Minutes of July 16, 2003 Meeting

Present: Isaiah Beard (recorder), Anne Butman, Tom Frusciano, Judy Gardner, Michael Giarlo, Nick Gonzaga, Dave Hoover, Patrick Huey, Ron Jantz, Linda Langschied, Sam McDonald, Ann Montanaro, Lynn S. Mullins, Robert E. Nahory, Jeffrey Triggs, Karen Wenk, Yu Yang (all group members present)

1. Review of Digital Project Process - AM

The Digital Project Process (see attachment 1) was presented and reviewed by the group to assess where DAWG stands in its ability to evaluate and support a digital project if an individual or group were to present one.

Re: Considerations for project acceptance - In discussion, KW suggested that perhaps two review streams need to be developed: one to approve content and one to approve the project based on collection criteria, support requirements, etc. AM stressed that importance also needs to be given to what value the project has in showing what Rutgers has to offer to the community.

Re: Post-acceptance - LL raised issue of student labor, and the consideration needs to be made of where such funding will come from. Availability of equipment, scheduling, and file storage are also important factors

Re: Data preservation - RJ posed the question: what gets preserved and what doesn't? Preservation of all data is a massive undertaking and we will need to determine what we should archive and preserve and what should not be archived once support is dropped. DH, IB suggested the possibility of placing archival responsibilities in the hands of requestors in cases where we choose not to backup using Fedora and the mass storage device.

2. Discussion of goal for Mass Storage Device (MSD) - DH

DH reviewed discussions with Grace Agnew on what role the mass storage device would play in archiving and backing up data. Per the original concept, the MSD would primarily store metadata and objects. Other individual server components (OS, interface, server apps) would be backed up locally. In such a case, Fedora would act as the store-and-use interface through which MSD would be accessed.

Per group suggestions, there should also be directly accessible partitions for server backups to occur. The MSD will not be directly accessible to servers not sharing the same local network, but options such as ftp can be used to transfer files directly to these partitions.

Consensus is that the Fedora interface is a preservation platform for external applications, as well as a native application environment for new projects.

3. Book Object Structure - RJ

RJ presented several models for the Book/Complex Object Structure (see attachment 2), along with pros and cons for each model. Models vary based on data streams, etc and some were based on suggested models by the University of Virginia.

Based on group discussion, Book Object Example 1b seems to best meet our needs.

[Note: Post-meeting, RJ submitted a revised model, Example 1c, which has been included in attachment 2. Object model 1c incorporates some of the changes to 1b that were discussed in the meeting. 1c should be used as the preferred model for books until we learn more from experimentation.]

4. Workflow Update - PH

Development on workflow continues. Feedback currently being sought from RJ, and waiting for a list of metadata fields to incorporate.

5. Fedora Update - JT

Fedora 1.0 is running. Currently two interfaces are available: a search page for end users, and a management page to upload, expunge and manipulate objects. Photographs and simple objects have been ingested, and presently looking for complex objects to test with.

For the next meeting, JT is to test items for Book Object Structure 1b using Fedora, including full text searching. JT is to use books from the Realiti project for testing purposes.

6. Scanning Standards Update - IB

Several draft documents are ready. A one page standards summary detailing the image specification requirements for NJDH was distributed, along with a document summarizing quality control factors and workflow, and a third document outlining environmental and basic hardware requirements for lab set up. Group members were asked to review these documents and provide feedback and suggestions.

Next Meeting: Wednesday, August 20, 9:30 a.m. in the Heyer Conference Room.

Attachment 1

Digital Project Process

  1. An individual or group wants to begin a digital project -
    • To what body is the idea presented?
    • What processes are in place to decide if the project will be accepted?
  2. Once the project is accepted -
    • What information is needed to begin?
    • How will the metadata scheme be determined?
    • Are the scanning standards ready for use?
    • What equipment is available for use?
    • Is the workflow process in place?
    • Who are the contacts from the project side and the implementation side?
  3. Once the project work is about to begin -
    • What does a programmer need to know or have to do to prepare for a new project?
    • What decisions have to be made to be ready to begin handling a new project?
    • When is the persistent ID assigned?
  4. Once the work begins on the project-
    • Is the workflow evaluated? If so, by whom?
    • Who determines if the processes (scanning and metadata) are being followed?
    • Is the integrity of the data checked?
  5. While the work is in progress but before it is publically available -
    • Will the project participants have access to their images and data?
    • Can they change and/or delete images and data?
    • Is there a testing mechanism for indexing and displays?
    • Is there a sign-off for the project manager?
    • Are there any projects that will not be publically-accessible? If so, how are the restrictions handled?
  6. Making the project accessible -
    • What changes are needed to the web pages?
    • Does anyone have to approve the changes?
    • Who makes the changes?
    • Are projects made available as ready or is there a specific time-table for new work?
  7. Backups -
    • Will all of the projects be housed on the mass storage device? ?

Attachment 2

Book Object Example 1
(Simple object with only two datastreams)

Notes:

  1. Easy to manage. For example, only one object ID to delete.
  2. Only two datastreams. Tiffs not included so probably not good for preservation (unless we wanted to use pdf as a preservation format)
  3. Reduced flexibility. For example, you can't create another object that points to one of this book's chapters. Probably inconsistent with UVa approach.
  4. OCRed text not available for full text searching.

Book Object Example 1a
(Simple object with multiple datastreams)

Notes:

  1. Easy to manage. For example, only one object ID to delete.
  2. For a book with 200 pages, we have 602 datastreams. What are performance or capacity issues?
  3. Reduced flexibility. For example, you can't create another object that points to one of this book's chapters. Probably inconsistent with UVa approach.
  4. Is this approach easier to index with tools like Amberfish?

Book Object Example 1b
(Simple object with multiple datastreams, separate objects for tiffs)

Notes:

  1. This approach separates the presentation formats from the preservation format.
  2. For a book with 200 pages, we have (only) 403 datastreams. What are performance or capacity issues?
  3. Reduced flexibility. For example, you can't create another object that points to one of this book's chapters, unless you used the tiff objects.
  4. More difficult to ingest. You have to ingest tiff objects before you can create the book object.

Book Object Example 1c (Simple object with multiple datastreams, separate objects for tiffs)

Notes:

  1. This approach separates the presentation formats from the preservation format.
  2. For a book with 200 pages, we have only 4 datastreams.
  3. Additional dynamic objects (e.g. for a specific chapter), could be created from the separate tiff page objects.
  4. With djvu server and a new disseminator, we can address individual djvu pages.
  5. Given that Fedora pre-empts the use of the structure map in the metatadata section, the added XML stream here provides the structmap.

Book Object Example 2 (complex object - separate objects for page images)

Notes:

  1. Flexibility in generating other objects (for example, a chapter object for a course). Similar to what UVa was suggesting.
  2. More complex structure seems to create some difficult management problems.
  3. Need to find and use tools for xml query and path tracing (xpath, xquery).
  4. More difficult to ingest. You have to ingest tiff objects before you can create the book object.


Issues and Questions

Back to Top of Page
URL: http://www.libraries.rutgers.edu/rul/staff/groups/dig_infrastructure/minutes/dawg_03_07_16.shtml
Libraries website maintained by the Libraries Webmaster
© Copyright 1996-2006, Rutgers University Libraries   (Further Copyright Information)