Digital Preservation: A Case Study Analysis

Logical_Preservation_-_Digital_Preservation.png

DIGITAL PRESERVATION:

A CASE STUDY ANALYSIS


ARTICLE SYNOPSIS

The research article From Accession to Access: A Born-Digital Materials Case Study, by Cyndi Shein (2011), documents a project undertaken by staff at the J. Paul Getty Trust Institutional Archives to accession, ingest, process, and provide access to a unique hybrid collection of paper and born-digital audiovisual materials. The collection in question consisted of recorded and transcribed oral history interviews that were inherited by the institution as part of the Pacific Standard Time: Art in Los Angeles, an art exhibit funded by grant money from the Getty Research Institute.

When this collection was transferred to the J. Paul Getty Trust Institutional Archives in 2011 it served as the institution’s first experience with managing a collection where the bulk of the transferred content consisted of born-digital objects. As Shein notes, there were several complicating factors that the repository had to face with the arrival of this collection. The first was that the decision to send the material to the Getty institutional archives came well after the grant project had been planned and completed, meaning that archivists within the institution were never given the opportunity to provide input on the content before or during its creation. A second complicating factor manifested in the institution’s lack of formal policy and procedure for accessioning, processing, and providing access to a collection consisting heavily of born-digital objects. The final complicating factor described by Shein is an expectation from the administration of the Getty Institute that these materials be made available online immediately after being deposited with the institutional archives. This later complication was made all the more complex by the fact that not all of the born-digital materials submitted to the archives contained accompanying rights information authorizing the Getty institutional archives to make content freely available online.

The archivists at the J. Paul Getty Trust Institutional Archives began their work by performing a literature review of similar born-digital projects with the aim of gaining insights and guidance with how to proceed. However, they discovered that the vast majority of available case studies presented situations and contexts that were completely unlike the one facing the relatively small J. Paul Getty Trust Institutional Archives. As Shein notes, most of the literature currently available describes work performed on major projects in institutional settings where technical expertise and funding were much more readily available than at the Getty institutional archives. As a result, this case study seeks to provide a template for strategies that might be used to fulfill mandates and administrative expectations on a small-scale, and at relatively low-costs.

PRESERVATION STRATEGIES, POLICIES, TECHNOLOGIES, AND/OR METHODOLOGIES USED

In the course of handling materials in the Pacific Standard Time: Art in Los Angeles collection, the small staff of the J. Paul Getty Trust Institutional Archives developed and utilized a host of digital preservation strategies and techniques. These activities can be broadly classified under three distinct groups: actions related to ingest, actions related to processing, and actions related to access.

While the archives had some limited experience handling digital objects in the past, this experience was the first in which a systematic approach needed to be developed and employed to ensure proper handling and ingest of digital objects into the permanent repository. A major challenge faced by the archivists in this situation was the fact that individual digital objects that made up the collection arrived at the archives with varying levels of accompanying rights information. As a result, a policy was drafted that separated materials in the collection by level of available rights documentation. Materials with clear rights statements were given priority for ingest and processing, while those without were given lower priority (as well as the stipulation that they would not be posted online, and only made available for use onsite). Another digital preservation strategy at ingest that needed to be dealt with was the development of an ad-hoc format standard for all materials entering the digital preservation environment. Archivists determined appropriate stable formats for materials and converted collection materials into those formats before ingest. This action became part of a larger workflow process that emerged within the archives, and included virus checking materials prior to ingest, removing materials from the media they were transferred on (typically optical discs and hard drives), and generating hashtag checksums prior to ingest. A final digital preservation action related to the acquisition and ingest of the materials in this case study was an institutional decision to limit access to the severely restrict access to the digital preservation server where the digital objects were ingested to and preserved.

Once the digital objects from the Pacific Standard Time: Art in Los Angeles collection were stabilized and ingested into the institutional archives digital preservation system, staff was faced with the need to develop and enact digital object processing procedures that would directly lead to enhanced discovery and access of these materials by users. Due to the unique nature of the collection, and the administrative demands to make it available online as soon as possible, the archivists assigned to the project developed and implemented a modified more product, less process (MPLP) approach for the materials. The MPLP processing method has gained traction as a preferred method for processing analog collections since it was first proposed by Greene and Meissner (2005) in their seminal article on MPLP in the American Archivist. Archivists working on processing the digital objects in this case study utilized the Archivist’s Toolkit utility to manually enter descriptive metadata for individual objects in the collection. Metadata (both extracted and manually entered) was wrapped in a METS wrapper for each digital object, and stored separately from the digital object(s), with unique identifiers serving as the means for linking object and corresponding metadata. Archivists also embedded some descriptive metadata elements within the objects themselves at the file level.

Finally, this collection served as a catalyst for the institutional archives in the development and implementation of digital preservation access standards, policies and procedures. Again, because of the limited amount of time staff had to handle this collection before it was placed online an institutional decision was made to rely exclusively on descriptive EAD encoded XML findings aids and tagging for discovery, instead of the more time intensive MARC encoding that most collections handled by the institution receive. The institutional archives also needed to establish a formal policy to govern the formats of access copies of materials retrieved from the digital preservation system. Lastly, a policy decision was made that making entire videos of oral history interviews available was too resource intensive for the limited amount of server space available to the institution. Instead, archivists decided to make PDF copies of whole oral history transcripts available along with brief clips from video interviews. The thinking was that a brief clip would allow users to place a face and voice to the interview transcript, while staying within the parameters of available server storage. It should be noted that the Getty institutional archives does make the complete video files of oral history interviews available to users upon request.

LESSONS LEARNED

Cyndi Shein’s article, From Accession to Access: A Born-Digital Materials Case Study concludes with a thorough examination of the case study in question and posits several lessons that were learned by staff of the J. Paul Getty Trust Institutional Archives. As Shein describes it, these lessons will play a large role in defining (and refining) the institution’s digital preservation program in the years ahead.

One key lesson that was learned while handling digital objects from the Pacific Standard Time: Art in Los Angeles collection was that all born-digital materials are not created equal. This is to say that the development and application of different preservation strategies for different digital objects is a must for any effective digital preservation program. Another insight gained from the experience is that increased automation of tools and services linked to digital preservation efforts is mandatory. Otherwise digital preservation efforts can become far too cumbersome and labor intensive to effectively engage in. A third important lesson learned by the institutional archives in this case study was the need for ongoing and enhanced cross-departmental collaboration. Building bridges in this manner can have the important effect of making the workload simultaneously easier to manage and less expensive via the utilization of shared resources. In this vein, a fourth valuable lesson learned is the need to utilize open source tools whenever necessary. Doing so in this case helped keep costs manageable. A fifth and final, important lesson learned articulated by Shein is the need to form a decision matrix that allows for the inherent value and importance of a given collection to determine that appropriate level of action and response by archives staff. 

In my own experience as an archivist and fledgling digital preservationist many of the lessons learned in this case study are extremely valuable and important to keep in mind while moving forward with work at my own institution. First and foremost is the need for institutionalized (and effective) preservation planning. Engaging in this ongoing activity will help any institution arrive at sound policy and procedure for how to deal with particular collections and/or types of digital objects. This activity is also needed to provide transparency to all digital preservation efforts, particularly with regard to the allocation of funding for those efforts. I also found the lesson of looking seeking out collaboration with disparate (but connected) stakeholders highly valuable. This is a lesson that we are learning the importance of in my institutional context. The more we collaborate with our information technology section, and learn to speak to their concerns, the more they learn about our unique needs and try to respond to them in appropriate ways. Overall, I found the lessons learned from the J. Paul Getty Trust Institutional Archives experience in handling digital objects from the Pacific Standard Time: Art in Los Angeles collection to be extremely valuable, particularly to its target audience of archivists and curators at smaller institutions who are attempting to enact digital preservation standards and programming with limited expertise and/or available resources.

SOURCES

Green, M.A., & Meissner, D. (2005). More Product, Less Process: Revamping Traditional Archival Processing. The American Archivist, 68 (2), 208-263.

Shein, C. (2014). From Accession to Access: A Born-Digital Materials Case Study. Journal of Western Archives, 5(1), 1-42.