About the Archive

Methodology and Standards

Technical Summary

During its first phase (1995-2000), the Whitman Archive was developed as a prototype HTML site aimed at making a large amount of material available to users quickly. Since 2000 we have conducted our work exclusively at a more sophisticated and demanding level, with new content added to the Archive only in the form of structured data. To facilitate preservation and searchability, we believe that the textual content of the entire Archive must be XML encoded. XML (Extensible Markup Language) is sometimes referred to as the "acid-free paper of the digital age" because, as a platform-independent and non-proprietary format, it stands a good chance of meeting the demands of long-term electronic preservation. XML encoding will also enable sophisticated searches of Whitman's writings. Users will be able, for example, to construct searches contingent upon a document's date, its poetic structures, and/or its relationships to Whitman's other published and unpublished work. The grammar of our encoding is established in a Whitman project "document type definition" (DTD), which we have developed as an extension of the Text Encoding Initiative (TEI), the de facto international XML standard for sophisticated electronic scholarly editions. This development has been conducted in cooperation with the University of Virginia's Institute for Advanced Technology in the Humanities (IATH). The Whitman DTD extends the TEI standard to deal with complex issues raised by manuscripts as material objects and intellectual constructs. Our project's encoding guidelines are posted to the web for use by our geographically dispersed project staff, interested users, and creators of similar projects who may find them useful. A password-protected document tracking database, specifically tailored to our project, enables us to manage the flow of work as we assign documents and track their progress through the stages of transcription, encoding, proofing, and web publication in the form of both digital images and searchable electronic text.

Our long-term goal is to encode and to provide digital images of all the documents in Whitman's vast oeuvre, including manuscripts, letters, notebooks, daybooks, and published work. These documents will be made available in facsimile (when permissions can be secured), so that users of the transcriptions will have access to a clear image of the documents on which transcriptions are based. Given that no one claims copyright on the content of Whitman's manuscripts, we are free to publish transcriptions of them. Reproduction of manuscript images does require permission from the repositories, and all of the major holders—University of Virginia, New York Public Library, University of Texas at Austin, Duke University, and the Library of Congress—have granted us the needed permissions. We have had similar good fortune with the institutions that have smaller Whitman holdings.

Our practice is to procure from various repositories 24-bit color scans of the original manuscripts, done at 600 dpi and saved as TIFF files. Scanner hardware, software, and settings are all recorded. When scanning is not possible, we rely on digital photography done by ourselves, and lighting conditions and camera settings are recorded. Members of the project staff examine each digital image as we derive from the archival-quality TIFFs faster-loading, high-quality JPEG images, cleaned and cropped for web delivery. We have benefited from the experiences of the Blake Archive and the Rossetti Archive, two IATH projects with extensive imaging components.

Our manuscript finding aids make extensive use of Encoded Archival Description (EAD), an internationally recognized and widely used document type definition for archival finding aids that facilitates sophisticated search and retrieval of manuscripts. Some repositories have provided finding aids already encoded in EAD format. Other institutions have provided us with paper or HTML finding aids and have allowed us to convert them. Using EAD gives us greater bibliographical control of documents while also offering our users one entry way to the manuscript material. Our integration of the disparate Whitman collections into a single, detailed, item-level finding aid provides a model for future projects that wish to organize similar data. Although Whitman's manuscripts are widely scattered across more than seventy repositories, it is also true, fortunately, that by far the greatest number of his manuscripts, approximately eighty percent, can be found at a handful of collections at Duke University, University of Virginia, New York Public Library, University of Texas at Austin, and the Library of Congress. All of these major libraries are cooperating with our EAD work. In advancing this aspect of our work, we have also benefited from a small grant from the Gladys Krieble Delmas Foundation and from a larger grant from the Institute for Museum and Library Services (IMLS).

The electronic files containing the texts (both TEI representations and EAD descriptions), images, and supporting apparatus for the Walt Whitman Archive are disseminated through an Apache HTTPD server, with separate daemons, as appropriate for supporting applications. The Apache HTTPD server operates on a Sun Fire V240 Server (libtextcenter.unl.edu) and Solaris operating system. The Sun Server is located at the University of Nebraska-Lincoln, and is administered by Computing Operations and Research Services (CORS) of the University of Nebraska-Lincoln Libraries.

Extensible Markup Language is used to represent a wide variety of textual materials: machine-readable transcriptions of manuscript and published items (Text Encoding Initiative [P4]: http://www.tei-c.org/P4X/); description and intellectual access to archive and manuscript materials (Encoded Archival Description (EAD, version 2.0 (pending): http://www.loc.gov/ead/); description, control (or administration) and structural information for Archive resources (Metadata Encoding and Transmission Standard (METS: http://www.loc.gov/standards/mets/).

XML is used, as appropriate, for context and help pages associated with the Walt Whitman Archive, as well as publication of databases.

METS is an emerging standard for describing and controlling digital objects, and is intended to support long-term preservation and access. The standard is currently maintained jointly by the Library of Congress and the Digital Library Federation. METS is designed to be used with other metadata schemas, in particular schemas for descriptive data such as EAD and Metadata Object Descriptive Schema (MODS: http://www.loc.gov/standards/mods/), and schemas for administrative data, techniques and conditions of creation and storage, intellectual property rights, and provenance of digital files (source, generational derivation, and migration/transformation information). To the extent possible, the Walt Whitman Archive monitors and adheres to emerging archive and library standards and best practices with respect to descriptive, administrative, and structural data.

The TEI is maintained by the Text Encoding Initiative Consortium, hosted at University of Virginia, Brown University, Oxford University, and a collaborative group based in Nancy, France, that includes ATILF, INIST, and Loria. The TEI-derived Walt Whitman Archive DTD has been developed in consultation with XML and TEI experts at IATH. The modifications to TEI conform to the guidelines provided in Chapter 29: "Modifying and Customizing the TEI DTD" of Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange (Oxford: TEI Consortium, 2002). Eventually we plan to make available on our site the Walt Whitman Archive DTD and documentation of our modifications of the TEI. In the meantime, we are willing to consider, on a case-by-case basis, requests from interested individuals who are working on related projects.

EAD is jointly maintained by the Library of Congress and the Society of American Archivists. Version 2.0 was released in the fall of 2002. The Walt Whitman Archive follows the guidelines provided in the EAD Tag Library (Chicago: Society of American Archivists and Library of Congress, 1998), and the EAD Application Guidelines (Chicago: Society of American Archivists and Library of Congress, 1999). The Walt Whitman Archive also has collaborated with repositories holding Whitman manuscripts in the creation and modification of EAD finding aids, and, to the extent feasible, has worked to reconcile differences in practice and adheres to emerging national consensus on best practice.

Indexing of EAD finding aids, TEI texts, and the descriptive data in METS instances will be provided by Tamino (Software AG), XML indexing software. Tamino is installed on the Sun 420R Enterprise Server, and employs a dedicated Apache HTTPD daemon. Software AG is committed to support of XML and related XML technologies such as XPATH and XQUERY for standard declarations of searching, and Extensible Stylesheet Language-Transformations (XSLT) for conversion of XML to HTML for viewing. In addition, we use a combination of commercial and open source software in creation, maintenance, parsing, and entity management (NoteTab, oXygen, GNU emacs, XML Validator, and so on). The project employs the XML Catalog standard developed by The Organization for the Advancement of Structured Information Standards (OASIS: http://www.oasis-open.org/committees/entity/spec-2001-08-06.html).

PHP, a scripting language used to extract information from a database for dynamically generating web pages, is also being used for three major sections of the Archive: the Walt Whitman Encyclopedia entries, the Whitman Image Gallery, and the Bibliography of Scholarship. PHP allows us to keep content files separate from design files, greatly reducing the chance for error during revisions. PHP also enables customized display of information within a database (MySQL); for example, bibliographic entries can be sorted by author, year, title, or annotation.

Images

We are currently upgrading our display of manuscript pages to offer users three viewing options: two sizes of static images plus high resolution scalable images. The scalable images are created in Zoomify, a Flash-based sequential image viewer that allows users to zoom in and pan around images. Viewing the scalable images requires that Flash be installed in the user's browser. (Most users of the Archive will not need to install anything, as Flash is already installed in 98% of browsers).

Networked Communication

The Walt Whitman Archive currently maintains a closed listserv to facilitate communication among the collaborating archivists, librarians, scholars, and technologists (whit-proj). This listserv currently resides on lists.village.virginia.edu and is maintained by IATH. It will, in the near future, be moved to a server at the University of Nebraska-Lincoln. All traffic on whit-proj is automatically archived, and this listserv archive is seen as a partial though important record of the building of the Walt Whitman Archive.

Access Tracking

Access to the Apache web server on libtextcenter.unl.edu is tracked using Advanced Web Statistics 6.4. This software provides daily records in both graphical and statistical form that allow the Walt Whitman Archive collaborators to observe the frequency with which texts and images are requested by users and IP addresses.

Backup and Record Keeping

All of the Archive's data is safeguarded by means of daily incremental backups on magnetic tape. The University of Nebraska Libraries' Computing Operation and Research Services (CORS) is responsible for backups and data integrity.

The Archive's project office maintains hard copies of all print records, hard copies of the DTD derived from TEI, and image files in uncompressed format on archival optical disk media.

 

 


Comments?

Published Works | In Whitman's Hand | Life & Letters | Commentary | Resources | Pictures & Sound

Support the Archive | About the Archive

Distributed under a Creative Commons License. Matt Cohen, Ed Folsom, & Kenneth M. Price, editors.