THE VHF MULTIMEDIA DATA COLLECTION

 

The Shoah Foundation has started to catalog its entire collection. Four thousand testimonies in English (less than 10% of the collection) have been cataloged using full segment-level description, using a domain-specific thesaurus containing 21,000 places and concepts.
Names of people (about 70 per testimony for a total of 280,000 name references to date) are cataloged separately. Although such extensive manual cataloging supports search and browsing well, the cost and the language skills needed to catalog multilingual materials, impose severe limitations. Some form of automated support is clearly essential if access to collections of this scale are to be achieved.

 

The cataloging process starts with the division of an interview into small segments that reflect natural topic boundaries in the interview. For example, an interview might be divided into segments discussing the individual’s family life, education, aspects of ghetto life, etc. Each segment is then labeled with the appropriate VHF Thesaurus terms that indicate experiences and geographical locations associated with that segment.  Often thesaurus terms assigned to a segment do not explicitly occur in the segment. For example, if an individual is describing the inability to obtain kosher food in Auschwitz, the segment might be indexed with the term “Food during deportation,” even if those specific words were not spoken by the survivor.