The Shoah Foundation has
started to catalog its entire collection. Four thousand testimonies in English
(less than 10% of the collection) have been cataloged using full segment-level
description, using a domain-specific thesaurus containing 21,000 places and
concepts.
Names of people (about 70 per
testimony for a total of 280,000 name references to date) are cataloged
separately. Although such extensive manual cataloging supports search and
browsing well, the cost and the language skills needed to catalog multilingual
materials, impose severe limitations. Some form of automated support is clearly
essential if access to collections of this scale are to be achieved.
The cataloging process
starts with the division of an interview into small segments that reflect
natural topic boundaries in the interview. For example, an interview might be
divided into segments discussing the individual’s family life, education,
aspects of ghetto life, etc. Each segment is then labeled with the appropriate
VHF Thesaurus terms that indicate experiences and geographical locations
associated with that segment. Often
thesaurus terms assigned to a segment do not explicitly occur in the segment.
For example, if an individual is describing the inability to obtain kosher food
in Auschwitz, the segment might be indexed with the term “Food during
deportation,” even if those specific words were not spoken by the
survivor.