IBM®
Skip to main content
    Israel [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

IBM collaborates with UCLA to digitize Hearst Newsreel Collection

IBM Haifa Labs News Center


IBM collaborates with UCLA to digitize Hearst Newsreel Collection


UCLA has engaged IBM to digitize the Hearst Metronome Newsreel Collection, which comprises one of the most important historical resources of twentieth century history. The collection consists of approximately 850 hours of newsreel footage covering major events from 1914 to 1971, including the formation of the League of Nations in 1919, Charles Lindbergh's solo crossing of the Atlantic in 1927 and the first flights into outer space. The collection was bequeathed to the UCLA archive in 1981 and is primarily documented on aging paper format, including 675,000 typed index cards, 7,700 synopsis sheets, and 190,000 disposition sheets. Access to the collection has been restricted in an effort to preserve the deteriorating paper catalogue. The cards provide detailed descriptions of the event documented on each newsreel, and are irreplaceable historical documents.

For ten years, UCLA searched for an affordable way to digitize the paper documentation of the Hearst Newsreel Collection, in order to create a searchable online database that would be accessible to the general public. In August 2002, the UCLA Film and Television Archive and IBM began investigating the possibility of digitizing the paper records. Recently, with the help of IBM Research scientists in Haifa, the Archive began to work on using newly created software designed to make this project a reality.

When the project was initially brought to IBM's attention, Dr. Jeffrey Schick, Director of Content Management Worldwide for IBM, was fascinated by the complexity of the Hearst documentation and recognized its research, education, and historical value. He felt confident that IBM has the resources needed to tackle this project. IBM Haifa scientists worked for over half a year to develop software capable of performing highly accurate optical character recognition, along with the ability to automatically place the scanned material into discrete database fields. The software utilizes innovative scanning technology that can be applied to complex and varied records. The index cards can thus be scanned, and image files saved using IBM Content Manager. Users will be able to search the database by subject, description or date.

Dr. Ehud Karnin, Manager of Signal Processing and Image Technologies at the IBM Haifa Lab noted that "IBM is very enthusiastic about the project and happy to collaborate on an effort with such historical significance. This research also opens the door to opportunities for digitizing many different archives, offering scholars and educators the chance to study information that would otherwise be inaccessible." For IBM itself, this project opens new doors for content management of large archives and the application of scientific innovations to this area.

Optical character recognition had previously been dismissed because the software necessary to complete the project at an acceptable level of quality, simply did not exist. When asked why IBM was able to meet the challenge head on, where others could not, Dr. Karnin mentioned Haifa's advanced technology that was adapted for the tough problems. These technologies involve binarization, cleaning of the text, separation of individual characters, and identification of the characters. He further noted IBM's Content Manager storage system, which is specifically designed for media storage.

The following are two sample index cards from the collection:

Click to see full size



The second phase of the project, following the creation of the online database, will link digitized newsreels to the database. UCLA will format the newsreels, which consist of over 27 million feet of film, into high quality video masters. IBM's DB2 Content Manager Software will be used to link the film footage to the online database. Once this has been accomplished, the astonishing historical record that is the Hearst Metrotone Newsreel Collection will be easily accessible to the public via the Internet.

The first software prototype was recently launched for testing. It is now being refined for the project, which is scheduled for completion later this year. The dedicated team of Haifa scientists includes Dr. Yaakov Navor, Dr. Eugene Walach, Asaf Tzadok, Avichai Giat, and Dr. Ehud Karnin. In recognition of the project's significance, IBM is contributing the servers for the database as well as the necessary software.

 
 

Document options
Print this page  


    About IBMPrivacyContact