Skip to main content

Preprocessing

Preliminary Processing

Preliminary processing includes image quality control and image enhancement. For example:

  • De-skewing
  • Binarization
  • Noise removal
  • Removal of the scan margins
  • Determination of missing pages (double feed)
  • Identification of the reverse feed (upside down forms)
  • Determination of low quality images (unsuitable for further processing)

Form Learning

Forms can be divided into three categories:

  • Fixed forms - a rigid structure
  • Logical forms - a non-rigid, yet similar structure
  • Unstructured forms - no predefined structure (e.g., invoices, receipts)

Fixed Forms
Any given application dealing with fixed forms scans the blank forms (commonly referred to as templates) and stores the extracted information for future reference.

An automatic process analyzes the respective form layouts and creates characteristic form descriptors to allow automatic recognition of each specific form type when processing filled-in forms. In typical cases, form training may be as simple as scanning a new form image. However, in some cases, manual intervention is applied to optimize the form recognition process.

Form recognition is based on horizontal and vertical lines, along with the analysis of text lines, barcodes, and (where necessary) thumb-nail images of special fiduciary areas. Various approaches are used in a collaborative manner so that for each application, both recognition results and processing time are optimized.

Logical Forms
Logical structured forms (Uform) are forms that share the same functionality, but are printed in a slightly different manner. To cope with these diversities , form templates are represented by a collection of logical descriptors, rather than by an image. These descriptors can contain information on textual elements describing a certain string (including location, letter case, etc.) or graphical elements such as lines (including location, size, orientation, etc.). Matching these descriptors to the image simultaneously identifies the appropriate template and registers the template to the image. Local feature matching creates a grid of matched points. This allows handling of global as well as local deformations of the image. Once this is done, data fields can be extracted from the image.

The use of logical descriptors is robust and enables the system to handle variations of the same logical form.

Unstructured Forms
The system analyzes the image structure to locate predefined fields of interest. This can be done according to keywords, topological structure, or syntax (e.g., dates).

Form Recognition

When a new filled-in form enters the system, it is scanned and its structure is analyzed. Upon completion, the template associated with the completed form is recognized. Prior knowledge of the template enables location and classification of the content of the form.

Form Dropout

Removing the recognized template from the completed form allows the Form Dropout technique to separate template data from filled in data. As a result, Form Dropout improves the recognition rate of OCR technologies. Furthermore, Form Dropout enables efficient compression of filled forms, using only 5 - 10% of the space required by conventional techniques.

This technology originated in the IBM Haifa Research Lab. We retain a number of key patents and technological advantages in this area.

Form Reconstruction
Once the above processes are complete, the user can retrieve the compressed and stored information, and superimpose the originating template over it. The system is designed with safeguards to prevent the loss of information, regardless of its location on the form. Moreover, even if the form itself is altered (i.e., by applying whiteout on a part of the form), these changes are detected and preserved.