SmartLogic - Advanced Post-OCR Logic
SmartLogic, developed by IBM Haifa, enables the utilization of document content and information, including characters, confidence levels, location, etc. In addition, the tool uses external information such as dictionaries, postal and phone directories, business process databases, and more.
Thus, when OCR results are validated, the location of the most likely error is determined and corrected.
The following table demonstrates OCR correction using the phone directory. The bold letters in the left column are erroneous recognition, corresponding to the right column, with the same letters corrected.
| Initial OCR results with errors | Corrected OCR results (automatically corrected using address and phone directory lookup) |
|---|---|
|
UBS AG, Stamford Branch, UBS AG
Collateral Management 677 Weshington Boulauerd, Stamford, CT, 06907 Attention: Margin Specialist Telephone: (603) 719-6118; Telecopier: (203) 719-4955 |
UBS AG, Stamford Branch, UBS AG
Collateral Management 677 Washington Boulevard, Stamford, CT, 06901 Attention: Margin Specialist Telephone: (203) 719-6116; Telecopier: (203) 719-4955 |
Violation of Logic Rules
SmartLogic can detect violations of logical rules that are known in advance, assess the statistical significance of various alternative solutions, and suggest the most likely solution that complies with the required logic.
In the image below, SmartLogic can be easily applied to tax forms. The following figure:

Figure 17 - Exploiting logical rules for better OCR
illustrates how a tax form can be corrected using SmartLogic. At first, the " 680.00" is recognized as "0680.00". However the error is discovered on the next line, which represents a sum of the two lines above it. Using SmartLogic, the error is corrected.
Click to see full size

Figure 18 - SmartLogic application for tax form