OCR Technologies
Character Separation Technology
Character separation is a process that precedes OCR, where a field is separated into its individual characters. The field is first segmented into individual connected components. Then, each connected component is separated into its individual characters. There is an option of using a preprocessing pass over the whole page to learn the specific characteristics of the writer, such as nominal values of pen width, character size, typical distance between characters, and more.
Character separation is carried out using the following steps:
- Estimate the number of characters in each connected component.
- For each connected component, find candidate points for separation.
- Determine at which point to perform the split.
- Split into two sub-components.
This process is carried out interactively for each sub-component, until all the characters of the original field are separated. Then the broken characters are reconstructed by combining different connected components.

Figure 7 - Handwritten character separation
Intelligent Character Recognition (ICR)
ICR recognizes characters using the following steps:
- Training phase - offline:
- Find topology
- Calculate features
- For each topology, train a neural network
- Recognition phase:
- Find topology
- Calculate features
- Use a neural network that was trained on this topology, and produce a probability for each class
- Our ICR uses two methods to define topologies:
- Horizontal/vertical lines:
Each character is segmented into horizontal and vertical lines.

- Primitives shapes:
Each character is segmented into loops, lines, curves, etc.

- Horizontal/vertical lines:
- For each method, there is a different set of neural networks.
Voting is carried out between the results of these two methods.
Algorithm Scheme

Figure 8 - ICR scheme
Example
The following are samples of recognized images from the NIST database.

Figure 8 - The input of handwritten digits is on the upper line, and ICR recognition is on the bottom line
Printed Character Recognition (PCR)
The printed character recognition system consists of two main processing units: a character separator and an isolated character classifier.
Character separation (frequently called segmentation) can work in two modes:
- Fixed (constrained) spacing mode, where character size is known in advance and therefore segmentation can be very robust
- Variable (arbitrary) spacing, where no a priori information can be assumed
Isolated Character Classifier
Upon input, the recognition module gets an extracted and size-normalized image representing a character to recognize.
For output, the module produces an ordered list of a few of the most probable classification candidates, together with their confidence values.
The task is performed by matching the raster sample with template masks representing different characters. The masks are prepared by an off-line training phase. A mask can be considered as a raster image containing three types of pixels: black, white, and undefined (gray).
Initially, template masks are built per font. In a single font set of masks, every character is represented by exactly one mask. Images representing template masks built for the Courier font are presented below.



Figure 9 - Sample of a template mask
In practice, a font character is often unknown a priori. Hence, templates representing the most prevalent fonts are prepared and combined.
The Omnifont recognizer, containing a number of masks per character, is shown below. An input image is correlated with all the masks stored in the recognizer. The mask with the highest correlation score is taken as the primary result of the recognition.
Example

Figure 10- Recognition probabilities chart. The 'B' with probability 99% will be matched with the 'B' on the right
Constrained Printing Recognition
When printing is constrained to a specific field per character, character spacing is fixed. In this case, segmentation is possible even when fields are distorted, as illustrated below.

Figure 11 - The bottom line presents the recognition results for the top line
Unconstrained Printing Recognition
In the following example, we can see the main steps of the recognition process for unconstrained printing. The input image used in the example was extracted from a fax coversheet.
![]()
- The possible slant is estimated and compensated (to cope with italics and backslanted fonts).
- The top and bottom base lines are detected. The base lines are shown in red and blue colors on the following picture.
- The whole image is divided into horizontally separated "words." Two such words are detected in our example. They are separated by a vertical green line.
- Each word is processed separately and decomposed into connected components. The following is the decomposition of the first word:
- The connected components undergo further analysis. Some of them are decomposed into smaller parts (called atoms).
- Thus, the problem of character separation is reduced to a problem of correct partition of an ordered sequence of atoms. In other words, we need to combine the atoms into molecules. Of course, this can be done in a variety of ways.
This choice is performed by using the recognition confidence values, produced by the character classification kernel described above.
- All the molecules are recognized separately. The average value of the recognition probabilities obtained for the corresponding molecules provides an estimate of the confidence of the entire word.
This process enables the successful recognition of broken and connected characters and dot matrix printing:
Recognition of printed characters includes:
- Monofont
- Omnifont
- Dot matrix
- Italics font
In addition, recognition of special fonts such as CMC7, Ferrington, OCRA, and OCRB is also possible.
Barcode Detection and Recognition
We see barcodes almost everywhere. These images represent a very fast and efficient way to code and retrieve information about an item. Our system receives a picture containing barcodes as input, and the goal is to locate the barcodes and decode them as fast as possible.
There are many types of barcodes that use various methods to encode information. Each one has its own algorithms for encoding, decoding, verification, and even error recovery.
All the pictures below are real examples that were successfully decoded by the system.



Figure 12 - Barcodes on parcels
Frequently, barcode location presents an additional challenge. A parcel may be very large, or many barcode-like graphics may exist in the same picture, (as shown in Figure 13 - Parcel ).

Figure 13 - Parcel with multiple barcodes
The team at IBM Haifa developed a system that can handle both of these problems. At high speeds, the system finds and decodes barcodes on extremely tough parcels. This is done at a very low resolution (140-170 dots per inch) compared with laser barcode readers (about 15000 dots per inch). Moreover, a laser barcode reader does not work if the barcode is damaged, whereas our system can decode barcodes when image quality is poor.
Character Recognition Technology Features
The character recognition technology includes the following features:
- OCR engines for both printed and handwritten characters are trainable and can be adjusted to the specific characteristics of the OCR application.
- Field recognition level is supported, and includes line separation and character segmentation.
- Character segmentation includes separation of touching characters and combination of broken characters, for both printed and handwritten text.
- Preprocessing options can:
- Remove lines (vertical and/or horizontal)
- De-skew tilted text
- Remove speckles
- Fill small holes
- Extract fields from a page via coordinates, with automatic adjustments of extracted text via connected component analysis
- Field segmentation is supported for both boxed and non-boxed fields.
- Software threshold sliders adjust the working points of reject-to-substitution rate.
- Multi-pass OCR can be used with various image preprocessing filters.
Additional software features include:
- API toolkit and automatic and/or manual resource management for the OCR engines. The resources include neural networks, font libraries, statistical tables, and more.
- Ability to train the OCR resources for specific data types.
- Configuration file used to create form processing applications.
- Interactive GUI to create configuration files.