
|
 |
|
Character Separation Technology
|  |
This demo demonstrates the IBM character separation technology. Character separation is a process that preceded OCR, where a field is separated into its individual characters.
The field is first segmented into individual connected components. Then, each connected component is separated into its individual characters. There is an option to use a preprocessing pass over the whole page to learn the specific characteristics of the writer, such as nominal values of pen width, character size, typical distance between characters, etc.
Character separation is carried out by the following steps:
- Estimate the number of charcters in each connected component
- Find candidate points on each connected component where separation should be carried out
- Determine at what point the split should be carried out
- Carry out the split into two sub-components
This process is carried out interactively for each sub-component until all the characters of the original field are separated. Finally, broken characters are reconstructed by combining different connected components.
Use these Next and Back buttons to page forward and backwards between the demo images.
|
 |
|
 |
|