IBM Israel
Skip to main content
 
Search IBM Research
   Home  |  Products & services  |  Support & downloads  |  My account
Select a Country Select a country
IBM Research Home IBM Research Home
IBM Haifa Labs Homepage IBM Haifa Labs Home

OCR Technologies

Document Management
Project Homepage
 ·Character Separation
 ·Handwritten Character Recognition (ICR)
 ·Printed Character Recognition (PCR)
 ·Barcode Detection & Recognition
 ·Additional Information
 ·Contact Information
Feedback


Character Separation Technology
This demo demonstrates the IBM character separation technology. Character separation is a process that preceded OCR, where a field is separated into its individual characters.

The field is first segmented into individual connected components. Then, each connected component is separated into its individual characters. There is an option to use a preprocessing pass over the whole page to learn the specific characteristics of the writer, such as nominal values of pen width, character size, typical distance between characters, etc.

Character separation is carried out by the following steps:
  • Estimate the number of charcters in each connected component
  • Find candidate points on each connected component where separation should be carried out
  • Determine at what point the split should be carried out
  • Carry out the split into two sub-components

This process is carried out interactively for each sub-component until all the characters of the original field are separated. Finally, broken characters are reconstructed by combining different connected components.

Use these Next and Back buttons to page forward and backwards between the demo images.

Next 





Next

 

  About IBM  |  Privacy  |  Terms of use  |  Contact