Natural Scene Text Recognition

Detecting and recognizing text in natural settings is essential for image and video indexing, automatic text-to-speech generation, automatic recognition of events, locations and products, and more.

Powered by robust deep learning models, NST is able to recognize words with different fonts, environments, and angles--all of which are the properties of natural scene text.

Given an input image, the NST tool first uses a deep neural network that locates potential textual objects in the natural scene. Then, combining multiple text recognition models, it reads and recognizes the located words.

The main goal of NST is to detect and recognize English text in natural settings.
Recognition of arbitrary sequences of characters will not usually work.

NST is not an OCR engine. It however can sometimes read words from scanned documents, without punctuation marks, paragraphs, styles and cases.


The NST tool has been released by Watson.
You can try out our NST for yourself - Simply drag an image to the web page and you'll get a response of the recognized text. The following image below shows the demo web page which you can use.


Daniel Rotman, Video AI Technologies, IBM Research - Haifa