Skip to main content
Click to return to IBM ECVG home

Segmentation Camera

With the simple addition of about $2 worth of parts, an ordinary video camera can be transformed into a range-based segmenter. This has a number of uses in human-computer interfaces as described below. The device works by flashing an invisible infrared LED on and off, and then looking for which pixels got brighter. The inverse square fall-off of diffuse reflection ensures that only objects relatively close to the camera will change in intensity. While there are other ways to segment objects from the background (e.g. stereo, motion, color) none of them has such a simple hardware configuration and such minimal processing (just frame subtraction).

 hand video
Click for an MPEG movie (1.5MB)
In its basic mode the system provides nice clean outlines that are suitable for hand pose decoding, fingertip tracking, and gesture recognition. This could be combined with limited speech recognition to perform tasks like Web browsing (scroll and back via motion; link selection by speech) or as the interface to a VRML viewer. While rapid background motions can sometimes mimic a ranging flash, these anomalies can be readily rejected by suitable tracking and temporal averaging.
head video
Click for an MPEG movie (1.2MB)
The spatial smoothing parameter can also be set higher to look for large objects such as heads. This could be useful for presence detection or counting the number of people viewing a kiosk. It could also be used to aid compression for videoconferencing -- the quality should be kept the highest in the detected face region. The original application for this system was to find heads so that a high-zoom, narrow angle camera could image the user's iris (colored portion of the eye) for biometric identification.

The demo system we have built uses an off-the-shelf Radio Shack surveillance camera with built-in IR emitters. For the system to work it is important that the camera NOT have an IR-cut filter. However, such a filter is an extra expense anyhow and is usually not included on commercial camcorders (try taping your TV remote control). To complete the system we modify the basic camera to put the IR LEDs under computer control, and add a visible-cutoff filter to increase the signal to noise ratio (somewhat surprisingly, the basic video looks the same). IBM's CVM frame grabber & DSP board is used to process 256 by 240 pixel images at 7.5 Hz without intervention by the host processor. This includes a local averaging step (4 arithmetic operations per pixel) to clean up the raw difference image.

We have also ported the system to use the main CPU of a workstation or laptop computer at full frame rate. This version uses a modified camera which automatically flashes a number of infrared LEDs and returns a standard NTSC signal which is digitized by a USB adapter. The camera is especially suited to automotive applications because:
  • it can image in the dark (without bothering the driver)
  • it supplies its own illumination so lighting variations can be minimized
  • it provides depth-based figure-ground separation essentially for free
We are currently looking at using SegCam in audio-visual speech recognition (sound + lip reading) to combat the high ambient noise levels in a car. We are also investigating facial feature monitoring (head nodding, yawning, blink rate) to automatically determine operator fatigue level.

 

Related patents:

US05631976 Flashing Light Segmentation System

 
Contact: Jon Connell Last updated: 6/12/02
 
Research Projects Group Papers Issued Patents Related Groups


  Privacy | Legal | Contact | IBM Home | Research Home | Project List | Research Sites | Page Contact