Skip to main content
Click to return to IBM ECVG home

Stereo Line Counting

Counting people at a checkout counter is important. Customers do not like to stand in long queues to pay for their purchases, especially if they only have a few items. If a computer system could automatically count the number of people in a line, it could call in more cashiers during peak times. On a longer term basis, the data could be used to determine times of high traffic flow and thus be used to schedule the number of employees on duty each day.

Right Image    Left Image
Right Image Left Image
Confidence Disparity
Confidence Disparity


To count people at a checkout stand we mount a stereo pair of cameras on a post near the end of the checkout counter. The images above show the output of the two cameras (if you cross your eyes you can see the depth). The cameras are at a height which corresponds to human eye-level and thus do not interfere with movement in the bagging area. Generally, the camera will be displaced to one side of the line which helps avoid the problem of one person's head occluding the head of the next person in line. From these images we compute the normalized correlation of small image patches over a range of horizontal displacements. This yields a disparity map (right) which encodes the distance to each person, and also a correlation score (left) which tells us where to trust the map.

Disparity Histogram    Front Segment
Disparity Histogram Front Segment
Middle Segment
Middle Segment
Rest (background) Back Segment
Rest Back Segment


Next we create a histogram of all the valid readings in the depth map to obtain a graph as shown above (left). Here, we plot the number of pixels at each depth (disparity) value. For a system verged at a distant point, larger shifts correspond to closer objects. Thus, far away objects appear as peaks toward the left (shift = 0) whereas closer objects show up more to the right (shift = 20). In this example there is one big population peak at a low disparity and three smaller peaks at higher disparities. The big peak corresponds to the background whereas each of the smaller peaks corresponds to one of the people in line. The portion of the original left image corresponding to each of these peaks in shown on the left hand side of the figure.

The fact that these segments are somewhat ragged is irrelevant - in most applications all we care about is just counting the number of peaks (people). Reliably finding these peaks involves several steps. First of all, there may be extraneous objects in the scene at the same distance as the people in line. Stores often have magazine racks and candy displays along the aisles, for instance. To eliminate these items, we use a reference depth histogram taken when no one is standing in line. We chose to use a histogram instead of a depth map as a reference because it is much more robust to small camera changes. This histogram is then subtracted off from all succeeding histograms. In this way we can get rid of the large background peak in the graph above. Then we threshold the "difference" graph at some level to get discrete peak regions. Sometimes, however, if customers are standing very close together the peaks are still partially merged. Thus we perform a final step of inserting an extra boundary at any significant remaining dips.


Related patents:

US05581625 Stereo Line Counting

 
Contact: Jon Connell Last updated: 6/12/02
 
Research Projects Group Papers Issued Patents Related Groups


  Privacy | Legal | Contact | IBM Home | Research Home | Project List | Research Sites | Page Contact