Counting people at a checkout counter is important. Customers do not
like to stand in long queues to pay for their purchases, especially if
they only have a few items. If a computer system could automatically
count the number of people in a line, it could call in more cashiers
during peak times. On a longer term
basis, the data could be used to determine times of high traffic flow
and thus be used to schedule the number of employees on duty each day.
To count people at a checkout stand we mount a stereo pair of cameras on a post near the end of the checkout counter. The images above show the output of the two cameras (if you cross your eyes you can see the depth). The cameras are at a height which corresponds to human eye-level and thus do not interfere with movement in the bagging area. Generally, the camera will be displaced to one side of the line which helps avoid the problem of one person's head occluding the head of the next person in line. From these images we compute the normalized correlation of small image patches over a range of horizontal displacements. This yields a disparity map (right) which encodes the distance to each person, and also a correlation score (left) which tells us where to trust the map.
Next we create a histogram of all the valid readings in the depth map to obtain a graph as shown above (left). Here, we plot the number of pixels at each depth (disparity) value. For a system verged at a distant point, larger shifts correspond to closer objects. Thus, far away objects appear as peaks toward the left (shift = 0) whereas closer objects show up more to the right (shift = 20). In this example there is one big population peak at a low disparity and three smaller peaks at higher disparities. The big peak corresponds to the background whereas each of the smaller peaks corresponds to one of the people in line. The portion of the original left image corresponding to each of these peaks in shown on the left hand side of the figure. The fact that these segments are somewhat ragged is irrelevant - in most applications all we care about is just counting the number of peaks (people). Reliably finding these peaks involves several steps. First of all, there may be extraneous objects in the scene at the same distance as the people in line. Stores often have magazine racks and candy displays along the aisles, for instance. To eliminate these items, we use a reference depth histogram taken when no one is standing in line. We chose to use a histogram instead of a depth map as a reference because it is much more robust to small camera changes. This histogram is then subtracted off from all succeeding histograms. In this way we can get rid of the large background peak in the graph above. Then we threshold the "difference" graph at some level to get discrete peak regions. Sometimes, however, if customers are standing very close together the peaks are still partially merged. Thus we perform a final step of inserting an extra boundary at any significant remaining dips. |
| Contact: Jon Connell | Last updated: 6/12/02 | ||
|
|
|
|
|