|
Pietà Project - Technical Details [ Home | Overview | Project Team | Technical Details | Results | References ]
The process of building a digital model of 3D objects (shape and appearance) usually involves the acquisition of multiple, overlapping surface scans; these individual scans are then registered together and remeshed to obtain one seamless geometric model. Color and detail information can be represented as images textured over the geometric representation. A schematic of our 3D capture methodology is shown in Figure 1. In the following sections, we describe the process in some more detail (see references for a complete explanation).
[ Scanning | Registration | Meshing | Details & Color ] Scanning (back to top)Our 3D scanner is based on a multi-baseline stereo system, supplemented by a photometric system. The scanner is a customized version of the Virtuoso ShapeCamera from Visual Interface Inc. A photographic flash projects a pattern of vertical stripes on the subject. At the same time, six b/w digital cameras photograph the illuminated area from different angles. An additional color digital picture, registered with the b/w cameras, provides a texture image. A multiview stereo algorithm (part of the software suite that comes with the Virtuoso system) computes a triangle mesh approximating the scanned area. Each scan typically covered a 20 cm by 20 cm area and comprised on average about 10,000 measured points. The typical intersample distance for these scans is about 2 mm. In tests on calibrated targets in the laboratory, we have measured an accuracy in depth of +/- 0.1 mm for a single scan. We augmented the Virtuoso scanner with a photometric system (Figure 2(a)) consisting of 5 light sources and the built-in color camera, plus some control electronics. For each camera pose, we take 5 additional color pictures, each with one of the 5 light sources, while all other lights are turned off. We also used low-power laser sources to project red light dots onto the statue (the lasers are shown mounted by the camera in Figure 2(b)). The laser projectors that we used generate an 11 x 11 grid of rays. Used at a distance of about 1 meter, they produce an irregular pattern of red dots on the statue, with an average spacing of 2 to 4 cm between dots. We took an addtional picture of the dots to help in the alignment of overlapping meshes. The color pictures have a resolution of 1280 x 960 pixels, with 24-bit RGB per pixel. Typically we have a 0.5 mm intersample distance in the color images. We can therefore compute reflectance and surface normals from these pictures at a resolution about 4 times greater than the underlying geometry.
We did a preliminary scan of the statue (without the photometric system) in February 1998. We repeated the scan in June 1998 and completed it in July 1999. The total time spent doing the final scanning was about 90 hours over 14 days, including the equipment setup which had to be repeated each day. It took about 800 scans to cover the whole statue.
Registration (back to top)The acquired raw data consists of 6 b/w stripe images plus 6 color images per camera pose. We used the Virtuoso Developer software to compute triangle meshes for each single scan from the 6 stripe images. From this point on, we applied a number of algorithms to the data to arrive at the final model. Figure 3 illustrates the sequence of steps involved.
We start with a pairwise, manual, rough alignment of the meshes by loading each overlapping pair in the Virtuoso VI Studio application, and interactively selecting three or more matching features on the two scans. During the scanning, we moved the camera in a regular pattern, first scanning a horizontal stripe, then moving the camera upward to scan another stripe overlapping with the previous one. We recorded the scan ID-numbers and approximate positions on paper sheets with a schematic drawing of the statue. We used these notes to identify sequences of overlapping scans to align with one another. The goal was to form a tree of pairwise alignments that would span the whole set of scans. This procedure could probably be automated, but we did not investigate this research problem.
For each scan, we do a simple, automatic search for red dots in the images taken with the laser projectors on. We can map these image points back onto the scan geometry since the color camera is calibrated to the rest of the scanner. Given the initial manual aligment, we can search in the neighborhood of each laser point to check for matching points in overlapping scans. Adding additional consistency constraints to prune false matches, we arrive at a system of pairwise point matches with which we improve the registration by minimizing the sum of square distances between matching points using Besl and McKay's method [Besl 92]. Starting with this initial estimate of the scan alignment, we run several iterations of an Iterated Closest Point (ICP) algorithm to reduce the registration error. We use our own version of a n-scan ICP algorithm. We also attempt to reduce scanner line-of-sight error by computing more accurate estimates of true surface points from multiple overlapping scans, while filtering out small high frequency components (which are better captured by the photometric system). We call this process conformance smoothing. An example of the successive alignment steps on three test scans of Nicodemus' face is illustrated in Figure 4. Experiments showed that the registration error can be improved if the line-of-sight error is accounted for during the optimization. We are experimenting with alternating iterations of ICP and conformance smoothing to arrive at a better final registration. We are also working on making use of the higher-resolution image data we have acquired together with the geometry to further refine the registration.
Meshing (back to top)The result of the alignment and conformance processing described above is a large set of estimated surface samples. This point-set has nonuniform density because the number of overlapping scans varies from one part of the surface to another; and because the density within each scan varies locally depending on the angle at which the scanner saw the surface. However, except for areas that the scanner could not see, the sampling density is usually larger than strictly necessary to recover the shape of the surface with reasonable approximation. We designed our system to acquire geometry with an intersample distance of 2 mm. Note that this spatial resolution is independent of the accuracy in measuring point position. The scanner we used has a precision of +/- 0.1 mm in computing depth for each sample point. Also note that the photometric method provides additional surface detail in the form of a normal map (and a reflectance map) at a higher resolution, in a more compact, easier-to-use format. Most algorithms used to reconstruct a surface from a set of overlapping scans approximate the set of sample point, estimating a consensus surface at the same time as a representation of the surface is built. We believe that there are advantages in separating the two distinct problems of estimating true surface samples from overlapping scans and creating a surface from those samples. In our method, we estimated surface samples in the conformance step mentioned above.
The Ball-Pivoting Algorithm (BPA), diagrammed in Figure 5, computes a triangle mesh interpolating a given point cloud. The principle of the BPA is very simple: Three points form a triangle if a ball of a user-specified radius r touches them without containing any other point. Starting with a seed triangle, the ball pivots around an edge (i.e. it revolves around the edge while keeping in contact with the edge's endpoints) until it touches another point, forming another triangle. The process continues until all reachable edges have been tried, and then starts from another seed triangle, until all points have been considered. The set of triangles formed while the ball ``walks'' on the surface constitutes the interpolating mesh. The BPA is closely related to alpha-shapes [Edelsbrunner 94]. In fact, the BPA computes a subset of the 2-faces of the r-shape of the point set. These faces are also a subset of the 2-skeleton of the three-dimensional Delaunay triangulation of the point set. Notice that as a ball pivots around an edge, its center describes a circle c centered at the edge midpoint and lying on a plane perpendicular to the edge. Also observe that if the ball hits a data point during the pivoting, its center at the moment of collision can be computed by intersecting a ball of radius r centered at the hit point with the circle c. In practice, the algorithm works as follows: Input to the algorithm is the points and associated surface normals (estimated from the original scans). The points are kept in a regular grid for fast point-location queries. A seed triangle is found by looking at triples of points in a small neighborhood and checking if a ball of radius r can be placed in contact with the three points without touching any other point. Once a seed is found, the three edges are inserted into a queue of active edges which represents the boundary of the meshed region. Edges are removed from the queue one-by-one and considered for a pivoting operation. To pivot, we gather all points in a small neighborhood of the edge, and compute the intersection of balls of radius r centered at each point with the circle described by the center of the pivoting ball. The intersections are sorted in the order that each point is hit by the ball starting from its initial position. The first point hit (if any) forms a new triangle with the pivot edge. The boundary of the meshed region is adjusted accordingly while the newly-created edges are added to the active edge queue, and the process continues. The algorithm can be implemented in a form that uses a small amount of memory, even when meshing large datasets. Observe first that the mesh that is incrementally built does not need to be stored in memory: only the active edge front is required. Second, if each scan covers only a small portion of the entire model, we can process the set of scans in slices. The slice is moved from the bottom of the model up to the top in several steps. For each slice position, scans are loaded into (resp., discarded from) memory when they first (resp., no longer) intersect the active slice. Another simple extension allows the algorithm to produce good results with nonuniform samplings. We do a first pass of ball-pivoting with a small-radius ball to capture small features in areas of high sampling density. Of course, this fine meshing will leave holes in areas with sparser sampling. We then re-start the pivoting from the final active front of the first pass, but now use a larger ball radius; and repeat the process as many times as necessary to fill all holes up to a user-specified size. The Pietà data consists of 770 scans containing a total of 7.2M points. We use a slice height of 10 cm, and ball radii of 1.5, 3 and 6 mm. The BPA runs in 30 minutes on Pentium II PC, using 180MB of memory, and outputs a 14M triangle mesh. For details see [Bernardini 99].
Details and color (back to top)
The mesh produced using the Virtuoso camera has a spatial resolution of approximately 2 mm. This is more than adequate for studying the proportions of the statue from various viewpoints. However, it is not adequate for studying the small scale tool marks, shown in Figure 6. To capture data at a higher spatial resolution, we exploited the fact that the Virtuoso includes a color camera that produces images with a resolution on the order of 0.5 mm per pixel. We computed detail at this pixel resolution using a photometric stereo system built around the Virtuoso. The principle of photometric stereo, originally introduced by Woodham [Woodham 80], is illustrated in Figure 7(a). For an approximately diffuse (i.e. Lambertian) surface, the light reflected from a surface is proportional only to the cosine of the angle theta between the surface normal and the direction to the light source. Given three images of a surface as lit by three different light souces in known positions, a set of simultaneous equations can be solved for the surface normals corresponding to the points visible at each pixel in the image. Furthermore, given the normal at each pixel, the relative reflectance, or albedo, at each pixel for the red, green and blue bands can be computed.
Our photometric system is shown in Figure 2. We used 5 light sources rather than 3 because in any given image a point may be in shadow or a specular highlight. Such points can not be used in the normals calculation [Epstein 96], and must be discarded. While five lights do not guarantee that a normal can be computed at each pixel, the additional light sources increase the number of valid normals that can be computed for each camera position. Four typical images obtained at single camera position are shown in Figure 7(b). Further detail of the physical design of the system is given in [Rushmeier 98]. Photometric systems are very sensitive to precise light source positions and radiometric characteristics. Changes in electrical power levels, bulb-to-bulb variations, and directional variations in light source intensities can result in significant errors in the calculation of both normals and colors.
To compensate for possible errors in the photometric normals calculations, we use data from the 2 mm resolution mesh to compute the direction and relative distance to each point visible in each image. We also used the underlying mesh to estimate the relative light source intensity in the neighborhood of each pixel. This allows adjustments for possible bulb-to-bulb, or directional intensity variations. The details of these adjustments are given in [Rushmeier 99]. The normals computed for the images shown in Figure 7(b) are shown in Figure 8 (as illuminated from a virtual light source in front of and to the right of the surface.) We ensured color consistency across all of the color albedo maps with a two step process. First, we found corresponding points in all overlapping color albedo maps, as shown in Figure 9(a). A least-squares solution for scaling factors for each of the color channels in each of the images matches colors on corresponding points. In a second step, we combine the individual maps using weight maps to produce the final color albedo map. The weight map for each image is the product of a map of where photometric normals were computed for that image and the inverse relative area of the geometry projected onto each pixel (Figure 9(b)). Additional details of the color adjustments can be found in [Rushmeier 99].
Figure 10 summarizes the calculation of photometric normals and colors for a set of 53 camera positions. Initially, on the far left, we have many individual color images. The normals from each image are calculated and combined into a single map, as shown in the second image from the left. The unlit color albedo maps are computed and adjusted, shown in the third image. Finally the normals and colors can be combined to produce an image of the statue, as shown on the far right. To reduce alignment errors, we use the high-resolution image data captured during the scanning process [Bernardini 00]. Normal and albedo maps are a useful form for storing detail, since they do not need to be recomputed as the underlying mesh is simplified for interactive display. Furthermore, new generations of graphics hardware can be used to allow the interactive recalculation of local illumination using normals and color albedo maps [Heidrich 99].
References (back to top)
[ Home | Overview | Project Team | Technical Details | Results | References ] |
||||||||||||||||||||||||||