Image mosaicing is performed by "reprojecting" data taken from one camera pose (position and orientation) onto the image plane of another image, acquired with a different camera pose. As long as there is some overlap between reprojected images, it is possible to create one aggregate image, simulating a camera with a much wider field of view. The main tasks in this process are determining the correspondence between the images, calculating the reprojection transform to map between images, and resampling and compositing the reprojected image. For this project, all the correspondence information is given by the user. This section describes the approach for the other tasks.
One interesting aspect of this process is that in limited cases, this reprojection can be correctly done without any knowledge of the camera pose. This project concerns two such cases:
In these cases, the necessary transformation of the image to be reprojected can be expressed with a single matrix multiplication, which operates on pixel locations in homogenous coordinates.
- Fixed viewpoint; varying orientation; world is non-planar
- Varying viewpoint; varying orientation; world is planar
Suppose the user wants to transform Image #1 to the space of Image #0, which is the "reference" image. This can be accomplished by extending the image plane of Image #0, then mapping pixel coordinates in the extended plane back to their corresponding location in Image #1. Since this location will in general be non-integral, bi-linear interpolation is performed to create the final image values. Although Image #1 is being projected onto Image #0, for the sake of implementation, need a way to transform locations in the opposite direction, from Image #0 to Image #1.
This is where the projective transform of homogenous coordinates is used. A pixel location p0 in Image #0 is transformed to a location p1 in Image #1 by multiplication with a matrix M:
Because of the renormalization inherent in using homogenous coordinates, there is an extra degree of freedom in the scalings of the coeffiecients of a general 3x3 matrix. This can be eliminated by fixing the final element of the matrix M (above) to be 1.
Once one assumes that the relevent class of transforms can be accomplished with this matrix transform, the reprojection problem of image mosaicing is reduced to determining the matrix M for a given pair of images. For this project, the user specifies a list of pairs of correspondence points between the images. Each pair contains a point p0 in Image #0 and a point p1 in Image #1. The task is to find the matrix M which maps the points in Image #0 to their corresponding points in Image #1.
Because M has eight degrees of freedom, and because each correspondence point pair imposes at most two constraints (one in X, one in Y), at least four correspondence point pairs are required to determine M. But these points must be chosen very accurately, since there is no redundancy in the information they represent. Using more points can lead to a better match, but then M is solved as the matrix which does the closest job of mapping between correspondence points.
This is facilitated by re-arranging the image transformation equations above so that the mi values are a vector quantity to be solved for:
The 2x8 matrix on the right side (above) encapsulates the information about the location of the correspondence points p0 and p1. Each additional point pair contributes two more rows to this matrix, and two more elements to the column vector at left. This matrix is called C:
To solve for the mi, one would like to "invert" C. However, since C is not in general a square matrix, Singular Value Decomposition (SVD) is used to calculate the "pseudo-inverse" of C. Singular value decomposition of C breaks it into U, W, and V:
If P is the number of correspondence pairs, then U is (2*P) x 8 a matrix of orthogonal column vectors, W is an 8 x 8 diagonal matrix of singular values, and V is an 8 x 8 matrix of orthogonal column vectors. Then the pseudo-inverse is:
W-1 is the diagonal matrix composed of the reciprocals of the singular values. The vector of projective transformation matrix coefficients mi is computed by multiplying the pseudo-inverse by the vector of correspondence point coordinates in image #1.
One benefit of SVD is that the singular values tell you if really have enough information to determine the M completely. If the chosen correspondance points are all co-linear, for instance, then there is insuffient information. This kind of pathology can be detected easily by looking for very small singular values, prior to calculating W-1 and the psuedo-inverse.
So far the correspondence, reprojection, and resampling tasks of mosaicing have been addressed. The final issue is that of compositing: how to blend and adjust the overlapping images in a smooth way. There was not sufficient time to experiment with methods for brightness and color adjustment to better match images in the region of their overlap.
A simple scheme for smooth blending was created. This is based on a weighting function which determines a "weight" for each location inside the image. This weight should be high at points near the center of the image, and very low at the boundary of the image. Also, because second-order discontinuities in brightness are surprisingly visible, it is nice to have the function be C2 continuous at its boundary. Such a function was created in a completely ad hoc manner, based on a linear distance to the nearest edge, but then mapped through a quarter of a sinusoid so as to create C2 edges. Here is an image of the chosen weighting function, indicated with brightness:
The compositing procedure is as follows. For each pixel in the image plane:
The fact that there are non-C2 edges inside the weighting did not seem to be a problem.
- If there are no projected images, the color is black
- If there is only one projected image, the color is determined solely by bilinear interpolation inside its image plane
- If there is more than one projected image overlapping at this pixel, compute the weighting function for all of them. Normalize these weights so that they sum to 1. Use these weights to combine the values interpolated from each image.