Gordon Kindlmann

CS 6964
Project 1
Image Mosaicing


For ease and speed of implementation, I used my nrrd library for reading in the images, as well as the correspondence points data. Images are either two dimensional (grayscale) or three dimensional (color) arrays, so nrrd is a natural use for them. Putting the correspondence point data into an array form is also reletively straightforward, as will be discussed below. The nrrd library relies in turn on biff (for error reporting) and on air (basic utility functions). The source for these libraries is in /home/gk/usr/local/src; the #includes are in /home/gk/usr/local/include, and the libraries themselves are in /home/gk/usr/local/lib.

This description of the functionality will start with the lowest level and work its way up to high level calls which perform useful work. As described in the previous section, an SVD calculation is needed to determine the warp between two overlapping images. The SVD code comes from Numerical Recipes. Because its SVD call (svdcmp) uses two dimensional arrays which are 1-based instead of 0-based, I had to write a wrapper function (mossSVD) which allocated and initialized the 1-based arrays based on the 0-based input data. Another wrapper around this (mossPseudoInverse) calculates the pseudo-inverse based on the SVD result. This is all in svdcmp.c, and some supporting code is in pythag.c and nrutil.c, both largely ripped off from the Numerical Recipes code base.

As described in the previous section, the SVD is used to convert information about correspondence points into a transformation matrix which maps between image planes. The software follows exactly the procedure described already; this functionality is contained in the function _mossCalcPTs of corresp.c. The transformation itself is stored as a 3x3 array of floats, in a struct (mossCorresp) which stores all the correspodence points and the matrices together. This is defined in ally.h. All the psuedo-methods to create and destroy this and other struct are in methods.c

But this program operates on more than just image pairs. The user can specify an arbitrarily large set of images, and as long as every image is tied to at least one other image, then they form a cohesive whole, and the mosaic can be calculated. Still, one particular image is specified (by the user) to be the reference image, which defines the plane onto which all the other images are projected. Thus, in order to perform resampling, the mosaicing tool needs to be able to determine the transformation which maps from the reference image to any of the other images.

To organize the required information about mapping between images into a single place, I used the notion of a "matrix of transformations": If N images are being mosaiced, then the matrix of transformations is an N by N matrix, each element of which is a transformation (which, somewhat confusingly, is itself a matrix). The transformation at row i and column j records how to transform pixels in image j onto the plane of image i. This matrix is similar to an adjacency matrix used to represent a graph (nodes connected by edges), but instead of a binary number at each entry, we have the transformation. Like an adjacency matrix, we know that the images form a connected graph (mosaic) if the row for each node (image) contains more than just one element-- this means that every image is tied to at least one other image. The user's correspondence information has to have this property, or else moss complains. Populating the matrix of transformations with the the transformations derived directly from the user's correspondences is done by _mossLearnPoints() in corresp.c.

The next step is filling the whole matrix with all the intermediate transformations which relate two images which weren't explicitly tied together by the user. Assuming that the mosaic isn't over-constrained (there is a loop in the adjacency graph), then figuring out the intermediate transformations is essentially a graph problem: to find a way of transforming one image to the other, a path must be found through the adjacency graph. Between each pair of images along the path, the matrices associated with each image pair is cumulatively composited to produce a new transformation, which tells how to transform between the images at the endpoints of the path. Starting with the correspondences the user defines, the whole matrix of transformations can be filled out, even though only one whole row (or column) needs to be fleshed out. This process is done by various functions (_mossFindPathTo, _mossCalculateInter, mossSetMatrix) in corresp.c.

Now the process of creating the output image actually starts. Given the reference image, and the transformations which take all the other images into its plane, we can determine the bounding box of the final image, and allocate space for it. Then the pixel by pixel creation of the mosaic: for every pixel in the output image, find the images who's projections cover that pixel, use the transformations calculated to calculate the pre-image of the target pixel in the source image, and use bilinear interpolation to find the pixel value. All the interpolated pixel values for all the overlapping images are averaged according to the weighting scheme described in the previous section. This functionality is all in image.c. The function which starts the whole process off is mossDoit in mosaic.c, which also contains functions to check the validity of all the user input parameters (such as making sure at least four points were specified to stitch two images together). The only job for main.c is to read in all the input data and call mossDoit.

As was said before, the nrrd libarary handles the reading and writing of PGM and PPM files. It also handles the reading in of the user-specified correspondence data. Using nrrd for this was simply a matter of programming convenience more than one of user convenience. The user records all the corresponds points by creating an ASCII nrrd file which defines a 3-dimensional array of floating point numbers. The first axis of this array always has only 2 elements, for the X and Y coordinates of the correspondence points. The second axis has as many elements as there are images to be stitched together, and the third axis has as many elements as there are correspondence points. Written as an ASCII file, there is the nrrd header which is fairly straightforward, and then one line per correspondance point. Each correspondance point is recorded as a coordinate pair in two different images, and each image has its own column. Since nrrd can't deal with incomplete data, the user just fills in -1 for the coordinates in all the images not involved in a given correspondence point. An example may clarify things:

dimension: 3
type: float
encoding: ascii
sizes: 2 4 18
#image 0    image 1   image 2      image 3
 264 105    134  10     -1 -1      -1 -1 
 369  84    231   8     -1 -1      -1 -1 
 248 385    130 233     -1 -1      -1 -1 
 361 384    232 231     -1 -1      -1 -1 
 530 197    353 109     -1 -1      -1 -1 
 509 382    340 230     -1 -1      -1 -1 
  -1 -1     434   6     81  72     -1 -1 
  -1 -1     451 136    106 227     -1 -1 
  -1 -1     542 222    212 328     -1 -1 
  -1 -1     402 295     63 418     -1 -1 
  -1 -1     563 267    243 383     -1 -1 
  -1 -1     566 306    248 429     -1 -1 
  -1 -1     256 164    -1 -1       428 107
  -1 -1     313 165    -1 -1       489 109
  -1 -1     218 246    -1 -1       382 197
  -1 -1      16 362    -1 -1       155 317
  -1 -1       3 249    -1 -1       140 192
  -1 -1       2 340    -1 -1       139 293
Here, there are 4 different images, and 18 correspondence points in total. For example, from the first line of data we can see that the first point ties location (264,105) in image #0 to (134, 10) in image #1. I found that a very convenient way of finding the correspondence points between a pair of images is to use two copies of xv running on each image. Middle clicking inside the image causes coordinate and color information to be written at the top or bottom of the image, but it also causes said information to be copied into X's crude cut/paste buffer, so middle clicking inside an emacs window pastes the same information in a line like:
 346, 185 =  35, 35,143  #23238f  (240  75  56 HSV)  [    0,    0]
Keyboard macros can then be used to massage the point data into the necessary format. Note that these coordinates need not be integers- floating point values are valid. Also note that negative coordinates are also valid, if by some bizarre reason you know that the proper location for a correspondence point is outside the image. So using the coordinate (-1,-1) as a placeholder and sentinel meaning "no data for this point inside this image) is really a hack, but it sure simplified the programming.

Although I did create a seperate library (libmoss.a) for this project, there is only one stand-along program to run which accomplishes the work, called moss. Using it is just a matter of setting up your correspondence point file correctly and making sure all the input images are either PGMs or PPMs. Here is the usage information it prints:

usage: moss <imgOut> <points> <which> <img0In> <img1In> ...
The ordering of the images on the command-line is very important, as this must exactly match the ordering of the coordinates in the columns of the correspondence point file. Obviously, the number of images given to be stitched together must be compatible with the data in the points correspondence file.

As the program runs, it will spit out a lot of information about matrices being computed and intermediate transformations being determined. Until I learn to use a debugger I'll keep to printf.