CS 7960 - Project 4

Active Shape Models (ASM)

Name: Jonathan Bronson
Date: March 23, 2010

Table of Contents

  1. Introduction
  2. Active Shape Models
  3. Reconstruction
  4. Experiments
  5. Conclusion

1. Introduction

Overview
The goal of this project is to implement a statistical shape model based on the point distribution model (PDM) developed by Cootes and Taylor. The following sections will show how these point models, coupled with PCA, allow us to not only describe the relationship between existing shapes, but also to generate new plausible shapes from that family which not found in the original data set. We will apply this analysis to several sets of data, inluding a simple set of shapes, and more complicated set of human hand data.

Design Choices

For this project and future projects for this course, I will continue to build a comprehensive framework application. This application allows me to easily open and save images using a GUI, and quickly access all previous projects through their respective toolbox's. Using an MDI display model allows myself or another user to have any number of images open at one time, and compare results of methods more easily.

My implementation is coded in C++, using Qt for the GUI, and VISPack for image handling. I use CMake to automate the build process and except for some limitations of VISPack, the application is completely cross-platform. The software has been successfully built on OpenSUSE, Ubuntu, and Windows 7.

Back


2. Active Shape Models

The idea of Active Shape Models was developed by Cootes and Taylor, building on their previous work with the Point Distribution Model (PDM). In such a model, a training shape (usually a contour) is sampled and represented as a series of landmark points. These landmarks are chosen such that they are distinguishable enough that a deformed version of the contour would still contain such landmarks. This ensures a one-to-one correspondence among all shapes in a dataset. Additionally, further samples may be added which are interpolations of these landmark points. We refer to the full sampling of a contour as a shape vector:

where n is the number of samples of the shape. In this form, there is little information to explain the relationship between shapes of the same family.

Principal Component Analysis (PCA) is the mechanism that allows us to model shape variation in a more compact and meaningful way. Since we ensured a one-to-one correspondence between sample points, it follows that similar shapes will have high correlation between corresponding samples. Similarly, a deformable shape will have high correlation in connected regions across the whole set of samples. PCA provides a transformation to a parameter space, where each axis represents a mode of variation pulled directly from the correlation and covariance of the samples.

For a family of shapes, we can obtain the covariance matrix Σ of the mean-centered data by the formula:

where is the ith shape vector, and is the mean shape vector. A singular value decomposition provides us with the eigenvalues and eigenvectors of this matrix, in the form of:

where W is a diagonal matrix of the eigenvalues and V is a column matrix of the corresponding eigenvectors. We will utilize V as a transformation to a basis that maximizes variance along each dimension. In other words, transforming a shape vector by the first eigenvector will deform that shape by the most prominent mode of variation across the training set. The second vector the second most prominent mode, and so forth. By examining the eigenvalues, we can ignore the modes (eigenvectors) which provide little or know importance. A good scheme for this is to only accept the eigenvectors corresponding to 90% or 95% of the variation. This is easily found by summing the eigenvalues in order until the desired threshold is reached as the proportion to the total variance:

By multiply shape vectors by this reduced space matrix containing only the most important eigenvectors, we can obtain a compressed and more meaningful representation of the data:

.

Finally, to help visualization the actual correlation between landmark points, we can compute the Correlation Matrix and visualize it as a color coded image. The Correlation Matrix can be derived from the Covariance Matrix as follows:

Since we are looking for correlation within a single vector, the diagonal of this matrix will be composed entirely of 1's. Since correlated values range from -1 to 1, we have to use a transformed color scale to visualize this. For the images in this report, colors are scaled such that black, gray, and white correspond to -1, 0, and 1, respectively.

Back


3. Reconstruction

The explicit representation of the modes of variation by matrix V allows a much more powerful approach to shape modeling. Given that a family of shapes resides in some high-dimensional ellipse, even though we only have a finite set of examples, we can reconstruct any possible shape within that ellipse as:



with

where is a scaling factor for eigen axis i, is the ith eigenvector, and is the standard deviation of the ith mode among the traning data set. This is an ideal way of viewing the reconstruction, since over 95% of the traning data falls within two standard deviations and over 99% lie within 3 standard deviations.

Back


4. Experiments

To analyze the effectiveness of Active Shape Models, we test their ability to represent both synthetic families of shapes, as well a family of shapes obtained from real-world data. The synthetic dataset consists of unit squares centered at the origin, randomly scaled, and/or translated. Additionally, we test the effect of adding 10% random noise to the landmark points.

Synthetic Data: Squares

(a) Random Translation:

Sample Shapes Correlation Matrix Samples Shapes
(10% Noise)
Correlation Matrix
(10% Noise)
1st Mode 2nd Mode 1st Mode
(10% noise)
2nd Mode
(10% noise)

Examining the Correlation Matrix image, we see all landmarks are highly correlated within their axis. That is, the x components are all highly correlated with x components, and the y components are all highly correlated with the other y components. This makes perfect sense, since translation in the x and y axis are independent. Further, the high correlation within an axis makes sense since we are performing rigid translations, and all landmarks move in synchronicity. Unsuprisingly, the high correlation within each axis leads the major modes of variation to coincide with the major axis.

Translation Eigenvalues Translation Eigenvalues (10% Noise)

Comparing the raw data with the 10% noisey data, there are several clear differences. First, the correlation matrix, while generally the same, has fluctuating shades of grey rather than consistent correlation. This is of course a direct result of the noise, and should be expected. A less intuitive side effect of the noise is the rotation of the major axis of variation. Rather than being aligned with the x and y axis they are aligned more closely to the y = x and y = -x lines. Additionally, there is a small shearing effect present, particuarly more so in the second mode of variation.

(b) Random Scale:

Sample Shapes Correlation Matrix Sample Shapes
(10% Random Noise)
Correlation Matrix
(10% Random Noise)
1st Mode 1st Mode
(10% Noise)
2nd Mode
(10% Noise)

The pattern found in the Correlation Matrix for the raw randomly scaled squares is less intuitive than for the translated shapes. Clearly, from the black and white there is still a strong mixed correlation. The grouping of these correlations reflects which quadrant the landmark falls in. In the 1st quadrant, the x and y components are both positive, and thus, positively correlated. Similarly, in the 3rd quadrant, the x and y components are both negative, and again, positive correlated. Conversely, the 2nd and 4th quadrants have mixed signs and therefore result in checkboard pattern between x,y components, but continuous color for adjacent landmark points.

Scale Eigenvalues Scale Eigenvalues (10% Noise)

For the noise-free samples, the only non-zero eigenvalue is the first one, and entirely captures the scaling found in the sample set. Once noise is introduced, the 2nd and 3rd modes begin to show subtle variation. As seen in the figure above, these latter modes of variation are shearing effects, capturing the noise present in the data.

(c) Random Translation & Scale:

Sample Shapes Correlation Matrix Sample Shapes
(10% Noise)
Correlation Matrix
(10% Noise)
1st Mode 2nd Mode 3rd Mode
1st Mode
(10% Noise)
2nd Mode
(10% Noise)
3rd Mode
(10% Noise)

Combining both random translation and random scale into a single family of shapes proves quite robust, as these truly are linearly independent operations. This can be seen as the 1st dominant mode provides scaling, and the 2nd and 3rd provides translation in the x and y axis, respectively. The Correlation Matrix is however, much more difficult to read. It is infact a superposition of the previous two Correlation Matrices from experiments a and b. Suprisingly, the 10% noise plays a less significant role in the outcome of the 2nd and 3rd modes of variation found from PCA. The axis are only slightly off, and the 2nd and 3rd modes have switched order. This second effect is most likely due not to the noise, but rather to the inability of random samplings to evenly favor translations in both dimensions.

Translation & Scale Eigenvalues Translation & Scale Eigenvalues (10% Noise)


(d) Random Rotation:

Sample Shapes Correlation Matrix Sample Shapes
(10% Noise)
Correlation Matrix
(10% Noise)
1st Mode 2nd Mode 1st Mode
(10% Noise)
2nd Mode
(10% Noise)

Clearly rotation is more difficult to handle than scaling and translation. If not immediately apparent, the problem should become more clear when examining the captured modes of variation. At first glance the square appears to mostly scale and slightly rotate simulataneously. If, however, we follow the path of a single landmark during deformation, we see it follows a linear path. Ultimately we are trying to represent a non-linear operation with a linear decomposition, and we simply cannot expect to properly capture the deformation using PCA. Higher-dimensional techniques might be worth trying for rotation based deformations.

Rotation Eigenvalues Rotation Eigenvalues (10% Noise)


Real Data: Corpus Collosum

The data used for this experiment comes from segmentation of the human Corpus Collosum from medical images. Each sample shape is represented with 12 landmarks. Ideally, we should use cubic interpolating splines to display these shapes, however, in the interest of time, a linear interpolation is used. The sample set contains 30 example shapes, aligned with anatomical AC-PC coordinates.

Sample Shapes Correlation Matrix

It is clear from the Correlation Matrix image that these shapes are structured in a very specific way. In particular, notice the strong positive and negative correlation along the tridiagonal of the matrix. This makes intuitive sense, as our landmark points are ordered around the contour of the corpus collosum. Nearby landmarks should be correlated if the shape is to remain entact through deformation. Similarly, another such diagonal is present orthogonal in the matrix. This set of correlations implies that landmarks on one side of the shape are correlated with landmarks on the other side. This also makes intuitive sense, since the shape undergoes little shearing. For the reconstructions below, the bounding box of the sample shapes was adjusted to be closer to the origin.

1st Mode 2nd Mode 3rd Mode 4th Mode

Examining the modes of variation for the corpus collosum dataset doesn't offer the type of clear insight it did for the synthetic examples. However, we can still see some basic patterns. The first mode is a kind of rotation, while the second is an expansion along the x-axis. Similarly, the third mode also seems to be a type of rotation, while the fourth looks like an expansion along the y-axis.

Corpus Collosum Eigenvalues Corpus Collosum Eigen CDF

As one might also expect, being real world data, the contributing eigenvectors extend much further than the synthetic data. The first 5 modes capture over 90% of the variation, and by 15 there is virtually nothing left to contribute.

Back


5. Conclusion

The biggest benefits from the Active Shape Model (ASM) approach is the ability to take a model generated from a training set, and through a minimization optimization, deform it to match a similar but different shape in an image. We've seen from the experiments above that the method is quite robust to noise, and compact for linear data. Conversely, non-linear deformations require a further expansion of the basis to properly capture the modes of shape deformation. This should only be seen as a problem if the shape requires a large number of landmarks, correlated in highly non-linear ways.

Examining the eigenmodes of variation proved insightful for linear operations, but not so much for non-linear modes such as rotation. For medical data it seems unlikely that sample shapes will vary linearly in any way accept scaling and translation. However, variation in translation is usually removed prior to use anyway. Rotation as well for that matter. If the application of ASM is for rigid mechanical parts, however, examining the modes of variation could prove incredibly insightful. On the other hand, being man-made objects, manufacturers are probably already aware of such variation. Still, the power of ASM's is quite clear.

Back