WARNING:
This is a final report  for an older project done for a Berkeley class, so it is a rather low level description.
This project is not currently active, but I might continue it some day.
You may want to skip to the  result section  to see if this is of any interest to you.
 
 
 

High Quality Chroma Key

Introduction.

The main goal of this project is to write a high quality chroma key function based on processing of foreground and/or background frames before combining them into an output image. Such preprocessing is needed to fight some common chroma key problems: hard edges around foreground objects and poor treatment of colors close to the key color. Additionally, a high quality chroma key algorithm is intended to serve as an example of a fairly complex real-world algorithm which would be interesting to implement using Intel MMX or other multimedia enhanced architecture. Taking into account large amount of effort required for assembly language implementation of any algorithm and little time available for the project, every effort was made to come up with a most simple algorithm which would still produce good results.

The algorithm.

A chroma key function is implemented based on a modified version of the algorithm described in [1] . The algorithm works in (Y, Cb, Cr) color space which is MPEG's native color space. In this way, incorporation of the chroma key function into an existing system will not require extra color space conversion. Schematic description of the algorithm is shown below. The fundamental problem for a chroma key function is, first, to decide which pixels in the foreground image belong to the blue backing and which - to needed foreground objects and, second, how to identify and process boundary pixels. Image value at such pixels is due to contributions from both an object and the blue backing: pixel_value = k*Object_color + (1-k)*Blue_backing_color. One would like to solve this equation for two unknowns, k and Object_color having only one independent measurement, pixel_value. So, as we can see, this problem is fundamentally underdefined.

If we come up with some reasonable estimate for the contribution of the blue backing to the pixel value, the next step would be to subtract this contribution from the pixel value and replace it with a scaled version of the background image. In other words, the overall operation performed by chrome key is CK_result = (fg_pixel_value - blue_backing_contribution) + Kbg*bg_pixel_value (eq 1). The problem now is to provide a concrete algorithm for determining Kbg and blue_backing_contribution from the image data.

One such an algorithm is presented below.

First, we rotate (Cb, Cr) coordinate system by an angle defined by the key color to obtain (X,Z) coordinate system. Second, we use a parameter alfa (30 to 60 degrees were used for different images) to divide the color space into two regions, one where the processing will be applied and the one where foreground will not be changed (where Kbg = 0 and blue_backing_contrubution = 0 in eq.1 above). Then we "suppress" pixel value (if it is inside the working region) by applying the transformation Z' = Z X' = Z/tg(alfa). This corresponds to projecting the pixel value vector along X axis onto the line dividing the two regions. In this way, we subtract more from a pixel if it is further away from the dividing line. All pixels on the X axis are projected into zero chromaticity point. Amount of this suppression, Kfg = X - abs(Z)/tg(alfa) is used to suppress the third channel, Y' = Y - y*Kfg, where y is such that Y' = 0 for the key color. Next, we calculate Kbg by interpolating between Kbg = 1 on the border of the area shown black on the picture ("beyond" the key color Kbg is also set to 1) and Kbg = 0 on the line separating the working region from the rest of the color space. Lines shown on the picture are lines of constant value of Kbg. Up to this point, all transformations applied to the image are continuous, so the result of applying eq.1 with blue_backing_contribution and Kbg calculated in the described way is guaranteed not to have any hard edged not present in the initial image. Unfortunately, real images contain noise which becomes very noticeable after chroma key processing due to the fact that noise in calculated Kgb gets multiplied by the pixel value in the background image. Probably the simplest way to treat the noise is used here - a circle of a certain radius around the key color is treated as if it is the key color: Kbg = 1 within this circle. This can introduce discontinuities on the circle boundary.

Although multiple enhancements to the algorithm described here are possible (more sophisticated noise treatment is an obvious example), simplicity of the algorithm is of great value both by itself and with regard to MMX implementation which has to be done in assembly language. The algorithm makes use of only two parameters (other that the key color) which need to be adjusted for visual quality of the result - noise level and processing angle and their meaning (what will happen if one increases or decreases parameter value), is clear.

From the implementation point of view, the chroma keyer has the following structure:

Results and discussion.

Typical raw foreground image to be used in chroma key is a set of objects/people shot against very smooth blue backing. Quality of the image of this backing, as it will be clear from the examples presented below, is crucial for the quality of the overall process (as it is for any other chroma key implementation). Unfortunately, it's hard to find "real" images (or frame sequences) intended to be used for chroma keying, so the testing has been done on images which only very remotely can be treated as a good material. Nevertheless, the results are quite encouraging.

In examples below, please click on the image to see a large picture.

Example 1. Noise and color nearness. Foreground is an example of colors in wanted objects being close to the key color and some substantial noise in the "backing".

Original background and foreground images, respectively:

            
Result of applying chroma key with some default parameter settings (assuming no noise). Results of noise in the foreground are apparent:

This image, in fact, represents the worst case - large area of constant color in the background on which even small noise is very noticeable. The same combined scene after adjusting the noise parameter looks nicer but, as noted before, some hard edge appears around the barb wire:

Example 2. Fine detail treatment and intermediate results. This is an example of a transparent object in the foreground scene (a cloud) which would be lost with any standard chroma key. Common processing would also have trouble with grass/sky and tree/sky border lines due to some very fine details present there.

Original background and foreground images, respectively:

          
Output of foreground suppresser and backgroung*Kbg, respectively:
          
Combined image (sum of the two above):
The image in the last example was processed with noise level set to zero which allowed the cloud above the tree to stay visible. Compare this image with the output of a chroma key function which performs hard switch between foreground and background:
As a second part of the project, it was planed to re-implement the chroma key algorithm on MMX architecture as an example of a fairly complex real-world application. One problem with this is that the algorithm used is essentially a floating point one. It was not immediately clear to me that its integer re-implementation will be able to provide the same final quality. The algorithm was re-implemented using a combination of regular integer and fixed point arithmetic. Images produced with integer version of the chroma key were visually identical to those created by a floating point one. The price was to reduce the working angle parameter range so that its tangent fit into reasonable fixed precision representation - this disallowed small and large angles and although did not present a problem for the test images, can be a potential problem. Second problem is that the original version uses division and there is no pdiv operation in the MMX instruction set. This was possible to overcome by sacrificing some flexibility of the algorithm - for example, at this stage noise circle was chosen instead of more involved treatment which also was considered. Performance of the two versions (integer and floating point) were compared by measuring time required to perform chroma key on several images on a 180 MHz Pentium processor and taking an average of the results. Performance measurements is a field of its own and I'm far from claiming that absolute numbers of these very rough measurements have any far reaching meaning, but they do show that integer implementation is about three times faster than the original floating point one. This is not unexpected taking into account larger latency of floating point instructions compared with integer ones and apparent inability of gcc compiler (optimization flag O2) to achieve perfect instruction scheduling. Size of the assembly code is also smaller for integer version - 346 vs. 396 instructions inside the main loop.

Another implementation is currently being written using MMX emulating functions provided by Intel with "The complete guide to MMX technology" book [2] and software MMX registers. Currently, foreground suppresser and key generator (see the chroma keyer structure diagram above) are written and tested. This implementation should be 1-to-1 convertible into "real" MMX by anyone who would like to do it - simply replace a function call by a corresponding assembly language instruction. Most annoying problem I encounter so far while implementing the algorithm in MMX is the different representation for Cb, Cr and Y components (signed vs. unsigned) which has to be explicitly addressed. There are certain things which MMX architecture would have benefited from if they were present. First, a parallel division (16-bit number divided by 8-bit number) would extent applicability of MMX. In the current study, it was possible to avoid division by changing the algorithm but this might be not an option for other cases. Second, images have a tendency to be clustered, which means that chroma key processing should not be applied to large areas. With current architecture, we essentially apply the processing to all points in the image and then mask out the part which should not have been processed. Adding an instruction which does a conditional branch if the content of a 64-bit MMX register is zero would avoid applying a several hundred instruction worth of processing to a piece of data in none of the eight (or four) pixels needs it. Even if a penalty for such instruction is several cycles per point, it can easily save half of the total processing time for typical images.

Conclusion and future work.

A relatively simple high quality chroma key algorithm was developed based on [1] . It has been implemented using both integer and floating point arithmetic and performance results of the two were roughly compared. Because people expressed enough interest to it during the poster session, I intend to finish the implementation with simulation of MMX functions by Intel's regular C routines and software MMX registers.  I'd also like to do "real"  chroma-keying (i.e. on video sequences) - video might behave differently from a single image but I suspect that at worst it will be just like a hard - switching chroma key algorithm.

Acknowledgment.

I would like to thank Professor Rowe for suggesting this project to me and for helpful discussion during the midway project progress presentation.

References.

[1]. Keith Jack, "Video Demystified", Independent Pub Group (Computer), 1996

[2]. Intel Corporation, "The Complete Guide to MMX Technology", McGraw Hill Text, 1997.

Recent additions:

Promised MMX-ready version of the chroma keyer is now available here along with mdefs.h file. This is a MMX re-implementation of integer only chroma key. Please read the comments before using either of these.