Rapid, Realistic Image Generation

Rapid, Realistic Image Generation

Mark Schmelzenbach with Charles Hansen, Brian Smits
Department of Computer Science
University of Utah, Salt Lake City, UT 84112.


Download the PostScript file here.

1  Abstract

Conventional ray tracing has yielded some of the most realistic computer generated imagery produced to date. However, even with increasingly faster machines, ray tracing is still commonly regarded as useful only for non-interactive rendering jobs due to its high per-pixel cost. This paper presents a method of increasing the rendering frame rate of a ray tracing engine by augmenting it with pixel reprojection techniques. Pixel reprojection enables the engine to effectively exploit the frame-to-frame coherence inherent in interactive environments.

2  Introduction

Graphics research has typically approached the problem of image generation from two separate directions: realism and interactivity. Photo-realistic approaches are primarily concerned with simulating increasingly detailed physical properties, requiring an increase in both computational resources and time required per image. Interactive applications tend to rely on existing graphics hardware and algorithmic simplifications to obtain high frame rates.

One time-honored technique for generating photo-realistic imagery is ray tracing. Ray tracers have yielded some of the most realistic images produced to date. Ray tracing provides a simple and elegant method for generating complex effects, naturally incorporating hidden surface removal, shadows, reflection and refraction, which can be difficult to produce using other methods. This simplicity comes with a much higher per-pixel resource cost than other techniques, such as scan-line rendering, and so ray tracing is commonly regarded as useful only for non-interactive rendering jobs. As the technology curve continues to increase, this perception is changing; however, ray tracing is still not a ubiquitous technique for rendering scenes at interactive speeds.

The image quality of interactive systems tend to be much lower than that of images generated by offline image synthesis, due primarily to reliance upon existing graphics hardware. Graphics hardware can remove much of the computational burden from the host CPU, but the application is then dependent on the capabilities of the hardware. Graphics hardware is becoming increasingly sophisticated, yet still has difficulties producing some effects such as refraction. It is not a coincidence that when interactivity is not a constraint, many rendering applications forgo hardware accelerated techniques for more realistic methods.

Ideally, users would be able to have the best of both worlds: unlimited photo-realistic interactivity; however, this goal is still a few years off. Many current design tools are therefore divided into two sections: an interactive design session and an off-line rendering session. This approach often requires several iterations, as problems that were not visible during the design phase become apparent in the rendering phase.

This research seeks to strike a balance between the two approaches to rendering. By using a ray tracing engine for the base rendering module, the user will obtain access to flexible shading models and realistic imagery. Further, by utilizing a pixel reprojection, it becomes possible to inexpensively reuse this data. This paper presents a rendering architecture that rapidly generates realistic imagery by exploiting frame-to-frame coherence. The benefits can be applied directly to areas where both interactivity and fidelity are desired, as in design work or virtual reality environments.

3  Brief Summary of Previous Work

3.1  Ray Tracing

The basic algorithms used in ray tracing have remained relatively unchanged since the technique was introduced to computer graphics by Whitted [17]. Early research in this area focused on developing methods of accelerating the ray casting process. Arvo and Kirk provide a succinct summary of many of these methods [4]. Advances in image quality have been made by using Monte Carlo techniques such as those described by Cook [6], and Kajiya [8]. Incorporating these techniques allows such effects as motion blur, depth of field, anti-aliasing and soft shadows, although at the cost of dramatically increasing the number of rays cast per pixel.

The ray tracing algorithm lends itself well to parallelization. Since each initial ray cast into a scene is essentially independent, calculation of each pixel does not depend on any previous results. Parallelization has been a common method of increasing the speed of ray tracing engines. Many implementations for parallelizing ray tracers have been created. In fact, ray tracing has reached interactive speeds on high end machines using this technique. Parker, et al., [14] discuss many trade-offs to consider when moving a ray tracer into a heavily parallel setting.

3.2  Pixel Reprojection

Pixel reprojection is a specific case of a broad class of rendering algorithms known as Image Based Rendering (IBR). Although pixel reprojection has seen use in specialized applications, particularly in the viewing of volumetric data [16] and in the generation of stereoscopic images [2], it has not been widely used for more general rendering applications until recently. However, as IBR is becoming increasingly common, reprojection methods are starting to obtain a more visible profile as a rendering technique. The most appealing aspect of IBR is that rendering time is no longer measured by geometric complexity, but rather by the size of the frame buffer. In addition, images can be generated not only from geometric models but from measurements and photographs taken from the real world. There are many IBR methods besides pixel reprojection. Kang [9] surveys several image-based rendering techniques. He identifies a dichotomy of four basic categories distinguished by the scheme used to modify pixels between frames. In addition to geometric pixel reprojection, he created categories for non-physically based morphing, mosaicking, and dense sample interpolation. However, it is the area of geometric pixel reprojection that is most relevant to this research.

Pixel reprojection can be performed on any image that has an associated depth buffer. Mark et al. [11] discuss geometric pixel reprojection in detail, which they refer to as a 3D warp. By using both image coordinates and the associated depth value of each pixel in a source frame, it is possible to accurately compute both translation and rotation transforms, placing the pixel into a destination frame. Although every pixel that is reprojected will be correctly placed, the resultant image will not necessarily be complete. There will usually be several areas of the image that have no information. These areas will be appear as holes in object surfaces. These holes are symptoms of two major problems: occluders and sub-sampling artifacts.

There are several methods for reducing or eliminating the occluder and sub-sampling problems. Most methods work by obtaining more information about the scene, utilizing such methods as combining multiple viewpoints of the same scene [11], or augmenting a single reference image with multiple depth layers (LDIs) as described by Gortler et al. [7], [15]. Other common approaches to filling holes involve the use of interpolation techniques such as splatting [13] or mesh stretching [10].

3.3  Combined Techniques

Badt wrote one of the first papers to combine ray tracing and pixel reprojection to utilize frame-to-frame coherence generated by general camera movement [5]. He presented two algorithms for accelerating ray tracing using information from previous (and future) frames, using both image space and pixel reprojection techniques. The major problem with these algorithms is that they are not guaranteed to produce correct images; there may be missing objects, gaps, and unoccluded areas that should be hidden. Badt's solution to correct these errors was to have a human operator manually touch-up the animation by selecting areas of each frame that needed to be retraced. Adelson and Hodges further developed Badt's pixel reprojection algorithm, identifying and removing sources of error in order to produce stereoscopic images [1], [2]. In a later paper [3], they applied their modifications to Badt's original problem-generating animations.

4  The Architecture

When generating successive frames in an animation or interactive environment, it is most common to simply calculate each frame independently. Although some global information may be retained (such as view-independent scene preprocessing used for bounding box hierarchies or radiosity calculations), these tend to be the exceptions rather than the rule. This research seeks to increase the amount of information shared between frames by leveraging frame-to-frame coherence.

This work is most similar to that done by Badt, Adelson and Hodges. The primary distinction between this and earlier projects is that this framework provides an interactive environment. Interactivity imposes new constraints and adds issues that were not concerns in the previous works. Badt had access to the entire animation, so he could rely on making retroactive corrections to images as errors were detected in the course of rendering new frames. Also, in both Badt's work and Adelson and Hodges' refinements, large amounts of data are saved to disk for each frame. Using the disk as a storage device is not desirable in an interactive environment. Mobile objects must also be handled in a new manner. Mobile objects are not addressed in Adelson and Hodges' work while Badt's algorithms made assumptions based on the ability to access the entire animation.

The following sections describe the architecture proposed for the new hybrid rendering framework.

4.1  The Rendering Pipeline

The rendering pipeline is divided into two major stages. During the first stage pixel reprojection occurs, while the second stage involves image refinement. Although these stages must be performed sequentially, internally they can be run in parallel. Figure 1 outlines the complete rendering pipeline. One master thread is responsible for processing user input, updating moving objects, generating the workload for the worker threads, and updating the display. The worker threads are responsible for calculating both stages of the rendering process.

Rendering Pipeline
Figure 1: The Rendering Pipeline: Rounded boxes denote tasks performed by worker threads, while square boxes denote tasks performed by the master thread.

Computation is performed on an enhanced image buffer. In addition to color information, additional data is stored at each pixel location. Figure 2 enumerates the data that is tracked for each entry in the image buffer. This record provides information needed by the rendering process in order to efficiently reuse calculations from previous frames.

Pixel Record
Figure 2: A Pixel Record

In order to perform pixel reprojection, there must exist a source image buffer and a destination image buffer. The destination buffer is always assumed to be empty. The source buffer provides the information that will be remapped into the destination buffer. Since this is not an in-place mapping, frameless rendering techniques cannot be applied to stage one of the rendering process.

4.1.1  Stage One: Pixel Reprojection

Stage one of the rendering process is performed only if pixels in the image buffer should be moved for the next frame. There are two possible reasons that pixels might move: either due to viewpoint motion, or due to object motion within the rendering environment.

Viewpoint motion requires that every pixel in the current frame be changed. This is done by transforming the world coordinates of each pixel in the source buffer by the new viewport matrix. This will result in coordinates that either fall outside the destination buffer or coordinates that can be reused.

Basic pixel reprojection simply maps each pixel in the source buffer to a single pixel positioned in the destination buffer. If this transformation results in multiple source pixels mapping to a single location, the pixel closest to the viewer will be displayed. This one-to-one mapping can leave a significant number of holes in the destination buffer. This problem is handled by explicitly ray tracing these areas during the second stage of rendering.

When a pixel is moved from the source buffer into the destination buffer, its time-to-live counter is decremented. A pixel with a zero time-to-live counter will not be reprojected, leaving a hole that stage two will fill.

Pixel movement due to object motion is a more complicated problem and is discussed in greater detail in section 4.2.

4.1.2  Stage Two: Image Refinement

Stage two of the rendering process is responsible converting the destination buffer into a displayable image. Its primary task is to reduce overall error in the image, either by filling in holes left by stage one or refreshing data in old pixels.

During this stage, each pixel in the destination buffer is examined. The function performed on each pixel varies depending on the pixel type. See figure 3 for a list of all possible pixel types. A pixel can be flagged empty either a result of its time-to-live count expiring, or as a hole left from stage one. Every pixel is considered empty for the first frame. This provides the application with its initial base frame, from which all successive frames are then derived. When an empty pixel is filled, its time-to-live count is set to a user-defined base value plus a random jitter. This jitter spreads the cost of retracing pixels over several frames, so the frame rate does not suddenly drop x frames after a substantial image update. The time-to-live count can additionally be modified due to material type or object identifier. This ability is helpful in rendering moving objects (See section 4.2 for more detail). Deferred pixels are pixels that require deferred shading in order to incorporate viewpoint specific effects caused specular or reflective surfaces. In this case, the surface normal and material types are passed to the shading routine, and the color for the pixel is recalculated. Although this is more expensive than simply reusing the original color, it is more cost effective than having to retrace the pixel.

Pixel TypeAction
NormalNone
EmptyRetrace pixel, set initial time-to-live
DeferredReshade pixel using deferred shading

Figure 3: Possible Pixel types

Once a tile has been processed, it may be optionally anti-aliased. Since anti-aliasing is a view-dependent process, it is only performed at display time, otherwise the pixel information reused between frames would contain greater error. To this end, the tile will be reduced in size and placed into a separate pending display buffer. This means that a larger image will be maintained internally than is displayed to the user, effectively performing super-sampling.

4.1.3  Additional Refinement

Once the workers have completed the rendering stages, the master displays the result, processes user input, moves objects, and performs the process again. If the camera position has not changed and there are no moving objects, the master begins a refining process. This refining process begins by reducing the time-to-live count for every reprojected pixel, allowing the displayed image to eventually reach its ray traced ideal. Finally, the workers begin to anti-alias the image by super-sampling. This super-sampling process is updated via frameless rendering, writing directly to the image on screen. Since this anti-aliasing technique is view-dependent, the information gained in this manner will be lost when the next pixel warp is performed.

4.2  Moving Objects

Introducing mobile objects into the environment presents difficulties for both modules of the rendering engine. In the ray tracing module, moving objects increase the expense of determining object/ray intersections since each ray must be tested not only against the static bounding box hierarchy, but against the full list of moving objects as well. This cost increases dramatically as the number of moving objects increases.

Mobile objects present an interesting challenge to the pixel reprojection module. Basic image reprojection techniques assume that the only motion in a scene is caused by movement of the viewer. This allows the transformation matrix to be applied equally to every pixel in the frame buffer. However, if a pixel belongs to a moving object, the transformation generates incorrect images. The simple solution is to force pixels derived from moving objects to refresh every frame. Alternatively it is possible to associate a unique transform matrix with each object and attach this information to the pixel record.

5  Implementation Details

A program adhering to the architecture outlined in the previous section has been implemented. This program was dubbed RRIG (Rapid, Realistic Image Generator). The following sections describe various design decisions, specific programming details and discoveries made along the way.

5.1  Parallel Design Issues

One of the most important aspects of getting RRIG to perform at interactive frame rates is the parallelization of the architecture. The following sections describe how this was done for each stage in the rendering pipeline.

5.1.1  Stage One: Pixel Warping

Originally, RRIG did not perform pixel warping in a parallel fashion. In relatively small image sizes (256x256), pixel warping could be performed by a single processor without incurring too noticeable of a delay. However, as image size increased, it became apparent that the pixel warping was a bottleneck.

In order to perform pixel warping in a parallel fashion, RRIG divides the image buffer into n horizontal slices, where n is the number of worker threads. Each thread then performs the pixel reprojection as outlined in section 3.2. Since reprojection has a linear cost with respect to the number of pixels to be computed, each thread will perform roughly the same amount of work. This means that worker load balance is not a problem in stage one. Notice that each thread is writing to a common destination buffer. Although pixels reprojected within a thread are properly ordered, pixels from different strips that map to the same location in the destination buffer are not guaranteed to be correct. In practice, however, warped pixels are recalculated when their time-to-live counter expires, so this artifact is not a major issue.

5.1.2  Stage Two: Image Refinement

The image refinement stage is primarily composed of ray tracing. Although ray tracing trivially parallelizes, the time to cast each individual ray can vary greatly. This makes load balancing a very real issue. RRIG approaches this problem by dividing the destination buffer computed by stage one into a series of mxn tiles, where m and n are user defined variables at run-time. For each frame, the master thread creates m*n tiles. These tiles are then processed in a linear fashion by the worker threads using the method outlined in section  4.1.2 Each worker obtains a set of tiles using one of two methods.

5.1.3  Image Refinement: Method One

The first load-balancing method implemented in RRIG is a class of dynamic load balancing, referred to in this paper as reduced dynamic load balancing. In this method, workers initially withdraw a number of tiles from the work queue, process the tiles, then request more tiles. The number of tiles a worker thread obtains on each request varies, depending on the total number of tiles in the scene, the number of worker threads, and the number of requests a particular worker has made up to that point.

Initially all threads will obtain init tiles, where:

init = total  number  of  tiles
2*total  number  of worker  threads
(1)

Upon each successive request, the worker will ask for half the number of tiles as the previous request, unless this would cause the worker to request zero tiles. In this case, the worker will request a single tile, and continue doing so until all the remaining tiles are exhausted. This method of load balancing allows dynamic requests without incurring a large amount of interprocess communication overhead.

Figure 4 shows a map of this algorithm in action. In this example, four worker threads are processing a 512x512 pixel display. The scene being processed is a close-up of the Mayan pyramid described later in section 6. The tiles are processed from the bottom of the figure to the top. Initially, each thread is allocated one-eighth of the screen. When finished, each worker requests a new block covering one-sixteenth of the screen, and so on. Notice that at the end, the scene is essentially being processed one tile at a time, but primarily only by two workers. The remaining workers are busy processing earlier, more time-consuming tiles.

Load Map
Figure 4: A map indicating screen areas processed by each worker thread using the dynamic reduction load balancing method. Four workers were utilized on a 512x512 pixel display, where individual tiles are 16x4 pixels.

5.1.4  Image Refinement: Method Two

The second load-balancing method implemented in RRIG is a task-stealing scheme. In this scheme, the workers are considered to be in a ring. Each worker is assigned up to max tiles, where:

max = total  number  of  tiles + total  number  of worker  threads
total  number  of worker  threads
(2)

When a worker has completed its allocated tile set, it peeks at the tile set of the worker next in the ring. If it finds uncompleted tiles in its neighbor's tile set, the worker ``steals'' half of the remaining tiles. However, if the adjacent worker has already completed its work, then the original worker continues to the next worker in the ring. If a worker travels completely around the ring without finding new work, it goes to sleep until the next frame.

Task Stealing
Figure 5: Task Stealing: Rounded boxes denote worker threads joined in a ring formation.

A simple example of this process is outlined in Figure 5. In this figure, Worker A has just completed its tile set. It peeks at Worker B's load, but there are no tiles available. Worker A then examines Worker C's load and discovers that there are four available tiles. It then takes two of these tiles for itself.

Load Map
Figure 6: A map indicating screen areas processed by each worker thread using the task stealing load balancing method. Four workers were utilized on a 512x512 pixel display, where individual tiles are 16x4 pixels.

Figure 6 shows a map of the task stealing algorithm at work. This example is processing the identical scene outlined for Figure 4, and again is using four workers. Initially, each worker is allocated tiles composing one-quarter of the screen. This tile set is processed from the bottom to the top. As a worker finishes its tile set, it attempts to steal work from other workers. This can be seen most clearly in the area being processed by the worker delineated in black. Sections originally allocated to this worker are eventually processed by every worker in the scene.

Task-stealing provides a method of load balancing that is similar to other dynamic request methods. However, instead of requiring the workers to all lock a single master queue when requesting new work, the locks are distributed among the workers. This distinction becomes important when there are a large number of tiles in a scene.

5.2  Data Structures

The data structure that has the most significant impact in RRIG's rendering time is the pixel record, as outlined in Figure 2. During pixel warping, the fields describing pixel type, world position and color are accessed. During image refinement, any of the fields may be accessed.

Early on, the decision to keep world coordinates in the pixel record was made. The alternative is to use an inverse perspective matrix when performing pixel warping. Storing the world coordinates provides two major advantages: first, positional errors do not accumulate as rapidly when warping pixels; second, it allows the ray tracer to cast jittered rays into the scene, rather than relying on a ray passing through the center of each pixel. This reduces texture artifacting to some degree.

Originally, the color records were kept in a separate, contiguous memory location. This allowed the image to be easily copied directly from memory to the screen using a standard OpenGL call. Unfortunately, this incurred heavy cache thrashing during the rendering stages.

To reduce this problem, the color field was integrated back into the pixel record. To allow OpenGL to display the image, an additional marshalling step is now required. When a worker finished refining a tile, it copies the color fields into a separate display buffer. Anti-aliasing, if desired, is performed during this marshalling process.

5.3  Paths Not Taken

Extending the pixel reprojection stage with a splatting technique was explored. However, several issues arose that made splatting an awkward fit into the rendering architecture. The primary difficulty is that in order to reduce visual artifacts, alpha blending must be used to soften the edges of the splat. If performed in software, this would require a much greater overhead in the pixel mapping routines, since an ordered z-buffer would need to be kept. Alternatively, it would be possible to use OpenGL calls to perform the compositing in hardware. However, in order to generate the successive frame, RRIG needs to know the color of each pixel from the previous frame. While it is possible to read the OpenGL display buffer, it is very expensive, and would also require a further step of writing the color information to each pixel record. It was felt that the performance cost of splatting in stage one of the rendering pipeline would eliminate the benefits obtained for stage two.

A Moving Shadow
Figure 7: A moving shadow

In addition to displaying the moving objects themselves, RRIG must also be concerned with secondary visual effects, such as moving shadows. Doing this in an efficient manner is problematic. Figure 7 shows a shadow cast by a ball moving across a surface. The shadow in frame one consists of the regions labeled A and B. The shadow in frame two consists of the regions labeled B and C. It is fairly trivial to mark a pixel as ``in moving shadow'' when it is initially traced. When the object advances, pixels marked as being in shadow can be recalculated using deferred shading. This will correctly remove region A from beneath the shadow. The difficulty arises in correctly updating region C. In order to do this, RRIG must somehow determine that pixels in region C need to be updated. This becomes quite problematic. A few possible solutions might be to construct shadow volumes (eliminating much of the advantages that ray tracing lends to rendering), tag every surfaces upon which any moving shadow falls as ``dirty'' (giving incorrect results if the shadow crosses surfaces), or simply to lower the time to live value of the entire scene to a low value. RRIG currently handles this situation by ignoring shadows cast by moving objects.

6  Results and Conclusion

A number of experiments were run to determine the performance of RRIG. These experiments were run on a lightly loaded MIPS R10000 machine with 32 250 MHz IP27 processors and 8192 megabytes of main memory. Run times were measured in wall-clock time, and converted into a measure of frames per second.

The test scene consisted of a Mayan pyramid with two L-system generated trees. The static scene was composed of 80,991 triangles. Added to this was a 4032 triangle teapot which moved along a linear path on top of the pyramid. The pyramid pillars, comprising the majority of the geometry in the scene were shaded using a marble solid texture based on a Perlin noise model. The base of the pyramid was shaded using a solid checker pattern. Illumination for the scene was provided by two shadow casting point light sources.

Although RRIG is usually used dynamically, with the user controlling the camera with the mouse, it can be used in a batch manner for generating specific timings. A camera path exploring the scene was recorded while RRIG utilized 24 workers (25 processors total). This path consisted of 135 data points. This path was then rendered, varying the parameters of frame size, load-balancing method, number of processors and anti-aliasing. Results were measured for 2, 4, 8, 16 and 24 worker threads, which were migrated to separate processors. Each experiment also utilized an additional processor for the master thread. When rendering in batch mode, RRIG does not use future knowledge (i.e. future camera and object positions), but continues to run dynamically, simply reading input from a file rather than from the mouse. When rendering, RRIG attempts to normalize the path. This causes experiments with slower rendering times than the recorded path to skip frames. So, for instance, running the scene with only two processors may cause RRIG to only display 20 frames along the path rather then each of the possible 135 frames.

Table 1 outlines the results using a frame size of 512x512 pixels and the dynamic reduction load-balancing scheme. The display size of the anti-aliased run is 256x256. Internally, RRIG computes a 512x512 display, then reduces this in size to perform smoothing.

WorkerImage WarpRay TraceOversampled
ThreadsMinMaxAvgMinMaxAvgMinMaxAvg
20.4562.020.9520.4061.280.7120.4511.840.953
40.8942.951.80.7722.291.310.8443.221.78
81.695.083.281.534.092.341.75.33.28
162.687.425.212.967.534.452.887.485.45
243.777.96.533.849.976.374.068.16.62

Table 1: 512x512 display, dynamic reduction load balancing (measured in frames per second)

Table 2 summarizes the results given by using the same parameters as used for Table 1, but task stealing is used as the load balancing scheme.

WorkerImage WarpRay TraceOversampled
ThreadsMinMaxAvgMinMaxAvgMinMaxAvg
20.4161.840.9540.4251.460.7360.4121.940.944
40.7013.371.890.732.321.360.6863.191.86
81.285.593.451.334.422.581.35.483.44
162.428.075.782.328.054.812.48.275.75
242.98.396.943.0610.26.883.228.597.06

Table 2: 512x512, task stealing (measured in frames per second)

Table 3 outlines the results using a frame size of 768x768 pixels and the dynamic reduction load-balancing scheme. Again, the reduction step performed by the anti-aliasing routine results in a display size of 384x384 for the anti-aliased run.

WorkerImage WarpRay TraceOversampled
ThreadsMinMaxAvgMinMaxAvgMinMaxAvg
20.1820.6890.330.1570.5640.2750.2011.320.393
40.4041.540.7450.3560.9380.5910.4051.580.75
80.7682.471.470.6931.881.090.7782.391.43
161.353.622.51.373.612.071.063.852.44
241.834.263.111.985.082.941.834.283.16

Table 3: 768x768, dynamic reduction load balancing (measured in frames per second)

Finally, Table 4 outlines the results using a frame size of 768x768 pixels and the task stealing load-balancing scheme.

WorkerImage WarpRay TraceOversampled
ThreadsMinMaxAvgMinMaxAvgMinMaxAvg
20.1861.250.4380.1610.4590.2830.1871.260.438
40.31.460.7780.3261.060.6130.2891.610.771
80.5992.691.560.622.031.20.5912.751.57
161.113.932.651.183.972.31.134.02.66
241.574.43.31.575.513.261.534.533.35

Table 4: 768x768, task stealing (measured in frames per second)

Figure 8 records the average amount of time spent by each worker in the program subsections. The timings were recorded using the same camera path as the experiments detailed above. The scene was rendered using a 512x512 pixel display, no anti-aliasing, and dynamic reduction as the load balancing scheme.

Time per Worker
Figure 8: Average time per worker in program subsections

There are some interesting trends that can be drawn from the data. First of all, it is clear that load balancing via task stealing is more efficient than via dynamic reduced load balancing. This is particularly true when there are a larger number of processors. This is due using a single locked queue in the latter method versus locks distributed among the workers in the former.

It is interesting to note that the anti-aliased frames (oversampled) are often faster than their unprocessed counter-part. This is particularly evident at the 768x768 resolution. This may be a result of having the master thread update the display, forcing it to complete the OpenGL display call before generating new work.

Although the augmented ray tracing engine always outperformed the straight ray tracing engine, it did not scale as well. The reason for this can be seen in figure 8. Notice that although the image refinement stage (composed mostly of ray tracing new pixels) scaled fairly linearly, the pixel warping stage does not. In fact, it took almost as long to remap the pixel buffer with 8 processors as it did with 24 processors. This effect is significant enough, that during development of RRIG, when run on 60 195 MHz nodes of an MIPS R10000, the ray tracing engine outperformed the augmented engine by up to 47 percent. Clearly, in order to improve performance when using large numbers of processors, the pixel warping stage needs to be reexamined.

Although not clearly visible in the data, subjectively RRIG has a better immediate response time then ray tracing alone. When users are actively running the system, the pixel reprojection routines tend to provide a more sensitive feel. This is due to the fact that in a local area with small movements, the ray tracing engine running alone will require approximately the same amount of time to render each frame. However, in the same circumstance, RRIG can generate a displayable image much faster than having to retrace the entire scene.

As it stands, the current implementation of RRIG performs most efficiently when it utilizes 4 to 8 processors. Multiprocessor desk machines are becoming increasingly popular. With the recent introduction of Intel's quad-Xenon chipset and AMD's multiprocessor Athlon architecture, it is quite possible that the RRIG rendering framework could be on the desktop within nine to twelve months.

7  Future Work

The most important future work necessary to increase the speed of RRIG in a highly parallel environment is the development of a more scalable pixel reprojection. One possible approach to accomplish this is allocation of separate destination buffers for each worker thread. This would necessitate an additional stage to the pixel reprojection to composite the destination buffers into the single buffer required by the image refinement stage. This approach would help alleviate the read/write contention that currently occurs. However, the overhead of two barrier waits and accessing a large amount of non-sequential memory may negate the advantages offered--at least when a small number of worker threads are being utilized.

Another approach to increase RRIG's scalability might be to extend McMillian's list-priority mapping technique [12]. McMillan's method guarantees that pixels will arrive in a back to front order by processing pixels in a specific order without having to perform a Z-buffer test. Parallelizing this algorithm requires classifying the image buffer into two distinct regions. The first class contains all areas that can be processed in an arbitrary order. The second class contains potential ``overlap areas'' and must be processed after the first class regions are completed. In order to classify the second class regions, the maximum vertical distance d by which a given pixel can be displaced must be determined. This can be found by reprojecting either the pixel with the minimum range value in the source buffer or reprojecting a pixel on the hither clipping plane, and finding the difference from its original position. Once d is known, the overlap areas can be found at the borders of adjoining tiles as shown in figure 9. Figure 9 shows four workers dividing up the source buffer. First, each worker processes the class one regions (white and red) using McMillan's method. After every worker has completed this step, class two regions (yellow) are processed, resulting in potential overlap with the red areas. This algorithm assumes that d is smaller than the height of the class one tiles. If this is not the case, the algorithm should either reduce the number of worker threads (increasing the size of the tile), or utilize a different algorithm.

McMillan Warp
Figure 9: Parallelizing McMillan's warp ordering method: e is the center of projection for the next frame, shaded areas denote overlap areas

A second major area for future work is the efficient incorporation of secondary visual effects from moving objects. One possible method might be to track regions likely to require updates. This would require an image processing stage which would note pixels that no longer are in shadow or no longer reflect a moving object. These pixels could then be used as a base to predict which new areas may be affected. Determining a viable method of performing this task in a real-time system promises to be an interesting area of exploration.

There are several other avenues for investigation that would have a lesser impact on system performance. One of these is an examination of the effects of tile size during image refinement. The current tile size was determined in an ad hoc manner, and a more rigorous method may result in a slight performance increase. Another parameter to test is a pixel's time-to-live value. Increasing this value should result in a potential performance gain, but at the cost of image accuracy. Finally, augmenting the enriched buffer with a limited layered depth image of 2-3 layers may improve responsiveness but incur a much higher expense in memory and frame calculation time.

References

[1]
S. J. Adelson and L. F. Hodges, Visible surface ray-tracing of stereoscopic images, in Proc. SE ACM, 1992, pp. 148-157.

[2]
, Stereoscopic ray-tracing, The Visual Computer, 10 (1993), pp. 127-144.

[3]
S. J. Adelson and L. F. Hodges, Generating exact ray-traced animation frames by reprojection, IEEE Computer Graphics and Applications, 15 (1995), pp. 43-52.

[4]
J. Arvo and D. Kirk, A survey of ray tracing acceleration techniques, in An Introduction to Ray Tracing, A. S. Glassner, ed., Academic Press, 1989.

[5]
S. Badt, Jr., Two algorithms for taking advantage of temporal coherence in ray tracing, The Visual Computer, 4 (1988), pp. 123-132.

[6]
R. L. Cook, T. Porter, and L. Carpenter, Distributed ray tracing, in Computer Graphics (SIGGRAPH '84 Proceedings), vol. 18, July 1984, pp. 137-45. Monte Carlo distribution of rays to get gloss, translucency, penumbras, depth of field, motion blur.

[7]
S. J. Gortler, L. wei He, and M. F. Cohen, Rendering layered depth images, technical report, Microsoft Research, Mar. 1997.

[8]
J. T. Kajiya, The rendering equation, in Computer Graphics (SIGGRAPH '86 Proceedings), D. C. Evans and R. J. Athay, eds., vol. 20, Aug. 1986, pp. 143-150.

[9]
S. B. Kang, A survey of image-based rendering techniques, technical report, Digital Equiptment Corp., Camberidge Research Labratory, Aug. 1997.

[10]
W. R. Mark and G. Bishop, Efficient reconstruction techniques for post-rendering 3d image warping, technical report, University of North Carolina at Chapel Hill, Dept of Computer Science, Mar. 1998.

[11]
W. R. Mark, L. McMillan, and G. Bishop, Post-rendering 3d warping, in 1997 Symposium on Interactive 3D Graphics, P. Hanrahan and J. Winget, eds., ACM SIGGRAPH, Apr. 1997, pp. 7-16. ISBN 0-89791-884-3.

[12]
L. McMillan, A list-priority rendering algorithm for redisplaying projected surfaces, technical report, University of North Carolina at Chapel Hill, Dept of Computer Science, Aug. 1995.

[13]
L. McMillan and G. Bishop, Head-tracked stereoscopic display using image warping, in Volume 2409, S. Fisher, J. Merritt, and B. Bolas, eds., Proceedings SPIE, Addison Wesley, 1995, pp. 21-30. San Jose, CA, Feb. 1995.

[14]
S. Parker, W. Martin, P.-P. Sloan, P. Shirley, B. Smits, and C. Hansen, Interactive ray tracing, in I3D, April 1999. Accepted for publication.

[15]
J. Shade, S. Gortler, L. wei He, and R. Szeliski, Layered depth images, in SIGGRAPH 98 Conference Proceedings, Annual Conference Series, ACM SIGGRAPH, Addison Wesley, July 1998, pp. 231-242.

[16]
L. Westover, Footprint evaluation for volume rendering, in Computer Graphics (SIGGRAPH '90 Proceedings), D. C. Evans and R. J. Athay, eds., vol. 24, Aug. 1990, pp. 367-376.

[17]
T. Whitted, An improved illumination model for shaded display, Communications of the ACM, 23 (1980), pp. 343-349.