Mike Stark

Homework #5

Efficiency Structures for Ray-Tracing

Overview

For this assignment we were asked to implement one of two data structures for ray-object intersections. I implemented a bounded volume hierarchy structure, in which a complete binary tree is used to store a hierarchy of bounding boxes; each node containing a box bounding the boxes of its children.

Results

The results tabulated below are times as reported by the standard UNIX time command. The computations were done on "jacoby" in the grad lab, which I understand has a MIPS R4400 processor and is equipped with 128 MB of main memory. (Thanks to Tushar for this tidbit of information. My ray-tracer was compiled with gcc using second-level optimizations. (Remarkably enough, this seemed to beat the native CC compiler.)

The times below are for simple ray-traced images without any reflections or shadow calculations. Some images.

Preprocessing
Ray-Tracing
-- Total --
1 sphere
0.001 s
0.700 s
0.701 s
10 spheres
0.001 s
1.443 s
1.444 s
100 spheres
0.002 s
4.153 s
4.155 s
1,000 spheres
0.011 s
8.869 s
8.880 s
10,000 spheres
0.123 s
22.78 s
22.90 s
100,000 spheres
1.430 s
45.46 s
46.89 s
500,000 spheres
9.168 s
74.23 s
83.40 s

Notice I only went to 500,000 spheres. I would have liked to do a computation with 1,000,000, but the machine didn't seem to want to let me do it. Even if it had, I suspect there would have been so much thrashing the times would have been way off.

How it works

The bounded-volume hierarchy

I implemented what is probably a very naive bounded-volume structure. The algorithm can be summarized briefly as follows. N is the number of objects, and it is assumed each has an axis-aligned bounding box already computed.
  1. Find the smallest k such that 2^k >= N
  2. Partition the objects so that the first 2^(k - 1) objects are in the left portion of the array, according to their smaller x coordinate.
  3. Do the same for the two partitions, except use the smaller y coordinate, and repeat this process, cycling through the three coordinates, until the partition sizes reach 1. The objects will then be the leaves of a complete binary tree.
  4. Compute the bounding boxes of each pair of objects, then for each pair of those interior nodes, etc, until the root is reached.
The key to making this algorithm fast is being able to partition each sub-array quickly. There are a variety of methods for doing this; I used a recursive method based on a "mean of three" approximation to the median, which runs in expected linear time, which makes the expected running time for the algorithm O(N lg N). In general, it seemed the partitioning method was very fast, and certainly much faster than sorting the array at each stage.

Streamlining

Although the partitioning method works very well for the larger chunks of the array, it becomes inefficient for small arrays. So the natural thing to do is switch to a fast sorting algorithm, such as insertion sort for the small arrays. Empirically it seemed the best breakpoint was when the chunk size got down to 8.

Tracing Rays

The ray-tracing part of the code was relatively simple compared to the partitioning scheme, in that traversing the hierarchy is simply a matter of a recursive function, which stops when a leaf is reached (and the sphere test is done.) I suspect this could be improved somewhat by writing a non-recursive algorithm. This, however, could be a little tricky.

Bounding-Box intersection test

The bounding-box intersection test is probably the bottleneck here. I used my own function for this, which I wrote to minimize the number of branches at the expense of arithmetic expressions, which makes sense on deeply-pipelined super-scalar architectures. I tried Brian's code in place of mine, and mine seemed to do a little better, but not by enough to get excited about.

Improvements--or, what I should have done instead!

When I started this assignment I thought that the preprocessing would be the dominant portion of the running time, so I spent a lot of time trying to optimize that code. But as the numbers show, I was dead wrong. After thinking about it some today, I realized that my partitioning method is probably not the best approach, because it really doesn't do anything to ensure that the boxes in the hierarchy are in any way minimal. In fact, if the number of objects is far from a power of two, there will be many long, thin boxes on the right side of the tree. These of course are very undesirable, as they will "catch" a lot or rays that miss the contents outright.

I did actually re-write the code so that instead of routinely cycling through the coordinates at each subdivision, each box was "cut" along its longest dimension. It improved the ray-tracing times somewhat, but the increased preprocessing time made the total time slightly worse. If I had more time, I suspect it could be improved further.

Some images of the hierarchy

For ray-tracings such as the ones for this assignment, where there is no reflection, and only one sample per pixel, it is probably not all that useful to spend a lot of time in pre-processing the object structure. However, for more "realistic" ray-tracing problems, a good structure (that is, better than the one I used) could certainly pay off many times over.