Mike Stark
Homework #5
Efficiency Structures for Ray-Tracing
Overview
For this assignment we were asked to implement one of two data structures
for ray-object intersections. I implemented a bounded volume hierarchy
structure, in which a complete binary tree is used to store a hierarchy of
bounding boxes; each node containing a box bounding the boxes of its
children.
Results
The results tabulated below are times as reported by the standard
UNIX time command. The computations were done on "jacoby" in the grad
lab, which I understand has a MIPS R4400 processor and is equipped
with 128 MB of main memory. (Thanks to Tushar for this tidbit of information.
My ray-tracer was compiled with gcc using second-level optimizations.
(Remarkably enough, this seemed to beat the native CC compiler.)
The times below are for simple ray-traced images without any reflections
or shadow calculations.
Some images.
|
Preprocessing
Ray-Tracing
| -- Total --
| | |
| 1 sphere |
0.001 s |
0.700 s |
0.701 s |
| 10 spheres |
0.001 s |
1.443 s |
1.444 s |
| 100 spheres |
0.002 s |
4.153 s |
4.155 s |
| 1,000 spheres |
0.011 s |
8.869 s |
8.880 s |
| 10,000 spheres |
0.123 s |
22.78 s |
22.90 s |
| 100,000 spheres |
1.430 s |
45.46 s |
46.89 s |
| 500,000 spheres |
9.168 s |
74.23 s |
83.40 s |
Notice I only went to 500,000 spheres. I would have liked to do a
computation with 1,000,000, but the machine didn't seem to want to let
me do it. Even if it had, I suspect there would have been so much
thrashing the times would have been way off.
How it works
The bounded-volume hierarchy
I implemented what is probably a very naive bounded-volume structure.
The algorithm can be summarized briefly as follows. N is the number
of objects, and it is assumed each has an axis-aligned bounding box
already computed.
- Find the smallest k such that 2^k >= N
- Partition the objects so that the first 2^(k - 1) objects
are in the left portion of the array, according to their smaller x
coordinate.
- Do the same for the two partitions, except use the smaller y
coordinate, and repeat this process, cycling through the three coordinates,
until the partition sizes reach 1. The objects will then be the leaves
of a complete binary tree.
- Compute the bounding boxes of each pair of objects, then for each
pair of those interior nodes, etc, until the root is reached.
The key to making this algorithm fast is being able to partition each sub-array
quickly. There are a variety of methods for doing this; I used a recursive
method based on a "mean of three" approximation to the median, which runs in
expected linear time, which makes the expected running time for the algorithm
O(N lg N). In general, it seemed the partitioning method was very fast,
and certainly much faster than sorting the array at each stage.
Streamlining
Although the partitioning method works very well for the larger chunks of the
array, it becomes inefficient for small arrays. So the natural thing to do is
switch to a fast sorting algorithm, such as insertion sort for the small
arrays. Empirically it seemed the best breakpoint was when the chunk size got
down to 8.
Tracing Rays
The ray-tracing part of the code was relatively simple compared to the
partitioning scheme, in that traversing the hierarchy is simply a matter of
a recursive function, which stops when a leaf is reached (and the sphere test
is done.) I suspect this could be improved somewhat by writing a
non-recursive algorithm. This, however, could be a little tricky.
Bounding-Box intersection test
The bounding-box intersection test is probably the bottleneck here. I used my
own function for this, which I wrote to minimize the number of branches at
the expense of arithmetic expressions, which makes sense on deeply-pipelined
super-scalar architectures. I tried Brian's code in place of mine, and mine
seemed to do a little better, but not by enough to get excited about.
Improvements--or, what I should have done instead!
When I started this assignment I thought that the preprocessing would be the
dominant portion of the running time, so I spent a lot of time trying to
optimize that code. But as the numbers show, I was dead wrong. After
thinking about it some today, I realized that my partitioning method is
probably not the best approach, because it really doesn't do anything to
ensure that the boxes in the hierarchy are in any way minimal. In fact, if
the number of objects is far from a power of two, there will be many long,
thin boxes on the right side of the tree. These of course are very
undesirable, as they will "catch" a lot or rays that miss the contents
outright.
I did actually re-write the code so that instead of routinely
cycling through the coordinates at each subdivision, each box was "cut" along
its longest dimension. It improved the ray-tracing times somewhat, but
the increased preprocessing time made the total time slightly worse. If
I had more time, I suspect it could be improved further.
Some images of the hierarchy
For ray-tracings such as the ones for this assignment, where there is
no reflection, and only one sample per pixel, it is probably not all that
useful to spend a lot of time in pre-processing the object structure.
However, for more "realistic" ray-tracing problems, a good structure
(that is, better than the one I used) could certainly pay off many times
over.