James Bigler: CS6620 Homework 3
Implement a bounding volume hierarchy for your raytracer and render
1 million spheres.
All images are 500x500 and include shadows. Timing was done on an Origin 3k with R12k
(400Mhz) processors. I did timing for scenes with 10e3, 10e4, 10e5,
and 10e6 spheres to get an idea of how the algorithms scale. Most of
these runs were run with 10 processors using pthreads to implement the
parallelism. The work time is the total time used by all the
processors added up. The scaling is near linear, so these numbers
should be close.
Best results are acheived by creating the smallest bounding volumes.
Simply sorting the objects once would create a situation where you
have many thin slices of data in each bounding volume. I rotated
which axis to sort by in each iteration. This means I sorted in X the
first pass, then Y, then Z, and finally repeating this pattern. This
heuristic poved to perform well.
This produced decent rendering results, but I was troubled at the
scene creation time. It was growing super linearly.
| num prims | Building time | Rendertime |
| 1000 | 0.0227 | 9.45 |
| 10000 | 0.371 | 14.3 |
| 100000 | 6.5 | 37.8 |
| 1000000 | 110 | 123 |
This was kind of slow, so I optimized a few things and I got faster
times, but still the same poor scaling.
| num prims | Building time | Rendertime |
| 1000 | 0.00572 | 4.89 |
| 10000 | 0.127 | 10.9 |
| 100000 | 3.0 | 26.5 |
| 1000000 | 54 | 71.0 |
I knew there must be a better way to do this. After reading another
classmates page about his qsplit, I came to the same conclusion. It
doesn't matter if all the element in the array are sorted, just that
the first half is smaller than the second half. That is when Brian Budge pointed me the nth_element function in stl. This
function did exactly what I needed (elements in the first half less
than elements in the second half), but was order N. Thus I could get
linear scaling, but keep the same performance. This is nice! Here
are some timeing results. I could use some additional optimizations,
but the numbers scale. Also note that the MipsPro compiler CC did a
better job of optimizations.
Using CC -Ofast stl::nth_element
| num prims | Building time | Rendertime |
| 1000 | 0.00457 | 3.72273 |
| 10000 | 0.057749 | 9.22949 |
| 100000 | 1.07958 | 23.9744 |
| 1000000 | 14.91 | 61.9529 |
Using CC -Ofast stl::sort
| num prims | Building time | Rendertime |
| 1000 | 0.006191 | 3.81053 |
| 10000 | 0.099931 | 9.2021 |
| 100000 | 2.97277 | 24.2589 |
| 1000000 | 54.1093 | 63.8467 |
Using g++ -O3 stl::nth_element
| num prims | Building time | Rendertime |
| 1000 | 0.007505 | 7.0683 |
| 10000 | 0.099532 | 17.0937 |
| 100000 | 1.49093 | 41.997 |
| 1000000 | 19.3613 | 97.7004 |
Using g++ -O3 stl::sort
| num prims | Building time | Rendertime |
| 1000 | 0.010329 | 7.05071 |
| 10000 | 0.156536 | 17.0274 |
| 100000 | 3.96311 | 41.9395 |
| 1000000 | 66.4503 | 97.4918 |
Here are some images where you can see the bounding boxes. Notice
that they are the same for either sorting method.
| stl::nth_element | stl::sort |
 |
 |
 |
 |
 |
 |
 |
 |
All these images were created using 16 jittered samples per pixel.
1,000 Spheres
10,000 Spheres
100,000 Spheres
1,000,000 Spheres
// this is sample code