Ray tracing naturally lends itself towards parallel implementations. The computation for each pixel is independent of all other pixels, and the data structures used for casting rays are usually read-only. These properties have resulted in many parallel ray tracers, as discussed in Section 5. The simplest parallel shared memory implementation with reasonable performance uses Master/Slave demand driven scheduling as follows:
Master Task
initialize model
initialize ray tracing slaves on each free CPU
loop
update viewing information
lock queue
place all primary rays in queue
unlock queue
when the queue is empty redraw screen and handle user input
end loop
The ray tracing slaves are simple programs that
grab primary rays from the queue and compute
pixel RGB values:
Slave Task
initialize memory
loop
if queue is not empty
lock queue
pop ray request
unlock queue
compute RGB for pixel
write RGB into frame buffer pixel
end if
end loop
This implementation would work, but it would have excessive synchronization overhead because each pixel is an independent task. The actual implementation uses a larger basic task size and runs in conventional or frameless mode as discussed in the next two sections.