

| Outline                                                                                                                                                                                                                                                                 | What is an Execution Model?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>Execution Model</li> <li>Host Synchronization</li> <li>Single Instruction Multiple Data (SIMD)</li> <li>Multithreading</li> <li>Scheduling instructions for SIMD, multithreaded multiprocessor</li> <li>How it all comes together</li> <li>Reading:</li> </ul> | <ul> <li>Parallel programming model         <ul> <li>Software technology for <i>expressing parallel algorithms</i> that target parallel hardware</li> <li>Consists of programming languages, libraries, annotations,</li> <li>Defines the semantics of software constructs running on parallel hardware</li> </ul> </li> <li>Parallel execution model         <ul> <li>Exposes an abstract view of <i>hardware execution</i>, generalized to a class of architectures.</li> <li>Answers the broad question of how to structure and name data and instructions and how to interrelate the two.</li> <li>Allows humans to reason about harnessing, distributing, and controlling concurrency.</li> </ul> </li> </ul> |
| Ch 3 in Kirk and Hwu,<br><u>http://courses.cce.illinois.edu/ece498/al/textbook/Chapter3-CudaThreadingModel.pdf</u><br>Ch 4 in Nvidia CUDA 3.2 Programming Guide                                                                                                         | <ul> <li>Today's lecture will help you reason about the target<br/>architecture while you are developing your code</li> <li>How will code constructs be manual to the bardware?</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| CS6963 3<br>L2: Hardware Overview UNIVERSITY                                                                                                                                                                                                                            | 4<br>CS6963 L2: Hardware Overview UNIVE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |

UNIVERSITY OF UTAH

UNIVERSITY





## 2

|                                                                                                                                                                            | Host Blocking: Common Examples                                                                                                                                                                                                                                           |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| •                                                                                                                                                                          | How do you guarantee the GPU is done and results are ready?                                                                                                                                                                                                              |  |  |
| •                                                                                                                                                                          | Timing example (excerpt from simpleStreams in CUDA SDK):                                                                                                                                                                                                                 |  |  |
|                                                                                                                                                                            | cudaEvent_t start_event, stop_event;<br>cudaEventCreate(&start_event);<br>cudaEventCreate(&stop_event);<br>cudaEventRecord(start_event, 0);<br>init_array<<<br>cudaEventRecord(stop_event, 0);<br>cudaEventRecord(stop_event, 0);<br>cudaEventRecord(stop_event, 0);<br> |  |  |
| •                                                                                                                                                                          | A bunch of runs in a row example (excerpt from transpose in                                                                                                                                                                                                              |  |  |
|                                                                                                                                                                            | CUDA SDK)                                                                                                                                                                                                                                                                |  |  |
| <pre>for (int i = 0; i &lt; numIterations; ++i) {     transpose&lt;&lt;&lt; grid, threads &gt;&gt;&gt;(d_odata, d_idata, size_x, size_y); } cudaThreadSynchronize();</pre> |                                                                                                                                                                                                                                                                          |  |  |
|                                                                                                                                                                            | 9<br>C55963 12: Hardware Overview UNIVERSITY<br>OF UTAH                                                                                                                                                                                                                  |  |  |

| Predominant Control Mechanisms:                  |                                                                                    |                                                                                                                                             |  |  |
|--------------------------------------------------|------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Some definitions                                 |                                                                                    |                                                                                                                                             |  |  |
| Name                                             | Meaning                                                                            | Examples                                                                                                                                    |  |  |
| Single Instruction,<br>Multiple Data<br>(SIMD)   | A single thread of<br>control, same<br>computation applied<br>across "vector" elts | Array notation as in<br>Fortran 95:<br>A[1:n] = A[1:n] + B[1:n]<br>Kernel fns w/in block:<br>compute<                                       |  |  |
| Multiple Instruction,<br>Multiple Data<br>(MIMD) | Multiple threads of<br>control, processors<br>periodically synch                   | OpenMP parallel loop:<br>forall (i=0; i <n; i++)<br="">Kernel fns across blocks<br/>compute&lt;&lt;<gs,bs,msize>&gt;&gt;</gs,bs,msize></n;> |  |  |
| Single Program,<br>Multiple Data<br>(SPMD)       | Multiple threads of<br>control, but each<br>processor executes<br>same code        | <pre>Processor-specific code: if (\$threadIdx.x == 0) { }</pre>                                                                             |  |  |
| C56963                                           | 10<br>L2: Hardware Overview                                                        | UNIVERSITY<br>OF UTAH                                                                                                                       |  |  |











































