## Lecture 13: LRC & Interconnection Networks

• Topics: LRC implementation, interconnection characteristics

#### Shared Virtual Memory



#### Eager Release Consistency

 Invalidates/Updates are sent out to the list of sharers when a processor executes a release



### Lazy Release Consistency

 Invalidates/Updates are sought when a processor executes an acquire – fewer messages, higher implementation complexity



- Acquires and releases pertain to specific lock variables
- When a process executes an acquire, it should receive all updates that were seen before the corresponding release by the releasing processor
- Therefore, each process must keep track of all write notices (modifications to each shared page) that were applied at every synchronization point

# Example

| P1<br>A1 | P2       | P3 | P4 |
|----------|----------|----|----|
|          |          |    | A4 |
| R1       |          |    | R4 |
| A2       | A1       |    |    |
|          | D4       |    | A3 |
|          | R1       |    | R3 |
| R2       | A3       |    |    |
|          | R3       | A5 |    |
|          |          | R5 |    |
|          | A1<br>R1 |    |    |
|          |          | A1 |    |

6

## Example



7

### Implementation

- Each pair of synch operations in a process defines an *interval*
- A partial order is defined on intervals based on releaseacquire pairs
- For each interval, a process maintains a vector timestamp of "preceding" intervals: the vector stores the last preceding interval for each process
- On an acquire, the acquiring process sends its vector timestamp to the releasing process – the releasing process sends all write notices that have not been seen by acquirer

- LRC can reduce traffic by more than a factor of two for many applications (compared to ERC)
- Programmers have to think harder (causality!)
- High memory overheads at each node (keep track of vector timestamps, write notices) – garbage collection helps significantly
- Memory overheads can be reduced by eagerly propagating write notices to processors or a home node – will change the memory model again!

- Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies, hypercubes
- Consider a k-ary d-cube: a d-dimension array with k elements in each dimension, there are links between elements that differ in one dimension by 1 (mod k)
- Number of nodes  $N = k^d$

| Number of switches : |   | Ν   |
|----------------------|---|-----|
| Switch degree :      |   | 2d  |
| Number of links      | • | Nd  |
| Pins per node        | : | 2wd |

| Avg. routing distance: | d(k-1)/2           |
|------------------------|--------------------|
| Diameter :             | d(k-1)             |
| Bisection bandwidth :  | 2wk <sup>d-1</sup> |

Should we minimize or maximize dimension?

### Dimension

- For a fixed machine size N, low-dimension networks have significantly higher latencies for a packet scalable machines should employ high dimensionality (high cost!)
- For a fixed number of pins, message latency decreases at first, then increases (as we increase dimensionality)
- What if we keep constant bisection bandwidth?
- Lower dimensions also reduce wire length

| Number of switches | : | Ν   |
|--------------------|---|-----|
| Switch degree      | : | 2d  |
| Number of links    | : | Nd  |
| Pins per node      | : | 2wd |

Avg. routing distance:d(k-1)/2Diameter:d(k-1)Bisection bandwidth: $2wk^{d-1}$ 

# Title

• Bullet