New Directions in Memory Architecture

June 12, 2014

Bob Brennan,
Senior Vice President
Memory Solutions Lab
Bob.Brennan@Samsung.com
This presentation is intended to provide information concerning memory industry trends. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this presentation. Samsung reserves the right to make improvements, corrections and/or changes to this presentation at any time.

The information in this presentation or accompanying oral statements may include forward-looking statements. These forward-looking statements include all matters that are not historical facts, statements regarding the Samsung Electronics' intentions, beliefs or current expectations concerning, among other things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature, forward-looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this presentation or in the accompanying oral statements. In addition, even if the information contained herein or the oral statements are shown to be accurate, those developments may not be indicative developments in future periods.
Agenda

» Environment – BW & Capacity growth

» DRAM – BW & Capacity -> Tiering

» Flash – Scales, Gets Intelligent, Tiers

» New “Persistent Performance”
2012: Mobile connected devices exceeded the world's population
Environment: Datacenter Infrastructure

More applications for data

Billions of Devices!

Data traffic: 78% CAGR

* 1000 PB: 1EB (10^{18})

Source: Cisco Visual Networking Index

More video is uploaded to YouTube in one month than the 3 major US networks created in 60 years

What about Exabytes?

5 EB: Total data created between the dawn of civilization and 2003

© Samsung
Environment: Escalating Demand for DRAM and Storage

In-Memory Analytics for Big Data

Growing x86 Server Virtualization Density

Escalating Memory-Intensive Workloads

Data Center Processor Growth

- HPC
- Graphics
- Financial
- Gaming
- Big Data

EXABYTES

VMs per Host

Source: EMC and IDC

Source: Gartner and IDC

Source: Intel

© Samsung
Environment – Bandwidth Demand

Memory Bandwidth Requirements

| Peta-flops | 2018 | 100x (~1.4TB/s) |
| 400~600 Mbps | 7.5x (~100GB/s) |
| 10~20GB/s |
| 12.5x (~5.3Gbps) |

Now

[Source: “Memory systems for PetaFlop to ExaFlop class machines” by IBM, 2007 & 2010]

Mobile: Display/GFX/Camera
Exponential Bandwidth Demand

Server: Core Scaling
Linear to Exponential Bandwidth Demand

Display
Camera
Video
N-screen

FHD (1920x1080) 13MP 1080p F-HD
UD (3840x2160) 20+MP 4K UHD

© Samsung
Environment – Capacity Demand

Memory Capacity Requirements

- Memory Capacity/System
- Memory Capacity/Node

Now
- >70x (~10PB)
- >32x (~128GB)
- >5x (~750TB)
- >4x (~16GB)
- 100~200TB
- 2~4GB

2018
- >70x (~10PB)
- >32x (~128GB)
- >5x (~750TB)
- >4x (~16GB)
- 100~200TB
- 2~4GB

Peta-flops
- 2~4GB
- 100~200TB

20Peta-flops
- >4x (~16GB)

Exa-flops
- >70x (~10PB)

[Source: “Memory systems for PetaFlop to ExaFlop class machines” by IBM, 2007 & 2010]

Mobile:
Display/GFX/Camera

~Linear Capacity Demand

Server:
Core Scaling

Linear - Exponential Capacity Demand
Agenda

» Environment – BW & Capacity growth

» DRAM – BW & Capacity -> Tiering

» Flash – Scales, Gets Intelligent, Tiers

» New “Persistent Performance”
The “Trade-off Triangles”

Bandwidth
Power
Latency

Capacity

IOPs
Power
Endurance

Capacity

DRAM
Non-Volatile

© Samsung
DRAM: Bandwidth Scaling

Bandwidth vs. [Mbps]

- 1333, 1600, 1866, 2133, 2400/2667, 3200

Multi-Drop Bus Challenge:
- Higher BW, Lower VDD

DDR4

DDR5 (?) & New I/F (?)

DDR Wall?

Optical (?)

Subject to cost/energy efficiency, scaling, ...

New Solution Needed
DRAM: Scaling Challenges

- **Refresh**
  - Difficult to build high-aspect ratio cell capacitors decreasing cell capacitance
  - Leakage current of cell access transistors increasing

- **tWR**
  - Contact resistance between the cell capacitor and access transistor increasing
  - On-current of the cell access transistor decreasing
  - Bit-line resistance increasing

- **VRT**
  - As cell capacitance shrinks, more frequent
DRAM: Latency Challenge

Subject to cost/energy efficiency, scaling, ...

~Constant

Low Latency Needed

Disruptive Solution Needed
**DRAM: “Go Wide” for Bandwidth**

<table>
<thead>
<tr>
<th>ITEM</th>
<th>Mobile WIO2</th>
<th>HBM (High B/W Memory)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>DRAM</td>
<td>Base die + DRAM</td>
</tr>
<tr>
<td></td>
<td>WIO2</td>
<td>Si Interposer</td>
</tr>
<tr>
<td></td>
<td>AP</td>
<td>GPU</td>
</tr>
<tr>
<td></td>
<td></td>
<td>HBM</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Base</td>
</tr>
</tbody>
</table>

| Bottom die   | N/A         | Buffering & Signal re-routing |
| BW (GB/s)    | 25.6~51.2   | 128~256                 |
| Pin Speed    | 0.4~0.8 Gbps| 1~2 Gbps                |
| Pin # I/O    | 512         | 1,024                   |
| #Bump Logic  | 1~2K        | 6K~8K                   |
| #Bump DRAM   | 1~2K        | ~3K                     |
| Cube (GB)    | 1 / 2       | 1 / 2 / 4               |
| # TSV stack  | 1 / 2 / 4   | 1 / 2 / 4               |
| DRAM density | 8Gb         | 8Gb                     |
| Application  | GFX card    | ○                       |
|              | ULT         | ○                       |
|              | HPC         | -                       |
|              | Server      | ○ (Cache)               |
|              | Mobile      | ○                       |

**Good BW & Latency – Still Need Capacity**
DRAM: Hybrid Memory Systems

Mobile

SOC

CPU

Tiered Memory Controller

WIO?

WIO?

WIO?

High BW DRAM

High Bandwidth Tier

High Capacity Tier

Server

Tiered Memory Controller

CPU

DDR4

HBM?

SCM

SCM

Tiered Capacity, Tiered Latency, TL-DRAM?
1st Step: System Tiering DRAM

[Diagram showing high performance and high capacity tiers connected to DDR4 DRAM.]
Agenda

» Environment – BW & Capacity growth

» DRAM – BW & Capacity -> Tiering

» Flash – Scales, Becomes Intelligent, Tiers

» New “Persistent Performance”
Flash: Capacity Scaling

Scaling Becomes Difficult – Need a New Solution

© Samsung
Breakthrough: 128Gb V-NAND

- Vertical-NAND Technology
- Chip Size
  : $133\text{mm}^2 \rightarrow 0.96\text{Gb/mm}^2$
- 24-WL Stacked Layers
- 64Gb Array $\times$ 2-Plane
- One-sided Page Buffer
  : $(8\text{KB} \times 2)$ Page Size
- Asynchronous DDR Interface
  : Wave-pipeline datapath
  : 667Mbps at Mono Die
  : 533Mbps at 8-stacked Dies

World’s 1st 3D V-NAND Mass Production Flash
**V-NAND Array Structure**

- Advanced V-NAND Technology with Damascened Metal Gate
  - Cell: All-around Gate Structure + Charge Trap Flash
  - String: 24-WL + 2-DWL + 2-Select WL
  - Block: 8 Strings with Shared BL (8KB)
# V-NAND Features

<table>
<thead>
<tr>
<th>Feature</th>
<th>Details</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bits per Cell</td>
<td>2</td>
</tr>
<tr>
<td>Density</td>
<td>128Gb</td>
</tr>
<tr>
<td>Technology</td>
<td>Three Dimensional Vertical NAND, 3-metals</td>
</tr>
<tr>
<td>Organization</td>
<td>8KB $\times$ 384 pages $\times$ 5464 blocks $\times$ 8</td>
</tr>
<tr>
<td>Program Performance</td>
<td>50MB/s for Embedded App., 36MB/s for Enterprise SSD</td>
</tr>
<tr>
<td>Data Interface Speed</td>
<td>667Mbps@Mono, 533Mbps@8-stack</td>
</tr>
<tr>
<td>Power Supply</td>
<td>Vcc=3.3V / Vccq=1.8V</td>
</tr>
</tbody>
</table>
✓ Over 50% Lower Energy Advantage is achieved → Increasing overall SSD Performance by using 8-way Interleaving NAND Operation
### Enterprise SSD Comparison

<table>
<thead>
<tr>
<th>SSD Type</th>
<th>Sequential Write (MB/s)</th>
<th>Random Write (IOPS)</th>
<th>Power (mW)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Planar NAND SSD (8-ch, 8-way)</td>
<td>22% Faster</td>
<td>20% Faster</td>
<td>27% Lower</td>
</tr>
<tr>
<td>3D V-NAND SSD (8-ch, 4-way)</td>
<td></td>
<td></td>
<td>45% Lower</td>
</tr>
</tbody>
</table>

- **Smaller Real Estate**
- **Higher Performance**

© Samsung
Flash: Scaling Continues

Design Rule (nm)

2D Planar

3D V-NAND / No Patterning Limitation

- '03: 16Gb
- '09: 24 stack
- '11: 128Gb
- '13: 128Gb
- '15: 1Tb

Capacity, Endurance, Power

© Samsung
Flash: MLC Endurance

☑️ 36MB/s + 35K Endurance for Data-center & Enterprise SSD Applications
☑️ 50MB/s + 3K Endurance for Mobile Applications

- Planar 1xnm NAND after 3K cycle
- 3D V-NAND after 35K cycle

Vth (a.u)

- Avg. tPROG=0.45ms (36MB/s)
- Avg. tPROG=0.33ms (50MB/s)

Normalized (a.u.)

Time [us]
Flash: Performance

Latency & IOPS

- Rotational Latency
- AVG Seek
- IOPS

- 7.2K RPM: 14 ms
- 15K RPM: <0.3 ms
- SSD: <100x

Interface & Performance

- MB/s: PCIe x4 > SAS > SATA
- Power Capacity
- Endurance
- Interface Unlocks Bandwidth: PCIeG2->G3->G4

Solution needs to scale: Controllers, Algorithms, & Flash Organization

Increasing Intelligence & Sophistication
2nd Step: System Tiering Flash/HDDs

- **High Performance Tier**
  - DRAM
  - DRAM

- **High Capacity Tier**
  - DDR
  - DDR
  - DDR
  - DDR
  - DDR
  - DDR
  - DDR
  - DDR
  - DDR
  - DDR
  - DDR
  - DDR
  - DDR

- **Intelligent Flash Tier**
  - F
  - F
  - L
  - L
  - A
  - S
  - S
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H
  - H

- **HDD**
Today’s Rack Scaling

Acknowledgement: Krishna Malladi.
Disclaimer: conceptual model only. CPU data on different scale.

Flash Significantly Improves the DRAM-Disk Gap
Agenda

» Environment – BW & Capacity growth

» DRAM – BW & Capacity -> Tiering

» Flash – Scales, Becomes Intelligent, Tiers

» New “Persistent Performance”
Opportunity for New Technology

- HDD
- Flash
- Persistent Performance
- DRAM
- LLC
STT-MRAM

STT-MRAM Cell Structure

Promising Technology, Not Mature Yet
3rd Step: New possibilities

Persistent Tiered Caching

- High Performance Tier
  - DRAM

- Higher Performance Tier
  - DRAM

- High Capacity Tier
  - HDD

- Intelligent Flash Tier
  - SAS

- Intelligent Flash Tier (PCIe)

- Intelligent Flash Tier (SAS)

- Persistent Performance, Byte addressable

© Samsung
Future Rack Scaling Vision

Acknowledgement: Krishna Malladi. Disclaimer: conceptual model only.

Thank you!

Questions: Bob.Brennan@Samsung.com