Computer Architecture: SIMD and GPUs (Part III) (and briefly VLIW, DAE, Systolic Arrays)
A Note on This Lecture
Last Lecture
Today
Approaches to (Instruction-Level) Concurrency
Graphics Processing Units SIMD not Exposed to Programmer (SIMT)
Review: High-Level View of a GPU
Review: Concept of “Thread Warps” and SIMT
Review: Loop Iterations as Threads
Review: SIMT Memory Access
Review: Sample GPU SIMT Code (Simplified)
Review: Sample GPU Program (Less Simplified)
Review: Latency Hiding with “Thread Warps”
Review: Warp-based SIMD vs. Traditional SIMD
Review: SPMD
Branch Divergence Problem in Warp-based SIMD
Control Flow Problem in GPUs/SIMD
Branch Divergence Handling (I)
Branch Divergence Handling (II)
Dynamic Warp Formation
Dynamic Warp Formation/Merging
Dynamic Warp Formation Example
What About Memory Divergence?
NVIDIA GeForce GTX 285
NVIDIA GeForce GTX 285 “core”
Slide 26
Slide 27
VLIW and DAE
Remember: SIMD/MIMD Classification of Computers
SISD Parallelism Extraction Techniques
VLIW
VLIW (Very Long Instruction Word)
VLIW Concept
SIMD Array Processing vs. VLIW
VLIW Philosophy
VLIW Philosophy (II)
Commercial VLIW Machines
VLIW Tradeoffs
VLIW Summary
DAE
Decoupled Access/Execute
Decoupled Access/Execute (II)
Decoupled Access/Execute (III)
Astronautics ZS-1
Astronautics ZS-1 Instruction Scheduling
Loop Unrolling
Systolic Arrays
Why Systolic Architectures?
Systolic Architectures
Slide 50
Systolic Computation Example
Systolic Computation Example: Convolution
Slide 53
More Programmability
Pipeline Parallelism
File Compression Example
Systolic Array
The WARP Computer
The WARP Computer
Slide 60
Systolic Arrays vs. SIMD
Some More Recommended Readings