1. Trang chủ
  2. » Công Nghệ Thông Tin

Advanced Computer Architecture - Lecture 30: Memory hierarchy design

51 5 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Cấu trúc

  • CS 704 Advanced Computer Architecture

  • Today’s Topics

  • Recap: Improving Cache Performance

  • Recap: Reducing Miss Penalty

  • Slide 5

  • Slide 6

  • Slide 7

  • Slide 8

  • Slide 9

  • Cache Misses

  • Cache Misses - Classification

  • Slide 12

  • Slide 13

  • Slide 14

  • Reducing Miss Rate

  • 1: Larger Block Size

  • Slide 17

  • Slide 18

  • 1: Larger Block Size

  • 1: Larger Block Size: Solution

  • Slide 21

  • Slide 22

  • Slide 23

  • 2: Large Cache Size

  • 3: Higher Associativity

  • Slide 26

  • Slide 27

  • 4: Way Prediction and Pseudo-associativity

  • Slide 29

  • Slide 30

  • Slide 31

  • Slide 32

  • Slide 33

  • 5: Compiler Optimization

  • Slide 35

  • Slide 36

  • Slide 37

  • Slide 38

  • 5: Compiler Optimization: Loop Interchange

  • 5: Compiler Optimization: Using Loop Interchange

  • Slide 41

  • 5: Compiler Optimization: Using Blocking

  • Slide 43

  • Slide 44

  • Slide 45

  • Slide 46

  • Slide 47

  • Summary

  • Slide 49

  • Example: Avg. Memory Access Time vs. Miss Rate

  • Slide 51

Nội dung

Advanced Computer Architecture - Lecture 30: Memory hierarchy design. This lecture will cover the following: cache performance enhancement; reducing miss rate; classification of cache misses; reducing cache miss rate; way prediction and pseudo-associativity; compiler optimization;...

CS 704 Advanced Computer Architecture Lecture 30 Memory Hierarchy Design Cache Performance Enhancement (Reducing Miss Rate) Prof Dr M Ashraf Chughtai Today’s Topics Recap: Reducing Miss Penalty Classification of Cache Misses Reducing Cache Miss Rate Summary MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) Recap: Improving Cache Performance ─ ─ ─ The miss penalty The miss rate The miss Penalty or miss rate via Parallelism ─ The time to hit in the cache MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) Recap: Reducing Miss Penalty – Multilevel Caches – Critical Word first and Early Restart – Priority to Read Misses Over writes – Merging Write Buffers – Victim Caches MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) Recap: Reducing Miss Penalty ‘Multi level caches’ ‘The more the merrier MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) Recap: Reducing Miss Penalty “ Critical Word First and Early Restart’ intolerance Reduces miss-penalty MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) Recap: Reducing Miss Penalty ‘Priority to read miss over the write miss’ Favoritism MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) Recap: Reducing Miss Penalty ‘Merging write-buffer,’ Acquaintance Victim cache Salvage MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) Recap: Reducing Miss Penalty Reduces miss penalty Multi level caches Reduces miss rate Cache-misses Methods to reduce the miss rate MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) Cache Misses – Compulsory Misses (cold start or first reference misses) – Capacity Misses – Conflict Misses (collision or interference misses) MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 10 5: Compiler Optimization Data misses are reduced Spatial locality Temporal locality Array calculation MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 37 5: Compiler Optimization – loop interchange – blocking MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 38 5: Compiler Optimization: Loop Interchange • program having nested loops that access data in non-sequential order for j (0100) and in sequential order for i (05000) MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 39 5: Compiler Optimization: Using Loop Interchange First Version: or (k = 0; k < 100; k = k+1) for (j = 0; j < 100; j = j+1) for (i = 0; i < 5000; i = i+1) x[i] [j] = * x[i] [j]; MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 40 5: Compiler Optimization Reorderd version: for (k = 0; k < 100; k = k+1) for (i = 0; i < 5000; i = i+1) for (j = 0; j < 100; j = j+1) x[i][j] = * x[i][j]; MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 41 5: Compiler Optimization: Using Blocking Example of improving Temporal Locality program to perform matrix multiplication MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 42 5: Compiler Optimization: Using Blocking ‘Row major order’ (row-by-row) ‘Column major order’ Iteration for matrix multiplication MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 43 5: Compiler Optimization: Using Blocking /* Initial version of matrix multiplication code */ for (i = 0; i < N; i = i+1) for (j = 0; j < N; j = j+1) {r = 0; for (k = 0; k < N; k = k+1) { r = r + y[i][k]*z[k][j]; }; x[i][j] = r; }; MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 44 Whereas, if the cache size is small and it can hold one N x N matrix and one row of N elements, then the full matrix Z and one ( i th ) row of Y can stay in the cache Else the misses may occur for both the x and z arrays MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 45 5: Compiler Optimization: Using Blocking B is chosen such that one row of B and one B x B matrix can fit in in cache This ensures that the y and z blocks are resident on cache Let us have a look on to the modified code which shows that the two inner loops now compute in steps of size B (blocking factor) rather than the full N x N size of arrays X and Z MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 46 5: Compiler Optimization: Using Blocking MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 47 Summary • Large block size to reduce compulsory misses • Large cache size to reduce capacity misses • Higher associativity to reduce conflict misses MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 48 Summary The way-prediction techniques checks a section of cache for hit and then on miss it checks the rest of the cache The final technique – loop interchange and blocking, is a software approach to optimize the cache performance Next time we will talk about the way to enhance performance by having processor and memory operate in parallel – till then MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 49 Example: Avg Memory Access Time vs Miss Rate Example: assume CCT = 1.10 for 2-way, 1.12 for 4-way, 1.14 for 8-way vs CCT direct mapped Cache Size (KB) 1-way 2.33 1.98 1.72 1.46 16 1.29 32 1.20 64 1.14 MAC/VU-Advanced Computer Architecture 128 1.10 Associativity 2-way 4-way 8-way 2.15 2.07 2.01 1.86 1.76 1.68 1.67 1.61 1.53 1.48 1.47 1.43 1.32 1.32 1.32 1.24 1.25 1.27 1.20 1.21 1.23 Lecture 30 Memory (6) 1.17 1.18 Hierarchy 1.20 50 Allah Hafiz MAC/VU-Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 51 ... cycles MAC/VU -Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 21 1: Larger Block Size: Solution Copy table 5.18 pp 428 MAC/VU -Advanced Computer Architecture Lecture 30 Memory Hierarchy. .. time) MAC/VU -Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 24 Associativity Absolute Miss Rate 3: Higher MAC/VU -Advanced Computer Architecture Lecture 30 Memory Hierarchy (6)... X and Z MAC/VU -Advanced Computer Architecture Lecture 30 Memory Hierarchy (6) 46 5: Compiler Optimization: Using Blocking MAC/VU -Advanced Computer Architecture Lecture 30 Memory Hierarchy (6)

Ngày đăng: 05/07/2022, 11:55