Advanced Computer Architecture - Lecture 29: Memory hierarchy design. This lecture will cover the following: cache performance enhancement by reducing cache miss penalty; cache performance; reducing miss penalty; CPU execution time equation; improving cache performance;...
CS 704 Advanced Computer Architecture Lecture 29 Memory Hierarchy Design Cache Performance Enhancement by: Reducing Cache Miss Penalty Prof Dr M Ashraf Chughtai Today’s Topics Recap: Cache Design Cache Performance Reducing Miss Penalty Summary MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) Recap: Memory Hierarchy Designer’s Concerns Block placement: Where can a block be placed in the upper level? Block identification: How is a block found if it is in the upper level? Block replacement: Which block should be replaced on a miss? Write strategy: What happens on a write? MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) Recap: Write Buffer for Write Through cache write strategies – write back write through use of write-buffer MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) Recap: Write Buffer for Write Through level-2 cache is introduce in between the Level-1 cache and the DRAM main memory - Write Allocate and - No-Write Allocate MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) Recap: Write Miss Policies Write Allocate: – A block is allocated in the cache on a write miss, i.e., the block to be written is available in the cache No-Write Allocate: – The blocks stay out of the cache until the program tries to read the blocks; i.e., the block is modified only in the lower level memory MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) Impact of Caches on CPU Performance CPU Exe c utio n Time e quatio n CPU (e xTime ) = (CPU Exe c lo c k c yc le + Me mo ry S tall c yc le s ) x Clo c k Cyc le Time Impact of Caches on CPU Performance: Example As s umptio ns the c ac he mis s pe nalty o f 100 c lo c k c yc le s all ins truc tio ns no rmally take 1 c lo c k c yc le Ave rag e mis s rate is 2% Ave rag e me mo ry re fe re nc e s pe r ins truc tio n = 1.5 Ave rag e numbe r o f c ac he mis s e s pe r 1000 ins t. = 30 Find the impac t o f c ac he o n pe rfo rmanc e o f CPU c o ns ide ring bo th the mis s e s pe r ins truc tio n and mis s rate Impact of Caches on CPU Performance: Example CPU Time = (CPU Exe c lo c k c yc le + Me mo ry S tall c yc le s ) x Clo c k Cyc le Time CPU Time with c ac he (inc luding c ac he mis s ) = (IC x (1.0 + (30/1000 x 100) x c lo c k c yc le time = IC x 4.00 x c lo c k c yc le time CPU Time with c ac he (inc luding mis s rate ) = (IC x (1.0 + (1.5 x 2% x 100) x c lo c k c yc le time = IC x 4.00 x c lo c k c yc le time Cache Performance (Review) Numbe r o f Mis s e s o r mis s rate Co s t pe r Mis s o r mis s pe nalty Me mo ry s tall c lo c k c yc le s e qual to the s um o f IC x Re ads pe r ins t. x Re ad mis s rate x Re ad Mis s Pe nalty ; and IC x write s pe r ins t. x Write Mis s Rate x Write Mis s Pe nalty 4: Merging Write Buffer However, here the problem, particularly in write-through caches, is that small write-buffer may end up stalling processor if they fill up; and the Processor needs to wait till write committed to memory This problem is resolved by Merging cache-block entries in the write buffer, because: MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 46 4: Merging Write Buffer – Multiword writes are usually faster than writes performed one at a time – Writes usually modify one word in a block; Thus … If a write buffer already contains some words from the given data block we will merge current modified word with the block parts already in the buffer MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 47 4: Merging Write Buffer That is, If the buffer contains other modified blocks the address can be checked to see if the address of this new data matches the address of valid write buffer entry Then the new data are combined with the existing entry - it is called Write Merge MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 48 4: Merging Write Buffer Note that here, the CPU continues to work while the write-buffer prepares to write the word to memory This technique, therefore, reduces the number of stalls due to write-buffer being full; hence reduces the miss penalty through improvement in efficiency of writebuffer MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 49 4: Merging Write Buffer MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 50 4: Merging Write Buffer MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 51 5: Victim Caches: Reducing Miss Penalty Another way to reduce the miss penalty is to remember what was discarded as it may needed again This method reduces the miss penalty since the discarded data has already been fetched so it can be used again at small cast The victim cache contains only discarded blocks because of some earlier miss; and are …… MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 52 5: Victim Caches: Reducing Miss Penalty … checked on another miss to see if they have the desired data before going to the next lower-level memory If the desired data (or instruction) is found then the victim block and cache block are swapped This recycling requires small fully associative cache between a cache and its refill path - called the victim cache as shown in the following figure MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 53 5: Victim Caches: Reducing Miss Penalty Placement of victim cache in memory hierarchy Cache MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 54 Summary The first approach, ‘multi level caches’ is: ‘the more the merrier – extra people are welcome to come along’ MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 55 Summary Cont’d The second technique, “ Critical Word First and Early Restart’, is the intolerance MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 56 Summary The third method, ‘priority to read miss over the write miss’, is the favoritism The fourth technique, ‘merging writebuffer,”is acquaintance MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 57 Summary Combining sequential writes into a single block for fast memory transfer The fifth technique, victim cache’ is: salvage All these methods help reducing MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 58 Summary Miss penalty; however, the first one – multi level caches, are the most important and efficient However, reducing miss rate and hit rate to improve the memory hierarchy performance are also important metrics We will take up these metrics next time – till then MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 59 Aslam – o – Alacum And Allah Hafiz MAC/VU-Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 60 ... MAC/VU -Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 49 4: Merging Write Buffer MAC/VU -Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 50 4: Merging Write Buffer MAC/VU -Advanced. .. MAC/VU -Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) 22 1: Multilevel Caches (to reduce Miss Penalty) Local miss rate Global miss rate MAC/VU -Advanced Computer Architecture Lecture. ..Today’s Topics Recap: Cache Design Cache Performance Reducing Miss Penalty Summary MAC/VU -Advanced Computer Architecture Lecture 29 Memory Hierarchy (5) Recap: Memory Hierarchy Designer’s Concerns Block