Advanced Computer Architecture - Lecture 28: Memory hierarchy design. This lecture will cover the following: cache design and policies; placement and replacement policies; cache write strategy; cache performance enhancement; memory hierarchy designer’s concerns; block placement policy;...
CS 704 Advanced Computer Architecture Lecture 28 Memory Hierarchy Design (Cache Design and policies ) Prof Dr M Ashraf Chughtai Today’s Topics Recap: Cache Addressing Techniques Placement and Replacement Policies Cache Write Strategy Cache Performance Enhancement Summary MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) Recap: Block Size Trade off Impact of block size on the cache performance and categories of cache design The trade-off of the block size verses the Miss rate, Miss Penalty, and Average access time , the basic CPU performance matrices MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) Recap: Block Size Trade off – The larger block size reduces the miss rate, but If block size is too big relative to cache size, miss rate will go up; and – Miss penalty will go up as the block size increases; and – Combining these two parameters, the third parameter, Average Access Time MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) Recap: Cache Organizations Cache organizations Block placement policy, we studied three cache organizations MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) Recap: Cache Organizations – Direct Mapped where each block has only one place it can appear in the cache – Conflict Miss – Fully Associative Mapped where any block of the main memory can be placed any where in the cache; and – Set Associative Mapped which allows to place a block in a set of places in the cache MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) Memory Hierarchy Designer’s Concerns Block placement: Where can a block be placed in the upper level? Block identification: How is a block found if it is in the upper level? Block replacement: Which block should be replaced on a miss? Write strategy: What happens on a write? MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) Block Placement Policy Fully Associative: Block can be placed any where in the upper level (Cache) E.g Block 12 from the main memory can be place at block 2, or any of the block locations in cache MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) Block Placement Policy Set Associative: Block can be placed any where in a set in upper level (cache) The set number in the upper level given as: (Block No) MOD (number of sets) E.g., an 8-block, 2-way set associative mapped cache, has sets [0-3] each of two blocks; therefore and block 12 or 16 of main memory can go any where in set # as (12 MOD = 0) and (16 MOD = 0) MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) Similarly, block 14 can be placed at any of the locations in set#2 (14 MOD = 2) Block Placement Policy Direct Mapped: (1 way associative) Block can be placed at only one specific location in upper level (Cache) The location in the cache is given by: Block number MOD No of cache blocks E.g., the block 12 or block 20 can be place at location in cache having blocks as: 12 MOD = 20 MOD = MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 10 Write Buffer Saturation In that case, it does NOT matter how big you make the write buffer, the write buffer will still overflow because you are simply feeding things in it faster than you can empty it There are two solutions to this problem: The first solution is to get rid of this write buffer and replace this write through cache with a write back cache MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 31 Write Buffer Saturation Cache Processor L2 Cache DRAM Write Buffer MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 32 Write-Miss Policy In case of write-miss, two options are used, these options are : Write Allocate: A block is allocated on a write-miss, followed by the write hit action No-write Allocate: Usually the writemisses not affect the cache, rather the block is modified only in the lower level memory, i.e., MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 33 Write-Miss Policy The blocks stay out of the cache in no-write allocate until the program tries to read the blocks, but The blocks that are only written will still be in the cache with write allocate Let us discuss it with the help of example Let’s look at our 1KB direct mapped cache again MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 34 Write-miss Policy Assume we a 16-bit write to memory location 0x000000 and causes a cache miss in our 1KB direct mapped cache that has 32-byte block select Valid Bit Example: 0x00 Cache Tag 0x00 Cache Data Byte 31 Byte 63 : Byte 1 Byte 0 Byte 33 Byte 32 : : Byte 1023 MAC/VU-Advanced Computer Architecture Byte Select Ex: 0x00 Lecture 28 Memory Hierarchy (4) : Cache Tag Cache Index Ex: 0x00 : : 31 Byte 992 31 35 Write-miss Policy Assume we a 16-bit write to memory location 0x000000 and causes a cache miss in our 1KB direct mapped cache that has 32byte block select After we write the cache tag into the cache and write the 16-bit data into Byte and Byte 1, we have to read the rest of the block (Byte 2, 3, Byte 31) from memory? If we read the rest of the block in, it is called write allocate MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 36 Write-miss Policy As the principle of spatial locality implies that we are likely to access them soon But the type of access we are going to is likely to be another write MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 37 Write-miss Policy So even if we read in the data, we may end up overwriting them anyway so it is a common practice to NOT read in the rest of the block on a write miss If you don’t bring in the rest of the block, or use the more technical term, Write Not Allocate, you better have some way to tell the processor the rest of the block is no longer valid MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 38 No write-allocate verses write allocate: Example Let us consider a fully associative writeback cache with cache entries that start empty Consider the following sequence of five memory operations and find The number of hits and misses when using no-write allocate verses write allocate MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 39 No write-allocate verses write allocate: Example Write Mem [100] Write Mem [100] Read Mem [200] Write Mem [200] Write Mem [100] MAC/VU-Advanced Computer Architecture For no-write allocate, the address [100] is not in the cache (i.e., its tag is not in the cache Lecture 28 Memory Hierarchy (4) 40 No write-allocate verses write allocate: Example So the first two writes will result in MISSES Address [200] is also not in the cache, the reed is also miss The subsequent write [200] is a hit The last write [100] is still a miss The result is MISSes and HIT MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 41 No write-allocate verses write allocate: Example For the write-allocate policy The first access to 100 and 200 are MISSES The rest are HITS as [100] and [200] are both found in the cache The result is MISSes and HITs MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 42 No write-allocate verses write allocate: Conclusion Either write miss policy could be used with the write-through or write-back Normally Write-back caches use write-allocate, hopping that the subsequent write to the block will be captured by the cache MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 43 No write-allocate verses write allocate: Conclusion Write-through caches often use No Write Allocate, the reason is that even if there is a subsequent write to the block, the write must go to the lower level memory MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 44 Allah Hafiz And Aslam-U-Alacum MAC/VU-Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 45 ... lower level memory MAC/VU -Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 44 Allah Hafiz And Aslam-U-Alacum MAC/VU -Advanced Computer Architecture Lecture 28 Memory Hierarchy (4)... MAC/VU -Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 31 Write Buffer Saturation Cache Processor L2 Cache DRAM Write Buffer MAC/VU -Advanced Computer Architecture Lecture 28 Memory Hierarchy. .. not modified (clean) – Reduce memory- bandwidth requirements, hence the reduces the memory power requirements MAC/VU -Advanced Computer Architecture Lecture 28 Memory Hierarchy (4) 25 Write Strategy: