kiến trúc máy tính nang cao tran ngoc thinh lec04 cache sinhvienzone com

dce 2011 ADVANCED COMPUTER ARCHITECTURE BK TP.HCM C om Khoa Khoa học Kỹ thuật Máy tính BM Kỹ thuật Máy tính Trần Ngọc Thịnh http://www.cse.hcmut.edu.vn/~tnthinh en Zo ne ©2013, dce Vi dce Si nh 2011 Memory Hierarchy Design SinhVienZone.com https://fb.com/sinhvienzonevn dce 2011 Since 1980, CPU has outpaced DRAM Four-issue 2GHz superscalar accessing 100ns DRAM could execute 800 instructions during time for one memory access! Performance (1/latency) 10 00 CPU 10 CPU 60% per yr 2X in 1.5 yrs 10 19 80 Year 00 Processor-DRAM Performance Gap Impact nh 2011 Vi dce en Zo ne 19 90 C om Gap grew 50% per year DRAM 9% per yr DRAM 2X in 10 yrs Si • To illustrate the performance impact, assume a single-issue pipelined CPU with CPI = using non-ideal memory • Ignoring other factors, the minimum cost of a full memory access in terms of number of wasted CPU cycles: Year CPU speed MHZ 1986: 1989: 1992: 1996: 1998: 2000: 2002: 2004: 33 60 200 300 1000 2000 3000 CPU cycle ns 125 30 16.6 3.33 333 Memory Access Minimum CPU memory stall cycles or instructions wasted ns 190 165 120 110 100 90 80 60 190/125 - = 0.5 165/30 -1 = 4.5 120/16.6 -1 = 6.2 110/5 -1 = 21 100/3.33 -1 = 29 90/1 - = 89 80/.5 - = 159 60.333 - = 179 SinhVienZone.com https://fb.com/sinhvienzonevn dce Levels of the Memory Hierarchy Upper Level Capacity Access Time Cost CPU Registers 100s Bytes more conflicts Can waste bandwidth Si • Easy for Direct Mapped • Set Associative or Fully Associative: – Random – Least Recently Used (LRU) • LRU cache state must be updated on every access • true implementation only feasible for small sets (2way, 4-way) • pseudo-LRU binary tree often used for 4-8 way – First In, First Out (FIFO) a.k.a Round-Robin • used in highly associative caches • Replacement policy has a second order effect since replacement only happens on misses 28 SinhVienZone.com https://fb.com/sinhvienzonevn dce 2011 Q4: What happens on a write? • Cache hit: – write through: write both cache & memory • generally higher traffic but simplifies cache coherence – write back: write cache only (memory is written only when the entry is evicted) • a dirty bit per block can further reduce the traffic • Cache miss: • Common combinations: C om – no write allocate: only write to main memory – write allocate (aka fetch on write): fetch into cache 29 Reading assignment nh 2011 Vi dce en Zo ne – write through and no write allocate (below example) – write back with write allocate (above Example) • Cache coherent problem in multicore systems Si – Identify the problem – Algorithms for multicore architectures • Reference – eecs.wsu.edu/~cs460/cs550/cachecoherence.pdf – …More on internet 30 SinhVienZone.com https://fb.com/sinhvienzonevn dce 2011 Reading assignment • Cache performance – Replacement policy (algorithms) – Optimization (Miss rate, penalty, …) • Reference Si nh Vi en Zo ne C om – Hennessy - Patterson - Computer Architecture A Quantitative – www2.lns.mit.edu/~avinatan/research/cache.pdf – … More on internet SinhVienZone.com https://fb.com/sinhvienzonevn 31 ... of cache Address Processor Address CACHE Data Data Main Memory Simple view of cache nh 2011 Vi dce en Zo ne C om • The processor accesses the cache first • Cache hit: Just use the data • Cache. .. block in cache by a block from main memory, use the data • The data transferred between cache and main memory is in blocks, and controlled by independent hardware Si • Hit rate: fraction of cache. .. https://fb.com/sinhvienzonevn dce 2011 W-way Set-associative Cache ne C om • Balancing: Direct mapped cache vs Fully associative cache • Cache has 2k sets • Each set has 2w lines • Block M is mapped

Định dạng
Số trang	16
Dung lượng	202,71 KB