Advanced Computer Architecture - Lecture 36: Multiprocessors. This lecture will cover the following: cache coherence problem; example of invalidation scheme; coherence in distributed memory architecture; performance of cache coherence schemes; implementation complications; snooping cache contention; directory based protocoldistributed shared memory;...
CS 704 Advanced Computer Architecture Lecture 36 Multiprocessors (Cache Coherence Problem … Cont’d ) Prof Dr M Ashraf Chughtai Today’s Topics Recap: Example of Invalidation Scheme Coherence in Distributed Memory Architecture Performance of Cache Coherence Schemes Summary MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) Recap: Cache Coherence Problem Last time we discussed the sharing of caches for multi-processing in the symmetric shared-memory architecture, wherein each processor has the same relationship to the single memory Here, we distinguished between the private data and shared data, i.e., the data used by a single processor and the data replicated in the caches of the multiple processors for their simultaneous use MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) Recap: Cache Coherence Problem Then we discussed cache coherence problem in symmetric shared memory which results due to inconsistency or conflict in caching of shared data, being read by the multiple processors simultaneously We studied the cache coherence problem with the help of a typical shared memory architecture where each of the processor contained write-back cache MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) Recap: Cache Coherency Problem In write-back caches, values written back to memory depend on which cache flushes or writes back the value and when? We noticed that the cache coherency problem exists even on uniprocessors due interaction between caches and I/O devices However, in multiprocessors the problem is performance-critical where the order among multiple processes is crucial, i.e., MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) Recap: Order among multiple processes For single shared memory, with no caches, a serial or total order is imposed on operations to the location; and for single shared memory, with caches, the serial order be consistent, i.e., all processors must see writes to the location in the same order Considering this we can say that in a coherent system: MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) Recap: Order among multiple processes – the operations issued by any particular process occur in the order issued by that process, and – the value returned by a read is the value written by the last write to that location in the serial order Then we talked about write propagation and write serialization as the two features of the coherent system MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) Recap: Multiprocessor cache Coherence We also noticed that to implement cache coherence the multiprocessors extend both the bus transaction and state transition The cache controller snoops on bus events (write transactions) and invalidate / update cache Then we discussed the cache coherence protocols, which use different techniques to track the sharing status and maintain coherence for multiprocessor MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) Recap: Coherency Solutions The two fundamental classes of Coherence protocols are: – Snooping Protocols All cache controllers monitor or snoop (spy) on the bus to determine whether or not they have a copy of the block that is requested on the bus – Directory-Based Protocols The sharing status of a block of physical memory is kept in one location, called directory MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) Recap: Basic Snooping Protocols The snooping protocols are implemented using two techniques: write invalidate and write broadcast The Write Invalidate method ensures that processor has exclusive access to the data item before it write that item and all other cached copies are invalidated or canceled on write The write broadcast approach, on the other hand, updates all the cached copies of a data item when that item is written MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 10 Directory Protocol Messages Message type Source Destination Msg Content Data value reply Home directory Local cache Data – Return a data value from the home memory (read miss response) Data write-back Remote cache Home directory A, Data – Write-back a data value for address A (invalidate response) MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 48 State Transition Diagram for an Individual Cache Block in a Directory Based System States identical to snoopy case; transactions very similar Transactions are caused by read misses, write misses, invalidates, data fetch requests Generates read miss & write miss messages to home directory MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 49 State Transition Diagram for an Individual Cache Block in a Directory Based System Write misses that were broadcast on the bus for snooping results in explicit invalidate & data fetch requests Note: on a write, a cache block is bigger, so need to read the full cache block MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 50 CPU -Cache State Machine CPU Read hit Invalidate State machine or Miss due to for CPU requests address conflict: Invalid for each CPU Read memory block Send Read Miss message Invalid state CPU Write: if in Fetch/Invalidate Send Write Miss memory or Miss due to msg to h.d address conflict: send Data Write Back message to home directory CPU read hit CPU write hit MAC/VU-Advanced Computer Architecture Exclusive (read/writ e) Shared (read/only ) CPU Write: Send Write Miss message to home directory Fetch: send Data Write Back message to home directory Lec 36 Multiprocessor (3) 51 State Transition Diagram for the Directory Here, the same states & structure is shown as the transition diagram for an individual cache Two actions performed are: 1: update of directory state and 2: send messages to satisfy requests The controller tracks all copies of memory block; and also indicates an action that updates the sharing set, called Sharers, as well as sending a message MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 52 Directory State Machine Read miss: State machine for Directory requests for each memory block Un-cached state if in memory Uncached Read miss: Sharers = {P} send Data Value Reply Write Miss: Sharers = {P}; send Data Value Reply msg Sharers += {P}; send Data Value Reply Shared (read only) Write Miss: send Invalidate Data Write Back: to Sharers; Sharers = {} then Sharers = {P}; (Write back block) send Data Value Reply msg Read miss: Sharers += {P}; Write Miss: Exclusive send Fetch; Sharers = {P}; (read/writ send Data Value Reply send Fetch/Invalidate; e) msg to remote cache MAC/VU-Advanced send Data Value Reply Computer Architecture 53 Lec 36 Multiprocessor (3)(Write back block) msg to remote cache Example Directory Protocol Message sent to directory causes two actions: – Update the directory – More messages to satisfy request Block is in Uncached state: the copy in memory is the current value; only possible requests for that block are: – Read miss – Write miss: MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 54 Example Directory Protocol – Read miss: requesting processor sent data from memory & requestor made only sharing node; state of block made Shared – Write miss: requesting processor is sent the value & becomes the Sharing node The block is made Exclusive to indicate that the only valid copy is cached Sharers indicates the identity of the owner MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 55 Example Directory Protocol Block is Shared state => the memory value is upto-date; the read miss and write miss activities are: – Read miss: requesting processor is sent back the data from memory & requesting processor is added to the sharing set – Write miss: requesting processor is sent the value All processors in the set Sharers are sent invalidate messages, & Sharers is set to identity of requesting processor The state of the block is made Exclusive MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 56 Example Directory Protocol Block is Exclusive: current value of the block is held in the cache of the processor identified by the set Sharers (the owner) three possible directory requests: - Read Miss - Data Write back - Write Miss MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 57 Example Directory Protocol – Read miss: owner processor sent data fetch message, causing state of block in owner’s cache to transition to Shared; and causes owner to send data to directory, where it is written to memory & sent back to requesting processor Identity of requesting processor is added to set Sharers, which still contains the identity of the processor that was the owner (since it still has a readable copy) State is shared MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 58 Example Directory Protocol – Data write-back: owner processor is replacing the block and hence must write it back, making memory copy up-to-date the block is now Uncached, and the Sharer set is empty MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 59 Example Directory Protocol – Write miss: block has a new owner A message is sent to old owner causing the cache to send the value of the block to the directory from which it is sent to the requesting processor, which becomes the new owner Sharers is set to identity of new owner, and state of block is made Exclusive MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 60 Summary Caches contain all information on state of cached memory blocks Snooping and Directory Protocols are similar; However, bus makes snooping easier because of broadcast Directory has extra data structure to keep track of state of all cache blocks MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 61 Thanks and Allah Hafiz MAC/VU-Advanced Computer Architecture Lec 36 Multiprocessor (3) 62 ... exclusive takes place MAC/VU -Advanced Computer Architecture Lec 36 Multiprocessor (3) 15 Example: Working of Finite State Machine Controller MAC/VU -Advanced Computer Architecture Lec 36 Multiprocessor... from invalid to Shared MAC/VU -Advanced Computer Architecture Lec 36 Multiprocessor (3) 18 Example: Working of Finite State Machine Controller MAC/VU -Advanced Computer Architecture Lec 36 Multiprocessor... to A1 with value A1 MAC/VU -Advanced Computer Architecture Lec 36 Multiprocessor (3) 21 Example: Working of Finite State Machine Controller MAC/VU -Advanced Computer Architecture Lec 36 Multiprocessor