Advanced Computer Architecture - Lecture 35: Multiprocessors

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề	multiprocessors (cache coherence problem)
Người hướng dẫn	Prof. Dr. M. Ashraf Chughtai
Trường học	mac/vu
Chuyên ngành	advanced computer architecture
Thể loại	lecture

Định dạng
Số trang	55
Dung lượng	1,46 MB

Nội dung

Advanced Computer Architecture - Lecture 35: Multiprocessors. This lecture will cover the following: cache coherence problem; multiprocessor cache coherence; enforcing coherence in: symmetric shared memory architecture, distributed memory architecture; performance of cache coherence schemes;...

CS 704 Advanced Computer Architecture Lecture 35 Multiprocessors (Cache Coherence Problem) Prof Dr M Ashraf Chughtai Today’s Topics Recap: Multiprocessor Cache Coherence Enforcing Coherence in: – Symmetric Shared Memory Architecture – Distributed Memory Architecture Performance of Cache Coherence Schemes Summary MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) Recap: Parallel Processing Architecture Last time we introduced the concept of Parallel Processing to improve the computer performance Parallel Architecture is a collection of processing elements that cooperate and communicate to solve larger problems fast We discussed Flynn’s four categories of computers which form the basis … MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) Recap: Parallel Computer Categories …… to implement the programming and communication models for parallel computing These categories are: – SISD (Single Instruction Single Data) – SIMD (Single Instruction Multiple Data) – MISD (Multiple Instruction Single Data) – MIMD (Multiple Instruction Multiple Data) The MIMD machines implement Parallel processing architecture MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) Recap: MIMD Classification We noticed that based on the memory organization and interconnect strategy, the MIMD machines are classified as: - Centralized Shared Memory Architecture Here, the subsystems share the same physical centralized memory connected by a bus The key architectural property of this design is the Uniform Memory Access – UMA; i.e., the access time to all memory from all the processors is same MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) Recap: MIMD Classification – Distributed Memory Architecture It consists of number of individual nodes containing a processors, some memory and I/O and an interface to an interconnection network that connects all the nodes The distributed memory provides more memory bandwidth and lower memory latency MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) Recap: Framework for Parallel processing Last time we also studied a framework for parallel architecture The framework defines the programming and communication Models for centralized shared-memory and distributed memory parallel processing architectures These models present address space sharing and message passing in parallel architecture MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) Recap: Framework for Parallel processing Here, we noticed that the shared-memory communication model has compatibility with the SMP hardware; and offers ease of programming when communication patterns are complex or vary dynamically during execution While the message-passing communication model has explicit Communication which is simple to understand; and is easier to use sender-initiated communication MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) Multiprocessor Cache Sharing Today, we will look into the sharing of caches for multi-processing in the symmetric shared-memory architecture The symmetric shared memory architecture is one where each processor has the same relationship to the single memory Small-scale shared-memory machines usually support caching of both the private data as well as the shared data MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) Multiprocessor Cache Sharing The private data is used by a single processor, while the shared data is replicated in the caches of the multiple processors for their simultaneous use It is obvious that the program behavior for caching of private data is identical to the that of a Uniprocessor, as no other processor uses the same data, i.e., no other processor cache has copy of the same data MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 10 An Example Snooping Protocol Each block of memory is in one of the three states: – (Shared) Clean in all caches and up-to-date in memory – OR (Exclusive) Dirty in exactly one cache – OR Not in any caches MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 41 An Example Snooping Protocol Each cache block is in one of the three state (track these): – Shared : block can be read – OR Exclusive : cache has only copy, its writeable, and dirty – OR Invalid : block contains no data Read misses: cause all caches to snoop bus Writes to clean line are treated as misses MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 42 Finite State Machine for Write Invalidation Protocol and write Back Caches Now let discuss the finite-state Transition for a single cache block using a write invalidation protocol and write back caches The state machine has three states: – Invalid – Shared (read only) and – Exclusive (read/write) MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 43 Finite State Machine for Write Invalidation Protocol and write Back Caches Here, the cache states are shown in circles where access permitted by the CPU without a state transition shown in parenthesis The stimulus causing the state transition is shown on the transition arc in yellow and the bus action generated as part of the state transition is shown in orange The state in each cache node represents the state of the selected cache block specified by the processor or bus request MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 44 Finite State Machine for Write Invalidation Protocol and write Back Caches In reality there is only one state-transition diagram but for simplicity the states of the protocol are duplicated here to represent: – Transition based on the CPU request – Transition based on the bus request Now let us discuss the state-transition based on the actions of CPU associated with the cache, shown state machine -I MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 45 Snoopy-Cache State Machine-I: for CPU requests for each cache block CPU Read hit CPU Read Invalid Place read miss on bus CPU Write Place Write Miss on bus CPU read miss Write back block Shared (read/only) CPU Read miss Place read miss on bus CPU Write Place Write Miss on Bus CPU read hit CPU write hit MAC/VU-Advanced Computer Architecture Exclusive (read/write) CPU Write Miss Write back cache block Place write miss on bus Lec 35 Multiprocessor (2) 46 Finite State Machine for CPU requests for each cache block Note that a read miss in the exclusive or shared state and a write miss in the exclusive state occurs when the address requested by the CPU does not match the address in the cache block Further an attempt to write a block in the shared state always generates miss even if the block is present in the cache, since the block must be made exclusive MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 47 Finite State Machine for CPU requests for each cache block Here, note that in case of read hit, the shared and exclusive states read data in cache and address the conflict miss The invalid state places the read miss on the bus; For write hit, the exclusive state writes the data in cache and shared state place write miss on bus MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 48 Finite State Machine for CPU requests for each cache block In case of write miss, the invalid state places the miss on the bus shared and exclusive states address the conflict miss; the shared state places write miss on the bus, while the exclusive state write-back block and then places write miss on the bus MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 49 Snoopy-Cache State Machine-II for bus requests for each cache block Write miss for this block Invalid Write Back Block; (abort memory access) Write miss for this block Exclusive (read/write) MAC/VU-Advanced Computer Architecture Shared (read/only) Write Back Block; (abort memory access) Read miss for this block Lec 35 Multiprocessor (2) 50 Finite State Machine for Write Invalidation Protocol and write Back Caches Now let us discuss the state-transition based on the actions of bus request associated with the cache, shown as state machine-II Here, when ever a bus transaction occurs, all caches that contain the cache block specified in the bus transaction take the action as shown in this state machine Here, the protocol assumes that MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 51 Finite State Machine for Write Invalidation Protocol and write Back Caches Memory provides data on a read miss for a block that is clean in all caches Note that read miss, the stared state take no action, and allows the memory to service read miss; where as the exclusive state, attempts to share the data, places the cache block on the bus and change the state to shared MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 52 Finite State Machine for Write Invalidation Protocol and write Back Caches For the write miss, the shared state attempts to write shared block and invalidates the block Whereas, the exclusive state attempts to write block that is exclusive elsewhere; write back the cache block and make the state invalid MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 53 Summary Today, we talked about sharing of caches for multi-processing in the symmetric shared-memory architecture We studied the cache coherence problem and studied two methods to resolve the problem Here, we discussed the write invalidation and write broadcasting schemes At the end we discussed the finite state machine for the implementation of snooping algorithm – We will further explain the snooping protocol with the help of example next time; till then … MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 54 Thanks and Allah Hafiz MAC/VU-Advanced Computer Architecture Lec 35 Multiprocessor (2) 55 ... typical shared memory architecture shown here! MAC/VU -Advanced Computer Architecture Lec 35 Multiprocessor (2) 13 Multiprocessor Cache Coherence MAC/VU -Advanced Computer Architecture Lec 35 Multiprocessor... Read Miss, in case of: Write-through: memory is always up-to-date, so no problem; and Write-back: it snoop in caches to find most recent copy MAC/VU -Advanced Computer Architecture Lec 35 Multiprocessor... state-transition based on the actions of CPU associated with the cache, shown state machine -I MAC/VU -Advanced Computer Architecture Lec 35 Multiprocessor (2) 45 Snoopy-Cache State Machine-I:

Ngày đăng: 05/07/2022, 11:57