Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 43 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
43
Dung lượng
226,94 KB
Nội dung
8 PARALLEL COMPUTER ARCHITECTURES CuuDuongThanCong.com https://fb.com/tailieudientucntt P P P P P P Shared memory P P P P P (a) P CPU P P P P P P P P P P P P P P P P P P P P (b) Figure 8-1 (a) A multiprocessor with 16 CPUs sharing a common memory (b) An image partitioned into 16 sections, each being analyzed by a different CPU CuuDuongThanCong.com https://fb.com/tailieudientucntt M P M P M P M P M M M M Private memory P P P P CPU Messagepassing interconnection network P P P P M M M M (a) P P M P P M P P M P P M P P P P CPU P Messagepassing interconnection network P P P P P P P (b) Figure 8-2 (a) A multicomputer with 16 CPUs, each with each own private memory (b) The bit-map image of Fig 8-1 split up among the 16 memories CuuDuongThanCong.com https://fb.com/tailieudientucntt Machine Machine Machine Machine Machine Machine Application Application Application Application Application Application Language run-time system Language run-time system Language run-time system Language run-time system Language run-time system Language run-time system Operating system Operating system Operating system Operating system Operating system Operating system Hardware Hardware Hardware Hardware Hardware Hardware Shared memory Shared memory Shared memory (a) (b) (c) Figure 8-3 Various layers where shared memory can be implemented (a) The hardware (b) The operating system (c) The language runtime system CuuDuongThanCong.com https://fb.com/tailieudientucntt (a) (b) (c) (d) (e) (f) (g) (h) Figure 8-4 Various topologies The heavy dots represent switches The CPUs and memories are not shown (a) A star (b) A complete interconnect (c) A tree (d) A ring (e) A grid (f) A double torus (g) A cube (h) A 4D hypercube CuuDuongThanCong.com https://fb.com/tailieudientucntt Input port CPU Output port A B C D End of packet Middle of packet Four-port switch CPU Front of packet Figure 8-5 An interconnection network in the form of a fourswitch square grid Only two of the CPUs are shown CuuDuongThanCong.com https://fb.com/tailieudientucntt CPU Entire packet Input port Four-port switch Output port A B A B A B C D C D C D CPU Entire packet Entire packet (a) (b) (c) Figure 8-6 Store-and-forward packet switching CuuDuongThanCong.com https://fb.com/tailieudientucntt CPU B C D CPU Four-port switch , , A CPU Input port Output buffer CPU Figure 8-7 Deadlock in a circuit-switched interconnection network CuuDuongThanCong.com https://fb.com/tailieudientucntt 60 N-body problem 50 Linear speedup Speedup 40 Awari 30 20 10 Skyline matrix inversion 0 10 20 30 40 Number of CPUs 50 60 Figure 8-8 Real programs achieve less than the perfect speedup indicated by the dotted line CuuDuongThanCong.com https://fb.com/tailieudientucntt n CPUs active … Inherently sequential part Potentially parallelizable part CPU active f 1–f f 1–f fT (1 – f)T/n T (a) (b) Figure 8-9 (a) A program has a sequential part and a parallelizable part (b) Effect of running part of the program in parallel CuuDuongThanCong.com https://fb.com/tailieudientucntt Stages CPUs Memories 000 001 1A 2A 000 3A b b 010 1B 2B b 010 3B 011 011 b 100 1C 100 3C 2C 101 110 111 101 a a 1D a 2D a 3D Figure 8-28 An omega switching network CuuDuongThanCong.com 001 https://fb.com/tailieudientucntt 110 111 CPU Memory MMU Local bus CPU Memory Local bus CPU Memory Local bus CPU Memory Local bus System bus Figure 8-29 A NUMA machine based on two levels of buses The Cm* was the first multiprocessor to use this design CuuDuongThanCong.com https://fb.com/tailieudientucntt Node Node CPU Memory CPU Memory Local bus Local bus Node 255 CPU Memory Directory … Local bus Interconnection network (a) 218-1 Bits 18 Node Block Offset (b) 0 0 82 (c) Figure 8-30 (a) A 256-node directory-based multiprocessor (b) Division of a 32-bit memory address into fields (c) The directory at node 36 CuuDuongThanCong.com https://fb.com/tailieudientucntt Intercluster interface CPU with cache Intercluster bus (nonsnooping) Memory D D D 12 D 13 D D D D D D 10 D D 14 Local bus (snooping) 11 15 D D D D Directory Cluster (a) Cluster Block This is the directory for cluster 13 This bit tells whether cluster has block of the memory homed here in any of its caches 9… State 15 Uncached, shared, modified (b) Figure 8-31 (a) The DASH architecture (b) A DASH directory CuuDuongThanCong.com https://fb.com/tailieudientucntt Quad board with Pentium Pros and up to GB of RAM Snooping bus interface Directory controller 32-MB cache RAM Directory Data pump IQ board SCI ring RAM CPU Figure 8-32 The NUMA-Q multiprocessor CuuDuongThanCong.com https://fb.com/tailieudientucntt Local memory table at home node Bits 13 Back State Tag 219-1 Fwd Back State Tag Fwd Back State Tag Fwd Node cache directory Node cache directory Node 22 cache directory Figure 8-33 SCI chains all the holders of a given cache line together in a doubly-linked list In this example, a line is shown cached at three nodes CuuDuongThanCong.com https://fb.com/tailieudientucntt CPU Node Memory … … Local interconnect Disk and I/O … Local interconnect Communication processor High-performance interconnection network Figure 8-34 A generic multicomputer CuuDuongThanCong.com https://fb.com/tailieudientucntt Disk and I/O Network Disk Tape GigaRing Alpha Shell Node Mem Alpha Mem Control + E registers Control + E registers Commun processor Commun processor Alpha … Control + E registers Commun processor Full-duplex 3D torus Figure 8-35 The Cray Research T3E CuuDuongThanCong.com Mem https://fb.com/tailieudientucntt Kestrel board 64-Bit local bus 38 PPro PPro 64 MB I/O NIC PPro PPro 64 MB I/O NIC 32 64-Bit local bus (a) (b) Figure 8-36 The Intel/Sandia Option Red system (a) The kestrel board (b) The interconnection network CuuDuongThanCong.com https://fb.com/tailieudientucntt CPU group CPU group CPU group 7 1 Time 9 6 (a) (b) (c) Figure 8-37 Scheduling a COW (a) FIFO (b) Without headof-line blocking (c) Tiling The shaded areas indicate idle CPUs CuuDuongThanCong.com https://fb.com/tailieudientucntt CPU CPU CPU Backplane Packet going east Packet going west (a) Line card Ethernet (b) Figure 8-38 (a) Three computers on an Ethernet (b) An Ethernet switch CuuDuongThanCong.com https://fb.com/tailieudientucntt Switch CPU Cell Packet Port Virtual circuit 11 10 12 ATM switch 13 14 15 16 Figure 8-39 Sixteen CPUs connected by four ATM switches Two virtual circuits are shown CuuDuongThanCong.com https://fb.com/tailieudientucntt Globally shared virtual memory consisting of 16 pages 0 2 10 CPU 10 11 12 13 14 15 12 14 CPU 10 CPU 13 15 Memory CPU CPU Network (a) 11 12 14 CPU 11 13 CPU 15 CPU (b) 10 CPU 10 CPU 12 14 CPU 11 13 15 CPU (c) Figure 8-40 A virtual address space consisting of 16 pages spread over four nodes of a multicomputer (a) The initial situation (b) After CPU references page 10 (c) After CPU references page 10, here assumed to be a read-only page CuuDuongThanCong.com https://fb.com/tailieudientucntt (′′abc′′, 2, 5) (′′matrix-1′′, 1, 6, 3.14) (′′family′′, ′′is sister′′, Carolyn, Elinor) Figure 8-41 Three Linda tuples CuuDuongThanCong.com https://fb.com/tailieudientucntt Object implementation stack; top:integer; # storage for the stack stack: array [integer N-1] of integer; operation push(item: integer); function returning nothing begin stack[top] := item; push item onto the stack top := top + 1; # increment the stack pointer end; operation pop( ): integer; begin guard top > top := top - 1; return stack[top]; od; end; begin top := 0; end; # function returning an integer # suspend if the stack is empty # decrement the stack pointer # return the top item # initialization Figure 8-42 A simplified ORCA stack object, with internal data and two operations CuuDuongThanCong.com https://fb.com/tailieudientucntt ... Arguably none Multiprocessor, multicomputer Figure 8- 13 Flynn’s taxonomy of parallel computers CuuDuongThanCong .com https://fb .com/ tailieudientucntt Parallel computer architectures SISD SIMD MISD (Von... Multiprocessors COMA Switched Multicomputers NUMA CC-NUMA Shared memory NC-NUMA MPP Grid COW Hypercube Message passing Figure 8- 14 A taxonomy of parallel computers CuuDuongThanCong .com https://fb .com/ tailieudientucntt... result Values 1. 082 × 10 − 9.212 × 1011 1. 082 × 1012 − 0.9212 × 1012 0.16 08 × 1012 1.6 08 × 1011 12 Figure 8- 17 Steps in a floating-point subtraction CuuDuongThanCong .com https://fb .com/ tailieudientucntt