tính toán song song thoại nam parallelprocessing 13 matrixmultiplication sinhvienzone com

om C Si nh Vi en Zo ne Matrix Multiplication SinhVienZone.com Thoai Nam https://fb.com/sinhvienzonevn om Outline ne C Sequential matrix multiplication Algorithms for processor arrays nh Vi en Zo – Matrix multiplication on 2-D mesh SIMD model – Matrix multiplication on hypercube SIMD model Si Matrix multiplication on UMA multiprocessors Matrix multiplication on multicomputers https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -2- Sequential Matrix Multiplication Si nh Vi en Zo ne C om Global a[0 l-1,0 m-1], b[0 m-1][0 n-1], {Matrices to be multiplied} c[0 l-1,0 n-1], {Product matrix} t, {Accumulates dot product} i, j, k; Begin for i:=0 to l-1 for j:=0 to n-1 t:=0; for k:=0 to m-1 t:=t+a[i][k]*b[k][j]; endfor k; c[i][j]:=t; endfor j; endfor i; End https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -3- om Algorithms for Processor Arrays Si nh Vi en Zo ne C Matrix multiplication on 2-D mesh SIMD model Matrix multiplication on Hypercube SIMD model https://fb.com/sinhvienzonevn Khoa Coâng Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -4- om Matrix Multiplication on 2D-Mesh SIMD Model Si nh Vi en Zo ne C Gentleman(1978) has shown that multiplication of two n*n matrices on the 2-D mesh SIMD model requires 0(n) routing steps We will consider a multiplication algorithm on a 2D mesh SIMD model with wraparound connections https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -5- om Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) For simplicity, we suppose that nh Vi en Zo ne C – Size of the mesh is n*n – Size of each matrix (A and B) is n*n – Each processor Pi,j in the mesh (located at row i, column j) contains ai,j and bi,j Si At the end of the algorithm, Pi,j will hold the element ci,j of the product matrix https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -6- om Matrix Multiplication on 2D-Mesh SIMD Model (cont’d)  Major phases b1,3 b0,1 b1,2 b2,3 a0,0 b0,0 a0,1 b1,1 a0,2 b2,2 a0,3 b3,3 a1,0 a1,1 b1,0 a1,2 b2,1 a1,3 b3,2 a2,0 a2,1 a2,2 b2,0 a2,3 b3,1 a3,1 a3,2 a3,3 b3,0 a0,2 b0,2 a0,3 b0,3 a1,0 b1,0 a1,1 b1,1 a1,2 b1,2 a1,3 b1,3 a2,0 b2,0 a2,1 b2,1 a2,2 b2,2 a3,0 b3,0 a3,1 b3,1 a3,2 b3,2 ne C b0,2 Zo a0,1 b0,1 nh Vi en a0,0 b0,0 a2,3 b2,3 Si a3,3 b3,3 (a) Initial distribution of matrices A and B a3,0 b0,3 (b) Staggering all A’s elements in row i to the left by i positions and all B’s elements in col j upwards by i positions https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -7- Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) Each processor P(i,j) has a pair of elements to multiply ai,k and bk,j a0,0 b0,0 a0,1 b1,1 a0,2 b2,2 a0,3 b3,3 a1,1 b1,0 a1,2 b2,1 a1,3 b3,2 a2,0 a2,1 a2,2 b2,0 a3,1 a3,2 a3,3 b3,0 C b2,3 ne b1,2 Zo b0,1 a2,3 b3,1 Si a3,0 b1,3 nh Vi en a1,0 b0,2 om b0,3 (b) Staggering all A’s elements in row i to the left by i positions and all B’s elements in col j upwards by i positions a0,0 b0,0 a0,1 b1,1 a0,2 b2,2 a0,3 b3,3 a1,1 b1,0 a1,2 b2,1 a1,3 b3,2 a1,0 b0,3 a2,2 b2,0 a2,3 b3,1 a2,0 b0,2 a2,1 b1,3 a3,3 b3,0 a3,0 b0,1 a3,1 b1,2 a3,2 b2,3 (c) Distribution of matrices A and B after staggering in a 2-D mesh with wrapparound connection https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -8- om Matrix Multiplication on 2D-Mesh SIMD Model (cont’d)  The rest steps of the algorithm from the viewpoint of processor P(1,2) a1,3 a1,0 nh Vi en b3,2 ne a1,2 a1,2 Zo a1,1 b3,2 C b2,2 a1,3 a1,0 a1,1 b0,2 b1,2 b1,2 b2,2 Si b0,2 (a) First scalar multiplication step (b) Second scalar multiplication step after elements of A are cycled to the left and elements of B are cycled upwards https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -9- om Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) a1,1 a1,2 nh Vi en b1,2 ne a1,0 a1,0 Zo a1,3 b1,2 C b0,2 a1,1 a1,2 a1,3 b2,2 b3,2 b3,2 b0,2 Si b2,2 (c) Third scalar multiplication step after second cycle step (d) Third scalar multiplication step after second cycle step At this point processor P(1,2) has computed the dot product c1,2 https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -10- om Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) Detailed Algorithm Zo ne C Global n, {Dimension of matrices} k; Local a, b, c; Begin for k:=1 to n-1 forall P(i,j) where ≤ i,j < n if i ≥ k then a:= fromleft(a); if j ≥ k then b:=fromdown(b); end forall; endfor k; nh Vi en Stagger matrices Si a[0 n-1,0 n-1] and b[0 n-1,0 n-1] https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Baùch Khoa Tp.HCM SinhVienZone.com -11- om Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) Si nh Vi en Zo Compute dot product ne C forall P(i,j) where ≤ i,j < n c:= a*b; end forall; for k:=1 to n-1 forall P(i,j) where ≤ i,j < n a:= fromleft(a); b:=fromdown(b); c:= c + a*b; end forall; endfor k; End https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -12- om Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) Si nh Vi en Zo ne C Can we implement the above mentioned algorithm on a 2-D mesh SIMD model without wrapparound connection? https://fb.com/sinhvienzonevn Khoa Coâng Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -13- om Matrix Multiplication Algorithm for Multiprocessors Design strategy Zo ne C – If load balancing is not a problem, maximize grain size Grain size: the amount of work performed between processor interactions nh Vi en Things to be considered Si – Parallelizing the most outer loop of the sequential algorithm is a good choice since the attained grain size (0(n3/p)) is the biggest – Resolving memory contention as much as possible https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -14- om Matrix Multiplication Algorithm for UMA Multiprocessors {Dimension of matrices} {Two input matrices} {Product matrix} Si nh Vi en Zo ne C Algorithm using p processors Global n, a[0 n-1,0 n-1], b[0 n-1,0 n-1]; c[0 n-1,0 n-1]; Local i,j,k,t; Begin forall Pm where ≤ m ≤ p for i:=m to n step p for j:= to n to t:=0; for k:=1 to n t:=t+a[i,k]*b[k,j]; endfor j; c[i][j]:=t; endfor i; end forall; End https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -15- om Matrix Multiplication Algorithm for NUMA Multiprocessors Things to be considered nh Vi en Design strategy Zo ne C – Try to resolve memory contention as much as possible – Increase the locality of memory references to reduce memory access time – Reduce average memory latency time by increasing locality Si The block matrix multiplication algorithm is a reasonable choice in this situation – Section 7.3, p.187, Parallel Computing: Theory and Practice https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Baùch Khoa Tp.HCM SinhVienZone.com -16- om Matrix Multiplication Algorithm for Multicomputers We will study algorithms on multicomputers Si nh Vi en Zo ne C – Row-Column-Oriented Algorithm – Block-Oriented Algorithm https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Baùch Khoa Tp.HCM SinhVienZone.com -17- om Row-Column-Oriented Algorithm The processes are organized as a ring Si nh Vi en Zo ne C – Step 1: Initially, each process is given row of the matrix A and column of the matrix B – Step 2: Each process uses vector multiplication to get element of the product matrix C – Step 3: After a process has used its column of matrix B, it fetches the next column of B from its successor in the ring – Step 4: If all rows of B have already been processed, quit Otherwise, go to step https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -18- om Row-Column-Oriented Algorithm (cont’d) ne C Why we have to organize processes as a ring and make them use B’s rows in turn? Design strategy 7: Si nh Vi en Zo – Eliminate contention for shared resources by changing the order of data access https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -19- om Row-Column-Oriented Algorithm (cont’d) C ne B A B C A B C nh Vi en Zo A C Example: Use processes to mutliply two matrices A4*4 and B4*4 1st iteration C B Si A https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -20- om Row-Column-Oriented Algorithm (cont’d) C ne B A B C A B C nh Vi en Zo A C Example: Use processes to mutliply two matrices A4*4 and B4*4 2nd iteration C B Si A https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -21- om Row-Column-Oriented Algorithm (cont’d) C ne B A B C A B C nh Vi en Zo A C Example: Use processes to mutliply two matrices A4*4 and B4*4 3rd iteration C B Si A https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -22- om Row-Column-Oriented Algorithm (cont’d) C ne B A B C A B C C B 4th iteration (the last) Si A nh Vi en Zo A C Example: Use processes to mutliply two matrices A4*4 and B4*4 https://fb.com/sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone.com -23- ... Parallel Computing: Theory and Practice https://fb .com/ sinhvienzonevn Khoa Coâng Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone. com -16- om Matrix Multiplication Algorithm for Multicomputers... At this point processor P(1,2) has computed the dot product c1,2 https://fb .com/ sinhvienzonevn Khoa Công Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone. com -10- om Matrix Multiplication... n-1] https://fb .com/ sinhvienzonevn Khoa Coâng Nghệ Thông Tin – Đại Học Bách Khoa Tp.HCM SinhVienZone. com -11- om Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) Si nh Vi en Zo Compute dot product

Định dạng
Số trang	23
Dung lượng	404 KB