fast parallel integration for three dimensional discontinuous petrov galerkin method

Procedia Computer Science Procedia Computer Science 101, 2016, Pages – 17 YSC YSC 2016 2016 5th 5th International InternationalYoung Young Scientist Scientist Conference Conference on on Computational Computational Science, Science Fast parallel integration for three dimensional Discontinuous Petrov Galerkin method Maciej Wo´zniak1 , Marcin Lo´s1 , Maciej Paszy´ nski1 , and Leszek Demkowicz2 AGH University of Science and Technology Krak´ ow, Poland macwozni@agh.edu.pl, los@agh.edu.pl paszynsk@agh.edu.pl The University of Texas at Austin, Austin, Texas, U.S.A leszek@ices.utexas.edu Abstract Finite Element Method comes with a challenge of constructing test functions, that provide better stability Discontinuous Petrov-Galerkin method constructs optimal test functions “on the fly” However this method comes with relatively high computational cost In this paper we show a parallelization method to reduce computation time Keywords: Finite Element Method, Discontinuous Petrov-Galerkin, parallel, shared memory Introduction In this paper we present a parallelization of the algorithm for generation of the element matrices for Discontinuous Petrov Galerkin (DPG) method [3, 4, 5] The DPG method is a new rapidly growing method for solving the numerical problems It enables for automatic control of the stability of the numerical formulations We have parallelized the element routines of the hp3d DPG code developed by the group of prof Demkowicz We are aware of other parallel FEM packages supporting adaptive computations for DPG, including CAMELIA [14] and DUNEDPG [10] However, the hp3d framework is unique in the following ways: • It supports hexahedral, tetrahedral, prism and pyramid elements in 3D To our best knowledge, CAMELIA supports only hexahedral elements, and DUNE supports triangular elements only • It enables for parallel anisotropic refinements over computational domain distributed with different kind of finite elements, including tetrahedral, hexahedra, prism and pyramid The CAMELIA and DUNE packages not allow for anisotropic refinements, and thus the exponential convergence of the numerical solution is not possible there • CAMELIA and DUNE not support complex Hcurl discretization Our framework will enable for parallel automatic hp-adaptive computations for different classes of problems, including H1, Hdiv, and Hcurl doi:10.1016/j.procs.2016.11.003 Peer-review under responsibility of organizing committee of the scientific committee of the 5th International Young Scientist Conference on Computational Science © 2016 The Authors Published by Elsevier B.V Fast parallel integration for 3D DPG method Wo´ zniak, Lo´s, Paszy´ nski and Demkowicz Our preliminary work presented in this paper concerns the parallelization of the element matrices for the elliptic problem However, our future work will involve parallelization of Hdiv and Hcurl element routines Let us focus on the model elliptic problem In the Sobolev space H01 (Ω) = {u ∈ L2 (Ω) : Dα u ∈ L2 Ω, |α| ≤ 1, tr u = on ∂Ω} (1) we introduce classical weak formulation for Poisson problem in H01 (Ω) We seek for u ∈ H01 (Ω) ∇u∇v dx = Ω ∀v ∈ H01 (Ω) f v dx (2) Ω We may also express the above problem with abstract notation: b(u, v) = l(v) (3) where in our model problem we have b(u, v) = ∇u∇v dx (4) f v dx (5) Ω l(v) = Ω We project the weak problem into the finite dimensional subspace Vh ⊂ H01 (Ω) ∇uh ∇vh dx = Ω f vh dx ∀vh ∈ Vh ⊂ H01 (Ω) (6) Ω The actual mathematical theory concerning the stability of the numerical method for general weak formulation (2) is based on the famous “Babu´ska-Brezzi condition” (BBC) developed in years 1971-1974 at the same time by Ivo Babu´ska, and Franco Brezzi [13, 12, 9] The condition states that the problem (2) is stable when sup v∈V |b(u, v)| ≥γ u v V U ∀u ∈U (7) However, the inf-sup condition in the above form concerns the abstract formulation where we consider all the test functions from v ∈ V and look for solution at u ∈ U (e.g U = V ) The above condition is satisfied also if we restric to the space of trial functions uh ∈ Uh sup v∈V |b(uh , v)| ≥ γ uh v V Uh (8) However, if we use test functions from the finite dimensional test space Vh = span{vh } sup vh ∈Vh |b(uh , vh )| ≥ γ h uh v h Vh Uh (9) we not have a guarantee that the supremum (9) will be equal to the original supremom (7), since we have restricted V to Vh The optimality of the method depends on the quality Fast parallel integration for 3D DPG method Wo´ zniak, Lo´s, Paszy´ nski and Demkowicz of the polynomial test functions defining the space Vh = span{vh } and how far are they from the supremum realized in (7) Many scientists spent their lives on constructing test functions providing better stability of the method for given class of problems [11, 7, 8, 1] They have developed several techniques for stabilization different kind of problems In 2010 the DPG was proposed, with the modern summary of the method described in [2] The key idea of the DPG method is to construct the optimal test functions “on the fly”, element by element The DPG automatically guarantee the numerical stability of difficult computational problems, thanks to the automatic selection of the optimal basis functions Derivation of the DPG method The DPG method can be derived using one of the following three methods [2, 6] a) Minimum residuum method b) Petrov-Galerkin with optimal test functions c) Special mixed methods Minimum residuum method is good to illustrate the idea of the DPG method, however, it results in the optimal test functions problem that is as expensive as the original problem itself The construction of the DPG method with Petrov-Galerkin approach is relatively difficult, so we will only illustrate it here as a tool for making the optimization problem local over each finite element Special mixed method formulation of the DPG method is the most suitable for efficient implementation, since it results in a modification of the classical finite element method We will present this method to discuss the implementation issues Ad a) Let us start now from the derivation of the DPG method using the Minimum residuum method For our weak problem (2) we construct the operator B:U →V (10) such that Bu, v V ×V = b(u, v) (11) so we can reformulate the problem as Bu − l = (12) We wish to minimize the residual uh = arg wh ∈Uh Bwh − l 2 V (13) We introduce the Riesz operator being the isometric isomorphism RV : V v → (v, ) ∈ V (14) We can project the problem back to V uh = arg wh ∈Uh 10 −1 R (Bwh − l) V V (15) Fast parallel integration for 3D DPG method Wo´ zniak, Lo´s, Paszy´ nski and Demkowicz The minimum is attained at uh when the Gâteaux derivative is equal to in all directions: RV−1 (Buh − l), RV−1 (B δuh ) V = ∀ δuh ∈ Uh (16) Using again the definition of the Riesz map we get Buh − l, RV−1 (B δuh ) = ∀ δuh ∈ Uh (17) which is equivalent to our original residuum problem Bu − l, vδuh = ∀uδVh (18) vδuh = RV−1 B δuh for each trial function δuh (19) with optimal test functions In other words, with the help of the Riesz operator, it is possible to construct the optimal test functions [2] However, with the traditional weak formulation, the numerical solution of this optimization problem is as expensive as the solution of the original problem itself [2] In order to make the basis function optimization problem local over particular finite elements, we need to break the test spaces and reformulate the weak formulation Ad b) In other words we switch now to the Petrov-Galerkin method, where we use original trial functions and broken test functions In order to allow for local element-wise solution of the problem of the optimization of the test functions, we derive the weak formulation with broken test spaces We introduce the space of broken test functions H (Ωh ) = {u ∈ L2 (Ω) : u|K ∈ H (K) ∀K ∈ Th } (20) and we introduce the weak formulation with broken test functions We seek for u ∈ H01 (Ω) as well as for fluxes t ∈ trΓh H(div, Ω) ∇u∇v dx + K K We denote by t, v = K ∂K K ∂K f v dx for all v ∈ H (Ωh ) tv ds = (21) Ω tv ds Everythig now is computed element-wise and summed up In particular the term t, v is also computed element-wise v, t = tv ds = K ∂K v · nf [v] ds f (22) f where f denotes the faces of an element and nf denotes the face normal vector, and [v] = v v[+] − v[−] for f ⊂ ∂Ω for f ⊂ Ω \ ∂Ω (23) (for element faces located on the boundary of the domain we just take the normal component of v, but for element faces located inside the domain, we consider difference of the normal components v[+] and v[−] resulting from the two elements sharing the common face) Over a 11 Fast parallel integration for 3D DPG method Wo´ zniak, Lo´s, Paszy´ nski and Demkowicz single element, we have the following contributions now: element frontal matrices ∂K tv ds and right-hand-sides K K ∇u∇v dx + f v dx Notice that the first term comes from the integration inside the element, and the second term results from the integration on the boundary of the element Again, we express the above weak-formulation with broken basis functions in the following abstract form b((u, t), v) = l(v) (24) where b((u, t), v) = b(u, v) + t, v (25) When we compare the standard weak formulation and the formulation with broken spaces, we can see that in the second one we have more unknowns, namely (u, t) For such the weak formulation with broken test functions, we can compute automatically the optimal test functions This is the main idea of the DPG method (Discontinuous Petrov-Galerkin method) - to construct on fly, element by element, the optimal test functions that will ensure numerical stability of the method However, the implementation with this Petrov-Galerkin method is relatively difficult, and we will rather switch to the formulation of the DPG with special mixed method Ad c) We will focus now on the formulation of the DPF by using the special mixed method, with error representation function [6] The error representation functions given by Ψ = RV−1 (Buh − l) (26) allows us to develop alternative formulation of the DPG method: Find Ψ ∈ V, uh ∈ Uh , t ∈ trΓh H(div, K) such as (Ψ, δΨ) − b(u, δΨ) − t, δΨ = −l(δΨ) for all δΨ (27) b(δu, Ψ) = for all δu (28) δt, Ψ = for all δt (29) Based on the above formulation, we can construct now element matrices for this problem ⎡ ⎤⎡ ⎤ ⎡ ⎤ G −B1 −B2 Ψ −l ⎣B1T 0 ⎦ ⎣u⎦ = ⎣ ⎦ (30) t 0 B2T Computational cost of generation of element matrices in DPG method The formulation of the DPG with special mixed method results in the structure of the element local matrices, as presented in equation (30) for the case of approximation in H (K) spaces Analogous formulation for H(div, K) and H(curl, K) approximations also results in a similar structure of the element local matrix, but with different distribution of base functions and degrees of freedom over element vertices, edges, faces and interiors In the following estimations we assume three dimensional hexahedral finite elements with vertices, edges, faces and interior ui ei is approximated over the element with polynomials of order p nodes Notice that u ≈ from the space uh ∈ Uh ⊂ H (K) This means that we have one degree of freedom per each vertex node, p − degrees of freedom per each edge node, (p − 1)2 degrees of freedom per each 12 Fast parallel integration for 3D DPG method Wo´ zniak, Lo´s, Paszy´ nski and Demkowicz face node and (p − 1)3 degrees of freedom per each interior node Traces t ≈ i=1, ,O(p3 ) ti fi are approximated with polynomials of order p from the space trΓh H(div, K), on the boundary of elements only This means that we have one degree of freedom per each vertex node, p − degrees of freedom per each edge node, and (p − 1)2 degrees of freedom per each face node The error representation function Ψ ≈ i=1, ,O((p+Δp)3 ) Ψi ei is approximated with polynomials of order p + Δp (from the enriched space) also forming a subspace of H (K) This means that we have one degree of freedom per each vertex node, p + Δp − degrees of freedom per each edge node, (p + Δp − 1)2 degrees of freedom per each face node and (p + Δp − 1)3 degrees of freedom per each interior node Summing, up: • The G is the Gram matrix, and it is a block-diagonal matrix • u ≈ i=1, ,p ui ei is approximated with polynomial of order p, which means over the 3D element there are O(p3 ) unknowns related to u, since they are defined over element vertices, edges, faces and interiors • t ≈ i=1, ,p ti ei is approximated with polynomial of order p, which means over the 3D element there are also O(p2 ) unknowns related to t, since the fluxes are defined over element edges and faces Ψ ≈ i=1, ,p+Δp Ψi ei is approximated with polynomial of order p + Δp which means over the 3D element there are also O((p + Δp)3 ) unknowns related to Ψ Thus, the Gram matrix has a square shape of size O((p + Δp)3 × (p + Δp)3 ), the matrix B1 has rectangular shape of size O((p + Δp)3 × p2 ) and the matrix B2 has rectangular shape of size O((p + Δp)3 × p3 ) The cost of generation of the Gram matrix is of the order of O(p9 + Δp9 ) using Gaussian quadratures Parallel OpenMP implementation In general, the generation of the DPG matrices involves nested loops, starting from the Gauss integration points, through test basis functions, to trial basis functions The generation of the matrices involves Gramm matrix and the so-called extended HH matrices use omp lib c loop over i n t e g r a t i o n points !$OMP PARALLEL DO !$OMP& DEFAULT(SHARED) !$OMP& PRIVATE( l , xi , wa , weight , k1 , v1 , dv1 , k2 , v2 , dv2 , k , gradH , !$OMP& nrdofH , shapH , x , dxdxi , z f v a l , i f l a g , r j a c , dxidx ) !$OMP& FIRSTPRIVATE( nrdofHH ) !$OMP& REDUCTION(+:BLOADH) !$OMP& REDUCTION(+:AP) !$OMP& REDUCTION(+:STIFFHH) l =1, n i n t c Prepare common data ( e g i n t e g r a t i o n p o i n t s , w e i g h t s ) 13 Fast parallel integration for 3D DPG method Wo´ zniak, Lo´s, Paszy´ nski and Demkowicz Figure 1: Execution time of the parallel integration algorithm for a single DPG element, when increasing number of cores 3D hexahedral element with cubic polynomials Figure 2: Parallel efficiency of the parallel integration algorithm for a single DPG element 3D hexahedral element with cubic polynomials c c c 14 f i r s t l o o p through e n r i c h e d H1 t e s t f u n c t i o n s k1 =1,nrdofHH compute t h e RHS e n t r y BLOADH( k1 ) = BLOADH( k1 ) + z f v a l ∗ v1 ∗ w e i g h t l o o p through H1 t r i a l f u n c t i o n s k2 =1,nrdofHH dv2 ( : ) = gradHH ( , k2 ) ∗ dxidx ( , : ) + gradHH ( , k2 ) ∗ dxidx ( , : ) + gradHH ( , k2 ) ∗ dxidx ( , : ) Fast parallel integration for 3D DPG method Wo´ zniak, Lo´s, Paszy´ nski and Demkowicz Figure 3: Parallel speedup of the parallel integration algorithm for a single DPG element 3D hexahedral element with cubic polynomials −− GRAM MATRIX −− ( s t o r e d i n t r i a n g u l a r format ) d e t e r m i n e i n d e x i n t r i a g u l a r format k = nk ( k1 , k2 ) AP( k ) = AP( k ) + ( dv1 ( ) ∗ dv2 (1)+ dv1 ( ) ∗ dv2 (2)+ dv1 ( ) ∗ dv2 ( ) +v1 ∗ v2 ) ∗ w e i g h t enddo c l o o p through H1 t r i a l f u n c t i o n s k2 =1, nrdofH v2 = shapH ( k2 ) dv2 ( : ) = gradH ( , k2 ) ∗ dxidx ( , : ) + gradH ( , k2 ) ∗ dxidx ( , : ) + gradH ( , k2 ) ∗ dxidx ( , : ) c −− EXTENDED HH STIFFNESS MATRIX −− c Poisson equation STIFFHH( k1 , k2 ) = STIFFHH( k1 , k2 ) + ( dv1 ( ) ∗ dv2 (1)+ dv1 ( ) ∗ dv2 (2)+ dv1 ( ) ∗ dv2 ( ) ) ∗ w e i g h t enddo enddo enddo !$OMP END PARALLEL DO c c We have parallelized the two external loops, through Gauss integration points and through rows (test functions) 15 Fast parallel integration for 3D DPG method Wo´ zniak, Lo´s, Paszy´ nski and Demkowicz Numerical results We summarized our paper with parallel implementation of the DPG element matrices generator We focused on the simple Poisson problem, with the pseudo-code described in previous section The numerical experiments have been performed for hexahedral DPG elements with cubic polynomials, integrated over Linux cluster node equipped with 14 cores We can observe the efficiency going through 70% of cores, 60 % on 11 cores down to 50 percent on 14 cores The corresponding speedup reaches up to 6.5 on 10 cores (see figures [1-3]) All computations were performed on a × × elements mesh Conclusions In this paper we presented a scalability of parallel OpenMP integration of DPG matrices The parallel integrator has been obtained through OpenMP parallelization of sequential 3D DPG code developed for model Poisson problem by the group of prof Demkowicz We observe a speedup up to 6.5 on 10 cores Future work will include the domain decomposition based parallelization of the DPG code, with hybrid parallelism including MPI on the level of particular elements and OpenMP on the level of matrices Acknowledgements The work of MW was supported by Deans grant no 15.11.230.270 References [1] Franco Brezzi, Marie-Odile Bristeau, Leopoldo P Franca, Michel Mallet, and Gilbert Rog A relationship between stabilized finite element methods and the galerkin method with bubble functions Computer Methods in Applied Mechanics and Engineering, 96:117–129, 1992 [2] Leszek Demkowicz and Jay Gopalakrishnan Recent developments in discontinuous galerkin finite element methods for partial differential equations IMA Volumes in Mathematics and its Applications, 157:149–180, 2014 An Overview of the DPG Method [3] Leszek Demkowicz and Jay Gopalarkishnan A class of discontinuous petrov-galerkin methods part i: The transport equation 199:1558–1572, 2010 [4] Leszek Demkowicz and Jay Gopalarkishnan Numerical methods for partial differential, a class of discontinuous petrov-galerkin methods part ii optimal test functions 27:70–105, 2011 [5] Leszek Demkowicz, Jay Gopalarkishnan, and Anti Niemmi A class of discontinuous petrovgalerkin methods part iii: Adaptivity 62:396–427, 2012 [6] Tim Ellis, Leszek Demkowicz, and Jessy Chan Locally conservative discontinuous petrov-galerkin finite elements for fluid problems 68:1530–1549, 2014 [7] Leopoldo P Franca, Sergio L Frey, and Thomas J.R Hughes Stabilized finite element methods: I application to the advective-diffusive model Computer Methods in Applied Mechanics and Engineering, 95:253–276, 1992 [8] Leopoldo P Franca and Srgio L Frey Stabilized finite element methods: Ii the incompressible navier-stokes equations Computer Methods in Applied Mechanics and Engineering, 99:209–233, 1992 [9] Brezzi Franco On the existence, uniqueness and approximation of saddle-point problems arising from lagrange multipliers ESAIM: Mathematical Modelling and Numerical Analysis - Modlisation Mathmatique et Analyse Numrique, 8:129–151, 1974 16 Fast parallel integration for 3D DPG method Wo´ zniak, Lo´s, Paszy´ nski and Demkowicz [10] F Gruber, A Klewinghaus, and O Mula The dune-dpg library for solving pdes with discontinuous petrov-galerkin finite elements http://arxiv.org/abs/1602.08338, 2016 [11] Thomas J.R Hughes, Guglielmo Scovazzi, and Tayfun Tezduyar Stabilized methods for compressible flows Journal of Scientific Computing, 43:343–368, 2010 [12] Babuska Ivo Error bounds for finite element method 16:322–333, 1971 [13] Demkowicz Leszek Babuska ↔ brezzi Technical Report 0608, The University of Texas at Austin, 2006 [14] Nathan Roberts Camelia: A software framework for discontinuous petrov-galerkin methods https://github.com/CamelliaDPG/Camellia, 2016 17 ... the DPG method The DPG method can be derived using one of the following three methods [2, 6] a) Minimum residuum method b) Petrov- Galerkin with optimal test functions c) Special mixed methods... , : ) Fast parallel integration for 3D DPG method Wo´ zniak, Lo´s, Paszy´ nski and Demkowicz Figure 3: Parallel speedup of the parallel integration. .. enddo !$OMP END PARALLEL DO c c We have parallelized the two external loops, through Gauss integration points and through rows (test functions) 15 Fast parallel integration for 3D DPG method Wo´ zniak,

Định dạng
Số trang	10
Dung lượng	327,04 KB