7 SOLUTION OF LARGE SYSTEMS OF EQUATIONS As we saw from the preceeding sections, both the straightforward spatial discretization of a steady-state problem and the implicit time discretization of a transient problem will yield a large system of coupled equations of the form K · u = f (7.1) There are two basic approaches to the solution of this problem: (a) directly, by some form of Gaussian elimination; or (b) iteratively We will consider both here, as any solver requires one of these two, if not both 7.1 Direct solvers The rapid increase in computer memory, and their suitability for shared-memory multiprocessing computing environments, has lead to a revival of direct solvers (see, e.g., Giles et al (1985)) Three-dimensional problems once considered unmanageable due to their size are now being solved routinely by direct solvers (Wigton et al (1985), Nguyen et al (1990), Dutto et al (1994), Luo et al (1994c)) This section reviews the direct solvers most commonly used 7.1.1 GAUSSIAN ELIMINATION This is the classic direct solver The idea is to add (subtract) appropriately scaled rows in the system of equations in order to arrive at an upper triangular matrix (see Figure 7.1(a)): K·u=f→U·u=f (7.2) To see how this is done in more detail, and to obtain an estimate of the work involved, we rewrite (7.1) as K ij uj = f i (7.3) Suppose that the objective is to obtain vanishing entries for all matrix elements located in the j th column below the diagonal K jj entry This can be achieved by adding to the kth row (k > j ) an appropriate fraction of the j th row, resulting in (K kl + αk K jl )ul = f k + αk f j , k > j Applied Computational Fluid Dynamics Techniques: An Introduction Based on Finite Element Methods, Second Edition Rainald Lưhner © 2008 John Wiley & Sons, Ltd ISBN: 978-0-470-51907-3 (7.4) 138 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES K u K f u f' = = 0 a) ) K u f U L u f 0 = = 0 ) b) K u f t L L u f 0 = = 0 c) ) Figure 7.1 Direct solvers: (a) Gaussian elimination; (b) Crout decomposition; (c) Cholesky Such an addition of rows will not change the final result for u and is therefore allowable For the elements located in the j th column below the diagonal K jj entry to vanish, we must have αk = − K kj K jj (7.5) The process is started with the first column and repeated for all remaining ones Once an upper triangular form has been obtained, the solution becomes trivial Starting from the bottom right entry, all unknowns u are obtained recursively from the preceding column: ui = (U ii )−1 (f − U ij )uj , i j > i (7.6) The work required to solve a system of equations in this way is as follows For a full matrix, the matrix triangularization requires O(N) multiplications for each column, i.e O(N ) operations for all columns As this has to be repeated for each row, the total estimate is O(N ) The solution phase requires O(N) operations for each row, i.e O(N ) operations for all unknowns If the matrix has a banded structure with bandwidth Nba , these estimates reduce to O(NNba ) for the matrix triangularization and O(NNba ) for the solution phase Gaussian elimination is seldomly used in practice, as the transformation of the matrix changes the RHS vector, thereby rendering it inefficient for systems with multiple RHS 139 SOLUTION OF LARGE SYSTEMS OF EQUATIONS 7.1.2 CROUT ELIMINATION In this case, the matrix is decomposed into an upper and lower triangular portion (see Figure 7.1(b)): K · u = L · U · u = f (7.7) One can choose the diagonal entries of either the upper or the lower triangular matrix to be of unit value Suppose that the diagonal entries of the upper triangular matrix are so chosen, and that the matrix has been decomposed up to entry i − 1, i − 1, i.e the entries : i − 1; : −1 of L and U are known The entries along row i are given by j K ij = Lik U kj (7.8) k=1 Given that U kk = 1, we obtain, recursively, j −1 Lij = K ij − Lik U kj (7.9) k=1 The entries along column i are given by j K ji = Ljk U ki (7.10) k=1 This allows the recursive calculation of U ji from j −1 U ji = K ji − Ljk U ki Ljj k=1 (7.11) The value for the diagonal entry Lii is then obtained from i−1 Lii = K ii − Lik U ki (7.12) k=1 This completes the decomposition of the ith row and column The process is started with the first row and column, and repeated for all remaining ones Once the decomposition is complete, the system is solved in two steps: - Forward substitution: L · v = f, followed by - Backward substitution: U · u = v Observe that the RHS is not affected by the decomposition process This allows the simple solution of multiple RHS 140 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES 7.1.3 CHOLESKY ELIMINATION This special decomposition is only applicable to symmetric matrices The algorithm is almost the same as the Crout decomposition, except that square roots are taken for the diagonal elements This seemingly innocuous change has a very beneficial effect on rounding errors (Zurmühl (1964)) All direct solvers have a storage and operation count as follows: Operations: O(Neq Nba ), Storage: O(Neq Nba ), where Nba is the bandwidth of the matrix As the bandwidth increases, so the possibilities for vectorization and parallelization Very efficient direct solvers for multi-processor vector machines have been reported (Nguyen et al (1990), Dutto et al (1994)) As the number of equations Neq is fixed, the most important issue when trying to optimize direct solvers is the reduction of the bandwidth Nba This is an optimization problem that is Np complete, i.e many heuristic solutions can be obtained that give the same or nearly the same cost function (bandwidth in this case), but the optimum solution is practically impossible to obtain Moreover, the optimum solution may not be unique As an example, consider a square domain discretized by N × N quadrilateral elements Suppose further that Poisson’s equation with Dirichlet boundary conditions is to be solved numerically, and that the spatial discretization consists of bilinear finite elements Starting any numbering in the same way from each of the four corners will give the same bandwidth, storage and CPU requirements, and hence the same cost function Bandwidth reduction implies a renumbering of the nodes, with the aim of bringing all matrix entries closer to the diagonal The main techniques used to accomplish this are (Piessanetzky (1984)): - Cuthill–McKee (CMK), and reverse CMK, which order the points according to lowest connectivity with surrounding points at each level of the corresponding graph (Cuthill and McKee (1969)); - wavefront, whereby the mesh is renumbered according to an advancing front; and - nested dissection, where the argument of bandwidth reduction due to recursive subdivision of domains is employed (George and Liu (1981)) The first two approaches have been used extensively in structural finite element analysis Many variations have been reported, particularly for the ‘non-exact’ parameters such as starting point, depth of search and trees, data structures, etc Renumbering strategies reappear when trying to minimize cache-misses, and are considered in more depth in Chapter 15 7.2 Iterative solvers When (7.1) is solved iteratively, the matrix K is not inverted directly, but the original problem is replaced by a sequence of solutions of the form ˜ ˜ K · (un+1 − un ) = K · u= τr = τ (f − K · u) (7.13) ˜ ˜ The vector r is called the residual vector, and K the preconditioning matrix The case K = K corresponds to a direct solution, and the sequence of solutions stops after one iteration The ˜ aim is to approximate K by some low-cost, yet ‘good’ K ‘Good’ in this context means that: SOLUTION OF LARGE SYSTEMS OF EQUATIONS 141 ˜ (a) K is inexpensive to decompose or solve for; ˜ (b) K contains relevant information (eigenvalues, eigenvectors) about K Unfortunately, these requirements are contradictory What tends to happen is that the low-eigenvalue (i.e long-wavelength eigenmodes) information is lost when K is approx˜ imated with K To circumvent this deficiency, most practical iterative solvers employ a ‘globalization’ procedure that counteracts this loss of low-eigenvalue information Both the approximation and the globalization algorithms employed may be grouped into three families: operator-based, grid-based and matrix-based We just mention some examples here ˜ Formation of K: F1 Operator-based: approximate factorization of implicit FDM CFD codes (Briley and McDonald (1977), Beam and Warming (1978), MacCormack (1982)); F2 Grid-based: point/ element-by-element/ red–black/ line/ zebra/ snake/ linelet/ plane relaxation/ Jacobi/ Gauss–Seidel/ etc (Wesseling (2004)); F3 Matrix-based: incomplete lower-upper (LU), drop-tolerances, etc (Nicolaides (1987), Saad (2003), van der Vorst (2003)) Globalization or acceleration: G1 Operator-based: Tshebishev, supersteps, etc.; G2 Grid-based: projection (one coarser grid), multigrid (n coarser grids); G3 Matrix-based: dominant eigenvalue extrapolation, conjugate gradient (CG), generalized minimal residuals (GMRES), algebraic multigrid (AMG) 7.2.1 MATRIX PRECONDITIONING In order to be more specific about the different techniques, we rewrite the matrix K either as a sum of lower, diagonal and upper parts K = L + D + U, (7.14) or as the product of a number of submatrices m K= Kj (7.15) j =1 One can then classify the different preconditioners by the degree of discrepancy (or neglect) ˜ between K and K 142 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES 7.2.1.1 Diagonal preconditioning The simplest preconditioners are obtained by neglecting all off-diagonal matrix entries, resulting in diagonal preconditioning ˜ K = D (7.16) The physical implication of this simplification is that any transfer of information between points or elements can only be accomplished on the RHS during the iterations (equation (7.13)) This implies that information can only travel one element per iteration, and is similar to explicit timestepping with local timesteps 7.2.1.2 Block-diagonal preconditioning For systems of equations it is often necessary to revert to block-diagonal preconditioning All the matrix entries that correspond to edges i, j , ∀i = j are still neglected, but the entries for i, j , ∀i = j are kept The result is a set of blocks of magnitude neqns*neqns along the diagonal For the Euler equations, this results in × blocks For the Navier– Stokes equations with a k − turbulence model × blocks are obtained As before, the propagation of information between gridpoints can only occur on the RHS during the iterations, at a maximum speed of one element per iteration The advantage of blockdiagonal preconditioning is that it removes the stiffness that may result from equations with different time scales A typical class of problems for which block-diagonal preconditioning is commonly used is chemically reacting flows, where the time scales of chemical reactions may be orders of magnitude smaller than the (physically interesting) advection time scale of the fluid 7.2.1.3 LU preconditioning Although point preconditioners are extremely fast, all the inter-element propagation of information occurs on the RHS, resulting in slow convergence rates Faster information ˜ transfer can only be achieved by neglecting fewer entries from K in K, albeit at higher CPU and storage costs If we recall that the solution of a lower (or upper) matrix by itself is simple, a natural way to obtain better preconditioners is to attempt a preconditioner of the form ˜ ˜ ˜ K = KL · KU (7.17) Given a residual r = f − K · u, the new increments are obtained from ˜ KL · u = r, ˜ KU · u= u (7.18) ˜ Consider the physical implication of KL : the unknowns corresponding to point i take into account the new values already obtained for points i − 1, i − 2, i − 3, , Likewise, ˜ for KU , the unknowns corresponding to point i take into account the new values already obtained for points i + 1, i + 2, i + 3, , n This implies that the information flows with ˜ ˜ the numbering of the points In order to propagate the information evenly, the KL , KU are invoked alternately, but in some cases it may be advisable to have more than one numbering for the points to achieve consistent convergence rates If the numerical flow of information can be matched to the physical flow of information, a very good preconditioner is achieved Cases where this is possible are supersonic flows, where the numbering of the points matches the streamlines 143 SOLUTION OF LARGE SYSTEMS OF EQUATIONS Gauss–Seidel variants The perhaps simplest LU preconditioner is given by the choice ˜ KU = D + U ˜ KL = L + D, (7.19) ˜ The resulting matrix K is then ˜ ˜ ˜ K = KL · KU = (L + D) · (D + U) = L · D + L · U + D · D + D · U (7.20) Comparing this last expression to (7.14), we see that this form of LU decomposition does not approximate the original matrix K well An extra diagonal term has appeared This may be remedied by interposing the inverse of the diagonal between the lower and upper matrices, resulting in ˜ ˜ ˜ K = KL · D−1 · KU = (L + D) · D−1 · (D + U) = K + L · D−1 · U (7.21) The error may also be mitigated by adding, for subsequent iterations, a correction with the latest information of the unknowns This leads to two commonly used schemes: - Gauss–Seidel (GS) (L + D) · u1 = r − U · u0 , u=r−L· u1 , (D + U) · (7.22) which is equivalent to K u = (L + D + U) · u=r+L·( u− u1 ); (7.23) - lower-upper symmetric GS (LU-SGS) (L + D) · D−1 · (D + U) · u = r + L · D−1 · U · u0 , which is equivalent to K u = (L + D + U) · u = r + L · D−1 · U · ( u0 − u) (7.24) In most cases u0 = GS and LU-SGS have been used extensively in CFD, both as solvers and as preconditioners In this context, very elaborate techniques that combine physical insight, local eigenvalue decompositions and scheme switching have produced very fast and robust preconditioners (Sharov and Nakahashi (1998), Luo et al (1998), Sharov et al (2000a), Luo et al (2001)) Diagonal+1 preconditioning Consider a structured grid of m × n points Furthermore, assume that a discretization of the Laplacian using the standard stencil −ui−1,j − ui,j −1 + 4ui,j − ui+1,j − ui,j +1 = ri (7.25) 144 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES M*N N M*N N M*N 2N+1 N+1 3N 2N N (a) (b) Figure 7.2 Matrix resulting from m × n structured grid is being performed The resulting K matrix for the numbering shown in Figure 7.2(a) is depicted in Figure 7.2(b) As one can see, K consists of a tridiagonal core D and regular off-diagonal ‘bands’ If K is rewritten as K=L +D +U, (7.26) the diagonal+1 preconditioning is defined by ˜ K=D (7.27) In this case, the information will be propagated rapidly via the LHS between all the points that form a tridiagonal system of equations, and slowly on the RHS between all other points As before, the ordering of the points guides the propagation of information during the iterative procedure Tridiagonal or block-tridiagonal systems result from point orderings that form ‘lines’ or ‘snakes’ when the points are renumbered For example, the ordering shown in Figure 7.2 resulted in a considerable number of ‘lines’, leading to an equal number of tridiagonal systems Given that the fastest information flow is according to the point numbering, the renumbering of points should form ‘lines’ that are in the direction of ˜ maximum stiffness In this way, the highest possible similarity between K and K is achieved (Hassan et al (1990), Martin and Löhner (1992), Mavriplis (1995), Soto et al (2003)) For cases where no discernable spatial direction for stiffness exists, several point renumberings should be employed, with the aim of covering as many i, j , ∀i = j entries as possible Diagonal+1 Gauss–Seidel ˜ As before, the unknowns already obtained during the solution of K = D can be re-used with minor additional effort, resulting in diagonal+1 GS preconditioning For structured grids, this type of preconditioning is referred to as line GS relaxation The resulting preconditioning matrices are of the form ˜ ˜ (7.28) KL = L + D , KU = D + U 145 SOLUTION OF LARGE SYSTEMS OF EQUATIONS 7.2.1.4 Incomplete lower-upper preconditioning All the preconditioners described so far avoided the large operation count and storage requirements of a direct inversion of K by staying close to the diagonal when operating ˜ with K For incomplete lower-upper (ILU) preconditioning, the product decomposition of the Crout solver K=L·U (7.29) is used, but the fill-in that occurs for all the off-diagonal entries within the band is neglected This rejection of fill-in can be based on integer logic (i.e allowing NFILR fill-ins per column) or based on some drop-tolerance (i.e neglecting all entries whose magnitudes are below a threshold) The resulting preconditioning matrix is of the form ˜ ˜ ˜ K = L · U· (7.30) ˜ If K is tridiagonal, then K = K, implying perfect preconditioning The observation often ˜ made is that the quality of K depends strongly on the bandwidth, which in turn depends on the point numbering (Duff and Meurant (1989), Venkatakrishnan and Mavriplis (1993, ˜ 1995)) The smaller the bandwidth, the closer K is to K, and the better the preconditioning This is to be expected for problems with no discernable stiffness direction If, on the other hand, a predominant stiffness direction exists, the point numbering should be aligned with it This may or may not result in small bandwidths (see Figure 7.3 for a counterexample), but is certainly the most advisable way to renumber the points (a) (b) Figure 7.3 Counterexample Before going on, the reader should consider the storage requirements of ILU preconditioners Assuming the lower bound of no allowed fill-in (nfilr=0), a discretization of space using linear tetrahedra and neqns unknowns per point, we require ˜ ˜ nstor=2*neqns*neqns*nedge for L, U, which for the Euler or laminar Navier–Stokes equations with neqns=5 and on a typical mesh with nedge=7*npoin translates into nstor=350*npoin storage locations Given that a typical explicit Euler solver on the same type of grid only requires nstor=90*npoin storage locations, it is not difficult to see why even one more layer of fill-in is seldom used 146 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES 7.2.1.5 Block methods Considering that the cost of direct solvers scales with the square of the bandwidth, another possibility is to decompose K into blocks These blocks are then solved for directly The reduction in cost is a result of neglecting all matrix entries outside the block, leading to lower bandwidths With the notation of Figure 7.4, we may decompose K additively as K = Lb + Db + Ub (7.31) or as a product of block matrices: m K= Kj (7.32) j =1 K L D U 0 = + + 0 K L L L 0 = 0 * * * Figure 7.4 Decomposition of a matrix For the additive decomposition, one can either operate without reusing the unknowns at the solution stage, i.e just on the diagonal level, ˜ K = Db , (7.33) or, analogous to Gauss–Seidel, by reusing the unknowns at the solution stage, ˜ KL = Lb + Db , ˜ K U = Ub + Db (7.34) For the product decomposition, the preconditioner is of the form ˜ K = D2 m j =1 (I + Eb ) D , j (7.35) where I denotes the identity matrix, and E contains the off-diagonal block entries, scaled by D As before, the propagation of information is determined by the numbering of the blocks Typical examples of this type of preconditioning are element-by-element (Hughes et al (1983a,c)) or group-by-group (Tezduyar and Liou (1989), Tezduyar et al (1992a), Liou and Tezduyar (1992)) techniques 147 SOLUTION OF LARGE SYSTEMS OF EQUATIONS 7.2.2 GLOBALIZATION PROCEDURES As seen from the previous section, any form of preconditioning neglects some information from the original matrix K The result is that after an initially fast convergence, a very slow rate of convergence sets in In order to avert this behaviour, a number of acceleration or globalization procedures have been devised The description that follows starts with the analytical ones, and then proceeds to matrix-based and grid-based acceleration Let us recall the basic iterative scheme: ˜ ˜ K · (un+1 − un ) = K · u= τ ·r= τ · (f − K · u) (7.36) 7.2.2.1 Tchebichev acceleration This type of acceleration procedure may best be explained by considering the matrix system that results from the discretization of the Laplace operator on a grid of linear elements of constant size hx , hy , hz If the usual 3/5/7-star approximation to the Laplacian ∇2u = (7.37) ˜ is employed, the resulting discretization at node i, j, k for the Jacobi iterations with K = D and τ = t takes the form 4(1 + a + b2 ) ui,j,k = t[(ui−1,j,k − 2ui,j,k + ui+1,j,k ) + a (ui,j −1,k − 2ui,j,k + ui,j +1,k ) + b2 (ui,j,k−1 − 2ui,j,k + ui,j,k+1 )], (7.38) with a = hx / hy , b = hx / hz Inserting the Fourier mode p u = gm,n,l exp iπx mhx exp j πy nhy kπz lhz exp (7.39) yields a decay factor gm,n,l between iterations of the form gm,n,l = − (7.40) tf (a, b, m, n, l), with f (a, b, m, n, l) = 2(1 + a + b2 ) + cos π m + a + cos + b2 + cos π l π n (7.41) Note that we have lumped the constant portions of the grid and a mode into the function f (a, b, m, n, l) After p timesteps with varying t, the decay factor will be given by p gm,n,l = [1 − t q f (a, b, m, n, l)] (7.42) q=1 The objective is to choose a sequence of timesteps so that as many eigenmodes as possible are reduced The following two sequences have been used with success to date: 148 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES (a) Tchebicheff sequence (Löhner and Morgan (1987)): tq = , + cos [π · (q − 1)/p] q = 1, , p; (7.43a) (b) superstep sequence (Gentzsch and Schlüter (1978), Gentzsch (1980)): tq = + (R/p2 ) + cos [π · (2q − 1)/2p] , q = 1, , p, R = 2.5 (7.43b) Observe that in both cases the maximum timestep is of order t = O(p2 ), which is outside the stability range The overall procedure nevertheless remains stable, as the smaller timesteps ‘rescue’ the stability Figure 7.5 compares the performances of the two formulas for the 1-D case with that of uniform timesteps The improvement in residual reduction achieved by the use of non-uniform timesteps is clearly apparent 1.0 Eqn.(7.43a) Eqn.(7.43b) Tau=0.8 0.8 Damping 0.6 0.4 0.2 0.0 -0.2 -0.4 0.00 0.31 0.63 0.94 1.26 1.57 1.88 2.20 2.51 2.83 3.14 Phi Figure 7.5 Damping curves for the Laplacian Returning to (7.42), let us determine the magnitude of t required to eliminate a certain mode For any given mode gm,n,l , the mode can be eliminated by choosing a timestep of magnitude t= 2(1 + c2 ) , (1 + cos(π/m)) + a (1 + cos(π/n)) + b2 (1 + cos(π/ l)) (7.44) with c2 = a + b2 The timesteps required to eliminate the three highest modes have been summarized in Table 7.1 Two important trends may be discerned immediately (a) The magnitude of t, or, equivalently, the number of iterations required to eliminate the three highest modes, increases with the dimensionality of the problem This is the case even for uniform grids (a = b = 1) 149 SOLUTION OF LARGE SYSTEMS OF EQUATIONS (b) For non-uniform grids, the magnitude of t necessary to eliminate modes increases quadratically with the aspect ratio of the elements This is a reflection of the second-order derivatives that characterize the Laplace operator For uniform timesteps/overrelaxation, the number of iterations would increase quadratically with the aspect ratio, whereas for varying t it increases linearly Table 7.1 Timestep required to eliminate mode (π/n, π/m, π/ l) ndimn Modes 2 3 π π, π π, π, π, π π, 0, t 1 + a2 1 + c2 Modes Modes t π/2 π/2, π/2 π/2, π/2, π/2, π/2 π/2, 0, 2 2(1 + a ) 2(1 + c2 ) π/3 π/3, π/3 π/3, π/3, π/3, π/3 π/3, 0, t 4 4(1 + a ) 4(1 + c2 ) The timestepping/overrelaxation sequence outlined above for the Laplace equation on uniform grids can also be applied to general, unstructured grids 7.2.2.2 Dominant eigenvalue acceleration Consider the preconditioned iterative scheme ˜ ˜ K · (un+1 − un ) = K · u= τ ·r= τ · (f − K · u) (7.45) This scheme may be interpreted as an explicit timestepping scheme of the form ˜ du + K · u = f, K· dt (7.46) ˜ or, setting A = K−1 K, du + A · u = g dt Assuming A, g constant, the solution to this system of ODEs is given by u = u∞ + e−At (u0 − u∞ ), (7.47) (7.48) where u0 , u∞ denote the starting and steady-state values of u, respectively Assembling all eigenvalue equations A · yi = yi λi (7.49) into one large matrix equation results in λmin A·Y=Y· , = , Y = [y1 , y2 , ] (7.50) λmax or A=Y· · Y−1 (7.51) 150 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES We can now expand e−At as a series, ∞ e−At = (−At)n n! n=0 (7.52) The close inspection of a typical term in this series reveals that (−t)n (−At)n = [Y · n! n! implying that · Y−1 ]n = Y (− t)n −1 Y , n! (7.53) e−At = Ye− t Y−1 (7.54) u = u∞ + Ye− t Y−1 (u0 − u∞ ) (7.55) We can therefore rewrite (7.48) as As time or equivalently the number of iterations in (7.45) increases, the lowest eigenvalues will begin to dominate the convergence to a steady state or equivalently the solution As the solution reaches a steady state (t → ∞), the solution approaches u − u∞ = ae−λmint , a = u0 − u∞ (7.56) If a way can be found to somehow detect this eigenvalue and its associated eigenvector, an acceleration procedure may be devised By differentiating (7.56) we obtain u = −λmin te−λmin t a = −λmin t(u − u∞ ), (7.57) implying that u∞ = u + u = u + α u, λmin t with α= (7.58) (7.59) λmin t Given that α is a scalar, and we are trying to infer information from a vector, many possibilities exist to determine it The most natural choice is to differentiate (7.57) once more to obtain the second differences ( u) = −λmin t u, and then dot (i.e weigh) this equation with either u or (7.60) ( u) to obtain u· u =− , λmin t u · ( u) ( u) · u α= =− λmin t ( u) · ( u) α= (7.61a) (7.61b) Either of these two procedures can be used The application of an overrelaxation factor α to augment the basic iterative procedure is only useful once the low-frequency modes start to dominate This implies that the value of α should not change appreciably between iterations 151 SOLUTION OF LARGE SYSTEMS OF EQUATIONS Typically, this dominant-eigenvalue acceleration is applied if two consecutive values of α not differ by more than a small amount, i.e α < c, α (7.62) where, typically, c ≤ 0.05 (Zienkiewicz and Löhner (1985)) The effect of applying this simple procedure for positive definite matrix systems stemming from second-order operators like the Laplacian is a reduction of the asymptotic number of iterations from O(Ng ) to O(Ng ), where Ng is the graph depth associated with the spatial discretization of the problem An extension of this technique to non-symmetric matrix systems was proposed by Hafez et al (1985) 7.2.2.3 Conjugate gradients For symmetric positive systems, conjugate gradient algorithms (Hestenes and Stiefel (1952)) are commonly used, as they offer the advantages of easy programming, low memory overhead, and excellent vectorization and parallelization properties Rewriting the original system of equations K·u=f (7.63) leads to the basic iterative step given by uk = αk (−rk−1 + ek−1 uk−1 ) = αk vk , (7.64) where uk , rk−1 denote the increment and residual (r = f − K · u) vectors respectively, and αk , ek−1 are scaling factors Observe that the biggest difference between this procedure and the simple iterative scheme given by (7.45) is the appearance of a second search direction and a corresponding scaling factor ek−1 The choice of ek−1 is performed in such a way that the increments of two successive steps are orthogonal with respect to some norm The natural norm for (7.64) is the matrix-norm K, i.e uk−1 · K · uk = (7.65) This immediately leads to rk−1 · K · uk−1 uk−1 · K · uk−1 This expression may be simplified further by observing that ek−1 = − rk−1 − rk−2 = −K · uk−1 , (7.66) (7.67) yielding ek−1 = rk−1 · (rk−1 − rk−2 ) uk−1 · (rk−1 − rk−2 ) (7.68) The scaling factor αk is obtained similarly to (7.66), i.e by forcing K · (uk−1 + uk ) = f (7.69) 152 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES uk : in a ‘matrix weighted’ sense by multiplication with uk · K · uk = uk · rk−1 (7.70) Upon insertion of (7.64) into (7.70) we obtain αk = vk · rk−1 vk · K · vk (7.71) The amount of work required during each iteration consists of several scalar products and one matrix–vector multiplication which is equivalent to a RHS evaluation Theoretically, the conjugate gradient algorithm will converge to the exact solution in at most Neq iterations, where Neq is the number of equations to be solved However, conjugate gradient algorithms are only of interest because they usually converge much faster than this pessimistic estimate On the other hand, the transfer of information between the unknowns only occurs on the RHS, and therefore, for a problem with graph depth Ng , the minimum algorithmic complexity of the conjugate gradient algorithm is of O(Ng · Neq ) The only way to reduce this complexity is through preconditioning procedures that go beyond nearest-neighbour information 7.2.2.4 Generalized minimal residuals For unsymmetric matrices, such as those that arise commonly when discretizing the Euler and Navier–Stokes equations, the conjugate gradient algorithm will fail to produce acceptable results The main reason for this failure is that, with only two free parameters αk , ek , no complex eigenvectors can be treated properly In order to be able to treat such problems, the space in which the new increment uk is sought has to be widened from the two vectors rk−1 , uk−1 This vector subspace is called a Krylov space We will denote the vectors spanning it by vk , k = 1, m In order to construct an orthonormal basis in this space, the following Gram–Schmidt procedure is employed: (a) starting vector (= residual) v1 = rn , |rn | rn = f − K · un ; (7.72) (b) for j = 1, 2, , m − take wj +1 = K · vj − j hij vi , hij = vi · K · vj ; (7.73) i=1 vj +1 = wj +1 |wj +1 | (7.74) Observe that, like all other iterative procedures, the first vector lies in the direction of the residual Moreover, vl · vj +1 = |wj +1 | j vl · K · vj − (vi · K · vj )vl · vi i=1 (7.75) SOLUTION OF LARGE SYSTEMS OF EQUATIONS 153 Assuming that the first j vectors are orthonormal, then the only inner product left for the last expression is the one for which l = i, whence vl · vj +1 = [vl · K · vj − vl · K · vj ] = 0, |wj +1 | (7.76) which proves the orthogonalization procedure When the set of search vectors vk , k = 1, m has been constructed, the increment in the solution is sought as a linear combination u = vk ak , (7.77) in such a way that the residual is minimized in a least-squares sense (hence the name generalized minimal residuals (Saad and Schultz (1986), Wigton et al (1985), Venkatakrishnan (1988), Saad (1989), Venkatakrishnan and Mavriplis (1993)) |K · (u + u) − f|2 → min, (7.78) or |K · (vk ak ) − rn |2 → (7.79) The solution to this minimization problem leads to the matrix problem Akl al = (K · vk ) · (K · vl )al = (K · vk ) · rn = bk , (7.80) A · a = b (7.81) or The amount of work required for each GMRES iteration with m search directions consists of O(m2 ) scalar products and O(m) matrix–vector multiplications which are equivalent to a RHS evaluation As with the conjugate gradient method, the transfer of information between the unknowns only occurs on the RHS, and therefore, for a model problem with graph depth Ng , the minimum algorithmic complexity of the GMRES algorithm is of O(Ng · Neq ) The only way to reduce this complexity is through preconditioning procedures that go beyond nearest-neighbour information (e.g using LU-SGS, see Luo et al (1998)) 7.3 Multigrid methods Among the various ways devised to solve efficiently a large system of equations of the form K · u = f, (7.82) multigrid solvers are among the most efficient Their theoretical algorithmic complexity is only of O(Neq log Neq ), which is far better than direct solvers, GMRES or conjugate gradient iterative solvers, or any other solver for that matter The concept of the algorithm, which is grid-based but may be extended to a more general matrix-based technique called algebraic multigrid (AMG), dates back to the 1960s (Fedorenko (1962, 1964)) The first successful application of multigrid techniques in CFD was for the solution of potential flow problems given by the Laplace or full potential equations (Jameson and Caughey (1977), Jameson (1979)) The concept was extended a decade later to the Euler equations 154 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES (Jameson et al (1981), Jameson and Yoon (1985), Mavriplis and Jameson (1987), Mavriplis (1991b)) and developed further for the RANS equations (Rhie (1986), Martinelli and Jameson (1988) Alonso et al (1995), Mavriplis (1995, 1996), Mavriplis et al (2005)) The present section is intended as an introduction to the concept For a thorough discussion, see Hackbusch and Trottenberg (1982), Brand (1983), Ruge and Stüben (1985), Trottenberg et al (2001) and Wesseling (2004) Some example cases that show the versatility of the technique are also included 7.3.1 THE MULTIGRID CONCEPT The basic concept of any multigrid solver may be summarized as follows Given a system of equations like (7.82) to be solved on a fine grid denoted by the superscript fg K fg · u fg = f fg , (7.83) solve iteratively for u fg until the residual r fg = f fg − K fg · u fg (7.84) is smooth This is usually achieved after a few (