Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 19 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
19
Dung lượng
180,9 KB
Nội dung
Advances in Computational Mathematics 10 (1999) 115133 115 RungeKuttaNystrăom-type parallel block predictor–corrector methods ∗ Nguyen Huu Cong a , Karl Strehmel b , Răudiger Weiner b and Helmut Podhaisky b a Faculty of Mathematics, Mechanics and Informatics, Hanoi University of Sciences, 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam b FB Mathematik und Informatik, Martin-Luther-Universităat Halle-Wittenberg, Theodor-Lieser-Str 5, D-06120 Halle, Germany Received July 1997; revised November 1998 Communicated by K Burrage This paper describes the construction of block predictor–corrector methods based on RungeKuttaNystrăom correctors Our approach is to apply the predictorcorrector method not only with stepsize h, but, in addition (and simultaneously) with stepsizes h, i = 1, , r In this way, at each step, a whole block of approximations to the exact solution at off-step points is computed In the next step, these approximations are used to obtain a high-order predictor formula using Lagrange or Hermite interpolation Since the block approximations at the off-step points can be computed in parallel, the sequential costs of these block predictor–corrector methods are comparable with those of a conventional predictorcorrector method Furthermore, by using RungeKuttaNystrăom corrector methods, the computation of the approximation at each off-step point is also highly parallel Numerical comparisons on a shared memory computer show the efficiency of the methods for problems with expensive function evaluations Keywords: RungeKuttaNystrăom methods, predictorcorrector methods, stability, parallelism AMS subject classification: 65M12, 65M20 Introduction Consider the numerical solution of nonstiff initial value problems (IVPs) for systems of special second-order, ordinary differential equations (ODEs) y (t) = f t, y(t) , t0 t T, y(t0 ) = y0 , y (t0 ) = y0 , (1.1) where y : R → Rd , f : R × Rd → Rd Problems of the form (1.1) are encountered in, e.g., celestial mechanics A (simple) approach for solving this problem is to convert it into a system of first-order ODEs with double dimension and to apply, e.g., a (parallel) Runge–Kutta-type method (RK-type method), ignoring the special form of (1.1) (the ∗ This work was supported by a three-month DAAD research grant J.C Baltzer AG, Science Publishers 116 N.H Cong et al / RungeKuttaNystrăom-type parallel block methods indirect approach) However, taking into account the fact that f does not depend on the first derivative, the use of a direct method tuned to the special form of (1.1) is usually more efficient (the direct approach) Such direct methods are generally known as RungeKuttaNystrăom-type methods (RKN-type methods) Sequential explicit RKN methods up to order 10 can be found in [9–11,14] The performance of the tenth-order explicit RK method requiring 17 sequential f-evaluations in [12] and the tenth-order explicit RKN method requiring only 11 sequential f-evaluations in [14] is an example showing the advantage of the direct approach for sequential explicit RK and RKN methods It is highly likely that in the class of parallel methods, the direct approach also leads to an improved efficiency In the literature, several classes of parallel explicit RKN-type methods have been investigated in [2–5,17] A common challenge in these papers is to reduce, for a given order, the required number of sequential f-evaluations per step, using parallel processors In the present paper, we investigate a particular class of explicit RKN-type block predictor–corrector methods (PC methods) for use on parallel computers Our approach consists of applying the PC method not only at step points, but also at off-step points (block points), so that, in each step, a whole block of approximations to the exact solutions is computed This approach was first used in [8] for increasing reliability in explicit RK methods It was also successfully applied in [19] for improving efficiency of RK-type PC methods We shall use this approach to construct PC methods of RKNtype requiring few numbers of sequential f-evaluations per step with acceptable stability properties In this case, as in [19], the block of approximations is used to obtain a highly accurate predictor formula in the next step by using Lagrange or Hermite interpolation The precise location of the off-step points can be used for minimizing the interpolation errors and also for developing various cheap strategies for stepsize control Since the approximations to the exact solutions at off-step points to be computed in each step can be obtained in parallel, the sequential costs of the resulting RKN-type block PC methods are equal to those of conventional PC methods Furthermore, by using RungeKuttaNystrăom corrector methods, the PC iteration method computing the approximation to the exact solution at each off-step point, itself is also highly parallel (cf [3,17]) The parallel RKN-type PC methods investigated in this paper may be considered as block versions of the parallel-iterated RKN methods (PIRKN methods) in [3,17], and will therefore be termed block PIRKN methods (BPIRKN methods) Moreover, by using direct RKN correctors, we have obtained BPIRKN methods possessing both faster convergence and smaller truncation error resulting in better efficiency than by using indirect RKN correctors (cf., e.g., [3]) Starting with section where the definition of RungeKuttaNystrăom methods is given, we formulate the BPIRKN methods in section Furthermore, we consider order conditions for the predictor, convergence and stability boundaries, and the choice of block abscissas In section we report the numerical results obtained by the BPIRKN methods of orders 5, 7, and the highly efficient sequential code ODEX2 [15] In the following sections, for the sake of simplicity of notation, we assume that the IVP (1.1) N.H Cong et al / RungeKuttaNystrăom-type parallel block methods 117 is a scalar problem However, all considerations below can be straightforwardly extended to a system of ODEs, and therefore also to nonautonomous equations RKN methods The starting point is the following two classes of RKN methods The first one is the class of direct and indirect RKN methods The second one is the class of parallel explicit RKN methods 2.1 Direct and indirect RKN methods A general s-stage RKN method for numerically solving the scalar problem (1.1) is defined by (see, e.g., [21,17], and also [1, p 272]) Un = un e + hun c + h2 Af (Un ), un+1 = un + hun + h2 bT f (Un ), un+1 = un + hdT f (Un ), (2.1) where un ≈ y(tn ), un ≈ y (tn ), h is the stepsize, s×s matrix A, s-dimensional vectors b, c, d are the method parameters matrix and vectors, e being the s-dimensional vector with unit entries (in the following, we will use the notation e for any vector with unit entries, and ej for any jth unit vector, however, its dimension will always be clear from the context) Vector Un denotes the stage vector representing numerical approximations to the exact solution vector y(tn e + ch) at the nth step Furthermore, in (2.1), we use for any vector v = (v1 , , vs )T and any scalar function f , the notation f (v) := (f (v1 ), , f (vs ))T Similarly to an RK method, the RKN method (2.1) is also conveniently presented by the Butcher array (see, e.g., [1, p 272; 17]): c A bT dT This RKN method will be referred to as the corrector method We distinguish two types of RKN methods: direct and indirect Indirect RKN methods are derived from RK methods for first-order ODEs Writing (1.1) in first-order form, and applying a RK method with Butcher array c ARK bTRK yield the indirect RKN method defined by c [ARK ]2 bTRK ARK bTRK 118 N.H Cong et al / RungeKuttaNystrăom-type parallel block methods If the originating implicit RK method is of collocation type, then the resulting indirect implicit RKN method will be termed indirect collocation implicit RKN method Direct implicit RKN methods are directly constructed for second-order ODEs of the form (1.1) A first family of these direct implicit RKN methods is obtained by means of collocation techniques (see [21]) and will also be called direct collocation implicit RKN method In this paper, we will confine our considerations to collocation high-order implicit RKN methods, that is, the Gauss–Legendre and Radau IIA methods (briefly called direct or indirect Gauss–Legendre, Radau IIA) This class contains methods of arbitrarily high order Indirect collocation unconditionally stable methods can be found in [13] Direct collocation conditionally stable methods were investigated in [3,21] 2.2 Parallel explicit RKN methods A first class of parallel explicit RKN methods called PIRKN is considered in [17] These PIRKN methods are closely related to the block PIRKN methods to be considered in the next section Using indirect collocation implicit RKN of the form (2.1) as corrector method, a general PIRKN method of [17] assumes the following form (see also [3]): U(0) n = yn e + hyn c, (j−1) U(j) , n = yn e + hyn c + h Af Un j = 1, , m, (2.2) , yn+1 = yn + hyn + h2 bT f U(m) n yn+1 = yn + hdT f U(m) , n where yn ≈ y(tn ), yn ≈ y (tn ) Let p be the order of the corrector (2.1), by setting m = [(p − 1)/2], [·] denoting the integer function, the PIRKN method (2.2) is indeed an explicit RKN method of order p with the Butcher array (cf [3,17]) O c A c O c O 0T 0T O A O O A O 0T 0T 0T 0T bT dT , (2.3) where O and denote s × s matrix and s-dimensional vector with zero entries, respectively From the Butcher array (2.3), it is clear that the PIRKN method (2.2) has a number of stages equal to (m + 1) · s However, in each iteration, the evaluation ) can be obtained in parallel, provided that we of s components of vector f (U(j−1) n have an s-processor computer Consequently, the number of sequential f-evaluations equals s∗ = m + This class also contains the methods of arbitrarily high order and belongs to the set of efficient parallel methods for nonstiff problems of the form (1.1) N.H Cong et al / RungeKuttaNystrăom-type parallel block methods 119 For the detailed performance of these PIRKN methods, we refer to [17] A further development of parallel RKN-type methods was considered in, e.g., [2–5] Notice that for the PIRKN method, apart from parallelism across the methods and across the problems, it does not have any further parallelism Block PIRKN methods Applying the RKN method (2.1) at tn with r distinct stepsizes h, where i = 1, , r and a1 = 1, we have Un,i = un e + hun c + a2i h2 Af (Un,i ), un+1,i = un + hun + a2i h2 bT f (Un,i ), un+1,i = un + hdT f (Un,i ), (3.1) i = 1, , r Let us suppose that at the (n−1)th step, a block of predictions U(0) n−1,i , i = 1, , r, and the approximations yn−1 ≈ y(tn−1 ), yn−1 ≈ y (tn−1 ) are given We shall compute r approximations yn,i to the exact solutions y(tn−1 + h), i = 1, , r, defined by (j−1) 2 U(j) n−1,i = yn−1 e + hyn−1 c + h Af Un−1,i , j = 1, , m, yn,i = yn−1 + hyn−1 + a2i h2 bT f U(m) n−1,i , yn,i = yn−1 + hdT f U(m) n−1,i , i = 1, , r In the next step, these r approximations are used to create high-order predictors By denoting Yn := (yn,1 , , yn,r )T , Yn := yn,1, , yn,r T yn,1 = yn , , yn,1 = yn , (3.2) we can construct the following predictor formulas: U(0) n,i = Vi Yn , (3.3a) U(0) n,i = Vi Yn + hWi Yn , (3.3b) U(0) n,i = Vi Yn + hWi Yn + h Λi f (Yn ), i = 1, , r, (3.3c) where Vi , Wi and Λi are s×r extrapolation matrices which will be determined by order conditions (see section 3.1) The predictors (3.3a), (3.3b) and (3.3c) are referred to as Lagrange, Hermite-I and Hermite-II, respectively Apart from (3.3), we can construct predictors of other types like, e.g., Adams type (cf [19]) Regarding (3.1) as block corrector methods and (3.3) as block predictor methods for the stage vectors, we leave the class of one-step methods and arrive at a block PC method 2 U(0) n,i = Vi Yn + hθ Wi Yn + h θ (1 − θ)/2 Λi f (Yn ), (3.4a) 120 N.H Cong et al / RungeKuttaNystrăom-type parallel block methods (j1) T T 2 U(j) , n,i = ee1 Yn + hce1 Yn + h Af Un,i j = 1, , m, yn+1,i = eT1 Yn + heT1 Yn + a2i h2 bT f U(m) n,i , yn+1,i = eT1 Yn + hdT f U(m) n,i , i = 1, , r, (3.4b) θ ∈ {0, 1, −1} Notice that for a general presentation, in (3.4), the three different predictor formulas (3.3) have been combined into a common one (3.4a), where θ = 0, and −1 respectively indicate Lagrange, Hermite-I and Hermite-II predictor With θ = −1 the block PC method (3.4) is in P E(CE)m E mode and, with θ = or 1, the P E(CE)m E mode is reduced to P (CE)m E mode It can be seen that the block PC method (3.4) consists of a block of PIRKNtype corrections using a block of predictions at the off-step points (block points) (cf section 2.2) Therefore, we call the method (3.4) the r-dimensional block PIRKN method (BPIRKN method) (cf [19]) In the case of Hermite-I predictor (3.3b), for r = 1, the BPIRKN method (3.4) really reduces to a PIRKN method of the form (2.2) studied in [3,17] Once the vectors Yn and Yn are given, the r values yn,i can be computed in parallel and, on a second level, the components of the ith stage vector iterate U(j) n,i can also be evaluated in parallel (cf also section 2.2) Hence, the r-dimensional BPIRKN methods (3.4) based on s-stage RKN correctors can be implemented on a computer possessing r · s parallel processors The number of sequential f-evaluations per step of length h in each processor equals s∗ = m + θ 2(1 − θ)/2 + 1, where θ ∈ {0, 1, −1} 3.1 Order conditions for the predictor In this section we consider variable stepsize BPIRKN methods n n 2 n U(0) n,i = Vi Yn + hn θ Wi Yn + hn θ (1 − θ)/2 Λi f (Yn ), (j−1) T T 2 U(j) , n,i = ee1 Yn + hn ce1 Yn + hn Af Un,i (3.5a) j = 1, , m, yn+1,i = eT1 Yn + hn eT1 Yn + a2i h2n bT f U(m) n,i , yn+1,i = eT1 Yn + hn dT f U(m) n,i , i = 1, , r, (3.5b) θ ∈ {0, 1, −1}, where hn = tn+1 − tn , Vin , Win and Λni are the predictor matrices which will be determined by order conditions below as variable predictor matrices depending on the stepsize ratio hn /hn−1 The order conditions for (3.5a) can be obtained by replacing U(0) n,i and Yn in (3.5a) with the exact solution values y(tn e + hn c) and y(tn−1 e + hn−1 a) = y(tn e + hn−1 (a − e)), respectively The substitution of these exact values into (3.5a) leads us to relations for predictors of order q y(tn e + hn c) − Vin y tn e + hn−1 (a − e) − θ hn Win y tn e + hn−1 (a − e) − θ 2(1 − θ)/2 h2n Λni y tn e + hn−1 (a − e) = O hq+1 , n i = 1, , r (3.6) N.H Cong et al / RungeKuttaNystrăom-type parallel block methods 121 Let us suppose that the stepsize ratio ξn = hn /hn−1 is bounded from above, then using Taylor expansions, we can expand the left-hand side of (3.6) in powers of hn and obtain the following qth-order conditions for determining the variable predictor matrices: (ξn c)j − Vin (a − e)j − jθ Win ξn (a − e)j−1 − j(j − 1) θ 2(1 − θ)/2 Λni ξn2 (a − e)j−2 = 0, j = 0, , q, i = 1, , r (3.7) The conditions (3.7) imply that q+1 , Un,i − U(0) n,i = O hn i = 1, , r, and, therefore, the following order relations hold: 2m+q+1 , Un,i − U(m) n,i = O hn un+1,i − yn+1,i = a2i h2n bT f (Un,i ) − f U(m) n,i = O h2m+q+3 , n un+1,i − yn+1,i = hn dT f (Un,i ) − f U(m) n,i i = 1, , r = O h2m+q+2 , n Furthermore, for the local truncation error of the BPIRKN method (3.4), we may also write + O h2m+q+3 , y(tn+1 ) − yn+1 = y(tn+1 ) − un+1 + [un+1 − yn+1 ] = O hp+1 n n y (tn+1 ) − yn+1 = y (tn+1 ) − un+1 + un+1 − yn+1 = O hp+1 + O h2m+q+2 , n n where p is the order of the generating RKN corrector (2.1) Thus, we have the following theorem: Theorem 3.1 Suppose that the stepsize ratio ξn is bounded from above If the conditions (3.7) are satisfied and if the generating RKN corrector (2.1) has order p, then the variable stepsize BPIRKN method (3.5) has the order p∗ = min{p, piter }, where piter = 2m + q + In order to express Vin , Win and Λni explicitly in terms of vectors a, c and stepsize ratio ξn , we suppose that q = [1 + θ + θ (1 − θ)/2]r − 1, θ ∈ {0, 1, −1}, and for i = 1, , r, define the matrices Pi,n := e, (ξn c), (ξn c)2 , , (ξn c)r−1 , Qn := e, (a − e), (a − e)2 , , (a − e)r−1 , Rn := 0, ξn e, 2ξn (a − e), 3ξn (a − e)2 , , (r − 1)ξn (a − e)r−2 , Sn := 0, 0, 2ξn2 e, 6ξn2 (a − e), 12ξn2 (a − e)2 , , (r − 1)(r − 2)ξn2 (a − e)r−3 , ∗ := (ξn c)r , (ξn c)r+1 , (ξn c)r+2 , , (ξn c)2r−1 , Pi,n Q∗n := (a − e)r , (a − e)r+1 , (a − e)r+2 , , (a − e)2r−1 , Rn∗ := rξn (a − e)r−1 , (r + 1)ξn (a − e)r , , (2r − 1)ξn (a − e)2r−2 , 122 N.H Cong et al / RungeKuttaNystrăom-type parallel block methods Sn := r(r − 1)ξn2 (a − e)r−2 , (r + 1)rξn2 (a − e)r−1 , , (2r − 1)(2r − 2)ξn2 (a − e)2r−3 , ∗∗ Pi,n := (ξn c)2r , (ξn c)2r+1 , (ξn c)2r+2 , , (ξn c)q , 2r 2r+1 , (a − e)2r+2 , , aq , Q∗∗ n := (a − e) , (a − e) Rn∗∗ := 2rξn (a − e)2r−1 , (2r + 1)ξn (a − e)2r , , qξn (a − e)q−1 , Sn∗∗ := 2r(2r − 1)ξn2 (a − e)2r−2 , (2r + 1)2rξn2 (a − e)2r−1 , , q(q − 1)ξn2 (a − e)q−2 , ∗ , Q∗ , R∗ , S ∗ , P ∗∗ , Q∗∗ , R∗∗ , S ∗∗ are assumed to where, for θ = 0, the matrices Pi,n n n n n n n i,n ∗∗ , Q∗∗ , R∗∗ , S ∗∗ are assumed to be zero matrices The be zero, and for θ = 1, only Pi,n n n n order conditions (3.7) can be presented in the form Pi,n − Vin Qn − θ Win Rn − θ (1 − θ)/2 Λni Sn = O, ∗ − V n Q∗ − θ W n R∗ − θ (1 − θ)/2 Λn S ∗ = O, Pi,n n n i i i n ∗∗ − V n Q∗∗ − θ W n R∗∗ − θ (1 − θ)/2 Λn S ∗∗ = O Pi,n n n i i i n (3.8) Since the components are assumed to be distinct, the matrix Qn is nonsingular, and from (3.8), for i = 1, , r, we may write Vin = Pi,n − θ 2Win R − θ (1 − θ)/2 Λni Sn Q−1 n , θ 2Win = ∗ −1 ∗ ∗ θ (1 − θ)/2 Λni Sn∗ − Sn Q−1 n Qn + Pi,n Qn Qn − Pi,n ∗ ∗ × Rn Q−1 n Qn − Rn θ (1 − θ)/2 Λni = ∗∗ Pi,n − ∗∗ Pi,n Q−1 n Qn ∗ ∗ × Rn Q−1 n Qn − Rn × −1 , ∗ ∗ + Pi,n Q−1 n Qn − Pi,n −1 ∗∗ ∗∗ Rn Q−1 n Qn − Rn ∗ ∗ −1 ∗ ∗ Sn Q−1 n Qn − Sn Rn Qn Qn − Rn ∗∗ + Sn∗∗ − Sn Q−1 n Qn (3.9) −1 −1 ∗∗ ∗∗ Rn Q−1 n Qn − Rn , ∗ ∗ −1 ∗ ∗ −1 −1 ∗∗ ∗∗ ∗∗ where the matrices (Sn Q−1 n Qn − Sn )(Rn Qn Qn − Rn ) (Rn Qn Qn − Rn ) + (Sn − −1 ∗∗ −1 ∗ ∗ Sn Qn Qn ) and Rn Qn Qn −Rn are assumed to be nonsingular In view of theorem 3.1 and the explicit expressions of the predictor matrices Vin , Win and Λni , in (3.9), we have the following theorem: Theorem 3.2 If q = [1 + θ + θ 2(1 − θ)/2]r − 1, and the predictor matrices Vin , Win , Λni , i = 1, , r, satisfy the relations (3.9), then for the variable stepsize BPIRKN methods (3.5), piter = [1 + θ + θ 2(1 − θ)/2]r + 2m, p∗ = min{p, piter } and s∗ = m + θ 2(1 − θ)/2 + 1, θ ∈ {0, 1, −1} In the application of BPIRKN methods, we have some natural combinations of the predictors (3.4a) with Gauss–Legendre and Radau IIA correctors Using Lagrange and Hermite-I predictors has the advantage of possessing no additional f-evaluations in the predictor An important disadvantage of using Lagrange predictor is that, for a N.H Cong et al / RungeKuttaNystrăom-type parallel block methods 123 given order q of predictor formulas, its block dimension is twice as large as that of using Hermite-I, yielding a double number of processors needed for the implementation of BPIRKN methods In this paper, we therefore concentrate our considerations on the Hermite-I predictors 3.2 Convergence boundaries In actual implementation of BPIRKN methods, the number of iterations m is determined by some iteration strategy, rather than by order conditions using minimal number of iterations to reach the order of the corrector Therefore, it is of interest to know how the integration step affects the rate of convergence The stepsize should be such that a reasonable convergence speed is achieved As in, e.g., [3], we shall determine the rate of convergence by using the model test equation y (t) = λy(t), where λ runs through the spectrum of the Jacobian matrix ∂f/∂y For this equation, we obtain the iteration error equation (j−1) U(j) − Un,i , n,i − Un,i = zA Un,i z := h2 λ, j = 1, , m (3.10) Hence, with respect to the model test equation, the convergence factor is determined by the spectral radius ρ(a2i zA) of the iteration matrix a2i zA, i = 1, , r Requiring that ρ(a2i zA) < 1, leads us to the convergence condition a2i |z| < ρ(A) or a2i h2 < ρ(∂f/∂y)ρ(A) (3.11) We shall call 1/ρ(A) the convergence boundary In actual computation, the integration stepsize h should be substantially smaller than allowed by condition (3.11) By 1), we are led to requiring that ρ(a2i zA) is less than a given damping factor α (α the condition a2i |z| γ(α) or a2i h2 γ(α) , ρ(∂f/∂y) γ(α) = α , ρ(A) (3.12) where γ(α) presents the boundary of convergence with damping factor α of the method Table lists the convergence boundaries γ(α) of the BPIRKN methods based on indirect and direct collocation Gauss–Legendre and Radau IIA RKN correctors of Table Convergence boundaries γ(α) for various BPIRKN methods based on p-order correctors Correctors p=3 p=4 p=5 p=6 p=7 p=8 p=9 p = 10 Indirect Gauss–Legendre 12.04α 21.73α 37.03α 52.63α Direct Gauss–Legendre 20.83α 34.48α 55.54α 76.92α Indirect Radau IIA 5.89α 13.15α 25.64α 40.0α Direct Radau IIA 10.41α 20.40 37.03 55.55 124 N.H Cong et al / RungeKuttaNystrăom-type parallel block methods orders up to 10 (cf., e.g., [3,21] and also section 2.1) Notice that for a given stepsize h, the maximal damping factor is defined by α= a2i h2 ρ(∂f/∂y) , γ(1) so we can conclude that the direct collocation RKN correctors reported in table give rise to faster convergence This conclusion leads us to restrict our considerations to the BPIRKN methods based on direct collocation RKN correctors (cf [3]) These direct RKN methods are not A-stable (see [21]), but their stability regions are sufficiently large for nonstiff problems 3.3 The choice of block abscissas In this section we consider fixed stepsize BPIRKN methods The accuracy of Hermite-I interpolation formulas is improved if the interpolation abscissas are more narrowly spaced However, this will increase the magnitude of the entries of the matrices Vi and Wi , causing serious round-off errors There are several ways to reduce this round-off effect, which were discussed in [19, section 2.1] for Lagrange interpolation formulas Also in [8] where Hermite interpolation formulas were used for deriving reliable error estimates for defect control, it was found that on a 15 digits precision machine, the interpolation abscissas should be separated by 0.2 in order to suppress rounding errors In order to derive a further criterion for the choice of suitable values of the abscissas , we need to get insight into the propagation of a perturbation ε of the block vectors Yn and Yn within a single step (a similar analysis was given in [19]) We shall study this for the model test equation y (t) = λy(t) First we shall express yn+1,i and hyn+1,i in terms of Yn and hYn Since −1 Un,i = I − a2i zA eeT1 Yn + ceT1 hYn , U(0) n,i − Un,i = Vi − I − zA −1 eeT1 Yn + Wi − I − a2i zA −1 ceT1 hYn , applying (3.4), (3.10) to the model equation for a given number m, we obtain T yn+1,i = eT1 Yn + heT1 Yn + a2i zbT U(m) n,i − Un,i + zb Un,i = eT1 Yn + heT1 Yn + a2i zbT I − a2i zA + a2i zbT a2i zA m Vi − I − a2i zA + Wi − I − a2i zA −1 = eT1 + a2i zbT I − a2i zA m −1 eeT1 Yn + ceT1 hYn eeT1 Yn ceT1 hYn −1 eeT1 + a2i zbT a2i zA + eT1 + a2i zbT I − a2i zA + a2i zbT a2i zA −1 −1 m Vi − I − a2i zA −1 eeT1 Yn ceT1 Wi − I − a2i zA −1 ceT1 hYn , (3.13a) N.H Cong et al / RungeKuttaNystrăom-type parallel block methods 125 T hyn+1,i = eT1 hYn + zdT U(m) n,i − Un,i + zd Un,i −1 = heT1 Yn + zdT I − a2i zA + zdT a2i zA m Vi − I − a2i zA + Wi − I − a2i zA = zdT I − a2i zA −1 −1 m −1 eeT1 Yn ceT1 hYn eeT1 + zdT a2i zA + eT1 + zdT I − a2i zA + zdT a2i zA eeT1 Yn + ceT1 hYn −1 m Vi − I − a2i zA −1 eeT1 Yn ceT1 Wi − I − a2i zA −1 ceT1 hYn (3.13b) Let us now replace Yn by Y∗n = Yn + ε and Yn by Y ∗n = Yn + ε Then from (3.13), ∗ and y ∗n+1,i of yn+1,i and yn+1,i , respectively, are given by the perturbed values yn+1,i −1 eeT1 Vi − I − a2i zA −1 ∗ yn+1,i = yn+1,i + eT1 + a2i zbT I − a2i zA + a2i zbT a2i zA m + eT1 + a2i zbT I − a2i zA −1 eeT1 ε ceT1 −1 m + a2i zbT a2i zA Wi − I − a2i zA ceT1 hε, −1 ∗ y n+1,i = yn+1,i + zdT I − a2i zA eeT1 h m −1 + zdT a2i zA Vi − I − a2i zA eeT1 ε + eT1 + zdT I − a2i zA + zdT a2i zA m −1 (3.14a) ceT1 Wi − I − a2i zA −1 ceT1 ε (3.14b) These relations show that the first component of the perturbation ε is amplified by a factor O(1) for both Yn and Yn , whereas all other components are amplified by a factor of O(h2m+2 ) and O(h2m+1 ) for Yn and Yn , respectively For a 2r-times continuously differentiable function y(t), the r-point Hermite interpolation formula (for our Hermite-I case) can be written in the form (see, e.g., [15, p 261; 18, p 52]) r y(tn−1 + τ h) = r vi (τ )y(tn−1 + h) + h i=1 + C (2r) (τ ) h wi (τ )y (tn−1 + h) i=1 d dt 2r y t∗τ , (3.15) where vi (τ ), wi (τ ) and C (2r) (τ ) are the scalar polynomials defined by vi (τ ) = li2 (τ ) − 2li (ai )(τ − ) , r li (τ ) = j=1,j=i (τ − aj ) , (ai − aj ) wi (τ ) = li2 (τ )(τ − ), C (2r) (τ ) = (2r)! r (τ − aj )2 j=1 (3.16) 126 N.H Cong et al / RungeKuttaNystrăom-type parallel block methods and t is a suitably chosen point in the interval [tn−1 , tn−1 + τ h] Hence, we have y(tn + ck h) = y tn−1 + (1 + ck )h r r = vj (1 + ck )y(tn−1 + aj h) + h j=1 wj (1 + ck )y (tn−1 + aj h) j=1 + C (2r) (1 + ck ) h d dt 2r y t∗ik , k = 1, , s, i = 1, , r, (3.17) where t∗ik is also a suitably chosen point in the interval [tn−1 , tn−1 + (1 + )h] Referring to the approach used in [19], this leads us to a choice of the values such that the maximum norm of the principal error vector C(2r) = C (2r) (e + a1 c) is minimized, i.e., we are led to minimize the magnitude of the values C (2r) (1 + ck ) = (2r)! r (1 + ck − aj )2 , k = 1, , s (3.18) j=1 Confining our considerations to the block dimensions r = s + 1, we set a1 = 1, = + ci−1 , i = 2, , s + 1, (3.19a) resulting in predictors of (local) order 2s + By this choice, the principal error (2r) (e + c) vanishes (i.e., C(2r) vector C(2r) ∞ = C ∞ = 0), so that all inaccuracies 1 introduced by the predictor formula are damped by a factor of O(h2m+2 ) for Yn and by a factor of O(h2m+1 ) for Yn (cf (3.14)) However, for high-order methods, the choice (3.19a) can violate the limitation on the minimal spacing allowed by 0.2 found in [8] with respect to the abscissas a1 and a2 Moreover, the large block dimensions often cause a drop in stability Therefore limiting the block dimensions to r = s, we propose the following second choice of abscissas : a1 = 1, = + ci , i = 2, , s, (3.19b) resulting in predictors of (local) order 2s − By the choice (3.19b), the maximum norm of the principal error vector C(2r) , evaluated by C(2r) ∞ = C (2r) (1 + c1 ) = (2r)! r (1 + c1 − )2 , i=1 is significantly small (see table 2) and the limitation on minimal spacing is satisfied We also hope that the stability of the resulting class of BPIRKN methods will be improved (see section 3.4) Finally, we remark that BPIRKN methods defined by the choice of abscissas given by (3.19) attain the order of the correctors for m (cf theorem 3.2) N.H Cong et al / RungeKuttaNystrăom-type parallel block methods Values of C(2r) Methods C(2r) ∞ ∞ 127 Table for various pth-order BPIRKN methods based on abscissas defined by (3.19b) p=3 p=4 p=5 p=6 p=7 p=8 p=9 p = 10 2.5E−2 7.4E−3 6.9E−4 1.9E−4 1.4E−5 3.6E−6 2.1E−7 5.5E−8 3.4 Stability boundaries The linear stability of the BPIRKN methods (3.4) is investigated by again using the model test equation y (t) = λy(t), where λ is assumed to be negative From (3.13) we are led to the recursion Yn+1 hYn+1 = Mm (z) Yn hYn , Mm (z) = Mm (z) + Mm (z), (3.20a) (z) and M (z) are the 2r × 2r matrices defined by where Mm m (z) Mm eT + a2 zbT [I − a2 zA]−1 eeT + a2 zbT [a2 zA]m [V − [I − a2 zA]−1 eeT ] 0T 1 1 1 1 T T −1 T T m −1 T T e1 + ar zb [I − ar zA] ee1 + ar zb [ar zA] [Vr − [I − ar zA] ee1 ] , = a zdT [I − a2 zA]−1 eeT + a zdT [a2 zA]m [V − [I − a2 zA]−1 eeT ] 0T 1 1 1 ar zdT [I − a2r zA]−1 eeT1 + ar zdT [a2r zA]m [Vr − [I − a2r zA]−1 eeT1 ] 0T (3.20b) Mm (z) 0T a1 eT1 + a21 zbT [I − a21 zA]−1 a1 ceT1 + a21 zbT [a21 zA]m [W1 − [I − a21 zA]−1 a1 ceT1 ] T ar eT1 + a2r zbT [I − a2r zA]−1 ar ceT1 + a2r zbT [a2r zA]m [Wr − [I − a2r zA]−1 ar ceT1 ] = 0T eT1 + a1 zdT [I − a21 zA]−1 a1 ceT1 + a1 zdT [a21 zA]m [W1 − [I − a21 zA]−1 a1 ceT1 ] 0T eT1 + ar zdT [I − a2r zA]−1 ar ceT1 + ar zdT [a2r zA]m [Wr − [I − a2r zA]−1 ar ceT1 ] (3.20c) The 2r × 2r matrix Mm (z) defined by (3.20), which determines the stability of the BPIRKN methods, will be called the amplification matrix, its spectral radius ρ(Mm (z)) the stability function For a given number m, the stability intervals of the BPIRKN methods are defined by −β(m), := z: ρ Mm (z) < 1, z It is evident from (3.20) that if z satisfies the convergence conditions (3.11), then the stability function of the BPIRKN method ρ(Mm (z)) converges to the stability 128 N.H Cong et al / RungeKuttaNystrăom-type parallel block methods Table Stability boundaries β(m) for various pth-order BPIRKN methods based on abscissas defined by (3.19a) BPIRKN methods β(0) β(1) β(2) β(3) β(4) β(5) β(6) ∗ (s (s∗ (s∗ (s∗ (s∗ (s∗ (s∗ = 1) = 2) = 3) = 4) = 5) = 6) = 7) p=3 p=4 p=5 p=6 p=7 p=8 p=9 p = 10 0.067 0.285 0.847 0.935 1.096 1.996 1.400 0.122 0.015 0.104 0.678 0.437 0.668 1.042 0.006 0.226 0.443 0.915 1.547 1.686 1.877 0.014 1.103 0.150 1.816 0.809 1.413 4.279 0.001 0.393 0.319 0.615 1.104 1.792 3.465 0.007 0.234 1.294 0.871 2.015 2.180 3.969 0.000 0.298 0.289 0.452 0.806 1.331 2.035 0.001 0.241 2.055 1.343 1.617 2.272 3.164 Table Stability boundaries β(m) for various p-order BPIRKN methods based on abscissas defined by (3.19b) BPIRKN methods β(0) β(1) β(2) β(3) β(4) β(5) β(6) ∗ (s (s∗ (s∗ (s∗ (s∗ (s∗ (s∗ = 1) = 2) = 3) = 4) = 5) = 6) = 7) p=3 p=4 p=5 p=6 p=7 p=8 p=9 p = 10 0.488 1.065 1.642 1.837 1.969 2.008 2.131 0.840 0.013 0.738 0.670 0.355 0.785 1.027 0.006 0.609 0.877 1.554 2.446 2.359 2.502 0.020 1.058 0.142 1.360 0.750 1.367 5.868 0.016 0.188 0.585 0.969 1.570 2.370 4.413 0.188 0.213 0.932 2.216 1.140 3.818 5.031 0.003 0.312 0.523 0.702 1.140 1.764 2.567 0.054 0.424 3.210 1.408 2.265 2.954 3.985 function of the RKN corrector method as m → ∞ (cf., e.g., [1, p 273; 21]) Hence, the asymptotic stability interval for m → ∞, (−β(∞), 0), is the intersection on the negative z-axis of the stability interval (−βcorr , 0) of the RKN corrector and its convergence region is defined by (3.11) We numerically calculated the values of β(m) for various resulting BPIRKN methods Tables and list these stability boundaries β(m) for two classes of so-called BPIRKN-A and BPIRKN-B methods defined by the relations (3.19a) and (3.19b), respectively From these tables we observe that the stability boundaries show a rather irregular behaviour, especially for the BPIRKN methods based on direct Gauss–Legendre correctors Moreover, the results reported in these two tables also show the better stability behaviour of the class of BPIRKN-B methods defined by the choice of abscissas given by (3.19b) The stability regions of this class are also sufficiently large for nonstiff problems From table 4, we can select a whole set of BPIRKN-B methods of order p up to 10 (except for p = 4) using only one correction with acceptable stability regions for nonstiff problems Numerical results In this section we report numerical results obtained by the BPIRKN methods (3.5) As was mentioned in the previous sections, we confine our considerations to the BPIRKN methods based on Hermite-I predictor and direct Radau IIA corrector N.H Cong et al / RungeKuttaNystrăom-type parallel block methods 129 Table BPIRKN-B methods used for the implementation Name Stages Order Emb order nproc 3-stage 4-stage 5-stage 5 9 16 13 pairs with block points defined by (3.19b) (cf sections 3.2 and 3.3) First numerical comparisons of fixed stepsize BPIRKN-B methods with the parallel PIRKN methods and efficient sequential RungeKuttaNystrăom methods by means of the numbers of f-evaluations for a given accuracy are reported in [7] These comparisons showed a very high efficiency of the BPIRKN-B methods and therefore encouraged us to implement these methods with variable stepsizes on a real parallel machine as described below The FORTRAN 77 code NYRA can be downloaded from a WWW-site [16] 4.1 Implementation We have implemented the BPIRKN-B methods based on 3-, 4- and 5-stage Radau IIA RKN correctors (3-, 4- and 5-stage BPIRKN-B methods) with chosen according to (3.19b) Hence our methods are of order p = × s − For optimal parallel performance we should employ a number of processors nproc = s × s However, we used a shared memory HP/Convex X-Class computer where we were restricted to 16 processors Therefore we chose nproc = 9, 16 and 13 for 3-, 4- and 5-stage methods (see table 5) Parallelism is expressed by means of the compiler-options for loop-parallelization In table emb order denotes the order of an embedded method used for stepsize control For stepsize selection we compute an approximation of the local error err = d d i=1 (yn+1,1 )i − (yn+1,1 )i ATOL + RTOL|(yn+1,1 )i | (4.1) The embedded solution yn+1,1 in (4.1) is, analogously to (3.4b), computed by y n+1,1 = eT1 Yn + hn eT1 Yn + h2n b T f U(n) n,i Here, b T = (0, b T ), b is the weights of the direct collocation RKN method based on collocation points c2 , , cs and yn+1,1 − yn+1,1 = O(hs+1 n ) (cf [6]) The new stepsize is determined by hn+1 = ξn+1 hn with ξn+1 = 1, for α ∈ [0.9, 2], α, otherwise, with α = 3, max 0.3, 0.8 · err−1/(s+1) In the correction process of (3.4b), we use m = if a stepsize change occurs and m = otherwise Note that the case m = implies that a (fully) parallelized BPIRKN method needs only two sequential f-evaluations per step 130 N.H Cong et al / RungeKuttaNystrăom-type parallel block methods We compare the BPIRKN-B methods listed in table with the highly efficient extrapolation code ODEX2 [15] with respect to the computing time in seconds and the global error ERR at the endpoint of the integration interval ERR = d d i=1 (yn,1 )i − (yn,ref )i + |(yn,ref )i | The reference solutions yn,ref were obtained by ODEX2 with ATOL = RTOL = 10−14 The code ODEX2 was used with the default settings (version of September 30, 1995) 4.2 Test problems As a first test example we take the seven body problem Plei in two spatial dimensions from Hairer et al [15] This problem models the gravity forces between seven stars leading to a second order ODE system of dimension 14 Since this problem is too small for testing the parallel implementation (the communication overhead would dominate the total computing time) we enlarge it by a factor scaling = 500 and get a new system 1l ⊗ y = 1l ⊗ f t, y(t) , 1l ∈ Rscaling The second celestial mechanics example Moon is obtained in a similar way: 101 bodies in 2D space (for details see [16]) Here no scaling was necessary, because the right hand side is very expensive The third example Wave is a semidiscretized 1D hyperbolic equation [20] ∂2u + λ (x, u), x b, t 10, ∂x2 ∂u ∂u (t, 0) = (t, b) = 0, ∂x ∂x π u(0, x) = sin(πx/b), ut (0, x) = − cos(πx/b) b utt = gd(x) with d(x) = 10 + cos(2πx/b) , λ = · 10−4 × g|u|/d, g = 9.81, b = 100 Using the method of lines with second order central differences in x on 40 inner grid points leads to a nonstiff ODE system In order to make the f-evaluations more expensive, we use scaling = 100 4.3 Results and discussion The work-precision diagrams for the three above test problems obtained with ATOL = RTOL = 10−2 , , 10−8 are given in figures 1–3 The speedup of the parallel 4-stage BPIRKN-B method over the sequential code ODEX2 for mild tolerances is between and Due to the variable order in the ODEX2 code, this speedup becomes N.H Cong et al / RungeKuttaNystrăom-type parallel block methods 131 Figure Work-precision diagram for Plei Figure Work-precision diagram for Moon smaller for very stringent tolerances The 3-stage BPIRKN-B method is competitive for crude tolerances only and gives for these tolerances a speedup of about over ODEX2 We want to remark that the 5-stage BPIRKN-B method was run with only 13 processors; the optimal number of processors would be 25 The performance of a parallel implementation of an integration method depends heavily on the machine, the size of the problem and the costs of the function evaluation 132 N.H Cong et al / RungeKuttaNystrăom-type parallel block methods Figure Work-precision diagram for Wave Figure Speedup and efficiency for the 4-stage BPIRKN-B method Hence, we have run our BPIRKN code NYRA with different numbers of processors The speedup and parallel efficiency defined by computing time with processor , computing time with nproc processors efficiency(nproc) = speedup(nproc)/nproc speedup(nproc) = are shown in figure The good speedup for test problem Moon is due to the high costs of calculating 100 gravity forces for each particle The superlinear speedup with nproc = is caused by cache-size effects N.H Cong et al / RungeKuttaNystrăom-type parallel block methods 133 Concluding remarks This paper described an algorithm to obtain RungeKuttaNystrăom-type parallel block PC methods (BPIRKN methods) requiring (almost) sequential f-evaluations per step for any order of accuracy Theoretical investigations show that among these methods especially the BPIRKN-B methods have favourable properties Numerical tests on a shared memory computer show that for the problems with expensive right hand sides functions, the speedup is nearly optimal Comparisons with the highly efficient code ODEX2 show the advantage of BPIRKN methods for such expensive problems References [1] K Burrage, Parallel and Sequential Methods for Ordinary Differential Equations (Clarendon Press, Oxford, 1995) [2] N.H Cong, An improvement for parallel-iterated RungeKuttaNystrăom methods, Acta Math Vietnam 18 (1993) 295–308 [3] N.H Cong, Note on the performance of direct and indirect RungeKuttaNystrăom methods, J Comput Appl Math 45 (1993) 347–355 [4] N.H Cong, Explicit symmetric RungeKuttaNystrăom methods for parallel computers, Comput Math Appl 31 (1996) 111122 [5] N.H Cong, Explicit parallel two-step RungeKuttaNystrăom methods, Comput Math Appl 32 (1996) 119–130 [6] N.H Cong, Explicit pseudo two-step RKN methods with stepsize control, submitted for publication [7] N.H Cong, K Strehmel and R Weiner, RungeKuttaNystrăom-type parallel block predictor– corrector methods, Technical Report 29, University Halle (1997) [8] W.H Enright and D.J Higham, Parallel defect control, BIT 31 (1991) 647663 [9] E Fehlberg, Klassische RungeKuttaNystrăom-Formeln mit Schrittweitenkontrolle făur Differentialgleichungen x = f (t, x), Computing 10 (1972) 305315 [10] E Fehlberg, Eine RungeKuttaNystrăom-Formel 9-ter Ordnung mit Schrittweitenkontrolle făur Differentialgleichungen x = f (t, x), Z Angew Math Mech 61 (1981) 477–485 [11] E Hairer, M´ethodes de Nystrăom pour lequations differentielle y (t) = f (t, y), Numer Math 27 (1977) 283–300 [12] E Hairer, A Runge–Kutta method of order 10, J Inst Math Appl 21 (1978) 47–59 [13] E Hairer, Unconditionally stable methods for second order differential equations, Numer Math 32 (1979) 373–379 [14] E Hairer, A one-step method of order 10 for y (t) = f (t, y), IMA J Numer Anal (1982) 83–94 [15] E Hairer, S.P Nørsett and G Wanner, Solving Ordinary Differential Equations, I Nonstiff Problems, 2nd revised edition (Springer, Berlin, 1993) [16] NYRA – a FORTRAN 77 implementation of a BPIRKN method (1998), available from http://www mathematik.uni-halle.de/institute/numerik/software [17] B.P Sommeijer, Explicit, high-order RungeKuttaNystrăom methods for parallel computers, Appl Numer Math 13 (1993) 221–240 [18] J Stoer and R Bulirsch, Introduction to Numerical Analysis (Springer, New York, 1983) [19] P.J van der Houwen and N.H Cong, Parallel block predictor–corrector methods of Runge–Kutta type, Appl Numer Math 13 (1993) 109–123 [20] P.J van der Houwen and B.P Sommeijer, Explicit RungeKutta(Nystrăom) methods with reduced phase errors for computing oscillating solutions, Technical Report, CWI Amsterdam (1985) [21] P.J van der Houwen, B.P Sommeijer and N.H Cong, Stability of collocation-based RungeKutta Nystrăom methods, BIT 31 (1991) 469–481 ... investigated in this paper may be considered as block versions of the parallel- iterated RKN methods (PIRKN methods) in [3,17], and will therefore be termed block PIRKN methods (BPIRKN methods) ... RungeKuttaNystrăom-type parallel block PC methods (BPIRKN methods) requiring (almost) sequential f-evaluations per step for any order of accuracy Theoretical investigations show that among these methods especially... RKN methods It is highly likely that in the class of parallel methods, the direct approach also leads to an improved efficiency In the literature, several classes of parallel explicit RKN-type methods