26 COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES The basic operations specified by IEEE arithmetic are first and foremost addition, subtraction, multiplication, and division. Square roots and remainders are also included. The default rounding for these operationsis “to nearest even”. This means that the floating point result fl (a op b) of the exact operation (a op b) is the nearest floating point number to (a op b), breaking ties by rounding to the floating point number whose bottom bit is zero (the “even” one). It is also possible to round up, round down, or truncate (round towards zero). Rounding up and down are useful interval arithmetic, which can provide guaranteed error bounds; unfortunately most languages and/or compilers provide no access to the status flag which can select the rounding direction. When the result of floating point operation is not representable as a normalized floating point number, and exception occurs. 1.8 FLOATING POINT ARITHMETIC AND THEIR COMPUTATION The computer performed five basic arithmetic operations such as addition, subtraction, multiplication and division. The decimal numbers are converted to machine numbers. The machine number consists of only the digit 0 and 1 with a base. It’s base depending on the computer. If the base is two the number system is called the binary number system, if the base is eight it is called octal number system and if the base is sixteen it is called hexadecimal number system respectively. The decimal number system has the base 10. In numerical computation, there are mainly two types of arithmetic operations present in the system. (a) Integer arithmetic, which deals with integer operands and (b) Real or Floating-point arithmetic, which deals with fractional part of a number as operands. Mostly computers carried out scientific calculations in floating point arithmetic to avoid the difficulty of keeping every number less than 1 in magnitude during computation. A floating point number is characterized by three parameters—the base b, the number of digit n and the exponent range (m, M). An n-digit floating-point number with base b has the form: 12 (0 . ) e nb x dd d b =± where d 1 , d 2 , d 3 , , d n are integers and satisfies 0 ,d b ≤< and the exponent e is such that ≤<meM . Also (0, d 1 d 2 d 3 d n ) b is a b-fraction called the mantissa, and it lies between +1 and –1. The number 0 is written as: + 0.000 0 × b e The floating-point number is said to be normalized if d 1 ≠ 0 or else d 1 = d 2 = = d n = 0. If d l , d n ≠ 0 the number is said to have an n significant digits. There are two commonly used ways to translate any given real number x into an n b-digit floating-point number f p (x), rounding and chopping. A floating-point number x = ± (0, d 1 d 2 d n ) b b e is in n-digit mantissa standard form if it is normalized and its mantissa consists of exactly n-digit. If a number x can be represented by x = (0.d 1 d 2 d 3 d n d n+1 ) b b e then the floating-point number can be in chopping form and if it can be written as f p (x) = (0.d 1 d 2 d 3 d n ) n b e then the floating point number is in rounding form. If it can be written as 12 1 1 ( ) 0. 2 pnn fx dd dd b + =+ where first n digits are used to write a floating-point number. ERRORS AND FLOATING POINT 27 Example 1. Digit normalized form of 2 3 Sol. () p f x = 2 0.66666 67 3 p f = ; Result after rounding ( ) p f x = 2 3 p f = 0.6666666; Result after chopping In computers, each location called word in memory stores only a finite numbers of digits. If we assume computer memory store 6 digits in each location and also store one or more signs then to represent real number, computer assumed a fixed position for the decimal point and all numbers are stored after appropriate shifting with an assumed decimal point. For that, the maximum possible numbers are stored as 9999.99 and the minimum possible numbers are stored as 0000.01. These maximum and minimum limits for numbers are in magnitude. For this purpose, preserve the maximum number of significant digits in a real number and increase the range of values for that real number. This type of representation is called the normalized floating-point mode. Example 2. The number 58.72 × 10 5 is represented as 0.5872 × 10 7 or 0.5872e7. Sol. Here mantissa is 0.5872 and the exponent is 7. Also shifting of the mantissa to the left to its most significant digit, is nonzero, is called normalization. 1.8.1 Arithmetic Operations on Floating Point Numbers Basically there are four arithmetic operations such as addition, subtraction, multiplication and division. These operations applied on floating point numbers as follows: Example 3. Add the following floating-point numbers 0.4546e3 and 0.5433e7. Sol. This problem contains unequal exponent. To add these floating-point numbers, take operands with the largest exponent as, 0.5433e7 + 0.0000e7 = 0.5433e7 (Because 0.4546e3 changes in the same operand as 0.0000e7). Example 4. Add the following floating-point numbers 0.6434e3 and 0.4845e3. Sol. This problem has an equal exponent but on adding we get 1.1279e3, that is, mantissa has 5 digits and is greater than 1, that’s why it is shifted right one place. Hence we get the resultant value 0.l127e4. Example 5. Add the following floating-point numbers 0.6434e99 and 0.4845e99. Sol. In this example, mantissa is shifted right and exponent is increased by 1, resulting is a value of 100 for the exponent (because sum of mantissa exceeds by 1). This condition is called an overflow condition overflow condition overflow condition overflow condition overflow condition because exponent cannot store more than two digits. Example 6. Find the sum of 0.l23e3 and 0.456e2 and write the result in three digit mantissa form. Sol. Sum is = 0.123e3 + 0.456e2, = 0. 123e3 + 0.0456e3 = 0.168e3 Result after chopping Sum is = 0.123e3 + 0.456e2 , = 0.123e3 + 0.0456e3 = 0.169e3 Result after rounding. Above examples (3 to 6) shows the addition of floating point numbers in different ways. 28 COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES Example 7. Subtract the floating-point number 0.36132346 × 10 7 from 0.36143447 × 10 7 . Sol. The number 0.36132346 × 10 7 after subtracting from 0.36143447 × 10 7 gives 0.00011101 × 10 7 . On shifting the fractional part three places to the left we have 0.11101 × 10 4 which is obviously a floating-point number. Also 0.00011101 × 10 7 is a floating-point number but not in the normalized form. Example 8. Subtract the following floating-point numbers: 1. 0.5424e – 99 From 0.5452e – 99 2. 0.3862e – 7 From 0.9682e – 7 Sol. On subtracting we get 0.0028e – 99. Again this is a floating-point number but not in the normalized form. To convert it in normalized form, shift the mantissa to the left by 1. Therefore we get 0.028e – 100. This condition is called an underflow conditionunderflow condition underflow conditionunderflow condition underflow condition. Similarly, after subtraction we get 0.5820e – 7. Above examples (7 and 8) shows the subtraction of floating points numbers with underflow condition. Therefore we say that, if two numbers represented in normalized floating-point notation then for addition and subtraction it is required that the exponent of the numbers must be equal, if it is not then made be equal and shift the mantissa appropriately. Example 9. Multiply the following floating point numbers: 1. 0.1111e74 and 0.2000e80 2. 0.I234e – 49 and 0.1111e – 54 Sol. 1. On multiplying 0.1111e74 × 0.2000e80 we have 0.2222e153. This Shows overflow condition of normalized floating-point numbers. 2. Similarly second multiplication gives 0.1370e – 104, which shows the underflow condition of floating-point number. This example represent that two numbers are multiplied by multiplying the mantissa and by adding the exponent of given normalized floating-point representation. Similarly division is evaluated by division of mantissa of the numerator by that of the denominator and denominator exponent is subtracted from the numerator exponent. The resultant exponent is obtained by adjusting it appropriately and using previous results normalizes the quotient mantissa. Example 10. Calculate the sum of given floating-point numbers: 1. 0.4546e5 and 0.5433e7 2. 0.4546e5 and 0.5433e5 Sol. 1. When the exponent is not equal, the operand is kept with large exponent number. That is 0.5433e7 + 0.0045e7 = 0.5878e7. 2. Here mantissas are added because exponent numbers are equal. That is, 0.4546e5 + 0.5433e5 = 0.9979e5. Example 11. Subtract the floating-point number 0.5424e3 from 0.5452e3. Sol. While subtracting 0.5424e3 from 0.5452e3 we get 0.0028e3. It can also be written as 0.28el using normalized floating point representation because mantissa is greater than or equal to 0.1. ERRORS AND FLOATING POINT 29 Example 12. Calculate the value of e x when x = 0.5250e1 and e = 2.7183. The expression for e x is =++ + !! 2 2 x xx e 1x 23 . Sol. We have e x = e 0.5250e1 = e 5 × e .25 e 5 = (.2718el) × (.2718e1)× (.27I8e1)× (.27I8e1)× (.2718e1) = .1484e3 Also, we find e .25 . Therefore e .25 = 1 + (.25) + () () 22 .25 .25 2! 3! + = 1.25 + .03125 + .002604 = .1284e1 Hence e .5250e1 = (.1484e3) × (.1284e1) = .l905e3 Example 13. Compute the middle value of the number a = 4.568 and b = 6.762 using the four-digit arithmetic and compare the result by taking c = a + − ba 2 . Sol. Since a = .4568el , b = .6762e1 and c be the middle value of the numbers a and b, therefore .4568 1 .6762 1 .1133 2 .5665 1 2 .2000 1 .2000 1 ab e e e ce ee ++ == = = . If we use the formula c = a + 2 b a − , we get c = .4568e1 + .6762 1 .4568 1 .2000 1 ee e − or .4568e1 + .1097e1 = .5665e1 which is similar result as first result. Example 14. Evaluate 1 – cos x at x = 0.1396 radian. Assume cos(0.1396) = 0.9903 and compare it when evaluated 2 sin 2 x 2 . Also assumes in (0.0698) = 0.6794e – 1. Sol. Since x = 0.1396 Therefore l – cos(0.1396) = 0.1000el – 0.9903e0 = 0.1000e1 – 0.0990e1 = 0.1000e1 – 1 Now sin 2 x = sin(0.0698) = 0.6974e – l 2sin 2 2 x = (0.2000e1) × (0.6974e – 1) × (0.6974e – 1) = 0.9727e – 2 The value obtained by alternate formula is close to the true value 0.9728e – 2. Example 15. Evaluate the following floating-point numbers: 1. 0.5334e9 × 0.l132e – 25 2. 0.1111el0 × 0.1234e15 3. 0.9998e – 5 ÷ 0.1000e98 4. 0.1111e51 × 0.4444e50 5. 0.1000e5 ÷ 0.9999e3 30 COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES 6. 0.5543e12 × 0.4111e – 15 7. 0.9998el + 0.l000e – 99 Sol. 1. 0.5334e9 × 0.l132e – 25 = 0.6038e –17, this result shows the underflow condition underflow condition underflow condition underflow condition underflow condition of floating point numbers. 2. 0.1111e10 × 0.1234e15 = 0.1370e24 3. 0.9998e – 5 ÷ 0.1000e98 = 0.9998e – 104, this result shows the underflow conditionunderflow condition underflow conditionunderflow condition underflow condition of floating point numbers. 4. 0.1111e51× 0.4444e50 = 0.4937e100. Hence the resultant shows an overflow condition.overflow condition. overflow condition.overflow condition. overflow condition. 5. 0.1000e5 ÷ 0.9999e3 = 0.1000e2 6. 0.5543e12 × 0.411le – 15 = 0.2278e – 3 7. 0.9998e1 ÷ 0.1000e – 99 = 0.9998e101, this shows an overflow conditionoverflow condition overflow conditionoverflow condition overflow condition of floating numbers. Example 16. For x = 0.4845 and y = 0.4800, calculate the value of − + 22 xy xy using normalized floating point arithmetic. Compare this with the value of (x – y). Sol. Since x = 0.4845, y = 0.4800 Hence x + y = 0.4845e0 + 0.4800e0 or 0.9645e0. Again, x 2 = (0.4845e0) × (0.4845e0) = 0.2347e0 y 2 = (0.4800e0) × (0.4800e0) = 0.2304e0 x 2 – y 2 = 0.2347e0 – 0.2304e0 = 0.0043e0 Therefore, 2 2 x y x y − + = 0.0043 0 0.9645 0 e e = 0.4458e – 2 Also, x – y = 0.4845e0 – 0.4800e0 = 0.4500e – 2 Example 17. Find the solution of the following equation using floating-point arithmetic with 4-digit mantissa x 2 – 1000x + 25 = 0. Sol. Given that, x 2 – 1000x + 25 = 0 ⇒ 6 2 1000 10 10 2 x ±− = Now 10 6 = 0.000e7 and 10 2 = 0.1000e3 Therefore 10 6 – 10 2 = 0.1000e7 ⇒ 62 10 10 0.1000 4 −= e Hence roots are: 0.1000 4 0.1000 4 0.1000 4 0.1000 4 and 22 ee ee+− which are 0.1000e4 and 0.0000e4 respectively. One of the roots becomes zero due to the limited precision allowed in computation. We know that in quadratic equation ax 2 + bx + c, the product of the roots is given by c a , the smaller root may be obtained by dividing (c/a) by the largest root. ERRORS AND FLOATING POINT 31 Therefore first root is given by 0.1000e4 and second root is as 25 0.2500 2 0.2500 1. 0.1000 4 0.1000 4 e e ee ==− Example 18. Associative and distributive laws are not always valid in case of normalized floating- point representation. Give example to prove this statement. Sol. According to the consequence of the normalized floating-point representation the associative and the distributive laws of arithmetic are not always valid. The example given below proves the above statement: Let a = 0.5555e1, b = 0.4545e1, c = 0.4535e1 then (b – c) = 0.0010e1 = 0.1000e – l a(b – c) = (0.5555e1) × (0.1000e – 1) = (0.0555e0) = 0.5550e – 1 ab = (0.5555e1) × (0.4545e1) = 0.2524e2 ac = (0.5555e1) × (0.4535e1) = 0.2519e2 Therefore ab – ac = 0.0005e2 = 0.5000e – 1 Thus, a(b – c) ≠ ab – ac This proves the non-distributivity of arithmetic. Again let a = 0.5665e1, b = 0.5556e – 1, c = 0.5644e1 Therefore a + b = 0.5665e1 + 0.5556e – 1 = 0.5665e1 + 0.0055e1 = 0.5720e1 (a + b) – c = 0.5720e1 – 0.5644e1 = 0.0076e1 = 0.7600e –1 a – c = 0.5665e1 – 0.5644e1 = 0.0021e1 = 0.2100e –1 (a–c) + b = 0.2100e – 1 + 0.5556e – 1 = 0.7656e – 1 Thus, (a+b) – c ≠ (a – c) + b This proves the non-associativity of arithmetic. Example 19. Calculate the smaller root of the equation x 2 – 400x + 1 = 0 using 4-digit arithmetic. Sol. Roots of the equation ax 2 + bx + c = 0 are 2 1 4 2 bb ac x a +− = and 2 2 4 2 bb ac x a −− = Here b 2 >>|4ac| and product of roots are c a . Therefore smaller root is 2 / 4 2 ca bb ac a +− or 2 2 4 c bb ac+− a = 1 = 0.1000e1, According to the equation b = 400 = 0.4000e3, c = 1 = 0.1000e1 32 COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES Therefore b 2 – 4ac = 0.1600e6 – 0.4000 e1 = 0.1600e6 or 2 4 b ac − = 0.4000e3 Hence smaller root is = 2 (0.1000 1) 0.2000 1 0.25 2 0.0025 0.4000 3 0.4000 3 0.8000 3 ee e ee e × ==−= + . PROBLEM SET 1.2 1. Round off the following numbers to four significant figures: 38.46235, 0.70029, 0.0022218, 19.235101 [Ans. 38.46, 0.7003, 0.002222, 19.24] 2. Round off the following numbers to two decimal places: 48.21416, 2.385, 52.275, 81.255, 2.3742 [Ans. 48.21, 2.39, 52.28, 81.26, 2.37] 3. Obtain the range of values within which the exact value of 1.265(10.21 7.54) 47 − lies, if all the numerical quantities are rounded off. [Hint. on taking e a < 1%] [Ans. 0.06186 <x< 0.8186] 4. Calculate the value of 102 1 01 − correct to four significant figures. [Ans. 0.04963] 5. Represent 44.85 × l0 6 in normalized floating-point mode. [Ans. 0.4485e8] 6. Explain Machine Epsilon in floating point arithmetic. 7. Calculate the value of x 2 + 2x – 2 and (2x – 2) + x 2 where x = 0.7320e0, using normalized point arithmetic and proves that they are not the same. Compare with the value of (x 2 – 2) + 2x.[Ans. –0.1000e–2, –0.2000e–3] 8. Find the value of sin 35 3! 5! ≈− + xx xx for x = 0.2000e0 using normalized floating point arithmetic with 4-digit mantissa. [Ans. 0.1987e0 (taking e a = 0.005)] 9. The following numbers are given in a decimal computer with a four digit normalized mantissa: (a) 0.4523e – 4, (b) 0.2115e – 3, (c) 0.2583e1. Perform the following operations, and indicate the error in the result, assuming symmetric rounding: 1. (a) + (b) + (c) 2. (a) – (b) – (c) 3. (a)/(c) 4. (a)(b)/(c)5.(a) – (b)6.(b)/(c) (a) [Ans. 1. 0.2585e1 2. 0.2581e1 3. 1.7511e–8 4. 0.3717e–8 5. –0.1663e–3 6. 0.1823e3] ERRORS AND FLOATING POINT 33 10. Give example to show that most of the laws of arithmetic fail to hold for floating-point arithmetic. 11. Find the root of smaller magnitude of the equation x 2 + 0.4002e0x + 0.8e – 4 = 0. Work in floating-point arithmetic using a four decimal place mantissa. [Ans. –0.2 e–3] 12. Give the normalized floating-point representation for the following: 1. 22/7 2. –22.75 3. 0.01 4. 3 9 8 5. – 3 64 6. 3/6 [Ans. 1. 0.3143e1 2. –0.2275e2 3. 1e–2 4. 0.9375e1 5. 0.5 e0 6. –0.4688e–1] 13. Using 5-digit arithmetic with rounding, calculate the sum of two numbers x = 0.78596e –2 and y = 0.786327e1. [Ans. 0.78712 e1] 14. Compute 403000 × 0.197 by 3-digit arithmetic with rounding. [Ans. 0.7939e5] 15. Evaluate − = 1cos () x fx x for x = 0.01, using five-digit decimal arithmetic. [Ans. 0.1 e–1] 16. Calculate the value of the polynomial P 3 (x) = 2.75x 3 – 2.95x 2 + 3.16x – 4.67 for x = 1.07 using both chopping and rounding off to three digits, proceeding through the polynomial term by term from left to right. [Ans. –0.133e1] GGG CHAPTER 2 Algebraic and Transcendental Equation 2.1 INTRODUCTION We have seen that expression of the form f(x)= a 0 x n + a 1 x n –1 + + a n –1 x + a n where a’s are constant (a 0 ≠ 0) and n is a positive integer, is called a polynomial in x of degree n, and the equation f (x) = 0 is called an algebraic equation of degree n. If f (x) contains some other functions like exponential, trigonometric, logarithmic etc., then f (x) = 0 is called a transcendental equation. For example, x 3 – 3x + 6 = 0, x 5 – 7x 4 + 3x 2 + 36x – 7 = 0 are algebraic equations of third and fifth degree, whereas x 2 – 3 cos x + 1 = 0, xe x – 2 = 0, x log 10 x = 1.2 etc., are transcendental equations. In both the cases, if the coefficients are pure numbers, they are called numerical equations. In this chapter, we shall describe some numerical methods for the solution of f(x) = 0 where f(x) is algebraic or transcendental or both. 2.2 METHODS FOR FINDING THE ROOT OF AN EQUATION Method for finding the root of an equation can be classified into following two parts: (1) Direct methods (2) Iterative methods. 2.2.1 Direct Methods In some cases, roots can be found by using direct analytical methods. For example, for a quadratic equation ax 2 + bx + c = 0, the roots of the equation, obtained by x 1 = 2 22 44 and 22 bb ac bb ac aa −+ − −− − =x These are called closed form solution. Similar formulae are also available for cubic and biquadratic polynomial equations but we rarely remember them. For higher order polynomial equations and non-polynomial equations, it is difficult and in many cases impossible, to get 34 ALGEBRAIC AND TRANSCENDENTAL EQUATION 35 closed form solutions. Besides this, when numbers are substituted in available closed form solutions, rounding errors reduce their accuracy. 2.2.2 Iterative Methods These methods, also known as trial and error methods, are based on the idea of successive approximations, i.e., starting with one or more initial approximations to the value of the root, we obtain the sequence of approximations by repeating a fixed sequence of steps over and over again till we get the solution with reasonable accuracy. These methods generally give only one root at a time. For the human problem solver, these methods are very cumbersome and time consuming, but on other hand, more natural for use on computers, due to the following reasons: (1) These methods can be concisely expressed as computational algorithms. (2) It is possible to formulate algorithms which can handle class of similar problems. For example, algorithms to solve polynomial equations of degree n may be written. (3) Rounding errors are negligible as compared to methods based on closed form solutions. 2.3 ORDER (OR RATE) OF CONVERGENCE OF ITERATIVE METHODS Convergence of an iterative method is judged by the order at which the error between successive approximations to the root decreases. The order of convergence of an iterative method is said to be kth order convergent if k is the largest positive real number such that 1 lim i k i i e A e + →∞ ≤ Where A, is a non-zero finite number called asymptotic error constant and it depends on derivative of f(x) at an approximate root x. e i and e i + 1 are the errors in successive approximation. In other words, the error in any step is proportional to the kth power of the error in the previous step. Physically, the kth order convergence means that in each iteration, the number of significant digits in each approximation increases k times. 2.4 BISECTION (OR BOLZANO) METHOD This is one of the simplest iterative method and is strongly based on the property of intervals. To find a root using this method, let the function f(x) be continuous between a and b. For definiteness, let f(a) be negative and f(b) be positive. Then there is a root of f(x) = 0, lying between a and b. Let the first approximation be x 1 = 1 2 (a + b) (i.e., average of the ends of the range). Now of f(x 1 ) = 0 then x 1 is a root of f(x) = 0. Otherwise, the root will lie between a and x 1 or x 1 and b depending upon whether f(x 1 ) is positive or negative. . 1 ab = (0 .55 55e1) × (0. 454 5e1) = 0. 252 4e2 ac = (0 .55 55e1) × (0. 453 5e1) = 0. 251 9e2 Therefore ab – ac = 0.0005e2 = 0 .50 00e – 1 Thus, a( b – c) ≠ ab – ac This proves the non-distributivity of arithmetic. Again. arithmetic. Again let a = 0 .56 65e1, b = 0 .55 56e – 1, c = 0 .56 44e1 Therefore a + b = 0 .56 65e1 + 0 .55 56e – 1 = 0 .56 65e1 + 0.0 055 e1 = 0 .57 20e1 (a + b) – c = 0 .57 20e1 – 0 .56 44e1 = 0.0076e1 = 0.7600e –1 a –. example given below proves the above statement: Let a = 0 .55 55e1, b = 0. 454 5e1, c = 0. 453 5e1 then (b – c) = 0.0010e1 = 0.1000e – l a( b – c) = (0 .55 55e1) × (0.1000e – 1) = (0. 055 5e0) = 0 .55 50e