The reliability of the numerical result will depend on an error estimate or bound, therefore the analysis of error and the sources of error in numerical methods is also a critically impo
Trang 6Errors and Floating Point
1.1 INTRODUCTION
Numerical technique is widely used by scientists and engineers to solve their problems A major advantage for numerical technique is that a numerical answer can be obtained even when a problem has no analytical solution However, result from numerical analysis is an approximation,
in general, which can be made as accurate as desired The reliability of the numerical result will depend on an error estimate or bound, therefore the analysis of error and the sources of error in numerical methods is also a critically important part of the study of numerical technique
(i) Exact Number: Number with which no uncertainly is associated to no approximation is
taken, are known as exact numbers e.g., 5, 21/6, 12/3, etc are exact numbers (ii) Approximate Number: There are numbers, which are not exact, e.g., 2 = 1.41421 ,
e = 2.7183 , etc are not exact numbers since they contain infinitely many non-recurring
digits Therefore the numbers obtained by retaining a few digits, are called approximates
numbers, e.g., 3.142, 2.718 are the approximate values of π and e.
(iii) Significant Figures: The significant figures are the number of digits used to express a number The digits 1, 2, 3, 4, 5, 6, 7, 8, 9 are significant digits ‘0’ is also a significant figure except when it is used to fix the decimal point or to fill the places of unknown or discarded digits
For example, each number 5879, 3.487, 0.4762 contains four significant figures while the numbers 0.00486, 0.000382, 0.0000376 contains only three significant figures since zeros only help to fix the position of the decimal point
Similarly, in the number 0.0002070, the first four ‘0’s are not significant figure since they serve only to fix the position of decimal point and indicate the place values of the other digits The other two ‘0’s are significant
Some example to be more clear, the number 2.0683 contain five significant figure
(iv) Round off Number: If we divide 2 by 7, we get 0.285714 a quotient which is a non-terminating decimal fraction For using such a number in practical computation, it is to
Trang 72 COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES
be cut-off to a manageable size such as 0.29, 0.286, 0.2857, etc The process of cutting off super-flouts digits and retaining as many digits as desired is known as rounding off
a number or we can say that process of dropping unwanted digits is called rounding-off Number are rounded-off according to the following rules:
To round-off the number to n significant figures, discard all digits to the right of nth digit
and if this discarded number is
(1) Less than 5 in (n + 1)th place, leave the nth digit unaltered e.g., 8.893 to 8.89 (2) Greater than 5 in (n + 1)th place, increase the nth digit by unity e.g., 5.3456 to 5.346 (3) Exactly 5 in (n + 1)th place, increase the nth digit by unity if it is odd otherwise leave
it unchanged e.g., 11.675 to 11.68, 11.685 to 11.68.
Example 1 Round-off the following numbers correct to four significant figures: 58.3643, 979.267,
7.7265, 56.395, 0.065738 and 7326853000.
Sol After retaining first four significant figures we have:
(i) 58.3643 becomes 58.36
(ii) 979.267 becomes 979.3
(iii) 7.7265 becomes 7.726 (digit in the fourth place is even)
(iv) 56.395 becomes 56.40 (digit in the fourth place is odd)
(v) 0.065738 becomes 0.06574 (because zero in the left is not significant)
(vi) 7326853000 becomes 7327 × 106
Error = True value – Approximate value
A computer has a finite word length and so only a fixed number of digits are stored and used during computation This would mean that even in storing an exact decimal number in its converted form in the computer memory, an error is introduced This error is machine dependent and is called machine epsilon After the computation is over, the result in the machine form (with
base b) is again converted to decimal form understandable to the users and some more error may
be introduced at this stage In general, we can say that Error = True value – Approximate value The
errors may be divided into the following different types:
1 Inherent Error: The inherent error is that quantity which is already present in the statement
of the problem before its solution The inherent error arises either due to the simplified assumptions in the mathematical formulation of the problem or due to the errors in the physical measurements of the parameters of the problem
Inherent error can be minimized by obtaining better data, by using high precision computing aids and by correcting obvious errors in the data
2 Round-off Error: The round-off error is the quantity, which arises from the process of rounding off numbers It sometimes also called numerical error Also round off denote a quantity, which must be added to the finite representation of a compound number in order to make it the true representation of that number The round-off error can be reduced by carrying the computation to more significant figures at each step of computation At each step of computations, retain at least one more significant figure than that given in the data, perform the last operation, and then round off
3 Truncation Error: Three types of errors caused by using appropximate formulae in
computation or on replace an infinite process by a finite one that is when a function f(x)
Trang 8ERRORS AND FLOATING POINT 3
is evaluated from an infinite series for x after ‘truncating’ it at a certain stage, we have
this type of error The study of this type of error is usually associated with the problem
of convergence
4 Absolute Error: Absolute error is the numerical difference between the true value of a
quantity and its approximate value Thus if x’ is the approximate value of quantity x then
x x− ′ is called the absolute error and denoted by E a Therefore E a = x x− ′ The unit of exact or unit of approximate values expresses the absolute error
5 Relative Error: The relative error E r defined by r x x E a
E
x True Value
′
−
the approximate value of quantity x The relative error is independent of units.
6 Percentage Error: The percentage error in x′ which is the approximate value of x is given
by E p = 100 × E r = 100 × x x
x
′
− The percentage error is also independent of units
Let X = f(x1, x2, , x n ) be the function having n variables To determined the error δX in X due
to the errors δx1, δx2, , δx n in x1, x2, , x n respectively
X + δX = f (x1 + δx1, x2 + δx2, , x n + δx n) Using Taylor’s series for more than two variables, to expand the R.H.S of above, we get
1 2
n
1 2
1
x x
Errors δx1, δx2, , δx n all are small so that the terms containing (δx1)2, (δx2)2, (δx n)2 and higher powers of δx1, δx2, , δx n are being neglected
Therefore X + δX = f (x1, x2, , (x n) + 1 2
1 2
δ ∂ + δ ∂ + + δ ∂
n
n
Because X = f (x1, x2, , xn)
Equation (2) represents the general formula for Errors If equation (2) divided by X we get
relative error
r
x
E
Trang 94 COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES
On taking modulus both of the sides, we get maximum relative error
n
n
x
δ
Also from equation (2), by taking modulus we get maximum absolute error
n
n
1.4.1 Error in Addition of Numbers
Let X = f (x1 + x2 + + x n)
∴ X + δX = (x1 + δx1) + (x2 + δx2) + + (x n + δx n )
= (x1 + x2 + + x n) + (δx1 + δx2 + + δx n) Therefore, δX = δx1 + δx2 + + δx n ; this is an absolute error
Dividing by X we get, X x1 x2 x n
δ
δ = + + ; which is a relative error A gain,
1 2 x n
X
δ
δ ≤ + + + ; which is a maximum relative error Therefore it shows that when the given numbers are added then the magnitude of absolute error in the result is the sum
of the magnitudes of the absolute errors in that numbers
1.4.2 Error in Subtraction of Numbers
Let X = x1 – x2 then we have
X + δX = (x1 + δx1) − (x2 + δx2) Or X + δX = (x1 – x2) + (δx1 + δx2)
∴ Absolute error is given by δX = δx1 – δx2
and Relative error is X x1 x2
But we know that δ ≤ δ + δX x1 x2 and 1 2
X
δ ≤ + therefore on taking modulus of relative errors and absolute errors to get its maximum value, we have δ ≤ δ + δX x1 x2 which
is the maximum absolute error and δ ≤X X δX x1 + δX x2 which gives the maximum relative error
in subtraction of numbers
1.4.3 Error in Product of Numbers
Let X = x1x2x3 , x n then using general formula for error
δX = x1 X x2 X x n X
Trang 10ERRORS AND FLOATING POINT 5
X =
δ
n
n
x
∂ 1
1
n n
∂
⋅
∂ 2
1
n n
∂
⋅
∂
1
n
X
X x =
−
−
=
1
n
X =
δ
δ 1 +δ 2 + +
n
n
x
Therefore maximum Relative and Absolute errors are given by
Relative Error = δ ≤ δ 1 + δ 2 + + δ
n
n
x
X
Absolute Error = δ = δ ×
1 2
( )n
1.4.4 Error in Division of Numbers
Let = 1
2
x
X
x then again using general formula for error
δX = δ ∂ + δ ∂ + + δ ∂
n
n
X =
2
1
Therefore δ ≤ δ 1 + δ 2
x x X
X x x or Relative Error ≤ 1 + 2
x x
x x
and
Absolute Error = X
X
δ
1.4.5 Inverse Problem
To find the error in the function X = f(x1, x2, x n) is to have a desired accuracy and to evaluate errors δx1, δx2, δx n in x1, x2, ; x n we have 1 2
n
n