The Research on Interpolation Methods and Fitting Models for the Lorenz Curve45302

The Research on Interpolation Methods and Fitting Models for the Lorenz Curve Li Zhang (1) ,(2) , Kien Nguyen The(3),(*), Youjian Qi[4], Lizhen Wei[5]，Khac Lich Hoang[3], Manh Hung Le(6) (1) School of Mathematics and Statistics, Changshu Institute of Technology, Suzhou, China Business School, Nanjing Normal University, Nanjing, China (3) VNU University of Economics and Business, Vietnam National University, Hanoi, Vietnam (4) Yangzhou High School of Jiangsu Province, Jiangsu, China (5) Jiangsu Province Changshu Vocational Educational Center School, Suzhou, China (6) Ministry of Education & Training, Hanoi, Vietnam * Correspondence: thekien.edu@gmail.com (2) Abstract: The Lorenz curve is very important to show the income distribution situations for a country Based on the work of other scholars, the paper at first discusses some exiting interpolation methods and points out shortcomings among them In order to overcome the existing of higher order derivatives for Lagrange interpolation, this paper sets up one method and makes some analysis about the methods Then this paper puts forward a new family of Lorenz curve and discusses the corresponding property With some collected income dates of some regions in China, statistical indexes shows the better results compared with other existing curves Keywords: Interpolation Method, Lorenz curve Introduction From the last century since the reform and opening, with the continuous development of economy, the overall strength of our country and economic conditions have been improved greatly But some provinces rural poverty still exist in remoted areas The party and the government have put great importance to poor problem in our country with the special economic development policy and fiscal subsidy (Bai and Cao 2007; Guo 2007) and so on These efforts by the government have achieved obvious results In the spring of 2014, Prime Minister Li Keqiang, on behalf of the central people government to the work report, pointed out last year to reduce rural poverty population 16.5 million and the gap between urban and rural residents income continues to shrink At the same time, the United Nations say yes for China in recent stage poverty alleviation work results According to the wealth of the credit suisse report in October 2014, the phenomenon of the unequal distribution of wealth more intensified, the Gini coefficient rising around the world Chinese Gini coefficient has gradually rise, cordon and to secure the red line approximation (Mei and Fan 2005; Wang and Fan 2005; Hu 2004; Liu 2006) This is must attach great importance to the problem, involving the stability and prosperity In economics, the Lorenz curve is a graphical representation of the cumulative distribution function of the empirical probability distribution of wealth, and was developed by Max O Lorenz in 1905 for representing inequality of the wealth distribution The Lorenz curve is used to compare and analysis of a country in a different age or wealth inequality of different countries at the same time, the curve as the convenience of a summary of income and wealth distribution information of graphic method is widely used Through the Lorenz curve, we can visually see a national income distribution equality or inequality Draw a rectangle, rectangle high measure the percentage of social wealth, will be divided into five equal parts, each class is divided into 20 social total wealth In rectangular to nod, 100 families from the poor to the very richest arranged from left to right, is divided into five parts, the first equal parts on behalf of 20 of the lowest income families This paper is organized as follows In section 2, we discusses some interpolation methods and analyze the characteristics of them Then one kind of function is advocated to get any higher order derivatives for Lagrange interpolation In section 3, describe the basic property about the Lorenz curve and Gini coefficient Some literature on it is given out in recent years In section 4, we put forward the new construction for Lorenz Curves and new expressions of the new family of Lorenz curve are given out In section 5, based on our new function for Lorenz Curve and other functions, the computations about some statistical indexes such as MAS, MAE, MES are done and the results are reported In section 6, some concluding remarks are addressed at last The Interpolation Methods About Lorenz Curve A large number of engineering problems involving the unknown function approximation Usually a set of observations, then use the appropriate interpolation method can calculate the function approximation But when making the error analysis, the function must have good properties with to n  order derivative In the practical engineering problems, however, the function can ensure continuity, but it is very difficult to obtain higher derivative and guarantee the existence of the higher derivative 2.1 Analysis of Some Interpolation Methods about Lorenz Curve Since the Lorenz curve was put forward in 1905 and the collected data are discrete, so some scholars have paid attentions to the interpolation method for it Gastwarth gives out the Hermite interpolation method for Lorenz curve Other scholars have put forward linear interpolation method and Newton interpolation method In this part, we gives out the concrete formulas then discussed advantages and shortcomings of these method It is known that interpolation method is to construct the Lorenz curve using the collected data ( xi , yi ) , i  0,1,2, , n about the income distribution The simple way of interpolation method is linear interpolation That is to use a line to connect two adjacent points ( xk , yk ) and ( xk 1 , yk 1 ) with the formula as following: L1 ( x)  l ( x) y k  l1 ( x) y k 1 , Where: l ( x)  x  x k 1 x  xk l1 ( x)  x k  x k 1 , x k 1  x k The basic thought of Hermite interpolation is the piecewise interpolation nodes, but need to connect the piecewise function curve smooth, forming a smooth curve The most commonly used cubic spline interpolation For the nodes ( xi , yi ) , i  0,1,2, , n the concrete expression in the interval [ xi , xi 1 ] of Hermite interpolation is following: S j ( x)  A j  B j x  C j x  D j x , j  0,1,  , n  Then through some conditions it is possible to determine the coefficients Newton interpolation N ( x ) is following: Where: N ( x )  L ( x)  R n ( x ) , n L( x )  L[ x0 ]  L[ x0 , x1 ]( x  x0 )    L[ x0 , x1 , , xn ] ( x  xi )  Rn ( x ) i 0 L[ x0 , x1 , , xk ]  ( L[ x0 , x1 , , xk ]  L[ x0 , x1 , , xk 1 ]) xk  x0 so the error estimation is Rn ( x ) The piecewise linear interpolation method is very simple, but it can't guarantee to be differentiable at the nodes, more far from the second order differentiable For the Lorenz curve, the piecewise linear interpolation method is on the curve, so it is certain to make the Gini coefficient smaller So the method is not limited The Hermite interpolation method is used in some fields but it has some shortcomings The first one is the computations are complicated when the numbers of nodes is large The second one is the error estimation Because the defined curve is not expressed clearly with discrete data and the higher order of derivative (more than 3) is unattainable So the error estimation about Hermite interpolation method is not too accurate 2.2 Analysis of New Method for Lagrange Interpolation The function  ( x ) is defined as following:  a21x2 , xa，  ( x )   e  0, xa Where  is a constant and  a   ea (1)  x2 a dx The constant a is positive It is obvious that the function  ( x ) is continuous and differentiable if x  a Now we consider that expression that lim  ( x )  lim  0, xa  xa (2) At the same time, it is also true that a lim  ( x )  lim  e x a  x a a  a2  x2  (3) So the function is continuous at the point x  a It is similar that the function is continuous at the point x   a So the function  ( x ) is continuous in R , and it is differentiable in the open interval (  a, a ) So the function  ( x ) is continuous in R, and it is differentiable in the open interval (  a, a ) where a isa positive constant On the other hand, the integration of the function is on R is 1, it is expressed as following:  R  ( x )dx   a x a  ( x )dx    ( x )dx    a (4) Theorem 2.1 The function  ( x ) is the one as in before and the function m ( x ) is defined as following : m ( x )  m (mx ), m  1,2, , (5) So the following conclusions are true (1)  R m ( x )dx 1 (6) (2) the compact support set of the function  m ( x ) is [  a a , ] m m (7) Proof The proof of The Theorem 2.1 is following  R When x  m ( x )dx   m (mx )dx    (u)du 1 R (8) R a , mx  a , Thus m ( x )  m (mx )  m Theorem 2.2 The function f ( x ) is continuous in R and the function  m ( x ) is the one as before And the function f m ( x ) is defined as following: f m ( x )   f ( y ) m ( x  y )dy, m  1, 2, R (9) So the following conclusions are true for any x  R （1） lim f m ( x )  f ( x ) (10) f m ( x )  C  ( R), m  1,2, (11) m  （2） Proof Let us proof the first conclusion in this theorem Set y  v  x, so it is clear that f m ( x )   f ( y )m ( x  y )dy   f (v  x ) m ( v )dv   f (v  x ) m ( v )dv R R R (12) Thus we consider the absolute value of two function f ( x ) , f m ( x ) which is in the following: f ( x)  fm ( x)   f ( x  v )m (v )dv   f ( x )m (v )dv   ( f ( x  v )  f ( x ))m (v )dv   R R R v  v a m f ( x  v )  f ( x ) m (v )dv f ( x  v )  f ( x ) m (v )dv a m  sup v a m v a m  sup f ( x  v )  f ( x)  v a m  m (v )dv f ( x  v )  f ( x) (13) Because the function is continuous in R, the following is true: lim sup m v a m f ( v  x )  f ( x )  (14) Now we give out the proof about the second conclusion It is known that f ( y ) is continuous on R and the function f m ( x )  C  ( R ) on the interval ( x  a a , x  ) So we m m see the following express: a f m ( x )   f ( y ) m ( x  y )dy   ma f ( y ) m ( x  y ), m  1, 2, R  (15) m make the p derivative about the function f m ( x ) , then we have the following expression as below: f ( p )m ( x)   f ( y ) R   m ( x  y )dy ,( p  1, 2,) x (16) The functions f m ( x ) , f ( x ) are defined in the former part Now we consider the interpolation polynomial for f m ( x ) But That function f m ( x ) , we have not any observation points There are one group of observation points ( xi , yi ) , i  0,1,2, , n By the related theory, for any   0, there exists N  0, when m  N , for any k  0, it is clear that f m ( x )  f ( x)   (17) yk (m)  f m ( xk ) , yk  f ( xk ), If yk (m)  f m ( xk ) is substituted by yk  f ( xk ), Set this kind of error estimation is under control for the conclusion above Set the function Rn ,m as following: f ( n 1) ( ) k  n  ( x  xk ) (n  1)! k 0 Rn ,m  f m ( x )  Pn ( x )  (18) For any x  R , It is obtained when m   that Rn ( x )  Rn ,m ( x )  ( f ( x )  Pn ( x ))  ( f m ( x )  Pn ( x ))  f ( x )  f m ( x )  (19) The Research on Fitting Models for The Lorenz Curve Income distribution is related to the broad masses of people’s standard of living, the degree of distribution justice is the key point for ordinary people To measure the distribution of the residents¡¯ income level, we often adopt the Lorenz curve Here, the function L( p ) is equal to p low-income population share end has a share of the total, is defined in the function on the interval p  F ( x ) said the proportion of people earning less than or equal to x , where F ( x ) is the distribution function of income distribution The function f ( x ) income distribution density function where 3.1 Basic Theory and Related Research Empirical analysis of income distribution, income distribution curve generally is what is called the forward bias, the peak point to the left and the right end, dragging a long tail Point x0 is called the modal point, m is the median number,  is the average income It is clear that x0  m   under this circumstance Conditions by above knowable, L( p ) can be expressed as: L( p )   x (20) tf (t )dt  F ( x ) There are the relation between the function L( p ) and the function f ( x ) L' ( p )   x , f ( x)   L ( p) '' (21) Because p  F ( x ), known of inverse function for x  F 1 ( p), and L( p ) can be expressed as the Lorenz curve L( p )   q qF 1 ( q)dq (22) And because China statistical yearbook published on so-called packet data in the form of: ( pi , pi ), i  1, 2, , n xi (23)  ( pi , Li ), i  1, 2,, n It is well known that L' ( p )  (24)  x ，so (23) expresses the points in the curve of L' ( p) while (24) expresses the points in the curve of L( p ) 3.2 The Construction of New Lorenz Curve It is vital to set up the necessary conditions for Lorenz curve which is the definition of the Lorenz curve, we call full enough under the condition of the curve is a lorenz curve The curve which is satisfied with the following conditions is called a Lorenz curve (1) L(0)  0, L(1)  1; (2) L( p )  0, p  [0,1] (3) L( p ) is the increasing function about p , (25) (26) ' which satisfy the L L ( p)  represents that the greater the population share of low-income has the corresponding group has the greater the share of the total income '' (4) L( p ) is a convex function about p , which satisfies L ( p)  said is when p increases, the L( p ) to a larger proportion increase A number of parametric models that satisfy the basic properties of a LC have been proposed in the literature See, for example, Kakwani and Podder (1973, 1976), Rasche et al.(1980), Gupta (1984), Rossi (1985), Arnold (1986), Rao and Tam (1987), Basmann et al (1990), Ortega et al (1991), Chotikapanich (1993), Ogwang and Rao (1996, 2000), Sarabia (1997) and Wang et al (2009) In those papers, the scholars advocates the different models for LC Some of the concrete results is listed in the Table Table Some Models for A Lorenz Curve Time Authors 1973 Kakwani 1980 Rasche 1991 Ortega et al 1991 Ortega et al 1999 Sarabia et al 2000 Ogwang Model L( p )  p e   (1 p ) L( p )  (1  (1  p ) ) L( p)  p (1  (1  p)  ) Wang et al p L( p )  p [(1  (1  p )  ) ] L( p)  p (1  (1  p)  ) L( p )  2009 p p e p  e  L( p)  p (1  (1  p )  e  p ) From the character of an arch function, we set the function expression is following: I   Ap (1  p)  , (27) This kind of the function can show the different arches with the different values of parameter  and  When    , the arch of the function tend to right When    , the arch of the function tend to left Combined with the 45 degree line and the curve I , So we can get the new function for Lorenz curves which is shown in the following: I  p  Ap (1  p )  , (28) Where the parameters meet the following conditions: A  0,0    1,0    (29) From the definition of Lorenz curve, we can find that the function I is satisfied with the following conditions: L(0)  0, L(1)  1, L' ( p)  0, L'' ( p )  (30) So it is clear that the function I is the new expression for a Lorenz curve Compared with others’work, this expression is out of the usual GP model with more generality to be suitable for the real world Studying the model given by Sarabia et al (1999), we put forward the new family for Lorenz curve which is shown in the following expression L ( p )  p  [ p  Ap (1  p)  ], (31) Since most models for Lorenz curve are dependent on the classical Pareto curve which form is  (1  p )  In this expression of Pareto curve we can see that there is no p Based on the related theory, L( p ) is a Lorenz curve Combined the demand for parameters in the function L(p) , the conditions for parameters in the function L ( p ) is following: A  0,0    1,    1,    (32) 3.3 Parameters Fitting and Model Comparison With the collected data which reflects the income of some province in China which is in the following table: Table The Data xj x j 1 fj pj Lj 0.00 999.00 0.0780 0.0780 0.0590 1000.00 1499.00 0.0560 0.1340 0.0165 1500.00 1999.00 0.0420 0.1760 0.0276 2000.00 2499.00 0.0470 0.2230 0.0436 2500.00 2999.00 0.0420 0.2650 0.0611 3000.00 3499.00 0.0440 0.3090 0.0828 3500.00 3999.00 0.0410 0.3500 0.1061 4000.00 4999.00 0.0860 0.4360 0.1647 5000.00 5999.00 0.0920 0.5280 0.2413 6000.00 6999.00 0.0880 0.6160 0.3279 7000.00 7999.00 0.0800 0.6960 0.4188 8000.00 8999.00 0.0650 0.0650 0.5024 9000.00 9999.00 0.0520 0.8130 0.5772 10000.00 11999.00 0.0780 0.8910 0.7071 12000.00 14999.00 0.0560 0.9470 0.8216 15000.00 24999.00 0.0430 0.9900 0.9453 0.0100 1.0000 1.0000 25000.00 In order to over-fitting the data, it is usual to use the nonlinear least square method to compute the value of the parameters in the model in the field of economy That is to say, considering the following problem: n S ( )   ( Li ( p, )  Li )2 (33) i 1 The vector values of parameters  can be found with some kinds of methods, such _ as LM method, then the vector value  for the vector is called the estimated vector value Then the function _ L( p, ) L( p, ) , (34) is called the approximate function for the real Lorenz curve In order to compute the value for the parameters with the data, this paper adopts the classical Levenberg-Marquard Algorithm The concrete result is following: _ _ _ _ _   ( ,  ,  , A) = (0.3448, 0.5837, 0, 8156, 0, 0, 5628) (35) Then we see some statistical indexes such as MSE, MAE, MAS which are used nomally in the economy and finance MSE is called mean squared error whose expression is following: MSE   n ( Li ( p, )  Li ) ;  n i 1 (36) MAE is called mean absolute error whose expression is following: MAE   n Li ( p, )  Li ;  n i 1 (37) MAS is called maximum absolute error whose expression is following: n  MAS  max  Li ( p, )  Li 1i  n (38) i 1 The statistical indexes is following: MSE = 0.000021317; MAE = 0.0013; MAS = 0.0025 (39) In order to comparison with Rasche, Ortega, Sarabia et al (1980), we some computations with the data Thedetails about the three different indexes is in the Table Table Some Models for A Lorenz Curve No Authors MSE MAE MAS PODDER 2.71E-04 0.0126 0.0435 KAKWANI 4.02E-06 0.0160 0.0038 RASCHE 1.82E-05 0.0035 0.0080 GUPTA 1.05E-04 0.0091 0.0176 ORTEGA 9.38E-06 0.0029 0.0052 SARABIA 1.42E-05 0.0034 0.0068 OGWANG 2.72E-04 0.0144 0.0356 WANG 0.85E-04 ZHANG 3.10E -06 0.0220 0.0126 0.0548 0.0051 Conclusions and discussion In this paper, we give out the background of Lorenz Curve and Gini coefficient The basic relation between the Lorenz curve and Gini coefficient is given out Then some transformation formulas among the density function and the distribution function are listed Based on the graphic character of arch function, we propose a new method to construct a family of Lorenz Curves and gives out some property on it With the collected income data in China, some computations about parameter estimation and error analysis is done and the results show that our models are better than other models References Wang, Z.X (2009) A new ordered family of Lorenz curves with an application to measuring inequality and poverty in rural China China Economic Review, 20: 218-235 Wang, Z.X (2000) A famous economic evaluation index and construction method Dongyue Review, 32: 31-36 Bai, Y.H (2007) Building a well-off society in an all-round way of the rural poverty problem Agricultural economy, 1:231-254 Guo, Q.F (2007) China’s relatively poor peasant households out of poverty mechanism and policy choice Graduate school of Chinese academy of social sciences, 1:7898 Wang, X.L., Fan, G (2005) China’s income gap analysis of the situation and influence factors Economic research, 40: 12-18 Wang, Y.J., Sarabia (2006) Lorenz transcribing curve model Applied Mathematics, 10:12-143 Cao, Y.C (2007) The influence factors of standard of safeguard of townsman lowest life in our country and the effect of study Contemporary economic science, 2:320-356 Foster, J.E and Wolfson, M.C (2009) Polarization and the decline of the middle class: Canada and the U.S Economic Inequality, 8: 247 273 Hu, D.M (2004) The gini coefficient theory and empirical analysis Ofeconomic system reform, 4: 37-40 Liu, L.S (2006) The current situation of fiscal adjustment income distribution gap analysis Economic science press, 92, 317-322 Sozialstatistik (1994) Fitting parametric Lorenz curves to grouped income distributions - a critical note Empirical Economics, 19: 361-370 Mei, J and Fan, J (2005) Zhejiang urban residents income gap and consumption behavior difference Management review, 10:57-62 ... So the function  ( x ) is continuous in R, and it is differentiable in the open interval (  a, a ) where a isa positive constant On the other hand, the integration of the function is on R... (19) The Research on Fitting Models for The Lorenz Curve Income distribution is related to the broad masses of people’s standard of living, the degree of distribution justice is the key point for. .. expression of Pareto curve we can see that there is no p Based on the related theory, L( p ) is a Lorenz curve Combined the demand for parameters in the function L(p) , the conditions for parameters

Định dạng
Số trang	13
Dung lượng	399,98 KB