Akaike’s Information Criterion (AIC): Adds harsher penalty for adding. more variables to the model, defined as:[r]
(1)FUNCTIONAL FORMS
Truong Dang Thuy truong@dangthuy.net
(2)Linear model
Consider a linear regression function
: change in Y when X increases by unit Sometimes the relationship is not linear Common functional form:
Log-linear Log-lin Lin-log
Reciprocal Polynomial
0 1
Y X
(3)Functional forms
Linear model Log-linear
Lin-log
Log-lin
0
Y X
0
lnY ln X
0 1ln
Y X
0
(4)Functional forms
Reciprocal (negative beta) Reciprocal (positive beta)
0 1
1
0
Y
X
1
0 1
1
0
Y
X
(5)Example dataset
Viet Nam Provincial data on (file ‘gdpprov.xlsx’)
gdp: provincial GDP (mil VND)
labfo: number of laborers of provinces (1000
persons)
(6)Record of commands
Record of results
Variables (data)
Commands
Taskbar
(7)Import data
Copy from Excel
(8)Data description
(9)Linear function
(10)LOG-LINEAR MODEL
The Cobb-Douglas Production Function:
can be transformed into a linear model by taking natural logs of both sides:
The slope coefficients can be interpreted as elasticities
If (B2 + B3) = 1, we have constant returns to scale If (B2 + B3) > 1, we have increasing returns to scale If (B2 + B3) < 1, we have decreasing returns to scale
3 2
1
B B
i i i
Q B L K
1 2 3
(11)Log-linear model
_cons 3.06333 .4515804 6.78 0.000 2.174233 3.952426 linvest 644785 .0405325 15.91 0.000 5649824 .7245876 llabor 508612 .0643267 7.91 0.000 381962 635262 lgdp Coef Std Err t P>|t| [95% Conf Interval] Total 224.910559 270 833002069 Root MSE = 42886 Adj R-squared = 0.7792 Residual 49.2915017 268 183923514 R-squared = 0.7808 Model 175.619057 87.8095284 Prob > F = 0.0000 F( 2, 268) = 477.42 Source SS df MS Number of obs = 271 reg lgdp llabor linvest
(17 missing values generated) gen linvest = ln(rinvest) gen llabor = ln(labfo)
(10 missing values generated) gen lgdp = ln(rgdp)
(12)LOG-LIN OR GROWTH MODELS
The rate of growth of real GDP:
can be transformed into a linear model by taking natural logs of both sides:
Letting B1 = ln RGDP0 and B2 = ln (l+r), this can be
rewritten as:
ln RGDPt = B1 +B2 t
B2 is considered a semi-elasticity or an instantaneous growth rate The compound growth rate (r) is equal to (eB2 – 1)
0(1 )
t t
RGDP RGDP r
0
(13)LOG-LIN MODEL
t 290 1.416658 5 Variable Obs Mean Std Dev Min Max sum t
(14)LOG-LIN MODEL
(15)LIN-LOG MODELS
Lin-log models follow this general form:
Note that B2 is the absolute change in Y responding to a
percentage (or relative) change in X
If X increases by 100%, predicted Y increases by B2 units
1 2 ln
i i i
(16)Exercise – lin-log model
Data: from VHLSS 2010
income: individual annual income (1000 VND) healthcost: individual annual cost for health care
(1000 VND)
Use the data in ‘healthcost.dta’ to run the
regression
where hcshare is the share of health cost in income
0 1 ln
(17)Health cost with Lin-log model
_cons 421608 .0322026 13.09 0.000 .35847 484746 lincome -.0341629 .0029364 -11.63 0.000 -.0399202 -.0284056 hcshare Coef Std Err t P>|t| [95% Conf Interval] Total 75.7996618 3474 021819131 Root MSE = 14494 Adj R-squared = 0.0372 Residual 72.9563097 3473 021006712 R-squared = 0.0375 Model 2.84335206 2.84335206 Prob > F = 0.0000 F( 1, 3473) = 135.35 Source SS df MS Number of obs = 3475 reg hcshare lincome
gen lincome = ln(income)
(18)RECIPROCAL MODELS
Lin-log models follow this general form:
Note that:
As X increases indefinitely, the term approaches zero and Y approaches the limiting or asymptotic value B1
The slope is:
Therefore, if B2 is positive, the slope is negative throughout, and if B2 is negative, the slope is positive throughout
1 2
1 ( )
i i
i
Y B B u
X 1 ( ) i B X 2 2 1 ( ) dY B
(19)Exercise – Reciprocal model
Use the data in ‘healthcost.dta’ to run the
regression
0 1
1
hcshare
income
(20)Exercise – Reciprocal model
_cons 023971 .0032251 7.43 0.000 0176478 .0302943 invincome 942.4843 81.65964 11.54 0.000 782.3786 1102.59 hcshare Coef Std Err t P>|t| [95% Conf Interval] Total 75.7996618 3474 021819131 Root MSE = 14498 Adj R-squared = 0.0367 Residual 72.9997153 3473 .02101921 R-squared = 0.0369 Model 2.79994649 2.79994649 Prob > F = 0.0000 F( 1, 3473) = 133.21 Source SS df MS Number of obs = 3475 reg hcshare invincome
(21)POLYNOMIAL REGRESSION MODELS
The following regression predicting GDP is an example of a
quadratic function, or more generally, a second-degree polynomial in the variable time:
The slope is nonlinear and equal to:
Exercise: run the above model with ‘gdpprov.dta’
2
1 2 3
t t
RGDP A A time A time u
2 2 3
dRGDP
A A time
(22)SUMMARY OF FUNCTIONAL FORMS
MODEL FORM SLOPE ELASTICITY
( dY
dX ) .
dY X dX Y
Linear Y =B1 + B2 X B 2 2( )
Y X B
Log-linear lnY =B1 + ln X 2( )
Y B
X B 2
Log-lin lnY =B1 + B2 X B Y 2( ) B2(X)
Lin-log Y B1 B2 ln X
1 ( ) B X ) 1 ( Y B
Reciprocal
1 ( )
Y B B X
B2( 12)
X
2( 1 )
XY B
2 ln
(23)COMPARING ON BASIS OF R2
We cannot directly compare two models that have
different dependent variables
We can transform the models as follows and compare RSS:
Step 1: Compute the geometric mean (GM) of the dependent
variable, call it Y*
Step 2: Divide Yi by Y* to obtain:
Step 3: Estimate the equation with lnYi as the dependent variable
using in lieu of Yi as the dependent variable (i.e., use ln as the dependent variable)
Step 4: Estimate the equation with Yi as the dependent variable
using as the dependent variable instead of Yi
i i
Y Y
Y ~
*
i
Y~ Y~i
i
(24)MEASURES OF GOODNESS OF FIT
R2: Measures the proportion of the variation in the regressand
explained by the regressors
Adjusted R2: Denoted as , it takes degrees of freedom into account:
Akaike’s Information Criterion (AIC): Adds harsher penalty for adding
more variables to the model, defined as:
The model with the lowest AIC is usually chosen
Schwarz’s Information Criterion (SIC): Alternative to the AIC criterion,
expressed as:
The penalty factor here is harsher than that of AIC 2
R
_
2 1
1 (1 ) n
R R n k 2
ln AIC k ln(RSS)
n n
ln SIC k ln n ln(RSS)
n n