A Comparative Study of Returns to Education and the Importance of Genetic and Environmental Factors:
Evidence from Different Twins Data
YUNG Chor-Wing Linda
A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of Doctor of Philosophy in Economics @ The Chinese University of Hong Kong May 2000
The Chinese University of Hong Kong holds the copyright of this thesis Any person(s) intending to use a part or whole of the materials in the thesis in a proposed publication must seek copyright release from the Dean of the Graduate School
Trang 2UMI Number: 9984690
Copyright 2000 by
Yung, Chor-Wing Linda
All rights reserved
®
UMI
UMI Microform9984690
Copyright 2000 by Bell & Howell Information and Learning Company All rights reserved This microform edition is protected against
unauthorized copying under Title 17, United States Code
Bell & Howell Information and Learning Company 300 North Zeeb Road
Trang 3Abstract
There have been numerous twin-based studies in the past that control for the unobservable variables in the analyses However, these past studies have often yielded contradictory results with no consensus on some important issues and no definite conclusions The objective of the present study is an attempt to explain some of the differences in the past studies, and to re-examine some issues on returns to education, the importance of genetic and environmental factors, and the significance of measurement errors
This study applies four twin-based models to three available US twin data The empirical results in this study show that (1) while some differences in existing studies are caused by different data used, other differences are due to different models used; (2) the “true” returns to schooling are mostly less than the “overall” returns to schooling, indicating a positive omitted variable bias; (3) the omitted variables are a significant portion, viz approximately 40%, of the “overall” returns to schooling; (4) these omitted variables, when divided into the genetic and environmental factors, generally indicate that environmental factors have a stronger effect than genetic factors; (5) models with two schooling variables are more prone to measurement error, biasing the estimates more significantly; (6) the measurement error problem in the schooling variable biases the estimates and the magnitude of the bias depends on the data used; and (7) results for males and
Trang 4i8 & EA 4E BỊ 3ì ®È lồ la tụ 5Ẹ 3 — tạ 3È $ 3-32 tị lề jã È B £ › 1818 nh Fw đó lệ ARR RRRTHS SRM LÊ 1À ## bì #+?ö† 344 Š 8 š § Íua # Èí 5 3š BIỆR ` 3š BH 4oz# lš BỊ # d9 6 # URS ERE 60/8 ‹
AH RE FAR UF) ME A Rẻ HỆ 1 2k 2 Đí É Eì 2 46 tÈ lh GR + 1£
fy BAT LER o BAR DAT Me RAAT (+) LA SCR Pa ee ZRAAREA TARR: AMAR AEAT RO ORE
Qi(=)' RAs HKAFORMR' @ ) ‡( Ñ P48 3 4Á › E2 —kR
iy ARSE EỊ ‡R 69 BRR M8 ; ( £ ) tà tk: R8 #(494b we Re
ROO RI) CLR RHR TD AR AR RAM: HE
HARHORROBRES ASE: (2) FARME PAR OAE
Sho ett 4 X18 š 9 BilB 3 ; (2x) šL T 1M $k th Š R13 š 93/18
RGR? MAAR BRHEA SH PRERR MARDI (4) Be
Trang 5Acknowledgement
This dissertation could not have been completed without the kind encouragement and advice from many people First and foremost, I would like to thank my advisor, Dr Junsen Zhang, who has introduced me the interesting topic of this study Through many interactions over a span of several years, I have gained valuable advice from Dr Zhang on the approaches to carry out research studies in this discipline His unfailing support, guidance and encouragement during all these years are most appreciated and treasured This dissertation would not have been completed without his constant care and help
I would also like to thank my Committee members, Dr Pak-wai Liu and Dr Terence Chong, for their constructive suggestions and kind encouragement, as well as for spending their valuable time on reviewing this dissertation
I am grateful to my friends, students and colleagues at the Chinese University, who have provided me with support from various dimensions, ranging from mental to technical, throughout these years Their support has made the past few years at CUHK an enjoyable period of time in my life
My deepest gratitude goes to my family None of this would have been
possible without the constant love and support of my parents, Kung-Hing and Yuet-
Trang 6Table of Contents 1 INTRODUCTION 1 2 LITERATURE REVIEW 10 2.1 WHAT ARE TWINSỸ HH HH HH ng HH ng ng nh xe 16 2.2 MODEL/S Q10 HH TT TH ng HT ng 18
2.2.1 The Fixed F;fect MO đeÌ Gv E11 1151111111 TEEStrreeo 19 The Basic Fixed Effect Model c.cccecccccccsscsssscssssscsvssscseeseseststessasavevsssssessecees 19 Fixed Effect Model with Instrumental Variables cccccccccsscccscssssssvessssesescssevess 25 2.2.2 The Selection Effect Model .cccccccccccccccscscssesssssssscssscsssssseseescssssvevecessesees 29
Variation of the Selection Effect MOde] c.cececcccccccsssssssssssssssssssssevscsvecsvessavaceans 32 2.2.3 The Behavioral Genetics Model .cccccccccccccsscscssessssssssssseressacecsesesecsceees 35 Basic Behavioral Genetics Model 0 0 ccccccsccccsscssesecsssscsscssssecsucctecsecceccecceesenes 35 Behavioral Genetics ÀÁodel with Instrumental Variables nhe 39
2.2.4 The Variance-Components Model .ccccccccccccscccsssccsssssssesssesscssesssscssasseses 41 Variance-Components Model with Family-specific Endowment Effect 45
2.3 COMPARISON OF THE SELECTED MODELS .ececcssssssesesesssscesssscsosecesecsecseceee 51 2.4 CHRONOLOGY OF SELECTED STUDIES WITH TWIN DATA wissccccccscscsseccecseceoceeces 59
2.5 SUMMARY OF ESTIMATED RETURNS TO SCHOOLING BY MODELS 65
2.6 SUMMARY OF OTHER FINDINGS ng HH HH ng HH nhe 71 2.7 BRIEF SUMMARY AND ISSUES UNDER DEBATE cccccecsssesscsecescsccseccsesecsecseees 76 2.7 BRIEF SUMMARY AND ISSUES UNDER DEBATE ccscscsescesssesessecesecnsceceseseceeees 76
Zzn., vn nh he 76
2.7.2 Genetic and Environmental FCÍOFS Ặá TT HH nh so 78
VN ( 8,, nnẽn nh ng an nh ad 80
3 DATA SETS 83
3.1 MINNESOTA TWINS DATA ccccsccssssscscssssssessssssecssscssteccessssseosestececaresersseccscees 83 3.2 NAS-NRC TWINS DATA .cccssscessssssscsssesscssesscessssesssarecsssceceesersessaserecsecceceses 86
Trang 74 RESEARCH QUESTIONS AND STRATEGY 94
4.1 RESEARCH QUESTIONS ccssssssssssssssssscsssesseesssnssssssesesessessvessusesseeseceeseceseecess 94
4.2 RESEARCH STRATEGY .sssesssssssssescssssssscssassusarsstisasssesesassssssasessesssssessseeceseececs 96 4.3 THE MATCHING OF DATA AND MODELS 2.2 SH 101
5 EMPIRICAL RESULTS 107
5.1 INITIAL RESULTS BASED ON INDIVIDUAL LEVEL DATA (THE NATURE OF TWIN
DATA HAS NOT BEEN UTILIZED) .ssscsssessssessesssssscsesssssssssessssssssscarsuseesesescececcesees 107
5.2 THE FIXED EFFECT MODEL .ccsssscssssssssscesssessuessessssrsesssssssessssesseceeecescececece 123 3.2.1 Environmental and Genetic Factor s cccccccccsssccscscscsssscssssesessccccscscececese, 129
5.2.2 Instrumental Variable Analysis of the Fixed Eƒfect Modl 135
5.3 THE SELECTION EFFECT MODEL .csssccessssssescssteccesssscssesescsesssescerseecececeeeeee 144 5.4 THE BEHAVIORAL GENETICS MODEL ccscssssessasessessssessseseeerescececseesecececes 150
3.4.1 Instrumental Variable Analysis of the Behavioral Genetics Model 155 5.5 THE VARIANCE-COMPONENTS MODEL scescsscssteccessscsssesssesessssscsseeesececeseece 159
5.6 SIGNIFICANCE OF INDIVIDUAL VARIABLES .cccsssescessssceceseseeseresececccscsesecesees 163 5.7 COMPARISON OF RESULTS BY DATA s.sccccssssesscssescesssssessesesesecesssecsesesececesees 165 5.8 DISCUSSION OF THE CAUSES OF DIFFERENT ESTIMATES nh, 190
5.8.1 Differences due to different MOdeIS .c ccecccescssssssssssssssssssserseceseccsseses 190 5.8.2 Differences due to different data .c.ecccccccccssscsessssssssssssscsssssersesesesssseses 191 5.8.3 Differences between Selected Studies .c cccccccccssssssssscsssssssssossesesesesseses 192 6 SUMMARY AND CONCLUSIONS 196 REFERENCE 202
Trang 8_ eye PFN DMF WON © lla 11b lle 12 13 14 15 16 17 18 19 20 22 23 24 25 26 List of Tables The Fixed Effect Model
The Selection Effect Model
The Fixed Effect Model with Instrumental Variables The Behavioral Genetics Model
The Variance-Components Model Minnesota Male Data Summary Statistics NAS-NRC Data Summary Statistics
1991 Twinsburg Data Summary Statistics Combined Twinsburg Data Summary Statistics Estimable Models with Respective Data Minnesota Data
NAS-NRC Data Twinsburg Data
Schooling and Earnings (Full Data)
Earnings Function with Age and Age Square ( Full Data) Earnings Function with Individual Characteristics ( Full Data) Earnings Function with Individual and Family Variables (Full Data) Schooling and Earnings for Males
Male Earnings Function with Age and Age Square Male Earnings Function with Individual Variables
Male Earnings Function with Individual and Family Variables Schooling and Earnings for Females
Female Earnings Function with Individual Variables
Trang 932 34 35 48 49 50 51
The Basic Fixed Effect Earnings Estimation for Minnesota DZ Females Results of the Estimated Females’ Schooling Coefficient on Earnings Decomposition of the Schooling Coefficient (%)
Fixed Effect Model with Sibling’s Report of Schooling as Instruments (MZ Twins) Fixed Effect Model with Sibling’s Report of Schooling as Instruments (MZ Males) Fixed Effect Model with Sibling’s Report of Schooling as Instruments (MZ Females) Minnesota Fixed Effect Model with Spouse Characteristics as Instrumental Variables NAS-NRC Fixed Effect Model With Spouse Characteristics as Instrumental Variables
GLS Regression for MZ Twins GLS Regression for MZ Males GLS Regression for MZ White Males GLS Regression for MZ Females
Behavioral Genetics Model - Full Minnesota Data
Behavioral Genetics Model by Gender
Behavioral Genetics Model with Instruments
Variance Components Model for MZ Twins Variance Components Model for DZ Twins
Difference between DZ and MZ Twins
Summary of Returns of Schooling by Models and by Data Sets
The Omitted Variable Bias Estimated from Various Twin-based Model by Data
Effects of Genetics and Environment by Models and by Data Sets Summary of Effects of Schooling, Genetics and Environment Summary of Schooling Coefficient of Fixed Effect Model
Trang 101 Introduction
Throughout history, inequality of income has been a principal source for social and political unrest that often leads to a reduction in productivity as a whole The inequality, in earnings in particular, is especially detrimental to societies Policy makers and economists have devoted considerable effort to study the various sources that apparently create this universal inequality of earnings Their aim is to identify these sources and, in turn, to formulate appropriate policies to narrow the gap between earnings of individuals
The studies of Mincer (1974) and Heckman and Polachek (1974) in the early 1970’s have suggested that there is a high correlation between schooling and earnings As a result of these early studies, many countries have adopted the policy of increasing the proportion of national spending on public education Most developed countries, in the past twenty years, have allocated over 10% of the government budget for education' For example, in 1992, the United States spent 14.1% on education Such a huge investment in education by governments is to increase the overall earnings and productivity of their citizens, and eventually to increase the overall output of the country On the other hand, it has come to light that reasons for the inequality of earnings are more complicated than schooling
alone
' UNESCO Statistical Yearbook (various issues) Total education expenditure, as a percent of
Trang 11More recent studies have suggested that, besides schooling, there are other important factors that may affect earnings These factors include personal characteristics (such as sex, race, marital status, religion and inborn abilities), family characteristics (for example, parent’s education, family income, parent’s socioeconomic status, and number of siblings), and other unobservable environmental factors (for instance, personal luck and other exogenous factors)
Thus, with these factors affecting earnings, the implementation of higher education
by the government may not be the most cost-effective way to increase the earnings of its citizens The effects of the other factors have to be taken into account for the formulation and implementation of the government policies
The effects of personal characteristics are described in the studies by Korenman and Neumark (1991), and Behrman and Deolalikar (1995) Methodologies used for this type of studies have included the use of “proxies” *, for example, the popular Intelligence Quotient (IQ) test and the Armed forces Qualifying Test (AFTQ) to investigate the “unobservable” personal traits, e.g the inborn ability of an individual In addition to the IQ and AFTQ tests, other cognitive ability tests have been deployed in the studies These tests have then been applied to investigate other observable family characteristics In general, results of the studies have shown that family characteristics do have an effect on earnings Altonji and Dunn (1996) have shown that both the family income and parents’ education have a positive effect on their children’s earnings to various degrees These observations
Trang 12
probably are the basis for the instinct desire for most parents to provide a “good environment” for their children’, believing that “it” will have a positive effect on their children’s earnings
While some economists investigated the effects of personal characteristics, unobservable genetic endowments and other environmental factors on individuals’ earnings, Herrnstein and Murray (1994) claimed that it was “inborn ability” that predetermined the earnings of an individual This was a controversial claim, as it would invalidate all other efforts to modify (or enhance) other “modifiable” factors that could have increased personal incomes
Thus, from all the above contradicting findings on factors affecting earnings, there is a need to clarify some of the issues concerning the roles played by schooling, personal inborn capabilities, and environmental factors on earnings as a
whole In view of the fact that, nowadays, a lot of governments are investing huge
amounts of public spending on education, there is an urgency to find the “true” rate of return to schooling Moreover, in order to plan future spending to enhance personal incomes, it is vital to determine the proportions that the various factors, viz schooling, inborn ability, environment and other factors, are contributing to increase an individual’s personal income
* This belief dated back even before scientific reasoning was used An example was the story of
Mencius He was one of the famous Chinese philosophers When he was a boy, his mother moved
Trang 13In order to obtain an “unbiased estimate” for the returns to schooling, investigators have used various methods to improve the estimation Various methods have been used For example, the use of proxies for unobservable ability variable*, so that the estimated returns to schooling will not be seriously biased by the excluded ability variable that is correlated with the included variable (in this case, schooling) To control for the unobservable family characteristics, the use of sibling’s data from the same family has also been used, so that the family background variables can be held constant Although these analytical procedures have yielded improved estimations, it is recognized that the best way to contro] these unobserved characteristics is the use of data from identical twins (i.e MZ twins)’, which appear to have advantages over regular data (see more later) There are different models to analyze the twin data to arrive at a “true” return to schooling These models assume that the unobservable genetic endowment and family characteristics have the same effects on both twins In the seventies, Behrman, Taubman and Wales were the first to put forth a series of empirical analysis on twins using the veterans’ registry Their findings showed that the overall returns to
schooling (obtained from individual conventional OLS estimation) could be divided into genetic factors, common family environmental factors and true returns to
schooling Their results also found that these factors were divided evenly within the overall returns to schooling As these findings were the first twin-based estimations
“ See Hause (1972)
* See Chamberlain (1977), Chamberlain and Grilches (1977)
Trang 14of returns to schooling, they were well received in the field It was generally presumed that if one failed to control for the genetic and environmental factors in the schooling estimate, one would obtain a bias that was upward by two-thirds
The above was the general presumption on the returns to schooling until 1994 when Ashenfelter and Krueger collected and analyzed a new set of twin data at the Twinsburg Festival In their 1994 study, Ashenfelter and Krueger reported that
the omitted variables, viz the genetic and environmental factors, did not bias the
return to schooling upward Instead, they biased the return to schooling downward After the publication of this 1994 controversial paper, a series of twin-based earnings studies followed These studies included both discussions and empirical studies Twin-based discussions included those by Card (1995) and Bound and Solon (1999) Empirical studies included those by Miller, Mulvey and Martin (1995, 1996 & 1997) using the Australian twins data, Ashenfelter and Rouse (1998) and Rouse (1999) using more Twinsburg data collected from the following years at the Twinsbury Festivals, Isacsson (1997) using the Swedish Twins Registry, and Berhman, Rosenzweig and Taubman (1994 & 1996) and Behrman and Rosenweig (1999) using the Minnesota Twin Registry
Trang 15There are three major disagreements in these studies First, different studies
indicated that the conventional OLS estimates were biased with differences both in
sign and degree The study by Behrman, Hrubec, Taubman and Wales (1980) showed that conventional estimates were positively biased by the omitted variables, while Ashenfelter and Krueger (1994) showed that conventional estimates were negatively biased instead Studies such as those by Blackburn and Neumark (1993a), Miller Mulvey and Martin (1995), Behrman, Rosenzweig and Taubman (1996), Isacsson (1997) and Ashenfelter and Rouse (1998) supported the notion that conventional estimates were biased upward, while Blackburn and Neumark (1993b) showed little or no support for the omitted genetic ability biasing the return to schooling upward
The second problem was on the effects of genetics and family environment Behrman, Hrubec, Taubman and Wales (1980) showed that these factors were equally important as schooling Ashenfelter and Krueger (1994), however, suggested that family effects (both genetics and environmental effects) were negative instead In a more recent study, Miller, Mulvey and Martin (1996) reported only a small role for genetic ability and family effects
Trang 16error produced a significant downward bias in conventional return to schooling estimates Behrman, Rosenzweig and Taubman (1994) also found measurement error in the schooling variable of the NAS-NRC data, which generated a small downward bias, but they reported that it was not statistically significant and thus “not sufficient to lead to incorrect conclusion”
This dissertation will attempt to resolve the disagreements over these three major issues
The Scope of this Study By means of several twin data sets, this study will investigate the three major issues in the twins based studies The twin data sets will be examined with four different models, viz the fixed effect model, the selection effect model, the behavioral genetics model and the variance-components model In particular, the present study will examine:
i) whether the omitted-variable bias is upward as reported by Behrman,
Taubman and Wales, or downward by Ashenefleter and Krueger It is intended that the personal characteristics are held constant across different
data sets, so as to make comparisons across the results of different data sets
more informative and convincing
ii) whether the genetic and environmental factors have positive and significant effects on earnings If the effect is strong and positive, how significant is it
when compared across different data sets? Are these results data-specific?
Can these results be replicated when another model is used?
Trang 17will be applied so that the effect of measurement error can be examined in depth
iv) whether the effects differ with gender In the literature’, most studies either
examine males and females together, as in Ashenfelter and Krueger (1994), Miller, Mulvey and Martin (1995), and Ashenfelter and Rouse (1998), or
males only, as in Behrman, Hrubec, Taubman and Wales (1980) and
Behrman, Rosenweig and Taubman (1994), or females only, as in Behrman,
Rosenweig and Taubman (1996) It is probably not scientifically prudent to take these results and generalize them to the population at large Thus this study will examine the three major issues in the twin study by gender as well Whenever the data allow, the sign and significance on omitted variable bias, the effects of genetics and environment on earnings, and the measurement error problem in the schooling variable will be examined on the four models by gender
The main task of the present research is not to follow the conventional methodology that usually looks for new data, analyzes the data with a new modified model and then compares the new findings with those in the literature Instead, this present study is designed to fill the gaps in the literature by using the same data set
for different models, and by using the same model with different data sets
Trang 18
Organization of the dissertation The following chapter surveys the earnings models commonly used in twin-based studies The benefits and shortcomings of each model will be explained following the description of the model Results of the selected twin studies in the literature are listed for comparison, and issues under debate (which this study attempts to solve) will be summarized Chapter 3 describes the three data sets that this dissertation will be using for estimations, and their uniqueness The strategy used for the analyses of the three data sets are outlined in Chapter 4, and the empirical results from these analyses are presented in Chapter 5, which also includes the comparisons with the reported findings in the literature Summary and conclusions are presented in
Trang 192 Literature Review
Becker (1964) introduced the idea of Human Capital Theory in 1964 and since then, people started to expect that, as one accumulated more human capital, the more earnings one could get Human capital accumulation was basically treated as a
form of formal education This led to the belief that if one wanted more earnings,
one should acquire more education or schooling This belief could be dated even further back in ancient China where the Chinese believed that the only way to get out of poverty or to gain a higher living standard was to receive more educationŸ
Although schooling does increase earnings, it is not the only factor Many economists have tried to explain the increase in earnings using other variables In 1972, Hause examined the effects of ‘ability’ together with ‘schooling’ on earnings With four different data sets, he made some conclusions on how IQ test scores and other cognitive ability test scores could be used as a proxy for ‘ability’ that could affect one’s earnings Even though these test scores might be good proxies, there was still a problem of whether different ability indices were measuring the same thing and whether they had the same effect on earnings Since the effects of these indices on earnings were not known or proven, studies using these test scores as proxy for ability were questionable
* There was a common Chinese saying that “from books you get house full of gold, from books you
Trang 20In 1972, Griliches and Mason also examined the questions of whether ability itself affected schooling and whether the effect of schooling on income was biased or not These investigators used data obtained from war veterans of Post-World World II They reported the result obtained from the AFQT (Armed Forces Qualification Test), instead of the usual IQ test that most studies used They estimated a semilog “income-generating” function with income defined as gross weekly earnings in dollars Their results showed that most of the effects of heredity (such as father’s schooling, father’s occupation, region before and region now’) were indirect, and the bias of the return to schooling was not significant The increment in
“explained” variance due to these “heredity-associated” variables was about one fifth
of the total “explainable” variance in income The use of structural equation estimation and then 2-stage instrumental variable procedure to analyze the data showed little effect of the omitted variable (i.e unobservable heredity variable) The results of this study appeared to be in contrast with some other studies!® where omitted ability variable was found to be positively related to earnings Consequently, this study was criticized for the analysis on a biased data set (since the subjects were all veterans) and for the use of AFQT for ability (as there was no obvious correlation between AFQT score and the individual’s income'') It should
be pointed out that the use of these cognitive test results to proxy for ability, in fact,
had a basic common flaw in the design of these studies The flaw was whether these
” Region or location was used as success variables, which intervene with income and schooling '° Such as Hansen, Weisbrod and Scanlon (1970) stated in Griliches and Manson (1972)
Trang 21tests, which were tests mainly for problem solving, could reflect the increase in
one’s productivity and earnings Those individuals with high-test scores might not
have high marketable qualities, or might have other qualities, such as drive and
determination, that were not revealed in these tests All these marketable qualities
might lead to higher earnings These proxies, therefore, might not be as good as they appeared
In 1974, Mincer showed that both education and experience determined
earnings Experience can be classified as an informal education, where earnings increase with experience and also increase at a decreasing rate The conventional
form of the earnings function is a semi-log function with log earnings on the left-
hand side and explanatory variables on the right hand side The earnings function takes the form of In Earnings = a + a, Schooling + a, Experience + a, Experience Square, where the coefficients denote the percentage change in earnings when there is a change in the explanatory variable In cases where data are not collected with the experience variable, these studies often use age or age — 6 — years of schooling as a proxy for experience Numerous studies after Mincer’s study estimated this earnings function and verified that average earnings were higher for more educated
persons This earnings function got so widely used that people named it after
Mincer — “The Mincer Earnings Function” These studies covered almost all countries and regions in the world Even though empirical evidence from these studies showed that the fit of the function was significant, economists still tended not to conclude that wage differential was only due to education and experience
Trang 22The Mincer earnings function basically examined earnings as a function of schooling, experience and experience square That is
InY, = B, + BS, + B,Exp, + B,Exp; +e; @)
where In Y is log earnings, S is individual years of schooling, Exp is experience and Exp’ is experience square for individual i The A#, for j = 1-3 are the coefficients of the variables and random error is denoted by 4 After many years of refinement, the Mincer earnings function included other characteristics, both individual and environmental These characteristics all contributed to explain some differences in earnings Individual or personal characteristics were generally composed of genetic and ability components that were specific to each individual On the other hand, environmental characteristics were generally thought of as family background and other external environmental factors, such as the place where one was raised, a characteristic not unique to individual The modified/expanded version of the
function became
In¥,=£, +S, +B,G,+ BF, + +6, : (2)
where G and F are vectors of genetic and family variables Since both G, and F,
were unobservable and therefore only the earnings function without G and F was
Trang 23InŸ,= , + /,Š,+ +£, @)
and the estimated ,, return to schooling, from Equation (3) was likely to be biased (variables of G and F being the omitted variables)
Economists were aware of this problem and had tried to solve it in many ways The best way was of course to find a good set of data that comprised all of the necessary variables However, many of the omitted variables were difficult to obtain and, even if they were available, many of them were not reliable, e.g the IQ test scores or the AFQT scores (that was mentioned earlier) For analysis of family effects, siblings’ data from the same family should be good control data Studies such as those by Chamberlain and Griliches (1977), and Olneck (1977) indeed utilized this property of siblings Nevertheless, the genetic components of siblings, though related, were not exactly the same Besides unobservable genetic component
differences, there were also differences, such as birth ordering’, sex’? and identification problems" when using siblings’ data
Behrman and Taubman (1986) studied the effect of birth ordering on schooling and earnings on twin data where the parents and children were both twins The major assumption used in this study was that the average family environment influenced one’s intelligence For a better-controlled family environment, these
'2 See Behrman and Taubman (1986) '3 See Kaestner, 1997
Trang 24investigators used data of individuals that had at least one parent who had a twin brother/sister Based on the assumption that the average family environment influenced the children’s intelligence, Behrman and Taubman hypothesized that as the number of children increased, the parent’s time spent with each child would decrease and thus would affect schooling and earnings Results of this study showed birth-order had a significant effect on schooling However, there was no significant birth-order effect on earnings, once total family size had been held constant This kind of study was not widely carried out, because of the scarcity of data (both parents and children had to be twins)
Trang 252.1 What are twins?
In a family setting, twins are multiple births born to a couple The study of
twins can provide much information on the estimation of earnings functions'*, as one
twin can serve as the control of the other twin Twins are divided into two types: (a) the identical / monozygotic (MZ) twins, and (b) the fraternal / dizygotic (DZ) twins The MZ twins are the result of the splitting of a fertilized ovum (egg) into two fertilized ova that develop into two embryos, while the DZ twins are the result of two different fertilized eggs developing into two embryos The genetic components of a fertilized egg come from the random matching of his/her parents’ genes; these components will eventually determine the genetic traits of the resultant individual As a result of the random matching, each pair of genes in a fertilized egg consists of one from the father and one from the mother Assuming that genes are additive’, each individual’s matching will have a variance that is half of their parents Given each fertilization is a random matching of their parents’ genes, it is difficult to know exactly the genetic component of a fertilized egg However, the MZ twins, the result of splitting a fertilized ovum, will have the same genetic makeup as each other The DZ twins, the result of two fertilized ova, will not Thus, we observe that the MZ twins are of the same sex'’, while the DZ twins can be of opposite sex Genetically
'’ Twins are good control experiments for many estimation and prediction especially in fields such as
medicine, sociology and psychology Our main concern is the earnings function
'S In Taubman (1976), this assumption results in offspring’s composed of half the genes of their
parents
Trang 26the DZ twins have the same general genetic components comparable to other “ordinary” siblings of the same family
The DZ twins, though not perfectly identical genetically, share the same womb at the same time during their gestation period This physiological fact, in essence, distinguishes twins from their other siblings In most cases, twins are born almost at the same time sequentially within very short time intervals, although some are born as long as a few days apart Thus, if they are brought up together (which is the norm in most cases), they will be exposed to the same environmental influence Being born at the same time, MZ and DZ twins are better control groups than any other siblings, because they will be exposed to the same environmental factors for
the same period of time These characteristics of twins can serve as better controls
Trang 272.2 Models
Four kinds of models'* have been described in the literature that have utilized the twin data to estimate schooling coefficient They are the fixed effect model, the selection effect model, the behavioral genetics model and the variance-components model The following sections review these models one by one, examining (a) how each model estimates the returns to schooling, (b) how the effect of ability and the environmental factors on earnings can be identified in these models, and (c) how measurement error in schooling affects the model estimation
'® David Card (1995) divided the “twin-based” models in the literature as two different kinds, namely
Trang 282.2.1 The Fixed Effect Model
The earnings function depends on many variables In the twins approach, these variables are mainly divided into (a) the individual and (b) the common variables The advantage of the twin data is that, besides the individual variables, other variables such as family background and environmental factors, assuming they
are raised together, are considered to be common for both twins The problem of not
being able to gather the common variables can be resolved by the within group estimates, where the earnings function of individuals from the same family will be differenced, and all the variables having the same effect on each individual in the
family will be differenced away When this method is applied for analysis, the
problem of gathering these important, but unobservable common variables will
vanish
The Basic Fixed Effect Model
The basic earnings function employs a semilog function that depends on schooling, inborn ability and family background,
InY, = 8, + B,S,+ B,G,+ BF, +6; (4)
Trang 29denotes the percentage change in earnings due to a change in the variables Genetic components are exactly the same for MZ twins, while those of DZ twins are assumed to be 4' Family background variables include parents’ income, education and belief (religion), race, neighborhood, school, and friends All these variables can affect an individual’s earnings either directly or indirectly Although some of these background variables are relatively easy to obtain, such as parents’ income, education, etc., they may not have the same effect on each child, if the children are not raised at the same time For example, the brother, who is 10 years older than the younger brother, would not have enjoyed such high family income as the
younger brother when he was the younger brother’s age, and vice versa The
difficulty in gathering data that controls for genetic components and family background can be solved by the fixed effect model using twin’s data
For twin data, the earnings function for each twin of the same family can be written as:
in Tỳ =8, + BS, + B,G, +B;F, tẩy (5)
where i now refers to the family and j =1,2 denotes twin one and two in each family i,
For MZ twins, taking a within family difference of the twin’s earnings function will lead to
Trang 30
Inf —InY2' = th (Sự —S) + BM (GM -~GM)+ BM (FM - FM) +(e" -e) (6)
where £,", Bi" and £3" are new coefficients for the differenced twin earnings function When the MZ twins are raised in the same family, they have the same external environmental characteristics (family background) and because they are genetically identical, they will also have the same individual characteristics (genetic components or inborn abilities) Therefore, taking the difference of 2 individual twins’ earnings function from the same family will cancel out the unobservable variables (G and F) and thus allow one to use the following formula:
Ini! ~In¥y = BM (Si Sip) + (E11 ~€73 ) (7)
or simply,
Aln v = ñ 4S, +h; (8)
Trang 31For the DZ twins, differencing their earnings function is similar to those of
MZ twins:
Infi) =InY? = BP (Si -S))+ BP (GP -G2 )+ BCRP -F2)+(eP -e2) ~~ 9)
When the DZ twins are assumed to be raised in the same family, the DZ twins also have the same environmental characteristics (family background), but are not
genetically the same because they are from two different fertilized eggs Thus for
the DZ twins, only the family variables can be eliminated through within family differencing, but not the genetic/ability variables Given that genetic/ability
variables are unobservable and proxies such as test scores are unreliable, one will
have difficulty in using the true within family differenced earnings function in the
estimation of schooling function Therefore, in most cases only
InY,? -In¥,? = BP (S? -S)) +(e? - 2?) (10) that is
AlnY,” = BAS nt, =P; ith; (11)
are being estimated The true within family differenced earnings function for DZ
Trang 32AlnY,? = B? AS, + BP AG, + uy (12)
The estimated B? [from Equation (11)] will definitely be biased with the omitted ability (AG) variable The estimated BP from Equation (11) will become
ap cov( AS, AG)
Bi = Bry var AS (13)
= 8 +7 B acas
where # is the true return to schooling, y is the effect of ability on earnings and bacas is the regression coefficient of AS in the auxiliary regression of AG on AS The estimated bias will be upward, if and only if (a) ability (AG) has a positive effect on earnings i.e y> 0, (b) the relationship between the excluded ability and included schooling variable is positive i.e (bygas > 0) and (c) ability is the only variable omitted”
Assuming the original earnings function is defined correctly, one can say that the individual’s 4, [from Equation (4)] will be biased with omitted ability (G) and family (F) variables Using the information from these estimates and comparing them, one can estimate an approximation of the effect of genetic components and
Trang 33
family background variables on earnings The properties of these estimates are summed up in the following summary: 8 Biased Omitted genetic & environmental variables Equation (4) Bt Unbiased - Equation (8) BP Biased Omitted genetic variables Equation (11)
With these estimates, the effect of ability on earnings can be estimated by Bi" - B,”, while the effect of environmental factors on earnings can be estimated by
Be - Ø6, Besides getting an “unbiased” estimate for the returns to schooling, this
method which utilizes the properties of different twins, can also separate the effects of genetics and environment on earnings
Note that if interaction terms such as schooling & ability, and schooling & family characteristics are included in the earnings function, the individual earnings
function becomes:
In ¥, =a, +a,8, +a,8,G, +a;G, +@,8,F, +a;F, +Vy (14)
When the above individual earnings function is differenced, the fixed effect model
Trang 34AlnY, =(a,+a,G,+a,F,)AS, + Av, (15)
where the coefficient for the differenced schooling becomes a function a,+a,G,+a,F, instead” If genetic components and family characteristics are not used in the estimation, the returns to schooling will still be biased
Fixed Effect Model with Instrumental Variables
All data sets have a similar problem of measurement error within the data;
even if the errors are small, their effect may be magnified when the estimated model
includes more variables that are subject to this problem” This problem exists in most data collected from questionnaires (even though some variables have already been crosschecked and verified by other legal documents, such as those in NAS- NRC twin data) People answer questions in the way they interpret them; therefore, measurement error is unavoidable Measurement error within the data can be easily detected in twin data sets, as they should have some variables, especially variables that are supposed not to vary within twins, such as family facts As long as error
exists in one of the variables, chances that other variables have errors are also high
Griliches (1977) was one of the first investigators reporting the problem of measurement error in the schooling variable Measurement error in the schooling
Trang 35
variable of the conventional earnings function can cause substantial bias in the estimated return to schooling For examples, the schooling variable that was used in the earnings function is S, where S, = S/ +v,"with S/and v:" denotes the true
schooling level and the error The estimated return to schooling will become
plim p= pf 1) var(v)+var(s) (16)
Trang 36where pg is the correlation across twins’ reported schooling Thus, leading to an even smaller estimate” [See for example, Griliches (1977), Ashenfelter and Krueger (1994) and Behrman and Rosenzweig (1999)]
One of the models Ashenfelter and Krueger (1994) used to estimate returns to schooling was to use instrumental variables This was basically a fixed effect model with a measurement error problem The measurement error was corrected by the use of the reported schooling of each individual by his/her twin They believed
that the self reported schooling was S;" =S,+v;" where m =1 and 2, S; and v”
denotes the true schooling level and the measurement error respectively With the use of the mean of self reported difference plus the mean of other twin reported difference as the independent variable, i.e
2.5L Tổ) (19)
where subscript was the twin’s schooling and superscript was reported by that twin The estimated return as shown by Ashenfelter and Krueger (1994)
plm [ _ var(v) + 2 var(s, =9) (20)
[var(v) + var(s)]d — ø,) 2
* As long as ø, > 0, the estimated / will be less than the “true” /, but the magnitude is not know The magnitude can be so large that the estimated @ and “true” # can have different sign (i.e true return is positive but the estimated one has a negative relationship)
Trang 37would give a smaller asymptotic bias than the basic fixed effect model
Miller, Mulvey and Martin (1997) also applied the fixed effect model with
similar instruments on the Australians, but obtained very different results The
Trang 382.2.2 The Selection Effect Model
Ashenfelter and Krueger (1994) proposed a selection effect model similar to the fixed effect model, and applied this new model for the analysis of MZ twin data They set up their earnings function, which depended on (a) a set of observable common variables (X;), (b) a set of variables that varied across the twins (Z,,, Z;.), (c) unobservable individual components (¢€,,,¢,.) and (d) family components that were the same for both twins (1,) For twin one and twin two, the earnings functions were defined as:
InY¥,, =aX, + PZ, + Mi, + €;, (21)
In¥,, = AX, + Bj + M; + €2 (22)
where i denotes family and subscripts 1 and 2 denote twin one and twin two Both the common and individual variables that determine earnings were divided into observable and unobservable
The relation was just a general earnings equation and looked similar to the structural equation of the basic fixed effect model, but there was the addition of an extra correlation between the family effect and the observable It was formulated as:
Trang 39where yand 6 denote the coefficients of individual observable characteristics and w the error term This relation denoted that unobservable common family components depended on observable individual characteristics of both twins and the observable common variables
The coefficient y measured the “selection effect”, which relates the family unobservable and the observable”, and # measured the so-called structural effect of the observable on earnings It was assumed that the family had the same effect for each twin This is so because the data that they used were all MZ twins The selection effect was defined in Ashenfelter and Krueger (1994) as follows: a positive selection effect was when families with high family wealth were more likely to educate their children and a negative selection effect was when families with high family wealth were less likely to educate their children Thus, for this reason, the selection effect is also named the family effects’’, i.e including both the genetic effects and common environmental effects Substituting the correlation equation
into the individual’s earnings function produced the reduced form equations:
InY, =[@+6]X,+[B+y]Zy +2) +, (24)
InỲụ = [œŒ+ỗ]X, +2 +[B+y]Z,; +£'; (25)
Trang 40
where é',, = @, + £„ and £'; = @, + £¿
The “true” schooling coefficient (Ø) could be estimated by subtracting the
coefficient of the other twin’s schooling (7) from his own schooling coefficient (6 + 7”), when # and y for twin one and twin two were assumed to be the same (i.e the returns to schooling and family effects were the same for both twins) Given these properties of the model, Equations (24) and (25)** can be estimated together Also from Equations (24) and (25), it can be obvious that the y coefficient or the selection
effect is exactly the omitted variable bias
Since twins are treated more equally than others are, their years of schooling have a higher probability to be the same or similar than any other two siblings This gives rise to correlated errors That is, in the case of MZ twins there is a higher probability that the pair of twins’ education level (years of schooling in this study) are much closer than any other two siblings or DZ twins This means that the schooling variables are highly correlated and will give rise to correlated errors in the schooling variables Given that these errors exist in the data, the use of the GLS estimation method rather than the OLS can reduce the effects of these errors
Similarly, the “within family differenced reduced form equation” would have removed both the observable and unobservable family effects The differenced equation would become