When the data have missing values for individuals across years, the data set is called unbalanced panel data. Different kinds of panels will have different estimations. Compared with balanced panel data, the method of fixed effects estimation for unbalanced panel data is far more difficult. In the equation, �� is denoted as the number of time periods for each individual �. Based on this, �1 + �2 + ⋯ + �� is considered as the total number of
observations. As mentioned above, in the balanced panel data, the demeaning of time reduces the degrees of freedom by one for each cross sectional individual. The package for this regression method always shows the result of the degrees of freedom at the end of the regression to adjust the degrees of freedom. The degrees of freedom are similarly calculated as for the case of dummy variable regression.
When the observed individual has only a single time period, it has no effect on the fixed effects estimation. Such kind of individual has zero time demeaning and is not included in the estimation. What is considered difficult here is how to determine the panel data is unbalanced. If the independent variables in the unbalanced panel data with missing value have no correlation with the error terms ���, the unbalanced panel does not matter at all.
3.2.3.3 Random Effects Models
Similar to fixed effects method, the random effects method starts with the unobserved effects equation,
16
��� = �0 + �1 ���1 + ⋯ + ������ + �� + ���
(3.7)
16 The constant is added to the equation, the unobserved effect �� is assumed to have zero mean, and the dummies of time is included in the explanatory variables. The use of fixed effects requires the unobserved effects �� to be removed from the equation because of its correlation with the independent variables ����. However, if the unobserved effect is not
correlated with the explanatory variable across time, the elimination of this effect will cause inefficient estimation.
The situation that the unobserved effect has no correlation with the independent variables turns the equation (3.7) into a random effects model.
���(����, ��) = 0, � = 1,2, … �; � = 1,2, … , � (3.8) The assumptions in the random effects model combine the assumptions of fixed effects model and the assumptions that the unobserved variables are not correlated with the independent variables. If the latter assumption is true, the single cross section is enough to estimate the coefficient of independent variables �� , and the panel data should not beemployed. Nevertheless, the method of a single cross section cannot be used because of its lack of essential information over time. If it truly happens, the use Pooled OLS with time dummies is enough to get the consistent coefficients for the random effects model. The equation (3.7) can be rewritten when error terms are decomposed by �� and ���; ��� =
�� +
���
��� = �0 + �1���1 + ⋯ ������ + ��� (3.9) In this function, the error term �� is defined for the time period; therefore, the total error terms will be serially correlated over time. Based on the random effects model, covariance is computed.
2 2 2
���(���, ���) = ��⁄(�� + ��), � ≠ �
The serial correlation of the error term is supposed to be positive. However, the standard errors for the Pooled OLS are not correct because they ignore the test of correlation.
In this situation, Generalized Least Squares (GLS) can be used in place of Pooled OLS to deal with the problem of serial correlation. The estimation will be effective if the data set has N relatively larger than T. The estimation that is introduced above is deserved for balanced panel data. However, this estimation can also be applied for unbalanced panel data.
17 3.2.4.4 Random Effects or Fixed Effects
Fixed effects estimation is considered to be better than random effects estimation because it takes the correlation between the differences between ���� and �� into consideration. On the other hand, in certain situation, random effects estimation is still used.
When the independent variables are unchanged across time, random effects but not fixed effects is used. However, the use of random effects is limited. It is applied only the assumption of the correlation between unobserved effects and independent variables cannot become true. When the control of time is included in the function with other independent variables, random effects is more effective to apply than Pooled OLS.
However, in reality, researchers still employ these two estimation methods and test which method is better between random effects and fixed effects by testing the differences between the coefficients of explanatory variables. Hausman (1978) introduced the test to check for these differences. In reality, if the Hausman test fails, that is, the variation in FE estimation is too big to identify the significant differences, either RE or FE can be used for the estimation. The failure of the test makes researchers wonder if the data can provide them with the correct coefficients. When the Hausman test is rejected, FE estimation is employed because the assumption of RE is wrong.