1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Environmental Justice AnalysisTheories, Methods, and Practice - Chapter 7 pps

18 326 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 18
Dung lượng 116,89 KB

Nội dung

7 Analyzing Data with Statistical Methods This chapter is not Statistics 101, but rather it is intended to review potential use, actual use, and misuse of statistics in environmental justice analysis. Different statistical methods are applicable to different areas of environmental justice analysis. It is a matter of choosing the right method. Both descriptive and inferential statistics have been applied to environmental justice analysis. It has been shown that the type of statistics affects the results. 7.1 DESCRIPTIVE STATISTICS Descriptive statistics are procedures for organizing, summarizing, and describing observations or data from measurements. Different types of measurements lead to different types of data. Four types of measurements or scales in decreasing order of levels are ratio, interval, ordinal, and nominal. A ratio scale is a scale of measure that has magnitude, equal intervals, and an absolute zero point. Some socioeco- nomic characteristics have a ratio scale such as income. Environmental data such as emission and ambient concentration are ratio measures. An interval scale has the attributes of magnitude and equal intervals but not an absolute zero point, for example, temperature in degrees Fahrenheit. An ordinal scale has only magnitude but does not have equal intervals or absolute zero point, for example, risk-ranking data are ordinal. A nominal scale is simply the classification of data into discrete groups that have no magnitude relationship to one another. For example, race can be classified into African American, Asian American/Pacific Islanders, Native American, Whites, and Other Races. A variable initially measured at a higher level such as a ratio scale may be reduced to a lower level of measure such as ordinal. However, the reverse cannot be done. For example, household income, initially a ratio measure can be classified as low, middle, and high income. This conversion helps us grasp large data concisely but results in loss of information initially contained in the higher level of measure. Descriptive statistics include minimum, maximum, range, sum, mean, median, mode, quartiles, variance, standard deviation, coefficient of variation, skewness, and kurtosis. All these statistics are applicable to ratio measures, while some of them can be used to describe other measures. As can be found at the beginning of Statistics 101, a simple way to describe a univariate data set is to present summary measures of central tendency for the data set. Three representations of central tendency are © 2001 by CRC Press LLC • Arithmetic mean, which is the sum of all observations divided by the number of observations • Median, which is the middle point in an ordered set of observations; in other words, it is the number above and below which there is an equal number of observations • Mode, which is the observation point that is most likely to happen in the data set For a nominal variable such as race/ethnicity, mode can be used but mean and median cannot. An ordinal variable can use the median or mode as a central tendency measure. Central tendency statistics are affected by the underlying probability distribu- tions. Typical distribution shapes include bell-shaped, triangular, uniform, J-shaped, reverse J-shaped, left-skewed, right-skewed, bimodal, and multimodal (Weiss 1999). The bell-shaped, triangular, and uniform distributions are symmetric. An asymmetric unimodal distribution is either left-skewed or right-skewed. A left- skewed distribution has a longer left tail than right tail, so that most observations have high values and a few have low values. A right-skewed distribution has a longer right tail than left tail, so that most observations have low values and a few have high values. For a symmetrical unimodal distribution such as the bell-shaped normal distribution, the mean, median, and mode are identical. For a symmetrical distribution that has more than one mode such as a bimodal distribution, the mean and median values are the same but the mode may be different from them. For a skewed distribution, the mean, median, and mode may be different. Household size distribution in the U.S. is right-skewed. The proportion of minority or any specific race/ethnicity variables does not follow normal distribution, but rather is consider- ably skewed. They are often bimodal, reflecting residential segregation where most tracts are either all whites or all minorities (Bowen et al. 1995). Integrated tracts are still small in number in most places. The mean is very sensitive to the extreme values, and thus the median is often preferred for data that have extreme values. However, the median measure also has its own limitations. We can perform all sorts of mathematical operations on a variable’s mean but not necessarily on the median. Mathematical operations such addition, subtraction, multiplication, or division on medians may not generate another median. In the environmental justice literature, some researchers treat the median measure like the mean and commit what I call the Median Fallacy. This often happens in estimating the median of a variable at the higher geographic level from the existing median data at the lower geographic level. For example, median household income is often used in equity analysis. Census reports data on income from the census block-group level and does not report them at the census block level. In some cases, analysts need to aggregate a census geography unit such as a block group or census tract into a larger unit. For example, to correct the border effect, researchers aggregate census tracts or block groups surrounding the census tract or block group where a target facility is located. In this aggregation, researchers often use population or households in each census tract or block group as weights. In buffer analysis, the analyst has to aggregate © 2001 by CRC Press LLC the census units in the buffer and estimate buffer characteristics based on the characteristics of the census units (see Chapter 8). In these cases, most analysts use a proportionate weighting method for aggregation. The assumptions underlying this method will be discussed in Chapter 8. While this method could serve to approximate a simple variable such as population and average household income, it may generate wrong results for the median value of a variable such as median household income. The following example illustrates the median fallacy. Suppose we have two census block groups. Each block group has five households and their household incomes are listed in Table 7.1. The median household incomes for BG1 and BG2 are, respectively, $20,000 and $60,000. The weighted median household income for BG1 and BG2 together is $40,000, with the number of households in each block group as the weights. The true median household income value of $60,000 is $20,000 higher than the estimated value based on a proportionate weight. In rare cases, such as a uniform distribution for each block group like those in Table 7.2, the estimated median value via the proportionate weighting method is the same as the true value. Besides the median value, household income data are also reported as the number of households in each household income class in the U.S. census. An approximation TABLE 7.1 True Median Household Income is not Equal to the Weighted Estimate Block Group BG1 ($) BG2 ($) Household income 20,000; 20,000; 20,000; 30,000; 60,000 60,000; 60,000; 60,000; 70,000; 70,000 Median household income at the BG Level 20,000 60,000 Weighted estimate of median household income for BG1+BG2 40,000 True median household income for BG1+BG2 60,000 TABLE 7.2 True Median Household Income is Equal to the Weighted Estimate Block Group BG1 ($) BG2 ($) Household income 20,000; 20,000; 20,000; 20,000; 20,000 60,000; 60,000; 60,000; 60,000; 60,000 Median household income at the BG level 20,000 60,000 Weighted estimate of median household income for BG1+BG2 40,000 True median household income for BG1+BG2 40,000 © 2001 by CRC Press LLC of median household income for the aggregated study areas can be calculated from the frequency distribution of household income through interpolation as follows: Md ≅ L + I ( n 1 / n 2 ) where Md = median; L = the lower limit of the median class; n 1 = the number of observations that must be covered in the median class to reach the median; n 2 = the frequency of the median class; I = the width of the median class. This approximation is based on the assumption that the observations in the median class are spread evenly throughout that class. This assumption is adequate for a study with a large number of observations. Like other inquiries, environmental justice analysis deals with two types of data: population and sample. Population data consist of all observations for the entire set of subjects under study (which is called a population in a statistical sense). A sample is a subset of population, which is often collected to make generalization to the population. Descriptive statistics are used in the context of a sample or a population. The mean of a variable x from a sample is called a sample mean, . A sample standard deviation measures the degree of variation around the sample mean and indicates how far, on average, individual observations are from the sample mean. A sample is often used to generalize to the population. Because of sampling errors, different samples from the same population may have different values for the same statistic, which show a certain distribution. The Central Limit Theorem states that for a relatively large sample size, the sample mean of a variable is approximately normally distributed, regardless of the distribution of the variable under investigation (Weiss 1999). As discussed in Chapter 5, some census variables are based on the complete count and thus are population data, while others are based on a sample. Data based on the entire population are free from sampling errors but have non-sampling errors such as undercount. Sample data have both sampling and non-sampling errors. The sampling error is the deviation of a sample estimate from the average of all possible samples (Bureau of the Census 1992a). Non-sampling errors are introduced in the operations used to collect and process census data. The major non-sampling sources include undercount, respondent and enumerator error, processing error, and nonre- sponse. Non-sampling errors are either random or non-random. Random errors will increase the variability of the data, while the non-random portion of non-sampling errors such as consistent underreporting of household income will bias the data in one direction. During the collection and processing operations, the Bureau of the Census attempted to control the non-sampling errors such as undercoverage. Differ- ential net undercounting is discussed in Chapter 5. Sampling errors and the random portion of non-sampling errors can be captured in the standard error. Published census reports provide procedures and formulas for estimating standard errors and confidence intervals in their appendix (Bureau of the Census 1992a). The error estimation is based on the basic unadjusted standard error for a particular variable, the adjusting design factor for that variable, and the number of persons or housing units in the tabulation area and the percent of these in the sample (percent-in-sample). The unadjusted standard error would occur under a x © 2001 by CRC Press LLC simple random sample design and estimation technique and does not vary by tabu- lation areas. “The design factors reflect the effects of the actual sample design and complex ratio estimation procedure used” and vary by census variables and the percent-in-sample, which vary by tabulation areas (Bureau of the Census 1992a:C- 2). In other words, the standard error estimation is the universal unadjusted standard error adjusted by an area- and variable-specific sampling factor that accounts for variability in actual sampling. The percent-in-sample data for the person and housing unit are provided with census data, and design factors are available in an appendix of printed census reports. The unadjusted standard errors are available from census appendix tables or can be estimated from the following formulas. For an estimated total, the unadjusted standard error is where N is the size of area (total persons for a person characteristic or total housing units for a housing characteristic) and Y is the estimate of characteristic total. For an estimated percentage, the unadjusted standard error is where B is the base of the estimated percentage and p is the estimated percentage. These procedures are designed for a sample estimate of an individual variable in the census and are not applicable to sums of and differences between two sample estimates. For the sum of or difference between a sample estimate and a 100% count value, the standard error is the same as that for the sample estimate. For the sum of or difference between two sample estimates, the standard error is approximately the square root of the sum of the two individual standard errors squared. This method leads to approximate standard errors when the two samples are independent and will result in bias for the two items that is highly correlated. Census reports also provide estimation procedures for the ratio of two variables where the numerator is not a subset of the denominator and for the median of a variable (Bureau of the Census 1992a). For Census 2000, the complete-count variables include sex, age, relationship, Hispanic origin, race, and tenure. Other census variables such as household income are estimates based on a sample and thus are subject to sampling and non-sampling errors. 7.2 INFERENTIAL STATISTICS Inferential statistics are the methods used to make inferences to a population based on the observations made on a sample. Univariate inferences estimate the single- variable characteristics of the population, based on the variable characteristics of a sample. Bivariate and multivariate inferences evaluate the statistical significance of the relationship between two or more variables in the population. As noted above, the purpose of using the sampling technique to collect data and derive estimates is to make assertions about the population from which the samples SE Y() 5Y 1 YN⁄–()= SE p() 5 p 100 p–()B⁄= © 2001 by CRC Press LLC are taken. To make inferences about the population, we need to know how well a sample estimate represents the true population value. Standard deviation of the sample mean accounts for sampling errors, and the confidence interval tells us the accuracy of the sample mean estimate. The confidence interval is a range of the dispersion around the sample mean at a confidence level, indicating how confident we are that the true population mean lies in this interval. The length of the confidence interval indicates the accuracy of the estimate; the longer the interval, the poorer the estimate. Like other inquiries, environmental justice analysis starts with a hypothesis. The null hypothesis is usually a uniform risk distribution. If risk is uniformly distributed, we should find no difference in the (potential) exposure or risk among different population groups. In a proximity-based study, the percentage of a par- ticular subpopulation such as blacks in the vicinity of a noxious facility would be the same as that far away. The alternative hypothesis is typically a specific risk distribution pattern that we wish to infer if we reject the null hypothesis. Usually, it is the alleged claim that minority and the poor bear a disproportionate burden of real or potential exposure to environmental risks. Specifically for a proximity-based study, we consider the alternative hypothesis that the percentage of a particular subpopulation such as blacks near a noxious facility is greater than that far away. Here, we are dealing with two populations, one that is close to a facility and the other that is far away. Hypothesis test for one population mean concerns whether a population mean is different from a specified value. We use the sample mean to make an inference about the population mean. Different hypothesis-testing procedures have different assumptions about the distribution of a variable in the population. We should choose the procedure that is designed for the distribution type under investigation. The z- test assumes that the variable has a normal distribution and the population standard deviation is known, while the t-test assumes a normal distribution and an unknown population standard deviation. The Wilcoxon signed-rank test does not require the normality assumption and only assumes a symmetric distribution (Hollander and Wolfe 1973). Given that many variables do not follow a normal distribution, the Wilcoxon signed-rank test has a clear advantage over the z-test and t-test. Outliers and extreme values will not affect the Wilcoxon signed-rank test, unlike the z-test and t-test. However, when normality is met, the t-test is more powerful than the Wilcoxon signed-rank test and should be used. As noted above, most environmental justice analyses concern inferences about two population means or proportions. Here, we use two samples to make inferences about their respective populations. To compare two population means, we can collect two independent random samples, each from its corresponding population. Alterna- tively, we can use a paired sample, which includes matched pairs in two populations. For either sampling method, subjects are randomly and independently sampled; that is, each subject or pair is equally likely to be selected. The t-test again assumes a normal distribution, while the Wilcoxon rank-sum test (also known as the Mann- Whitney test) assumes only the same shape for two distributions, which do not have to be symmetric. Again, when normality is met, the t-test is more powerful than the Wilcoxon rank-sum test and should be used when you are reasonably sure that the © 2001 by CRC Press LLC two distributions are normal. For a paired sample, the paired t-test assumes a normal distribution for the paired difference variable, while the paired Wilcoxon signed- rank test assumes that the paired difference has a symmetric but not necessarily normal shape (Weiss 1999). The Wilcoxon rank-sum test for two groups may be generalized to several independent groups. The most commonly used Kruskal-Wallis statistic is independent of any statistical distribution assumptions. A population proportion, p , is the percentage of a population that has a specified attribute, and its corresponding sampling proportion is (Weiss 1999). This type of measure is often used in environmental justice studies for the percentage of population who are minority, or any race/ethnicity category, or in poverty. The sampling distribution of the proportion, , is approximately normally distributed for a large sample size, n . If p is near 0.5, the normal approximation is quite accurate even for a moderate sample size. When np and n (1 – p ) are both no less than 5 as a rule of thumb (10 is also commonly used), the normal approximation can be used. For example, for a population proportion of 1%, the sample size needs to be no less than 500 for a normal approximation. For a large sample, the one-sample z-test is used to test whether a population proportion is equal to a specified value. For testing the difference between two proportions for large and independent samples, we can use the two-sample z-test. The assumptions require that the samples are independent and the number of members with the specified attribute x and its opposite set n – x in each sample is no less than 5. In general, parametric tests such as the t-test require more stringent assumptions such as normality, whereas nonparametric tests such as the Wilcoxon rank-sum test do not. In addition, nonparametric tests can be conducted on ordinal or higher scale data, while parametric tests require at least the interval scale. On the other hand, parametric tests are more efficient than nonparametric tests. Therefore, to choose appropriate methods, we need to examine the distribution of sample data. Normal probability plots and histograms will provide us with a visual display of a distribu- tion, and boxplots are also useful. It is a normal distribution if the normal probability plot is roughly linear. In addition to these plots, statistical packages such as SAS provide formal diagnostics for normality. For example, if the sample size is no more than 2,000, the Shapiro-Wilk statistic, W, is computed to test normality in SAS, and the Kolomogorov D statistic is used otherwise (SAS Institute 1990). Hypothesis testing entails errors. The null hypothesis is rejected when the test statistic falls into the rejection region. Type I error occurs when we reject the null hypothesis when it is true. Type II error occurs when we fail to reject the null hypothesis when it is false; that is, the test statistic falls in the nonrejection region when, in fact, the null hypothesis is false. The probability of making a Type I error is the significant level of a hypothesis test, α . This is the chance for the test statistic to fall in the rejection region when in fact the null hypothesis is true. The probability of making a Type II error, β , depends on the true value of µ , the sample size, and the significance level. For a fixed sample size, the smaller the significance level α we specify, the larger the probability of making Type II error, β . The power of a hypothesis test is the probability of not making a Type II error; that is, 1 – β . The P -value of a hypothesis test is the smallest significant level at which the null hypothesis can be rejected, and it can be used to assess the strength of the evidence p p © 2001 by CRC Press LLC against the null hypothesis. The smaller the P -value, the stronger the evidence. Generally, if the P -value ≤ 0.01, the evidence is very strong. For 0.01 < P ≤ 0.05, it is strong. For 0.05 < P ≤ 0.10, it is weak or none (Weiss 1999). 7.3 CORRELATION AND REGRESSION Correlation and regression are often used to detect an association between environ- mental risk measures and population distribution by race/ethnicity and income mea- sures in environmental justice analysis. The most commonly used linear correlation coefficient, r , also known as the Pearson product moment correlation coefficient, measures the degree of the linear association between two variables. The correlation coefficient, r , has several nice properties, with easy and interesting interpretations. It is always between –1 and 1. Positive values imply positive correlation between the two variables, while negative values imply negative correlation. The larger the absolute value, the higher the degree of correlation. Its square is the coefficient of determination, or r 2 . In the multiple linear regression, r 2 represents the proportion of variation in the dependent variable that can be explained by independent variables in the regression equation. It is an indicator of the predictive power of the regression equation. The higher the r 2 , the more powerful the model. Of course, we also use sample data to estimate r , and want to know whether this estimate really represents the true population correlation coefficient or is simply attributed to sampling errors. Usually, the t-test is used to determine the statistical significance of the correlation coefficients. However, like regression, the t-test requires very strong assumptions such as linearity, equal standard deviation, normal populations, and independent observations. When the distribution is mark- edly skewed, it is more appropriate to use Spearman’s rank-order correlation (Hollander and Wolfe 1973). This is a distribution-free test statistic, which requires only random samples and is applicable to ordinal and higher scales of measure- ments. Interval and ratio data have to be converted to ordinal data to use this test. Another nonparametric statistic test is Kendall’s rank correlation coefficient, which is based on the concordance and discordance of two variables. Values of paired observations either vary together (in concord) or differently (in discord). That is, the pairs are concordant if ( X i – X j )( Y i – Y j ) > 0, and discordant if ( X i – X j )( Y i – Y j ) < 0. The sum K is then the difference between the number of concordant pairs and the number of discordant pairs. The Kendall’s rank correlation coefficient represents the average agreement between the X and Y values. The classical linear regression model (CLR) represents the dependent (response) variable as a linear function of independent (predictor or explanatory) variables and a disturbance (error) term. It has five basic assumptions (Kennedy 1992): linearity, zero expected value of disturbance, homogeneity and independence of disturbance, nonstochasticity of independent variables, and adequate number of observations relative to the number of variables and no multicollinearity (Table 7.3). If these five assumptions are met, the ordinary least square (OLS) estimator of the CLR model is the best linear unbiased estimator (BLUE), which is often referred to as the Gauss-Markov Theorem. If we assume additionally that the disturbance term is normally distributed, then the OLS is the best unbiased estimator © 2001 by CRC Press LLC among all unbiased estimators. Typically, normality is assumed for making infer- ences about the statistical significance of coefficient estimates on the basis of the t-test. However, the normality assumption can be relaxed, for example, to asymp- totic normality, which can be reached in large samples. If there is serious doubt of the normality assumption for a data set, we can use three asymptotically equivalent tests: the likelihood ratio (LR) test, the Wald (W) test, and the Lagrange multiplier (LM) test. Inferential procedures are also quite robust to moderate violations of linearity and homogeneity. However, serious violations of the five basic assumptions may render the estimated model unreliable and useless. Violations of the five assumptions can easily happen in the real world and take different forms (Tables 7.3 and 7.4). As shown in Chapter 4, the relationship between proximity to a noxious facility and the facility’s impacts can be nonlinear, and the nearby area takes the greatest brunt of risks. Spatial association among observ ations, which will be discussed in detail later, can jeopardize the indepen- dence assumption. Indeed, in environmental justice literature, we see frequent violations of some assumptions. One common violation is misspecification, which includes omitting relevant variables, including irrelevant variables, specifying a linear relationship when it is nonlinear, and assuming a constant parameter when, in fact, it changes during the study period (Kennedy 1992). As a result of omitting relevant independent variables, the OLS estimates for coefficients for the included variables are biased and any inference about the coefficients is inaccurate, unless the omitted variables are unre- lated to the included variables. In particular, the estimated variance of the error term TABLE 7.3 Assumptions Underlying the Classical Linear Regression Model Assumptions Definitions Examples of Violations Linearity The dependent variable is a linear function of independent variables Wrong independent variables, nonlinearity, changing parameters Zero expected disturbance The expected value of the disturbance term is zero Homogeneity and independence The disturbance terms have the same variance and are independent of each other Heteroskedasticity, autocorrelated errors Nonstochasticity The observations on independent variables can be considered fixed in repeated samples Errors in variables, autoregressions, simultaneous equation estimation No multicollinearity There is no exact linear relationship between independent variables and the number of observations should be larger than the number of independent variables Multicollinearity Source: Kennedy, P., A Guide to Econometrics , 3rd ed., MIT Press, 1992. © 2001 by CRC Press LLC TABLE 7.4 Consequences, Diagnostics, and Remedies of Violating CLR Assumptions Violating Assumptions Consequences Diagnostics Remedies Omission of a relevant independent variable Biased estimates for parameters, 1 inaccurate inference RESET, F and t tests, Hausman test Theories, testing down from a general to a more specific model Inclusion of an irrelevant variable The OLS estimator is not as efficient F and t tests Theories, testing down from a general to a more specific model Nonlinearity Biased estimates of parameters; inaccurate inference RESET, Recursive residuals; general functional forms such as the Box-Cox transformation; non- nested tests such as the no-nested F test, structural change tests such as the Chow test Transformations such as the Box-Cox transformation Inconstant parameters Biased parameter estimates The Chow test Separate models, maximum likelihood estimation Heteroskedasticity Biased estimates of variance; unreliable inference about parameters Visual inspection of residuals, Goldfeld- Quandt test, Breusch- Pagan test, White test Generalized least square estimator, data transformation Autocorrelated errors Biased estimates of variance; unreliable inference about parameters Visual inspection of residuals, Durbin- Watson test, Moran’s I; Geary’s c Generalized least square estimator Measurement errors in independent variables Biased even asymptotically Hausman test Weighted regression, instrumental variables Simultaneous equations Biased even asymptotically Hausman test Two-stage least squares, three-stage least squares, maximum likelihood, instrumental variables Multicollinearity Increased variance and unreliable inference about parameters, specification errors Correlation coefficient matrix, variance inflation factors, condition index Construct a principal component composite index of collinear variables, simultaneous equations 1 This is true unless the omitted variables are unrelated with the included variables. Source: Kennedy, P., A Guide to Econometrics , 3rd ed., MIT Press, 1992. © 2001 by CRC Press LLC [...]... nominal dependent variable, logit and probit models are more appropriate In the environmental justice literature, multivariate analyses have employed a variety of methods, including linear regression (Pollock and Vittes 1995; Brooks and Sethi 19 97; Jerrett et al 19 97) , very popular logit models (Anderton et al 1994; Anderton, Oakes, and Egan 19 97; Been 1995; Boer et al 19 97; Sadd et al 1999a), probit ©... distribution of an environmental impact such as environmental risks from Superfund sites and the distribution of the disadvantaged subpopulations © 2001 by CRC Press LLC such as minority and the poor To this end, the analyst first estimates the two distributions and then identifies their associations Chapters 4 and 5 presented the methods for measuring and modeling environmental impact and population distributions... autocorrelation problems (Martin 1 974 ) When autocorrelation occurs, the estimated generalized least square (EGLS) method is often used to estimate a regression model Another approach is to filter out the spatial autocorrelation using the Getis-Ord statistics and then use the OLS (Getis 1999) 7. 6 APPLICATIONS OF STATISTICAL METHODS IN ENVIRONMENTAL JUSTICE STUDIES In environmental justice analyses, the analyst... 19 97) , Tobit model (Hird 1993; Sadd et al 1999a), discriminant analysis (Cutter, Holm, and Clark 1996), and others Greenberg (1993) illustrates that different statistics could lead to different findings about equity (see Table 7. 5) Different statistics have different assumptions, advantages, and disadvantages Table 7. 5 shows a comparison of three measures: proportion, arithmetic mean, and population-weighted... exposure and response This approach was implemented using Markov-Chain Monte Carlo methods The proposed methodology is particularly appealing because of its ability to account for uncertainty, which is prevalent in environmental justice issues Most studies in the environmental justice literature report few diagnostics of the assumptions underlying their statistical methods, with a few exceptions Jerrett and. .. facility, and factual conformity (Lau 1986) Multicollinearity, measurement errors, and spatial autocorrelations deserve greater attention in environmental justice analysis It is well-known that race and income are highly correlated If these two variables are put in the same regression, © 2001 by CRC Press LLC multicollinearity may result Although the OLS still maintains the BLUE, the variances of the OLS-parameter... that race or income has no significant role in explaining environmental risk distribution But the joint hypothesis that both race and income have a zero parameter may be rejected This means that one of them is relevant, but we do not know which 7. 4 PROBABILITY AND DISCRETE CHOICE MODELS A large proportion of environmental justice studies have treated environmental impact as a dichotomous or trichotomous... with TRI facilities in adjacent tracts only, and “dirty” tracts with one or more TRI facilities (Bowen et al 1995) The oft-cited UCC study classified 5-digit ZIP code areas into four groups according to the presence, type, and magnitudes of TSDFs in residential ZIP code areas (See Chapter 4 for a discussion of the strengths and weaknesses of this proximity-based approach.) These dependent variables are... heteroskedasticity, and outliers (Ben-Akiva and Lerman 1985) The maximum likelihood method is used to estimate logit models Asymptotic t-test is used to test the statistical significance of parameters (whether they are statistically significantly different from zero) Similar to the F-test in the CLR, the likelihood ratio test is used to test the joint hypothesis that all parameters are equal to zero The goodness-of-fit... is one form of autocorrelation, and another is temporal autocorrelation in time-series data Autocorrelation leads to biased estimates of parameters and the variance-covariance matrix The inference about the statistical significance of the parameter estimates is unrealizable The Durbin-Watson test is the most popular test for non-spatial autocorrelation, while Moran’s I and Geary’c are the two most popular . processing error, and nonre- sponse. Non-sampling errors are either random or non-random. Random errors will increase the variability of the data, while the non-random portion of non-sampling errors. the Wilcoxon signed-rank test has a clear advantage over the z-test and t-test. Outliers and extreme values will not affect the Wilcoxon signed-rank test, unlike the z-test and t-test. However,. rank-order correlation (Hollander and Wolfe 1 973 ). This is a distribution-free test statistic, which requires only random samples and is applicable to ordinal and higher scales of measure- ments.

Ngày đăng: 11/08/2014, 06:22

TỪ KHÓA LIÊN QUAN