Transportation Systems Planning Methods and Applications 08

7 104 1
Transportation Systems Planning Methods and Applications 08

Đang tải... (xem toàn văn)

Thông tin tài liệu

Transportation Systems Planning Methods and Applications 08 Transportation engineering and transportation planning are two sides of the same coin aiming at the design of an efficient infrastructure and service to meet the growing needs for accessibility and mobility. Many well-designed transport systems that meet these needs are based on a solid understanding of human behavior. Since transportation systems are the backbone connecting the vital parts of a city, in-depth understanding of human nature is essential to the planning, design, and operational analysis of transportation systems. With contributions by transportation experts from around the world, Transportation Systems Planning: Methods and Applications compiles engineering data and methods for solving problems in the planning, design, construction, and operation of various transportation modes into one source. It is the first methodological transportation planning reference that illustrates analytical simulation methods that depict human behavior in a realistic way, and many of its chapters emphasize newly developed and previously unpublished simulation methods. The handbook demonstrates how urban and regional planning, geography, demography, economics, sociology, ecology, psychology, business, operations management, and engineering come together to help us plan for better futures that are human-centered.

8 Statistical and Econometric Data Analysis 8.1 8.2 8.3 8.4 CONTENTS Introduction Models Type of Data and Levels of Measurement Categorical (Discrete) Data 8.5 Continuous Data Choice • Nonchoice • Counting Processes Konstadinos G Goulias Pennsylvania State University Other References and Journals • What Is Next? References 8.1 Introduction One can look at statistical analysis of data as a medium “for extracting information from observed data and dealing with uncertainty” (Rao, 1989, p 98) Another way of saying the same thing is to consider statistics as a group of methods that are used to collect, analyze, present, and interpret data From the myriad of methods available to us for data analysis (Snedecor and Cochran, 1980; Spanos, 1999), regression methods are one family of data analysis that are comprehensive in their ability to address data issues, efficient in their ability to extract large amounts of information in a concise way, and widely available, because even spreadsheet software provides for facilities to estimate simple regression models Regression methods, particularly when one considers generalized regression models, can also be considered as the general family of models that contains analysis of variance and a variety of other methods for the analysis of experiments as special cases Regression methods and models are also the techniques dominating econometrics — the art and science of analyzing economic data, which, when considering the leading textbooks on the subject, is nothing but the study of regression models (see Amemiya, 1985; Greene, 2000; Johnston and DiNardo, 1997; Pindyck and Rubinfeld, 1998; among many others) In the previous chapters of this book different authors pointed out the richness and variety of data available for understanding and predicting travel behavior In this chapter we set out to accomplish a few humble goals: Give a short introduction on statistical and econometric methods and introduce a road map of transportation data analysis, mentioning a few major milestones Provide an introduction to the next steps in data analysis methods and introduce the next three chapters Provide selective references to transportation planning books and articles where these methods were used successfully and where new information may be found in the future © 2003 CRC Press LLC 8.2 Models Statistical methods are used in a wide variety of instances in transportation planning to help us identify, study, and solve many complex practical problems For example, in the public involvement arena these methods enable decision makers, planners, and managers to make informed decisions, consistent with legislation, about the elements of their policies, plans, and programs This is accomplished by collecting data from a variety of persons and groups using a wide variety of techniques to collect qualitative and quantitative data that are combined to yield answers to specific policy and planning questions In another area, regional travel demand forecasting, regional models, and statistics are developed to build large-scale simulation models of a region or even an entire state to help identify alternate urban and regional designs, economic activity locations, and new or improved infrastructure system components After data are collected from individuals and their households, statistical models are estimated and their equations are used in a spreadsheet-like format or embedded into a computer program code Then other input data are provided to create predictions for each individual, household, or even geographical area, using these statistical models In this way statistical models are at the heart of these simulation systems and any errors, omissions, misrepresentations, or other approximations may be amplified and provide the wrong indications This is the key reason we continuously look for better and more precise and accurate model-building methods Figure 8.1 provides a pictorial representation of a linear sequential version of this process The feedback in the figure can be used to improve the models within a given project or to provide recommendations for improvements in a sequence of projects (lagged feedback) As expected, data analysis is needed to support actions and project development in these contexts Before moving into the details of data analysis, a digression to define a few terms is required Decision makers take action based on knowledge about an issue In the path from data to knowledge one can envision a sequence of transformations that lead to increase in power and confidence for the decisions to be made using the data The sequence starts from data to information, which is the transformation of data to something that is relevant to a specific decision problem Then the information becomes a group of facts, when statements can be supported by the data at hand Facts, in turn, become knowledge when they are used to complete the decision process Finally, knowledge aids actions when there is an implementation plan A statistical or econometric analysis and estimated models are the enabling devices (or vehicles) to move from data to knowledge Therefore the statistical or econometric models play a very important role in this example too because they summarize in a concise way the myriad pieces of information in a database In a way similar to that for prediction and simulation, any errors, omissions, misrepresentations, or other approximations may be amplified by the decision process and lead to the wrong actions, which in turn may cause dramatic damages to humans and their environment For this reason data analysis methods, together with operations research methods, have been considered of paramount importance in the decision sciences Both data analysis and operations research are quantitative methods, and they have a long history of development There are, however, other data collection and transformation devices and tools that have received very little systematic attention in transportation planning (Goulias, 2001), but they are beyond the scope of this chapter Data Collection Model Estimation Simulation/ Prediction using Equations Verification, Validation, Interpretation, & Policy Definition Feedback – most times is lagged FIGURE 8.1 A sequential version of a system in which regression models are estimated and used © 2003 CRC Press LLC 8.3 Type of Data and Levels of Measurement Classifications and taxonomies of data analysis methods abound (e.g., Judge etỵal., 1985; Jobson, 1991; Gelman etỵal., 1995), and they depend on the purpose of the reviews Since we focus on regression methods and models, we will use one classification that is consistent with most textbooks and research in travel behavior In typical transportation surveys information is collected using qualitative or quantitative data Qualitative data, such as the color of your car, is not computable by arithmetic operations The color is a label that informs us about a category, a group, a region, or any other classification in which a person or artifact falls These are named the categorical variables On the other side of this classification we find data that are measured on the real line and take any value on it (e.g., a ratio or proportion) There are very few examples in transportation where the data can be considered (completely) continuous because we consider either finite countable and integer quantities, such as trips, cars, sites to visit, and so forth, or variables that may be characterized by a limited range of variation (e.g., a proportion can take values between and 1) The presentation of the models available and resources to study and apply them is divided into two major groups: categorical (discrete) data and continuous data Each of these groups contains a variety of other models, depending on the more specific nature of the variable, the variation of which we are trying to explain (dependent or to be explained) The variation of this variable is explained by explanatory variables and parameters that we need to estimate (the combination of which is named systematic variation) and a random variation that we cannot explain It is also important to stress that the classification we use here is not based on the variables we use as explanatory (predictor) variables They can also be of any type, and there are ways to incorporate almost any type of explanatory variable in a regression model by converting it into some sort of numerical coding that can be handled by the software (Greene (2000) provides a discussion on this; Kennedy (1998) and Pindyck and Rubinfeld (1998) also provide a good discussion and examples) Emphasis in this chapter is placed on cross-sectional data (data collected for individuals and household at one time point) Longitudinal data analysis methods are also starting to emerge in transportation planning, and within each section a short mention is made to this type of data analysis, emphasizing panel surveys (time series are excluded from this presentation entirely) In addition, emphasis is given to the single equation because the issues are similar when one considers each equation of a system of equations (Pendyala in Chapter of this book provides examples of multiple equations issues) In addition, Chapter 11 provides an overview of structural equations and models, which are the premier methods when one wants to consider multiple dependent variables jointly Goulias also addresses the multiple equations issue, incorporating time and social levels, in Chapter 9, on multilevel models 8.4 Categorical (Discrete) Data For the sake of convenience, these can be further divided into choice models, nonchoice models, and models for counting processes The models for counting processes can be further divided into event count and duration models This is described in additional detail below, and key references are provided 8.4.1 Choice In transportation planning a typical example is mode choice, when for a trip a person decides which mode to choose from among a finite set of modes A typical model in discrete choice will be a model of the probability of choosing a mode as a nonlinear function of mode attributes, trip characteristics, and traveler demographics The usual formulation shows the (indirect) utility of each mode as the function from which we depart, and then making certain assumptions about its stochastic nature, we derive the probability shape This enables us to use specialized algorithms for estimation of the parameters driving the function Most transportation planning and modeling textbooks contain the © 2003 CRC Press LLC basic theory and examples of mode choice (Ortuzar and Willumsen, 2001; Meyer and Miller, 2001) There is also a monograph dedicated to the theory, data collection, and experience with choice problems, edited by Gọrling etỵal (1998) Many milestones in the past literature and important developments are reviewed in these three books However, classic references to discrete choice models are: for the logit model and initial formulations, Domencich and McFadden (1975); for the probit model and a very good detail on estimation, Daganzo (1979); and for a comprehensive review with clear examples, Ben-Akiva and Lerman (1985) Many subsequent developments have improved the original algorithms in these books, and widely available software exists for model estimation (see the websites at the end of this chapter) There are, however, many very important and more recent developments in model formulation and estimation that are expanding the scope of discrete choice models, making them by far more flexible and usable than in the past In a recent handbook, Bhat (2000a) and Koppelman and Sethi (2000) review some of these developments Bhat in Chapter 10 of this book provides the latest review of developments and identifies many important issues that have been resolved The analysis of repeated choices, however, has not received wide attention in discrete choice It is expected that with the increasing use of stated choice and preference data (Louviere etỵal., 2000), we may see new developments in the field 8.4.2 Nonchoice This area is also known as contingency table analysis and cross-classification categorical data analysis and contributes one of the richest groups of models that have immense flexibility and potential applications in transportation Fienberg’s (1977) book is still one of the best presentations of the original methods Agresti’s (1990) textbook is one of the most comprehensive surveys linking contingency table analysis methods to logit for binary and multicategory data Two somewhat newer expositions are Powers and Xie (1999) and Le (1998) One early application due to Goodman’s way of looking at contingency data analysis is reported in Kitamura etỵal (1990); extensive use of this method was done for the design of a microsimulator in Goulias (1991) More recently, repeated observations of the same individuals have been analyzed with contingency table methods that contain latent classes (Goulias, 1999), and a connection between latent class and discrete choice has inspired some very interesting model-building work that has become available only recently (see the discussion by Golledge and Gärling in Chapter in this book) Kitamura (2000) also reviews some of these models from a model formulation viewpoint 8.4.3 Counting Processes This group of models targets event counts — the number of times an event occurs In probability, this is the realization of a nonnegative integer random variable Counts and durations seem to be the two sides of the same coin: An event may be thought of as the realization of a point process governed by some specified rate of occurrence of the event The number of events may be characterized as the total number of such realizations over some unit of time The dual of the event count is the interarrival time, defined as the length of the period between events (Cameron and Trivedi, 1998, p 4) In travel behavior there are many examples for both the count and duration regression models In the case of duration models the study of activity episode durations (see Pendyala in Chapter 2) has received considerable attention in the past few years The earliest examples are cited in Kim and Mannering (1997), and a review can be found in Bhat (2000b) Counts using Poisson and negative binomial models also abound for the number of trips, activities per day, number of departures, and so forth Two of the earlier examples are Mannering (1989) and Monzon etỵal (1989) Arentze and Timmermans (2000) and Ma and Goulias (1999) provide updates on count data models and more recent examples A comprehensive review can also be found in Andersen etỵal (1992) © 2003 CRC Press LLC One particular class of models emerges when the count is ordered, for example, the number of vehicles a household owns In this case having three cars is more than having two cars, and having two cars is more that having one, etc These models are particularly attractive because they allow use of certain estimation tricks Greene (2000) provides an extensive discussion on ordered models An earlier example of ordered regression is reported in Kitamura and Bunch (1990), in which repeated observations are also used These models can also be used in attitudinal responses and judgments (Kim etỵal., 2001) 8.5 Continuous Data Every introduction to regression and econometrics departs from a model with a continuously varying dependent variable The usual treatment follows the same sequence with a discussion about the simple linear model and then removing each of a number of assumptions (very often referred to as the Gauss–Markov theorem assumptions) In this way, more and more complex and flexible models are built Because the majority of time and effort in introductory econometrics courses and texts is dedicated to the linear regression model, and because its assumptions are consistently violated by transportation data, the references about this model are limited here to a few key texts and emphasize transportation applications of the limited dependent variable variety For linear regression models, Greene’s textbook is one of the best and most comprehensive references The textbook also contains a very nice section on nonlinear regression models (Greene, 2000, Chap 10, pp 416–453) When the dependent variable is limited (e.g., cannot take values below or above a value), special attention needs to be paid in computing its mean, but also in estimating the regression coefficients Again, the standard textbook is Greene (2000), but a very good reference is also Maddala (1983) The typical example of a limited dependent variable is the Tobit model (see Monzon et al., 1989) There are also simpler methods, as illustrated in the practical application in Goulias and Kitamura (1993) 8.5.1 Other References and Journals Consistently through the past 20 years transportation researchers have utilized many of the new regression methods almost immediately after they have been developed, and very often transportation problems have offered motivation for statisticians and econometricians to develop new methods A notable example is D McFadden, who won the Nobel Prize in 2000 The methods of the other person who won the Nobel Prize for econometric contributions in 2000, J Heckman, are also used very often in transportation data analysis Transportation journals and conference proceedings always contain papers and chapters that will either provide a review of new methods or apply a new method to a transportation problem When seeking these new developments, one should examine the following: Transportation Research Record — A journal of the Transportation Research Board Transportation Research — A Pergamon international journal that is divided into parts dedicated to a specific focus Transportation — A Kluwer international journal The proceedings of the conferences mentioned in Chapter of this book are also very good sources There are also many websites with extensive treatment on statistical and econometric models The two sites with the best and most up-to-date links for statistical and econometric software are: http://www.feweb.vu.nl/econometriclinks/software.html http://www.fas.harvard.edu/~stats/survey-soft/survey-soft.html 8.5.2 What Is Next? Pendyala in Chapter provided a state-of-the-art presentation of more sophisticated and informative models in travel behavior with the stochastic frontier models, mixtures of discrete and continuous dependent variable models, and the duration models In discrete choice, Bhat, in Chapter 10, discusses © 2003 CRC Press LLC other directions focusing on microeconometric data Goulias, in Chapter 9, illustrates extensions of the linear regression that incorporate multiple hierarchies in the data, multiple equations, and multiple ways to incorporate randomness Golob’s review (Chapter 11) also provides another set of directions, along which we will see new advances Finally, another direction of data analysis that we are starting to see develop is in the nonparametric data analysis methods, such as the example in Kharoufeh and Goulias (2002) References Agresti, A., Categorical Data Analysis, Wiley, New York, 1990 Amemiya, T., Advanced Econometrics, Harvard University Press, Cambridge, MA, 1985 Andersen, P.K et al., Statistical Models Based on Counting Processes, Springer-Verlag, New York, 1992 Arentze, T and Timmermans, H., Albatross: A Learning Based Transportation Oriented Simulation System, European Institute of Retailing and Service Studies, Technical University of Eindhoven, Netherlands, 2000 Ben-Akiva, M and Lerman, S.R., Discrete Choice Analysis, MIT Press, Cambridge, MA, 1985 Bhat, C.R., Flexible model structures for discrete choice analysis, in Handbook of Transport Modelling, Hensher, D.A and Button, K.J.,ỵEds., Pergamon, Amsterdam, 2000a, pp 71–89 Bhat, C.R., Duration Modeling, in Handbook of Transport Modelling, Hensher, D.A and Button, K.J., Eds., Pergamon Amsterdam, 2000b, pp 91–110 Cameron, A.C and Trivedi, P.K., Regression Analysis of Count Data, Cambridge University Press, U.K., 1998 Daganzo, C., Multinomial Probit: The Theory and Its Application to Demand Forecasting, Academic Press, New York, 1979 Domencich, T and McFadden, D., Urban Travel Demand: A Behavioral Analysis, Elsevier/North Holland, Amsterdam, 1975 Fienberg, S.E., The Analysis of Cross-Classified Categorical Data, MIT Press, Cambridge, MA, 1977 Gärling, T., Laitila, T., and Westin, K., Theoretical Foundations of Travel Choice Modeling, Elsevier, Amsterdam, 1998 Gelman, A et al., Bayesian Data Analysis, Chapman & Hall/CRC Press, Boca Raton, FL, 1995 Goulias, K.G., Long-Term Forecasting with Dynamic Microsimulation, unpublished Ph.D dissertation, University of California, Davis, 1991 Goulias, K.G., Longitudinal analysis of activity and travel pattern dynamics using generalized mixed Markov latent class models, Transp Res B, 33, 535–557, 1999 Goulias, K.G., On the role of qualitative methods in travel surveys, workshop report on qualitative methods Q-5, International Conference in Transport Survey Quality and Innovation, Kruger National Park, South Africa, August 5–10, CD-ROM, 2001 Goulias, K.G and Kitamura, R., Analysis of binary choice frequencies with limit cases: Comparison of alternative estimation methods and application to weekly household mode choice, Transp Res B Methodol., 27, 65–78, 1993 Greene, W.H., Econometric Analysis, 4th ed., Prentice Hall, Upper Saddle River, NJ, 2000 Jobson, J.D., Applied Multivariate Analysis, Vols and 2, Springer, New York, 1991 Johnston, J.ỵand DiNardo, J., Econometric Methods, 4th ed., McGraw-Hill, New York, 1997 Judge, G.G et al., The Theory and Practice of Econometrics, 2nd ed., Wiley, New York, 1985 Kennedy, P., A Guide to Econometrics, 4th ed., MIT Press, Cambridge, MA, 1998 Kharoufeh, J.P and Goulias, K.G., Nonparametric identification of daily activity durations using Kernel density estimators, Transp Res B Methodol., 36, 59–82, 2002 Kim, T., Koza, S.A., and Goulias, K.G., Analysis of the resident component in PennPlan’s public involvement survey: Survey overview and item nonresponse selectivity issues, paper preprint 01-2772, Transp Res Rec., 1780, 145–154, 2001 © 2003 CRC Press LLC Kim, S and Mannering, F., Panel data and activity duration models: Econometric alternatives and applications, in Panels for Transportation Planning: Methods and Applications, Golob, T., Kitamura, R., and Long, L., Eds., Kluwer, Boston, 1997, pp 349–373 Kitamura, R., Longitudinal methods, in Handbook of Transport Modelling, Hensher, D.A and Button, K.J.,ỵEds., Pergamon, Amsterdam, 2000, pp 113128 Kitamura, R and Bunch, D.S., Heterogeneity and state dependence in household car-ownership: A panel analysis using ordered-response Probit models with error components, in Transportation and Traffic Theory, Koshi, M., Ed., Elsevier/North Holland, Amsterdam, 1990, pp 477–496 Kitamura, R., Nishii, K., and Goulias, K.G., Trip chaining behavior by central city commuters: A causal analysis of time–space constraints, in Developments in Dynamic and Activity-Based Approaches to Travel Analysis, Jones, P., Ed., Avebury, Aldershot, U.K., 1990, pp 145–170 Koppelman, F.S and Sethi, V., Closed-form discrete-choice models, in Handbook of Transport Modelling, Hensher, D.A and Button, K.J., Eds., Pergamon, Amsterdam, 2000, pp 211–225 Le, C.T., Applied Categorical Data Analysis, Wiley, New York, 1998 Louviere, J.J., Hensher, D.A., and Swait, J.D., Stated Choice Methods: Analysis and Applications, Cambridge University Press, Cambridge, U.K., 2000 Ma, J.ỵand Goulias, K.G., Application of Poisson regression models to activity frequency analysis and prediction, Transp Res Rec., 1676, 86–94, 1999 Maddala, G.S., Limited Dependent and Qualitative Variables in Econometrics, Cambridge University Press, U.K., 1983 Mannering, F., Poisson analysis of commuter flexibility in changing route and departure times, Transp Res B, 23, 53–60, 1989 Meyer, M.D and Miller, E.J., Urban Transportation Planning, 2nd ed., McGraw-Hill, Boston, 2001 Monzon, J., Goulias, K.G., and Kitamura, R., Trip generation models for infrequent trips, Transp Res Rec., 1220, 4046, 1989 Ortuzar, J.ỵde D and Willumsen, L.G., Modelling Transport, 3rd ed., Wiley, Chichester, U.K., 2001 Pindyck, R.S and Rubinfeld, D.L., Econometric Models and Economic Forecasts, 4th ed., McGraw-Hill, Boston, 1998 Powers, D.A and Xie, Y., Statistical Methods for Categorical Data Analysis, Academic Press, New York, 1999 Rao, C.R., Statistics and Truth: Putting Chance to Work, International Co-Operative Publishing House, Fairland, MD, 1989 Snedecor, G.W and Cochran, W.G., Statistical Methods, 7th ed., Iowa State University Press, Ames, 1980 Spanos, A., Probability Theory and Statistical Inference: Econometric Modeling with Observational Data, Cambridge University Press, U.K., 1999 © 2003 CRC Press LLC ... alternatives and applications, in Panels for Transportation Planning: Methods and Applications, Golob, T., Kitamura, R., and Long, L., Eds., Kluwer, Boston, 1997, pp 349–373 Kitamura, R., Longitudinal methods, ... departures, and so forth Two of the earlier examples are Mannering (1989) and Monzon etỵal (1989) Arentze and Timmermans (2000) and Ma and Goulias (1999) provide updates on count data models and more... the function Most transportation planning and modeling textbooks contain the © 2003 CRC Press LLC basic theory and examples of mode choice (Ortuzar and Willumsen, 2001; Meyer and Miller, 2001)

Ngày đăng: 05/05/2018, 09:29

Mục lục

    TRANSPORTATION SYSTEMS PLANNING: Methods and Applications

    PART II: Data Collection and Analysis

    Chapter 8: Statistical and Econometric Data Analysis

    8.3 Type of Data and Levels of Measurement

    8.5.1 Other References and Journals

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan