Research methodology for social sciences rajat acharyya, nandan bhattacharya, routledge, 2020 scan

RESEARCH METHODOLOGY FOR SOCIAL SCIENCES Research Methodology for Social Sciences provides guidelines for designing and conducting evidence-based research in social sciences and interdisciplinary studies using both qualitative and quantitative data Blending the particularity of different sub-disciplines and interdisciplinary nature of social sciences, this volume: • • • Provides insights on epistemological issues and deliberates on debates over qualitative research methods; Covers different aspects of qualitative research techniques and evidence-based research techniques, including survey design, choice of sample, construction of indices, statistical inferences and data analysis; Discusses concepts, techniques and tools at different stages of research, beginning with the design of field surveys to collect raw data and then analyse it using statistical and econometric methods With illustrations, examples and a reader-friendly approach, this volume will serve as a key reference material for compulsory research methodology courses at doctoral levels across different disciplines, such as economics, sociology, women’s studies, education, anthropology, political science, international relations, philosophy, history and business management This volume will also be indispensable for postgraduate courses dealing with quantitative techniques and data analysis Rajat Acharyya is Professor of Economics at Jadavpur University, Kolkata, and Director (additional charge) of UGC-Human Resource Development Centre, Jadavpur University, India He was the former Dean, Faculty of Arts, Jadavpur University (2013–2016) Professor Acharyya received his MSc (Economics) degree from Calcutta University in 1990 and PhD (Economics) degree from Jadavpur University in 1996 He was a Ford Foundation postdoctoral fellow at Rochester University, New York, USA, during 1997–1998 Professor Acharyya has written five books and published more than 60 articles in journals and edited volumes His recent books include International Trade and Economic Development (co-authored with Saibal Kar, 2014) and International Economics: Theory and Policy (2013) He was awarded the EXIM Bank International Trade Research Award in 1997, Global Development Network (Washington D.C.) Research Medal in 2003, the Mahalanobis Memorial Medal in 2006 and Shikhsaratna (Best University Teacher) Award by the Government of West Bengal in 2016 Nandan Bhattacharya is Assistant Director of the UGC-Human Resource Development Centre, Jadavpur University, India Dr Bhattacharya received his MSc (Zoology) degree in 1992 and PhD (Zoology) degree in 2004 from Vidyasagar University, India He has published several articles in different reputed journals and delivered lectures at different colleges and institutes of higher learning within and outside West Bengal He has coordinated and designed course curriculums for orientation programmes, workshops and short-term courses specially conducted for college and university teachers/librarians under the UGC Guidelines His areas of research interest include ecology, education and communication skill development, amongst many others Contemporary Issues in Social Science Research Series editors: Rajat Acharyya and Nandan Bhattacharya UGC-Human Resource Development Centre, Jadavpur University, India Contemporary Issues in Social Science Research is a series dedicated to the advancement of academic research and practice on emerging 21st-century social and cultural themes It explores fresh perspectives on a legion of interdisciplinary social science themes connecting subject areas that have hitherto been unexplored, underdeveloped or overlooked This series aims to provide scholars, researchers and students a ready reference for the new and developing in social science academia which has come into the fore as focal points of debate and discussion today Research Methodology for Social Sciences Edited by Rajat Acharyya and Nandan Bhattacharya Peace and Conflict Studies Theory and Practice Edited by Shibashis Chatterjee and Anindya Jyoti Majumdar For more information about this series, please visit www.routledge.com/ Contemporary-Issues-in-Social-Science-Research/book-series/CISSC RESEARCH METHODOLOGY FOR SOCIAL SCIENCES Edited by Rajat Acharyya and Nandan Bhattacharya First published 2020 by Routledge Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 52 Vanderbilt Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2020 selection and editorial matter, Human Resource Development Centre, Jadavpur University; individual chapters, the contributors The right of Rajat Acharyya and Nandan Bhattacharya to be identified as the authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988 All rights reserved No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record has been requested for this book ISBN: 978-1-138-39051-5 (hbk) ISBN: 978-0-367-40984-5 (pbk) ISBN: 978-0-367-81034-4 (ebk) Typeset in Bembo by Apex CoVantage, LLC CONTENTS List of figures vii viii List of tables List of contributors ix Forewordxiii Editors’ note xiv Introduction Rajat Acharyya PART I Epistemological issues Methodological or epistemological issues in social research Achin Chakraborty PART II Debates in research methods 25 Towards a pragmatic centre: debates on qualitative methodology27 Samita Sen Ethnographic fieldwork: the predicaments and possibilities Amites Mukhopadhyay 44 vi Contents Diversity in economics: an examination and defence of heterodox approach Soumik Sarkar and Anjan Chakrabarti 57 PART III Methods of conflict analysis and policy evaluation 89 Game theory: strategy design in conflict situations Swapnendu Banerjee 91 Impact evaluation: a simple need and a difficult choice of methodology Arijita Dutta 111 Construction of different types of indices in social science research: some numerical examples Sushil Kr Haldar 122 PART IV Quantitative research methods and predictive analysis 165 Designing a primary survey-based research Tanmoyee Banerjee (Chatterjee) 167 Sampling methods: a survey Malabika Roy 181 10 An introduction to statistical inference Sugata Sen Roy 206 11 Problems of endogeneity in social science research Arpita Ghose 218 12 Quantitative methods for qualitative variables in social science: an introduction Ajitava Raychaudhuri 253 FIGURES 5.1 5.2 5.3 5.4 5.5 6.1 6.2 7.1 7.2 8.1 9.1 9.2 11.1 11.2 Payoffs in battle of sexes game Payoffs in prisoners’ dilemma game Payoffs in matching pennies game Payoffs in stag hunt game Dynamic battle of sexes game Impact evaluation: a graphical presentation Before-after comparison Lorenz curve Concentration curve Box plot diagram Different methods of sampling Snowball sampling Scatter diagram of quantity and price Identification of demand function 94 96 97 98 100 113 114 144 151 179 188 199 228 230 TABLES 4.1 7.1 7.2 7.3 7.4 7.5 Snapshot view of difference in economic theories 76 Goalposts for the GDI 128 Gender-specific human development indicators 129 Income (in Rs.) earned by members per day of different social groups 131 Poverty amongst different social groups 132 Dimensions and indicators of multidimensional poverty at the household level 137 7.6 Multidimensional poverty at the household level 138 7.7 SAHS data by income 153 7.8 Computations for estimation of CI 153 7.9 Estimation of CI for grouped data 154 7.10 Parameters of human poverty across states, 2015–16 156 7.11 HPI across sixteen major states using Anand and Sen (1997) methodology156 7.12 HPI (weighted and un-weighted) and the ranks of states 157 A.7.1 SPSS output of PCA 162 A.7.2 Factor loadings 162 9.1 Example of probability sampling 187 9.2 Example of SRSR 189 9.3 All possible samples 189 9.4 All possible samples with probability and mean 190 9.5 Samples under systematic sampling 194 9.6 Systematic sampling with non-integer N/n195 A.9.1 Probability distribution of sample mean 203 A.9.2 Stratification 204 10.1 Some standard distributions and their parameter estimators 210 12.1 Elasticities of change in lowest and highest probabilities (the marginal effects) 266 CONTRIBUTORS Swapnendu Banerjee is Professor of Economics, Jadavpur University, Kolkata, India His areas of interest include microeconomic theory, game theory and economics of contracts He completed his PhD from Jadavpur University and subsequently did his postdoctoral research from the National University of Singapore (2004–2005) and London School of Economics (2016) He has published extensively in reputed international journals and given presentations and invited lectures at places like Cornell, LSE, Birmingham, Nottingham, National University of Singapore, Indian Statistical Institute, Delhi School of Economics, IIM Kolkata, IIM Bangalore and Indira Gandhi Institute of Development Research, amongst others Tanmoyee Banerjee (Chatterjee) is Professor of Economics, Jadavpur University, Kolkata, India Her areas of interest are industrial organization, microeconomic theory, micro econometrics, microfinance, financial inclusion and gender analysis and economic growth She has undertaken various empirical research projects funded by national agencies of India, such as ICSSR and UGC She has published extensively in reputed international journals Anjan Chakrabarti completed his MSc in Economics from University of Calcutta and his PhD in Economics from University of California, Riverside He is currently Professor of Economics, University of Calcutta, India His interests span Marxian theory, development economics, Indian economics, financial economics, history of economic ideas and political philosophy He has to his credit eight books and has published over 60 articles in peer-reviewed academic journals, edited books and handbooks His latest co-authored book is The Indian Economy in Transition: Globalization, Capitalism and Development He has published in journals such as Cambridge Journal of Economics, Rethinking Marxism, Economic and Political Weekly, Journal of Asset Management, Collegium Anthropologicum and Critical Sociology He is presently 248 Arpita Ghose From reduced form model one can estimate σ11, σ 21, σ12 Now consider, σ11 σ + σ12 1+σ12 / σ 22 σ2  1+λ  1+λ if λ = 12  (A.9) = 22 = = 2 2 2  σ 22 β1 σ + α1 σ β1 + α1 λ β1 + α1 λ  σ σ2  β12 + α12 12 σ2 and, σ 12 β1 σ 22 + α σ 12 β1 + α σ 12 / σ 22 β1 + α 1λ = = = (A.10) σ 11 1+ λ σ 22 + σ 12 + σ 12 / α 22 Since β1 has already been estimated without variance covariance restriction, (A.9) and (A.10) can be used to solve for α1 and λ α1 is the slope of price in the supply equation Next consider the expression  α (α − β0 )  + α  (A.11) π 20 =   β1 − α1  The solution of π20, is obtained through an estimate of reduced form The parameters α1, β0, β1 are estimated Hence the intercept term of the supply equation, α0, can be obtained using A(36) Finally consider  α α α β1  π 22 =  + α =  (A.12) β1 − α   ( β1 − α ) The solution of π22 is made through estimate of reduced form The parameters β1, α1 are estimated Thus, α2, the coefficient of rainfall in supply equation, can be solved using (A.12) Hence, using covariance restriction E(u1t, u2t) = σ 12 = , all the parameters of the supply equation is solved and the supply equation is identified Condition of identification of the model under general homogeneous linear restriction Before discussing the condition of identification in the presence of general homogeneous linear restriction, let us first discuss how one can represent the homogeneous linear restriction Consider the structural form of the model at any time point t, BYt + Γ X t = e t Or, Or,  Yt  = e t,  X t  ( BΓ )  Az t = e t Problems of endogeneity 249 where A : (BГ): the coefficient of structural parameters, A : Gx (G + K)  Yt  Zt =   : vector all observations on endogenous and exogenous variable at  Xt  time t, zt: (G + K)x1 Thus the system of equation can be represented as: §D Ã Đ e1t Ã ă ~1 ă D z = e where e = ă e t ¨ 2t ¸ ¨ ~2 ¸ t t ¨e ¸ ăă D áá â Gt â ~Gạ Where i is the i-th row of A Example 1: Let Y3 does not appear in first equation; the restriction is represented as β13 = 0 This restriction is on the first row of A matrix and hence can be represented as  0  0  0  0     [β11 β12 β13 γ 11 γ k ]   = 0~ Or, α1 Ø= 0, where Ø =   , Ø : (G + K) x  0  0      0  0 Example 2: Suppose the coefficients of Y1 and Y2 are equal, i.e β11 = β12 Or, β11 – β12 = 0 This restriction is also on the elements of first row of A matrix and can be represented as:  1 1   -1   [β11 β12 γ 11 γ k ] =0, Or,α1 Ø = 0,where Ø =  −1 = 0, Ø :(G + k) x   0   δ  Example 3: If both the restrictions, [i] β13 = 0, [ii] β11 – β12 = 0, are satisfied it can be represented as: Đ0 ă ă0 >E11 E12 J11 J k @ ă1 ă ă0 ă0 â Ã Đ0 Ã ă á -1á ă -1¸ ¸ , Or, α1 Ø = 0, where Ø = ă1 , ỉ : G + K x ă á ă0 ă0 áạ â 250 Arpita Ghose Note [i] number of columns in Ø is equal to the number of prevailing restrictions and the number of rows in Ø is equal to total number of variables (endogenous and exogenous/predetermined) Order condition: This is algebraic consistency condition, is necessary but not sufficient R ≥ G –1 It shows number of equations must be at least as great as number of variable to solve Rank condition Rank ( AØ) = G – This condition shows number of independent equations must be equal to the number of variable to solve For proof of rank and order condition of identification see Johnston (1984, 3rd ed.) and Judge et al (1982) Rule of identification using general homogeneous linear restrictions [1] R = (G – 1), rank (AØ) = G – 1, the equation is just identified [2] R > (G – 1), rank (AØ) = G – 1, the equation is over identified [3] R ≥ (G – 1), rank (AØ) < G – 1, the equation is under identified [4] R < (G – 1), rank (AØ) < G – 1, the equation is under identified Rule (1) says that order condition holds, i.e since R = G – 1, the number of equation is exactly equal to the number of variables to solve and also rank condition gets satisfied, i.e rank (AØ) < G – 1, i.e all the equations are independent equations and the equation is just identified Rule (2) says that since R < (G – 1), order condition holds, i.e the number of equation are greater than the number of variables to solve and since rank conditions holds, i.e rank (AØ) = G – 1, all the equations are independent equations, the equation is over identified The rule (3) is interesting It says that even if R > (G – 1) showing order condition holds, i.e the number of equation is greater than the number of variable to solve, there is not sufficient number of independent equation to solve for number of variables as rank conditions does not gets satisfied, i.e rank (AØ) < G – Hence the equation is under identified Rule (4) says that as R < (G – 1), there is not enough equations to solve for the number of variables and hence the equation is not identified Example: To test the identifiability of the model β11 y1t + β12 y 2t + γ 11 x1t + γ 12 x 2t = u1t β21 y1t + β 22 y 2t + γ 21 x1t + γ 22 x 2t = u2t Problems of endogeneity 251 under the following restriction γ1= 0, γ12= 0, γ12 = 0, in matrix notation the system can be expressed as, By t + Γ x t = ut β11 β12   γ 11 where, B =  ,Γ=   β21 β22   γ 21 γ 12  u1t  , ut =    γ 22  u2t  Or, Az t = ut γ 12   α1  = γ 22   α  yt   β11 β12 γ 11 A = [ BΓ ], z t =   , A = (β, Γ ) =  x  β21 β22 γ 21  t where α = ( β11 β12 γ 11 γ 12 ), α = ( β 21 β 22 γ 21 γ 22 ) To check identification condition of two equations The restriction on the first structural equations are γ 11 = 0, γ 12 = 0, which can be expressed as  0  0 ( β11 β12 γ 11 γ 12 )   = 0, or α Ø = 0, where Ø =  0  1  0  0    0  1 The restriction on second equation γ 22 = 0 can be expressed as  0  0  0  0 ( β 21 β 22γ 21γ 22 )  0 = 0, Or, α φ = 0, where Ø =  0      1  1 Order condition: We need to show, R ≥ G – Rank condition: We need to show, rank (AØ) = G – Identification of first equation: For first equation R = 2, G = 2, G – 1 = 1 Hence R > G – and order condition is satisfied For the first equation AØ can be expressed as  β11 A∅ =   β 21 β12 γ 11 β 22 γ 21 (Under full restriction)  0 γ 12   0  γ 11 =  = γ 22   0  γ 21  1 γ 12   0 = , γ 22   γ 21 0 252 Arpita Ghose ∴ Rank (AØ) = 1 = G – 1, implying rank condition is satisfied ∴ For first equation R > G – 1, and rank (AØ) = G – 1, thus, the equation is over identified Identification of second equation: Here, R = 1 = G = 1 Thus, R=G – 1, and order condition holds For second equation, AØ can be expressed as  0  β11 β12γ 11γ 12   0  γ 12   0  β β γ γ   0 =  γ  =  0 , ( Under the full restriction of the model.) , 21 22 21 22 22  1 Hence rank (AØ) = 0 < G – 1, showing rank condition is not satisfied Thus for second equation R = G – 1, Rank (AØ) < G – 1, showing the equation is under identified How to make second equation identified: Introduce additional restriction β 21 + γ 22 = 0; in addition to γ 22 = 0, these restrictions can be represented as  1  1  0  0 ( β 21 β 22γ 21γ 22 )  0 = 0, or α Ø = 0, where Ø =  0      1  1 with this new restriction, R = 2 > G – 1 = 1 hence order condition holds and  1  β11β12 γ 11 γ 12   0  γ 12  = AØ =   β21β22 γ 21 γ 22   0  γ 22  1 β11 +γ 11  0 β11  = β21 +γ 21  0  (under full restriction of the model ) Rank (AØ) = 1 Thus R > G – and rank (AØ) = G – 1, showing the equation is identified Thus, with appropriate restrictions an under identified equation can be made identified Notes For a discussion of the OLS method and the necessary assumptions see Wooldridge (2009) See Kmenta (1991) and Johnston (1984) for structural, reduced and final forms of a general simultaneous equation system consisting of G number of endogenous variables See appendix for a mathematical demonstration of identification of demand equation 12 QUANTITATIVE METHODS FOR QUALITATIVE VARIABLES IN SOCIAL SCIENCE An introduction Ajitava Raychaudhuri 1 Introduction Social science, unlike physical science, deals with subjects which in many cases are not easily quantifiable It includes human behaviour which itself is an integration of psychology, philosophy, rationality, social existence etc Similarly a subject like political science deals with certain stylized facts like strategy, security, political culture etc Another subject history is not only a repository of important events in the past, but it shapes the present as well through its underlying explanations on the basis of materialistic and ideological conflicts amongst others However, many of these events or actions are qualitative in nature which does not show up in terms of quantitative variables In the course of this chapter, some such examples will be highlighted to establish the fact that quantification of apparently qualitative variables can throw important light on possible trajectories these variables may take in future 2 Types of qualitative variables Many qualitative variables appear as binary variables to use a technical term In other words, the event or the aspect one wants to study does not have a quantification property except that the event can only be categorized as yes or no Examples of this abound from almost every field of social as well as behavioural sciences One may be interested to know what factors prompt people to be smokers Thus the variable which matters in this case is whether a person is a smoker or not –again a binary occurrence so that a smoker may respond yes (having an arbitrary quantitative value of 1) and a non-smoker responds no (having an arbitrary quantitative value of 0) Now a researcher can then subject these responses to some rigorous techniques and try to answer what factors may cause an individual to be more 254 Ajitava Raychaudhuri inclined to smoke – some such factors may well be age, education, family background, health issues, social factors like neighbourhood, type of job etc Similarly a political scientist may be interested to know voting behaviour after an election but not sure how to quantify the factors responsible for the voting pattern Here also the response for a voter may be yes or no regarding vote cast for either the incumbent candidate or its opposition Once this is collected, the binary responses may be subjected to rigorous empirical techniques to judge the importance of age, education, income, gender, religion, caste, location, language etc in shaping the decisions in the voters’ minds Sometime the responses may not be binary Suppose a television channel wants to elicit responses from its viewers about the attractiveness of certain newly launched TV serial The problem is that the viewers in this case may not have a clear yes or no response Instead they might be able to categorize their responses in an ordered fashion like, excellent, very good, good, average, bad The researcher then will be interested to know what prompts a viewer to go for very good instead of good or excellent or the other lower order responses Since these are qualitative responses, it may apparently seem to be impossible to a quantitative assessment about what factors may shape such ranking in the mind of the viewers However, techniques have been discovered where the researchers are able to transform such qualitative ranking into quantitative variables and then try to decipher the underlying factors which might be responsible for such categorizations 3 Binary dependent variables Binary dependent variables pose a major challenge for standard quantitative analysis pursued in social science disciplines like economics In fact many qualitative variables have binary characterization For example, to take the example cited earlier, if one is interested to know the major likely factors which drive a person to be a smoker, then one faces this problem of binary dependent variables In other words, one can undertake a survey where the respondents identify themselves as smokers or non-smokers After all, this is an attribute or qualitative character of the respondent The response is binary namely if the person is a smoker she is given a number and otherwise Similarly if one wants to infer about the factors which lead to some students getting grade A and others not, one needs to quantify some qualitative attributes of the students who are doing well Here also the variable to be explained, namely getting grade A is given a number while those getting grades below are given a number Apparently this looks innocuous But on closer scrutiny one finds several troubleshooters in this approach of using binary variables to be explained The most important lies in the fact that the variable to be explained no more remains to be a continuous variable This creates problem since the most common technique used to find impact of some explanatory factors to any event is regression This is very common as a statistical tool, but this depends on the assumption that the variable to be explained is a continuous variable which can assume all possible values in a range which is predefined by the researchers In economics, this is better explained by Quantitative methods for qualitative variables 255 regression techniques used in econometrics (which is the technique used to measure and test economic hypotheses which stem from some underlying theories) 3.1 Linear probability models (LPM) This technique assumes a binary response can be explained by a linear relationship involving the binary response as the variable to be explained and some explanatory variables For example, to take the example mentioned earlier, suppose the dependent variable (or the variable to be explained) Yi is the response that the student gets a grade A or not In case the student achieves a grade A, he is assigned a value and otherwise Let us assume the performance is explained by the number of hours the student studies (X) Then the LPM uses a standard linear regression model which looks like:Yi = β1 + β2Xi + ui This relationship tries to explain Y with the help of X and it is called linear since the relationship is a straight line with an intercept if one ignores the term ui Further the coefficients are constant parameters, so that they not change with variation in X The subscript ‘i’ refers to the i-th individual student The term ui (called the error term) is very common in econometrics since it suggests that however good the explanatory power of X on Y may be, there always remains some unexplained part which cannot be captured by X alone This may be due to some variables which are omitted or for variables which are measured incorrectly or any reason which is not known The real trouble starts with this error term.This term in statistical terminology is called a random variable having a distinct probability attached with each value.The reason for this is the simple fact that one cannot be sure what value this ui might take, given the components in it as previously mentioned In standard estimation procedure, ui is taken to be a normal variable by which it is understood that it can take any value, positive or negative, in the range –∞ to +∞, which implies almost all plausible real numbers could appear in the range Further, any value can occur with a distinct probability, and if the occurrence of values follows a normal probability distribution then identical positive and negative values of ui have equal chance of occurrence Thus the normal distribution of probability is called a symmetric distribution around a value A negative value of the error term for an individual student really means that number of hours of study has less explanatory power to explain his grade compared to an average student Similarly for a positive value of the error term it is the opposite This assumption of normality of probability distribution of the error term cannot be true for a variable which is a binary response variable The reason is somewhat obvious – it takes only two values, namely (1 – β1 – β2Xi) or (–β1 – β2Xi) The first term follows from Yi taking a value and the second one follows from Yi taking a value (since ui = Yi– β1 – β2Xi) Thus if p is the probability of Yi taking a value and (1– p) is the probability of Yi taking a value 0, then the probability distribution of ui is also confined to these two probabilities with the values of ui being either (1 – β1 – β2Xi) or (–β1 – β2Xi) This is similar to a Bernoulli trial in statistical terminology which produces a binomial probability distribution and not normal 256 Ajitava Raychaudhuri Further, the standard regression models can be used provided it is assumed that the variance of the error term (which measures the variation of ui values around its mean value) is identical for all values of ui Unfortunately, given the peculiar nature of the LPM, variance of ui is not identical – it can well vary for different values In addition another problem arises, namely the estimated values of Yi may well lie beyond the two possible values and This defeats the whole purpose of having the binary response model As a result, researchers have to resort to other methods of estimation for getting an idea about the most important determining factors for such binary responses One such important estimation model is known as logit model A similar method which is also popular in this context is known as probit model The two models essentially differ about the underlying distribution of the probability attached to the values The logit model uses a logistic distribution while a probit model assumes a normal distribution for the probability distribution underlying the variables 3.2 Logit and probit model The main advantage of logit model is that it avoids the problems mentioned in the context of linear probability model (LPM) mentioned earlier The first is the problem of handling binary responses which has a dichotomous nature of values, namely or Thus one cannot use a distribution of probability of variables which can assume any value from a continuous range between –∞ to +∞ Further, the standard regression needs to impose restrictions on the error term in order to have consistent estimates This itself poses problem and logit avoids that too In fact, it avoids one of the standard regression tools namely ordinary least squares (OLS) which is used in LPM too Instead of OLS it uses an iterative technique to find estimates of relevant parameters through a method called the maximum likelihood method (ML) The latter tries to find the maximum probability of having a response or based on the given values of the determining variables for such a response by varying the associated coefficients of these variables (the coefficients basically measure the degree of influence of the determining variables on a response) Logit model (as also in probit model), the variable under study is not only dichotomous but actually represents some latent variables For example, one may try to analyse the factors that determine students who scores well in examinations Thus the exact variable may be a dichotomous variable like those getting grades over A respond yes and those below respond no But the latent variable which really drives the response may be ability of the students to score well measured by the past scores of the student Similarly, whether a person will buy a car may be again captured by a binary yes or no response, but the underlying latent variable may be some cutoff utility (or satisfaction) level The connection between the actual variable and the latent variable may be represented as follows: y k * i = β + ∑ β j xij + ui j =1 Quantitative methods for qualitative variables 257 where yi* is latent variable and not observed, xij stands for the j-th explanatory factor and i stands for the i-th individual The actual observed variable is represented by  if yi* > yi =  0 otherwise The difference between logit and probit model is entirely dependent on the distribution of the error term ui If the cumulative distribution of ui is logistic in nature, then the resulting model is called logit On the other hand if ui has a normal distribution, the resulting model is called probit It really depends on the researcher which model is to be chosen –but generally the results not differ much between logit and probit models One must note that in linear probability models, the variable considered is the observed one and there is no latent variable involved In order to have a better idea about the technique used in the logit model, some technical details are given here First one notes that if one observes yi = 1, then the latent variable must have a value greater than zero Following our given examples, students must have some abilities (whatever be that) more than an average cutoff (to get a good grade now) or individuals have higher desire than a minimum (in order to buy a car), so that the latent variable is given a positive value if these cutoff values are satisfied To reiterate, the latent variables are mostly qualitative in nature which cannot be easily observed or quantified In such a case, it is obvious that the following holds: k k   Yi = 1 implies Yi* > or β + ∑ β j xij + ui > or ui > –  β + ∑ β j x ij    j =1 j =1 Therefore, probability that Yi = 1 is given by, k    Prob ( Yi =1) = Prob ui > −  β + ∑ β j x ij      j =1  k    = – F  −  β + ∑ β j x ij     j =1   where F [. . .] is the cumulative distribution of probability associated with the random error term ui k If we denote Z i = β + ∑ β j X ij then if the distribution of ui is symmetric, as j =1 in normal distribution, we get –F(–Zi) = F(Zi), and as a result the following holds: k   Pi = F  β + ∑ β j x ij    j =1 258 Ajitava Raychaudhuri Noting that yi is nothing but the realization of a binomial process (where the probability of occurrence is either or 0), and it varies according to the values of the determining explanatory variables x ij , the likelihood function of occurrence of a probability is given by the following (Maddala [2002, pp. 322–323]): L = Π Pi Π (1 − Pi ) yi =1 yi = If ui is distributed cumulatively as logistic, then the distribution of F(Zi) satisfies the following: e Zi F (Z i ) = , which yields the following distribution for Zi + e zi Z i = log F (Z i ) − F (Z i ) Thus the logit model follows what is known as the Weibull distribution and is represented by the following equation: log k Pi = β + ∑ β j x ij − Pi j =1 This is equivalent to the expression, Pi = e Zi − Pi k where, Z i = β + ∑ β j X ij j =1 The ratio Pi is known as the odd for the i-th observation of the j-th explana1 − Pi tory variable Similarly, the odds-ratio gives ratio of the probability of getting the correct (or desirable) response, that is prob(Yi = 1) relative to the probability of getting an incorrect (or not desirable) response which is prob(Yi = 0) for two alternative values of any explanatory variable To make it clearer, ignoring the i-th subscript for the time being since this will be similar for all i, suppose an independent variable X1 takes the values c and c+k Then odd for getting P(Y=1) for X1 = c, written as P(Y=1|X1 = c) is given by odd 1= P(Y=1|X1 = c)/(1 – P(Y=1|X1 = c)) Similarly for the other value c+k, the odd is written as odd 2 = P(Y=1|X1 = c+k)/(1 – P(Y=1|X1 = c+k)).Then the odd ratio is the change in the probability of having yes when only one of the explanatory variables X1 changes from c to c+k and the odd ratio (OR) = odd2/odd1 Quantitative methods for qualitative variables 259 3.3 Estimation of logit or probit models As mentioned earlier, logit or probit models are estimated by the maximum likelihood (ML) methods instead of ordinary least square method Now the estimation by ML methods has become quite easy with the help of standard econometric software, such as STATA, Eviews and SPSS But one important point to note is the meaning of the coefficient βi which measures the impact of a change in Xij on prob(Yi = 1) in case of probit or on log of odds in case of logit The estimated value will be identical in its effect on the latent variable Yi* as well But this creates a problem of interpretation.The reason is simple As Wooldridge (2006, p. 585) points out,Yi* in general has no well-defined unit of measurement In some cases it measures difference in psychic values of two different events or actions or some other attribute which is qualitative in character In such cases, it makes little meaning to emphasize too much on the estimated values of the coefficients βi To overcome such ambiguities, the researchers use the concept of marginal effect of a variable which is nothing but the partial effect of change of an explanatory variable on the probability of (Yi = 1) in case of both logit or probit regression The usefulness of calculating marginal effects is to quantify the impact of change in one explanatory variable Xij on the probability of getting a success (meaning a yes response) Thus if say the yi variable denotes some employment indicator for the i-th individual surveyed, for example yes meaning ‘having a job’ and indicating ‘not having it’, then yi depends on the Xij variables (which is the value of j-th explanatory variable for the i-th individual) Now the explanatory variable can also be binary – for example it can have a value indicating the individual has gone through some job training programme and indicating the individual has not taken the training These kind of binary explanatory variables are also known as dummy variables which are used to denote some kind of qualitative explanatory variables Then the marginal effect is given by the change in the cumulative probability of having the training over not having it Suppose X1 is this dummy variable representing the training Thus the effect can be written as follows: F (β0 + β1 + β2 X12 + β3 X13 + + βk X1K ) − F (β0 + β2 X12 + β3 X13 + + βk X1K ) This expression clearly shows the difference in cumulative probability (see the definition stated earlier) of obtaining a job for a person with training over not having the training The first term is the cumulative probability of having the training given the values for all other explanatory variables like, say, age, education, income, past experience etc and the second one is the same for the individual having the same characteristics but not having the training In case the X variable is not a dummy variable then the marginal probability measures the change in cumulative probability of changing the value of j-th explanatory variable by one unit In this case the change is over whatever be the existing value of the variable (need not be 0) 260 Ajitava Raychaudhuri Thus, the slope coefficients in both logit or probit gives an idea to which direction the odd ratio or the probability of success will move if the explanatory variable changes, but for the exact quantitative magnitude of the effect on probability of getting a success of a change in one of the variables (keeping other explanatory variables fixed) is given by the marginal effect In terms of notations the marginal effects are given as follows: For Logit ∂pi = β j pi (1 − pi ) ∂xij For Probit: k ∂pi = f ( β + ∑ β j X ij )β j ∂xij j =1 3.3.1 Example of logit regression The following logit regression was run by STATA (ver 13) software to understand probability of surviving cancer If the patient survives, the value of is given to the respondent and in case the patient dies The explanatory variables (sometimes also known as predictors since these are used for future prediction) are as follows: studytime = period available to treat the patient before his/her death; drug = use of medicine, a dummy variable with placebo = and otherwise; age = age of the patient The equations estimated are the same as previously discussed The output looks as follows (directly reproduced from STATA software): A logistic regression no of obs=48 LR chi2(3)=13.67 prob> chi2=0.0034 log likelihood = –24.364293 pseudo R2=0.2191 -died | coef std err Z P>|z| [95% conf interval] + -studytime | –0.0236468 0.0457671 –0.52 0.605, –0.1133487, 0.0660551 drug | –1.150009 0.5549529, –2.07 0.038, –2.237697, –0.0623212 age | 0.0793438 0.0699391, 1.13 0.257 –0.0577344, 0.2164219 cons| –1.113136 3.945369, –0.28, 0.778, –8.845918, 6.619646 -margins, grand ey/ex (drug age studytime) average marginal effects no of obs=48 model VCE: OIM Quantitative methods for qualitative variables 261 expression: pr(died), predict() ey/ex w.r.t.: studytime drug age - | delta-method | ey/ex std err Z P>|z| [95% conf interval] + -studytime | –0.1768585, 0.35401, –0.50, 0.617, –0.8707054, 0.5169884 –0.9822992, 0.5370982, –1.83, 0.067, –2.034992, 0.0703938 drug | 1.522409, 1.343293, 1.13, 0.257, –1.110396, 4.155214 age | -B logistic regression no of obs=48 LR chi2(3)=13.67 prob> chi2=0.0034 pseudo R2=0.2191 log likelihood = –24.364293 -died | odds ratio std err Z P>|z| [95% conf interval] + -studytime | 0.9766306, 0.0446976, –0.52, 0.605, 0.8928393, 1.068286 0.316634, 0.1757169, –2.07, 0.038, 0.106704, 0.939581 drug | age | 1.082576, 0.0757144, 1.13, 0.257, 0.9439007, 1.241626 cons | 0.328527, 1.296161, –0.28, 0.778, 0.000144, 749.6794 -margins, grand ey/ex(drug age studytime) average marginal effects no of obs=48 model VCE: OIM expression: pr(died), predict() ey/ex w.r.t.: studytime drug age - | delta-method | ey/ex std err Z P>|z| [95% conf interval] + -studytime | –0.1768585, 0.35401, –0.50, 0.617, –0.8707054, –0.5169884 drug | –0.9822992, 0.5370982, –1.83, 0.067, –2.034992, 0.0703938 age | 1.522409, 1.343293, 1.13, 0.257, –1.110396, 4.155214 -C probit regression no.of obs=48 LR chi2(3)=13.87 prob> chi2=0.0031 log likelihood = –24.263142 pseudo R2=0.2223 .. .RESEARCH METHODOLOGY FOR SOCIAL SCIENCES Research Methodology for Social Sciences provides guidelines for designing and conducting evidence-based research in social sciences and... www.routledge.com/ Contemporary-Issues-in -Social- Science -Research/ book-series/CISSC RESEARCH METHODOLOGY FOR SOCIAL SCIENCES Edited by Rajat Acharyya and Nandan Bhattacharya First published 2020... over research methodology in the social and cultural contexts of India, there has been no sustained engagement Research methodology in social sciences is not and cannot be uniquely defined Research

Định dạng
Số trang	285
Dung lượng	2,19 MB