Experimental quasi experimental design

International Initiative for Impact Evaluation Experimental and QuasiQuasiExperimental Designs Marie M Gaarder, Deputy Director, 3ie Prague January 14, 2010 Marie M Gaarder www.3ieimpact.org Why undertake Impact Evaluation? •Did the program/intervention have the desired effects on beneficiary individuals/households/communities? •Can these effects be attributed to the program/ intervention? •Did the program/intervention have unintended effects on the beneficiaries? ….on the non-beneficiaries (externalities)? •Is the program cost-effective? What we need to change to become more effective? Marie M Gaarder www.3ieimpact.org Quest: finding a valid counterfactual • Understand the process by which program participation (treatment) is determined • The treated observation and the counterfactual should have identical characteristics, except for benefiting from the intervention a Only reason for different outcomes between treatment and counterfactual is the intervention a Need to use experimental or quasi-experimental methods to cope with selection bias; this is what has been meant by rigorous impact evaluation Marie M Gaarder www.3ieimpact.org How you get valid counterfactuals? • Experimental – (randomized control trials = RCTs) • Quasi-experimental – Propensity score matching – Regression discontinuity – Regressions (including instrumental variables) • Additional tools at disposal – Pipeline approach – Difference in difference Marie M Gaarder www.3ieimpact.org Randomisation Treatment, T Control, C ØMunicipalities ØIndividuals/ households Marie M Gaarder www.3ieimpact.org Randomization (RCTs) • Randomization addresses the problem of selection bias by the random allocation of the treatment • Randomization may not be at the same level as the unit of intervention – Randomize across schools but measure individual learning outcomes – Randomize across sub-districts but measure village-level outcomes • The less units over which you randomize the higher your standard errors • But you need to randomize across a ‘reasonable number’ of units Marie M Gaarder www.3ieimpact.org Issues in Randomization • Can randomize across the pipeline • Is no less ethical than any other method with a control group (perhaps more ethical) • Any intervention which is not immediately universal in coverage has an untreated population to act as a potential control group Marie M Gaarder www.3ieimpact.org Conducting an RCT • Has to be an ex-ante design • Has to be politically feasible, and confidence that program managers will maintain integrity of the design • Perform power calculation to determine sample size (and therefore cost) • Adopt strict randomization protocol • Maintain information on how randomization done, refusals and ‘cross-overs’ • A, B and A+B designs (factorial designs) • Collect baseline data to: – Test quality of the match – Conduct difference in difference analysis Marie M Gaarder www.3ieimpact.org When is randomization really not possible? • The treatment has already been assigned and announced • The program is over (retrospective) • Universal eligibility and universal access • Operational / political constraints Marie M Gaarder www.3ieimpact.org Example of RCT: PES Testing the Effectiveness of Payments for Ecosystem Services (PES) to Enhance Conservation in Uganda – Chimpanzees – Carbon sequestration • Intervention: Local landowners receive financial compensation for conserving forest areas on their land and undertaking reforestation • Evaluation design: – Objective: measure the causal effect of the PES scheme on the rate of deforestation and socio-economic welfare – The PES scheme will randomly select villages (i.e clustered random sampling) among a pool of eligible villages – 400 local landowners will participate in the program – Control: similar number of landowners from the control villages Marie M Gaarder www.3ieimpact.org Design #5: ExEx-post matching (if possible include recall questions to create exex-post baseline) Comparison group matched based on observable characteristics (available from survey) Project participants Comparison group Follow -up Followevaluation Marie M Gaarder www.3ieimpact.org Time Design #6 ExEx-post RDD (if possible include recall questions to create exex-post baseline) Project participants Comparison group Follow -up Followevaluation Marie M Gaarder www.3ieimpact.org Comparison group found among the units (households/ individuals / districts) who were just above (or below) the cut-off point for eligibility (i.e marginally excluded) Time Design #7: Before and after evaluation Case-study approach Project participants baseline Marie M Gaarder Follow -up Followevaluation www.3ieimpact.org Time 33 Design #8: PostPost-test only of project participants Project participants end of project evaluation Marie M Gaarder www.3ieimpact.org Time 34 scale of major impact indicator Selecting a quantitative IE design approach Project participants Comparison group baseline Marie M Gaarder midterm end of project evaluation www.3ieimpact.org post project evaluation 35 Exercise • What sort of quasi-experimental design seems appropriate for your program Marie M Gaarder www.3ieimpact.org International Initiative for Impact Evaluation Thank you Visit: www.3ieimpact.org Marie M Gaarder www.3ieimpact.org Annex A • Calculating sample size Marie M Gaarder www.3ieimpact.org Sample size for randomized evaluations • How large does the sample need to be to credibly detect a given effect size? • What does credibly mean? Measuring with a certain degree of confidence the difference between participants and non-participants • Key ingredients: number of units (e.g villages) randomized; number of individuals (e.g households) within units; info on the outcome of interest and the expected size of the effect Marie M Gaarder www.3ieimpact.org Type error • First type of error: conclude that there is an effect when there is none • The significance level of the test is the probability that you will falsely conclude that the program has an effect, when in fact it does not So with a level of 5%, you can be 95% confident in the validity of your conclusion that the program had an effect • For policy purpose, you want to be very confident in the answer you give: the level will be set fairly low Common levels are 5%, 10% Marie M Gaarder www.3ieimpact.org Type error • Second type of error: fail to reject that the program had no effect, when it fact it does have an effect • The power of a test is the probability that I will be able to find a significant effect in my experiment if indeed there truly is an effect Marie M Gaarder www.3ieimpact.org Practical steps • Set a pre-specified significance level (5%) • Set a range of pre-specified effect sizes (what you think the program will do) What is the smallest effect that would prompt a policy response? • Decide for a sample size that allows to achieve a given power Should not be lower than 80% Intuitively, the larger the sample, the larger the power • Power is a planning tool: one minus the power is the probability to be disappointed… Marie M Gaarder www.3ieimpact.org Sample size calculation • Formula for sample size calculation Standard deviation Increases with the level of power Decreases with the significance level Marie M Gaarder www.3ieimpact.org Effect size of interest Try it! • Panama CCT program expected to have a nutritional impact after years of program implementation • Program document /logframe had predicted a decrease in stunting (measured by height for age) of pp ã Assume a=0.05, and significance ò=80% A=7.85 ã Assume standard deviation of the change in height for age: e.g 70 percentage points C Calculate the required sample size per group to detect your desired outcome n=7.85 x (0.72)/(0.052)=1539 Marie M Gaarder www.3ieimpact.org Correlation ? Causation Marie M Gaarder www.3ieimpact.org ... different outcomes between treatment and counterfactual is the intervention a Need to use experimental or quasi -experimental methods to cope with selection bias; this is what has been meant by rigorous... Marie M Gaarder www.3ieimpact.org How you get valid counterfactuals? • Experimental – (randomized control trials = RCTs) • Quasi -experimental – Propensity score matching – Regression discontinuity... IE design approach Project participants Comparison group baseline Marie M Gaarder midterm end of project evaluation www.3ieimpact.org post project evaluation 35 Exercise • What sort of quasi- experimental

Định dạng
Số trang	45
Dung lượng	823,63 KB
File đính kèm	160. experimental quasi experimental design.rar (387 KB)