Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 45 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
45
Dung lượng
823,63 KB
File đính kèm
160. experimental quasi experimental design.rar
(387 KB)
Nội dung
International Initiative for Impact Evaluation Experimental and QuasiQuasiExperimental Designs Marie M Gaarder, Deputy Director, 3ie Prague January 14, 2010 Marie M Gaarder www.3ieimpact.org Why undertake Impact Evaluation? •Did the program/intervention have the desired effects on beneficiary individuals/households/communities? •Can these effects be attributed to the program/ intervention? •Did the program/intervention have unintended effects on the beneficiaries? ….on the non-beneficiaries (externalities)? •Is the program cost-effective? What we need to change to become more effective? Marie M Gaarder www.3ieimpact.org Quest: finding a valid counterfactual • Understand the process by which program participation (treatment) is determined • The treated observation and the counterfactual should have identical characteristics, except for benefiting from the intervention a Only reason for different outcomes between treatment and counterfactual is the intervention a Need to use experimental or quasi-experimental methods to cope with selection bias; this is what has been meant by rigorous impact evaluation Marie M Gaarder www.3ieimpact.org How you get valid counterfactuals? • Experimental – (randomized control trials = RCTs) • Quasi-experimental – Propensity score matching – Regression discontinuity – Regressions (including instrumental variables) • Additional tools at disposal – Pipeline approach – Difference in difference Marie M Gaarder www.3ieimpact.org Randomisation Treatment, T Control, C ØMunicipalities ØIndividuals/ households Marie M Gaarder www.3ieimpact.org Randomization (RCTs) • Randomization addresses the problem of selection bias by the random allocation of the treatment • Randomization may not be at the same level as the unit of intervention – Randomize across schools but measure individual learning outcomes – Randomize across sub-districts but measure village-level outcomes • The less units over which you randomize the higher your standard errors • But you need to randomize across a ‘reasonable number’ of units Marie M Gaarder www.3ieimpact.org Issues in Randomization • Can randomize across the pipeline • Is no less ethical than any other method with a control group (perhaps more ethical) • Any intervention which is not immediately universal in coverage has an untreated population to act as a potential control group Marie M Gaarder www.3ieimpact.org Conducting an RCT • Has to be an ex-ante design • Has to be politically feasible, and confidence that program managers will maintain integrity of the design • Perform power calculation to determine sample size (and therefore cost) • Adopt strict randomization protocol • Maintain information on how randomization done, refusals and ‘cross-overs’ • A, B and A+B designs (factorial designs) • Collect baseline data to: – Test quality of the match – Conduct difference in difference analysis Marie M Gaarder www.3ieimpact.org When is randomization really not possible? • The treatment has already been assigned and announced • The program is over (retrospective) • Universal eligibility and universal access • Operational / political constraints Marie M Gaarder www.3ieimpact.org Example of RCT: PES Testing the Effectiveness of Payments for Ecosystem Services (PES) to Enhance Conservation in Uganda – Chimpanzees – Carbon sequestration • Intervention: Local landowners receive financial compensation for conserving forest areas on their land and undertaking reforestation • Evaluation design: – Objective: measure the causal effect of the PES scheme on the rate of deforestation and socio-economic welfare – The PES scheme will randomly select villages (i.e clustered random sampling) among a pool of eligible villages – 400 local landowners will participate in the program – Control: similar number of landowners from the control villages Marie M Gaarder www.3ieimpact.org Design #5: ExEx-post matching (if possible include recall questions to create exex-post baseline) Comparison group matched based on observable characteristics (available from survey) Project participants Comparison group Follow -up Followevaluation Marie M Gaarder www.3ieimpact.org Time Design #6 ExEx-post RDD (if possible include recall questions to create exex-post baseline) Project participants Comparison group Follow -up Followevaluation Marie M Gaarder www.3ieimpact.org Comparison group found among the units (households/ individuals / districts) who were just above (or below) the cut-off point for eligibility (i.e marginally excluded) Time Design #7: Before and after evaluation Case-study approach Project participants baseline Marie M Gaarder Follow -up Followevaluation www.3ieimpact.org Time 33 Design #8: PostPost-test only of project participants Project participants end of project evaluation Marie M Gaarder www.3ieimpact.org Time 34 scale of major impact indicator Selecting a quantitative IE design approach Project participants Comparison group baseline Marie M Gaarder midterm end of project evaluation www.3ieimpact.org post project evaluation 35 Exercise • What sort of quasi-experimental design seems appropriate for your program Marie M Gaarder www.3ieimpact.org International Initiative for Impact Evaluation Thank you Visit: www.3ieimpact.org Marie M Gaarder www.3ieimpact.org Annex A • Calculating sample size Marie M Gaarder www.3ieimpact.org Sample size for randomized evaluations • How large does the sample need to be to credibly detect a given effect size? • What does credibly mean? Measuring with a certain degree of confidence the difference between participants and non-participants • Key ingredients: number of units (e.g villages) randomized; number of individuals (e.g households) within units; info on the outcome of interest and the expected size of the effect Marie M Gaarder www.3ieimpact.org Type error • First type of error: conclude that there is an effect when there is none • The significance level of the test is the probability that you will falsely conclude that the program has an effect, when in fact it does not So with a level of 5%, you can be 95% confident in the validity of your conclusion that the program had an effect • For policy purpose, you want to be very confident in the answer you give: the level will be set fairly low Common levels are 5%, 10% Marie M Gaarder www.3ieimpact.org Type error • Second type of error: fail to reject that the program had no effect, when it fact it does have an effect • The power of a test is the probability that I will be able to find a significant effect in my experiment if indeed there truly is an effect Marie M Gaarder www.3ieimpact.org Practical steps • Set a pre-specified significance level (5%) • Set a range of pre-specified effect sizes (what you think the program will do) What is the smallest effect that would prompt a policy response? • Decide for a sample size that allows to achieve a given power Should not be lower than 80% Intuitively, the larger the sample, the larger the power • Power is a planning tool: one minus the power is the probability to be disappointed… Marie M Gaarder www.3ieimpact.org Sample size calculation • Formula for sample size calculation Standard deviation Increases with the level of power Decreases with the significance level Marie M Gaarder www.3ieimpact.org Effect size of interest Try it! • Panama CCT program expected to have a nutritional impact after years of program implementation • Program document /logframe had predicted a decrease in stunting (measured by height for age) of pp ã Assume a=0.05, and significance ò=80% A=7.85 ã Assume standard deviation of the change in height for age: e.g 70 percentage points C Calculate the required sample size per group to detect your desired outcome n=7.85 x (0.72)/(0.052)=1539 Marie M Gaarder www.3ieimpact.org Correlation ? Causation Marie M Gaarder www.3ieimpact.org ... different outcomes between treatment and counterfactual is the intervention a Need to use experimental or quasi -experimental methods to cope with selection bias; this is what has been meant by rigorous... Marie M Gaarder www.3ieimpact.org How you get valid counterfactuals? • Experimental – (randomized control trials = RCTs) • Quasi -experimental – Propensity score matching – Regression discontinuity... IE design approach Project participants Comparison group baseline Marie M Gaarder midterm end of project evaluation www.3ieimpact.org post project evaluation 35 Exercise • What sort of quasi- experimental