Causal Inference in Education Research: Principles and Applications of Related Methods
-
摘要: 近二十年来,因果关系推断方法快速发展成熟,并逐渐占据微观计量方法领域的主流地位。本文首先对因果关系推断方法兴起的背景进行了介绍;其次,探讨了判定因果关系需满足的三个条件,对在实验数据和非实验(观测)数据条件下进行因果判定的主要困难,以及观测数据研究中异质性残值的产生原因与构成进行了剖析;其三,借助小班化教学与"新机制"改革效果评价的实际案例,依次阐述了断点回归、工具变量、倾向得分结合倍差等准实验研究方法的基本原理与实现过程;最后,对准实验研究所面临的内部有效性质疑进行了回应,强调对选用方法背后隐含假设进行稳健性检验的重要性。Abstract: In the past twenty years, causal inference has developed rapidly and gradually dominated the field of micrometrics. The paper first introduces the context of the emerging causal inference methods. Next, we discuss three preconditions to reach a causal conclusion, point out the major problems with making causal inference in the experimental and non experimental studies, and analyze the main causes and components of heterogeneous residual that commonly exist in the observation studies. Then, using cases of impact evaluation of small class teaching and new mechanism reform, the paper illustrates the basic principles and analyzes procedures of some quasi experimental methods, including regression discontinuity, instrumental variable, propensity score method and double difference. Finally, in response to the doubts about the internal validity of quasi experimental studies, we emphasize the importance of robustness and sensitivity test of the implicit hypothesis that hide behind quasi experimental methods.
-
Key words:
- causal inference /
- educational research /
- quasi experiment /
- heterogeneous residual /
- internal validity
1) 此处的自我选择可能是学生及家庭主动的选择,也可能是被动的选择。家庭背景好的学生就读的班额大小可能是自己主动寻求的结果,而家庭背景差的学生就读的班额大小可能是不得不接受、无从选择的结果。无论是自动还是被动选择,都表现为个体选择为非随机,受制于某些个体特征。2) 所谓统计功效是指我们能够正确地拒绝一个错误假设,估计出真实处理效应的能力,一个研究的统计功效取决于多种因素,其中包括样本容量。3) 通常以样本中倾向分值标准差的1/4为匹配半径。4) 注意此处必须使用改革之前的数据进行分析,以保证各自变量都发生在改革之前,都是改革的前定变量。5) 也正是这个原因,断点回归又被称为“自然实验”(natural experiment)。6) 之所以要采用(ENROL-41) 的中心化处理,是要保证公式(2) 中的估计系数β1恰好等于图 1中垂直跳跃的值(本例中该值即等于10)。7) 这一要求被称为共同支撑假设(common support assumption),我们可以绘制出处理组与控制倾向得分的分布图,通过观测这两个分布重复区域的大小以检测研究数据是否满足共同支撑假设。8) 对于预测概率估计我们可以选择逻辑回归、probit回归或更加复杂但更具有稳健性的广义自举回归(Generalized Boosted Modeling,GBM),函数形式可以选择线性或非线性形式,对于数据匹配我们可以选择近邻匹配、半径匹配、近邻结合半径匹配、马氏(Mahalanobis)距离匹配、核匹配(Kernel-based matching),等等。不同匹配策略各具优缺点,所形成的匹配样本数量亦常常有很大差别。具体讨论参见Guo & Fraser(2010) 与Imbens & Rubin(2015)。9) 内部有效性是指研究结果能否真实地反映出样本中变量间的因果关系,外部有效性是指样本分析所得到的因果关系能否推广至总体。10) 学校入学人数是否达到41人决定了学校在断点左右两侧的落点。11) 如果不存在随时间变化异质性,那么所有不可观测异质性对于处理组和控制组结果变量的影响都不会随时间变化,于是我们就应观测到在接受处理前后处理组与控制组的结果变量应当具有相同的随时间变化趋势。该假设被称为平行趋势假设(parallel trend assumption)。12) 这也就说是,我们运用随机分配实现了完全的控制,处理组和控制组只在是否接受处理上存在差异,其他完全相同。13) 2015年奥巴马政府颁布了新的法案《每一个孩子都成功法》(Every Student Succeeds Act)。新法案对老法案作出了一定的修订,包括放松联邦政府对州一级的绩效考核,改进原有严格基于学生学业成绩的学校问责与拨款制度等,但有关强调科学因果推断结论对于形成教育政策的重要性与优先地位的相关论述未变。 -
图 1 断点回归原理图解
注:为简化讨论,我们对Angrist & Lavy(1999)的原始数据进行了适当的修改
表 1 未发生改革时的数据平衡检验结果
变量 改革县与未改革县均值差 数据匹配前 数据匹配后 近邻配合半径匹配法 马氏距离匹配法 人均GDP -2133.5*** -127.27 -106.98 人均一般性财政转移支付 198.29*** -66.69*** 2.48 人均专项财政转移支付 145.3*** 33.60*** 3.69 总人口数 -16.08*** 4.30** -1.13 人口密度 -0.02*** 0.00 0.00 财政供养人口比例 0.01*** 0.004** 0.000 农村人口占比 -0.02** 0.00 0.00 东部地区 -0.14*** 0.06** 0.00 样本数 3754 1166 428 注:样本仅包括县、县级市,剔除了所有的市辖区。为保持行政区划一致,我们把2005-2006年间所有发生行政区划变动的县级单位剔除,最终形成了2005和2006年各1877个县级单位的面板数据。表中数字等于各变量的改革县均值减去未改革县均值,数字右上标星号表示各变量均值差的t检验结果,***0.01水平上显著,**0.05水平上显著,*0.1水平上显著 表 2 “新机制”改革水平效应的估计结果
变量 小学 初中 OLS 马氏匹配+倍差法 OLS 马氏匹配+倍差法 截距 -204.96*** (19.35) 63.43*** (10.45) -322.93*** (43.38) 106.56*** (20.61) 新机制改革 66.42*** (10.75) 102.96***(13.59) 79.77*** (24.10) 115.07*** (26.35) 其他控制变量(略) … … … … 注:***0.01水平上显著**0.05水平上显著,*0.1水平上显著;小括号内为估计系数标准误;在回归中,我们还控制了其他一些变量,包括各县自有财力、上级各类转移支付、小学和初中在校生人数、人口数量与密度等 -
[1] Angrist, J., Bettinger, E., & Kremer, M. (2006). Long-term educational consequences of secondary school vouchers:Evidence from administrative records in Colombia. The American Economic Review, 96(3), 847-862. doi: 10.1257/aer.96.3.847 [2] Angrist, J. D., & Lavy, V. (1999). Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement.The Quarterly Journal of Economics, 114(2), 533-575. doi: 10.1162/003355399556061 [3] Angrist, J. D., & Steffen, P. J. (2015).Mastering metrics:the path from cause to effect. NJ:Princeton University Press. [4] Borman, G. D. (2009). The use of randomized trials to inform education policy. In Sykes, G., Schneider, B. & Plank, D. N. (Eds.).Handbook of education policy research (pp. 129-138). New York:Routledge. [5] Cappelleri, J. C., Darlington, R. B., & Trochim, W. M. K. (1994). Power analysis of cutoff-based randomized clinical trials. Evaluation Review, 18(2), 141-152. doi: 10.1177/0193841X9401800202 [6] Card, D. (1999).The Causal Effect of Education on Earnings.In Ashenfelter, O. & Card, D. (Eds.).Handbook of Labor Economics, 3A (pp. 1801-1864). New YorK:Elsevier. [7] Guo, S., & Fraser, M. W. (2010).Propensity score analysis. Thousand Oaks:Sage. [8] Heckman, J. J. (1979). Sample selection bias as a specification error.Econometrica, 47(1), 153-161. doi: 10.2307/1912352 [9] Heckman, J. J. (2005). The scientific model of causality.Sociological methodology, 35(1), 1-97. doi: 10.1111/j.0081-1750.2006.00164.x [10] Hoxby, C. M. (2000). The Effects of Class Size on Student Achievement:New Evidence from Population Variation. The Quarterly Journal of Economics, 115(4), 1239-1285. doi: 10.1162/003355300555060 [11] Imbens, G. W., & Angrist, J. D. (1994).Identification and estimation of local average treatment effects.Econometrica, 62(2), 467-475. doi: 10.2307/2951620 [12] Imbens, G. W., & Rubin, D. B. (2015).Causal inference in statistics, social, and biomedical sciences:An introduction.New York:Cambridge University Press. [13] Jacob, B. A., & Lefgren, L. (2004). Remedial education and student achievement:A regression-discontinuity analysis. Review of Economics and Statistics, 86(1), 226-244. doi: 10.1162/003465304323023778 [14] Kaplan, D. (2009). Causal inference in non-experimental educational policy research.In Sykes, G., Schneider, B. & Plank, D. N. (Eds.).Handbook of education policy research (pp. 139-153). New York:Routledge. [15] Khandker, S. R., Koolwal, G. B., & Samad, H. A. (2010).Handbook on impact evaluation:quantitative methods and practices. Washington, D. C.:World Bank Publications. [16] Lee, D. S., & Lemieuxa, T. Regression discontinuity designs in economics.Journal of Economic Literature, 48(2), 281-355. doi: 10.1257/jel.48.2.281 [17] Li, H., & Luo, Y. (2004). Reporting errors, ability heterogeneity, and returns to schooling in China. Pacific Economic Review, 9(3), 191-207. doi: 10.1111/per.2004.9.issue-3 [18] Murnane, R. J., & Willett, J. B. (2011).Methods matter:Improving causal inference in educational and social science research. Oxford:Oxford University Press. [19] Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39(1), 33-38. doi: 10.1080/00031305.1985.10479383 [20] Rubin, D. B. (1986).Which ifs have causal inference. Journal of the American Statistical Association, 81, 961-962. http://bacbuc.hd.free.fr/WebDAV/data/DOM/StatMeths/Rubin-JASA1986.pdf [21] Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002).Experimental and quasi-experimental designs for generalized causal inference. Boston:Houghton, Mifflin and Company. [22] Smith, W. C. (2014). Estimating unbiased treatment effects in education using a regression discontinuity design. Practical Assessment, Research & Evaluation, 19(9), 2. http://www.academia.edu/7855396/Estimating_unbiased_treatment_effects_in_education_using_a_regression_discontinuity_design [23] Thistlethwaite, D. L., & Campbell, D. T. (1960). Regression-discontinuity analysis:An alternative to the ex post facto experiment. Journal of educational Psychology, 51(6), 309. doi: 10.1037/h0044319 [24] Trochim, W. M. K. (1984). Research design for program evaluation:the regression-discontinuity approach. CA:Sage. http://www.worldcat.org/title/research-design-for-program-evaluation-the-regression-discontinuity-approach/oclc/10349137 [25] 黄斌, 钟晓琳. (2012).中国农村地区教育与个人收入——基于三省六县入户调查数据的实证研究.教育研究, (3), 18-26. http://www.cnki.com.cn/Article/CJFDTOTAL-JYYJ201203004.htm [26] 黄斌, 汪栋.(2016).中国义务教育财政投入的回顾与展望.华中师范大学学报:人文社会科学版, 55(4), 154-161. http://www.cnki.com.cn/Article/CJFDTOTAL-HZSD201604019.htm [27] 黄斌, 苗晶晶, 金俊. (2016). "新机制"改革对农村中小学公用经费的因果效应分析——基于准实验研究设计(工作论文). 南京: 南京财经大学公共财政研究中心.