Exam 2
Date: Thursday, April 16, in Class
NOTE: The Tuesday after the exam, April 21, please bring a device to do the FCQs for this course.
Mode: All Multiple Choice
Readings:
Applying Basic Micro Preferences and Technology (or Supply and Demand) to Happiness
Statistical Topics:
Multiple Hypothesis Test Correction Using Benjamini-Hochberg FDR Method
OLS and IV bias using arrow diagrams
Miles’s 5 big mistakes people make in statistical interpretation
The Generic Confound of the Social Sciences
Scale-Use Heterogeneity
Control variables measured with error only partially control, plus the fact that almost everything is only a proxy (and therefore measured with error) for what you really want, theoretically
p-hacking (which FDR addresses if people are honest)
The interpretation of coefficients changing when other variables are in the regression.
For example, if ln(HH size) is in the regression, then the coefficient on ln(Y) can be interpreted as being about effective income per person (after possibly allowing for returns to scale). Without ln(HH size) in the regression, it doesn’t have that interpretation.
Similarly, if ln(height) is in a regression, then the coefficient on ln(weight) can be interpreted as being about BMI (body mass index). Without ln(height) in the regression, it could be just as much about height (which on average is positively correlated with weight) and not about fat/thin.
As a 3d example, with ln(Y) in the regression, the coefficient on a dummy for finishing college is about correlations with education beyond any relation of education to income. Without ln(Y) in the regression, it includes relationships involving income.
Resources for Thinking about Statistical Issues.
PRIORITY: DO THE ARROW DIAGRAM EXERCISE AND THE ARROW BIAS DIAGRAM EXERCISE
Improved Instructions for accounting for Multiple-Hypothesis Testing using the Benjamini-Hochberg “False-Discovery Rate (FDR) Procedure” by hand (as you will need to do on the exam)
Choose a set of hypotheses that, a priori you think are about equally likely to have you find a rejection of the null hypothesis. If you mix in with the hypotheses you really care about with others almost sure to reject, you will make it too easy to get FDR rejections. If you mix in with the hypotheses you really care about others where you are almost sure to accept the null, you will make it hard to get FDR rejections. Why? You are declaring a set of things FDR significant and saying that on average only x% of that set would have seemed to reject just by chance.
Column 1: Write down the p-values for rejection of each of your hypotheses from smallest to largest: #1 for smallest, #2 for second smallest etc. Let me call that number the “ordinal rank” of the hypothesis.
Column 2: Multiply each p-value by the number of hypotheses
Column 3: Divide each number in Column 2 by the ordinal rank of the hypothesis
Column 4: Find the smallest number in Column 3. Copy it over to row 4 on the same row and to all rows in Column 4 above that. Now, find the smallest number in the rows below that in Column 3. Copy it over to row 4 on the same row and to all rows in Column 4 above that, but below the rows where you have already written a lower number. Keep going with this procedure: smallest number in remaining rows propagates upward in all the blank spots in Column 4.
The numbers in Column 4 are now the FDR significance level for each hypothesis (which is, loosely speaking, a p-value corrected for multiple-hypothesis testing).