The Analysis Task

The Analysis Task is now posted on Canvas.

Understanding the data:

This link takes you to the public Dropbox folder with the data files. Start by looking at the README file. Our Well-Being Measurement Initiative Research Assistant Jeffrey Ohl can answer your questions: johl@umich.edu Make sure to include Jeffrey's email address on any question about the data. He'll do most of the answering about the data himself. You can do almost anything for the analysis task; it just needs to be interesting.

  1. This is a link to take the Baseline survey so you can understand what data is available and what questions the data are based on: https://wiagl.gitlab.io/survey-baseline/?workerId=[enter your name, or your number plus same random numbers]

  2. This is a link to the Life & Psyche survey so you can understand what data is available and what questions the data are based on: https://ucla.qualtrics.com/jfe/form/SV_8kK2HMh6YrGSEF8. This is the survey that has most of the psychological indexes. (Baseline only has a few.) It has some other miscellaneous questions, too. Only some of the people who did Baseline went on to do this survey.

  3. This is a link to take the Bottomless HIT survey so you can understand what data is available and what questions the data are based on: https://wiagl.gitlab.io/survey-bottomlessa/. Only some of the people who did Baseline went on to do this survey (an overlapping, but different subset than those who went on to do the Life & Psyche survey.) You don't have to do all of this—just keep going until you have an idea for what analysis you want to do. The very first Block is a repeat of what is on Baseline, but it gets different after that.

Relevant Powerpoint File:

The analysis task is due by 11 PM Saturday, March 18. It needs to report the analysis with tables or figures and also have text that clearly explains the analysis. The idea is that this is like one section of a paper.

If you have an idea of what to do for the analysis task, just send me and Jeffrey an email and I'll give a reaction of how interesting I think it is, and maybe a suggestion for a tweak.

Seeing the analysis and its explanation as one section of your term paper. (Your term paper is due later, at 11 PM on Wednesday, May 3—the evening after the last class.) The idea is to make this analysis part of a larger discussion.

Including figures and tables, the analysis task should be at least 5 pages. I'll take a risk and not put an upper limit on the length of the analysis task. (The term paper beyond the analysis task should be between 5 and 10 pages, with closer to 5 being preferred.)

How to structure your writeup of the Analysis Task:

You can design a different structure, but a typical writeup could look like this:

  1. Here is an interesting question or questions. The answers matter (people care or should care) because: …

  2. Here is a statistical analysis that seems to have some bearing on this question or questions:

  3. On the surface the statistical results seem to say: …

  4. However, the following confounding factors could be giving rise to an illusion, making it seem like something is there that isn’t or that something is bigger or different than it really is.

Don’t forget to talk about the confounding factors! (4.)

Here is a Q&A about the analysis task:

Q:

What is the level of analysis you are expecting for this assignment? I’ve taken some stats classes, so I’m familiar with hypothesis testing and regression, but since this class doesn’t have a stats prerequisite I’m not sure how in depth I should go for this assignment.

Since most aspects of wellbeing are correlated with each other, it seems to difficult to use regression to analyze relationships between these aspects without running into reverse causality, cousin causality, or both. My knowledge of stats isn’t sufficient to avoid these problems in cases where instrumental variable regression isn’t a viable alternative. I’m wondering what you would suggest that I do to avoid this issue.

A:

At the low end, it could be simply some scatter plots or bar charts or other interesting graphs.

I don't expect you to have consistent estimates of anything, rather to be able to discuss any biases there might be in the estimates you do get, relative to something interesting. Please make the attempt to figure out the sign (+ or -) of any bias you discuss, and say what that would mean for the truth of the interesting thing one might care about. If there are multiple biases, try to figure out the sign of each one, even if all the biases put together can't be signed because some biases are likely to be + and others are likely to be -. Also, discuss whether you think a bias is likely to be large or small.

Advice for the Analysis Task:

  1. Use lots of graphs. I love scatterplots, but other types of graphs and figures can be good, too.

  2. It’s fine to do some statistics on individual variables, but make sure you do something that relates pairs of variables to each other.

  3. Do some formal statistical tests.

  4. When you test more than one hypothesis, set it up so you can do the multiple hypothesis test correction using the False Discovery Rate procedure!

  5. Make a distinction between being significant at the 5% level and being significant at the 1/2 % level.

  6. If something isn’t statistically significant, you say “I can’t reject the null hypothesis that …” NOT “I reject the alternative hypothesis.” If you want to reject a hypothesis, you have to set it up as a null hypothesis.

  7. Recognize reverse causality and cousin causality, including the consumer-theory-esque model I gave in class of how resources broadly construed help all good things, leading to the general principle (with only a few exceptions) that “All good things are positively correlated.” (This is a statement about the cross section.

  8. Define variables in full. You need to act like your reader doesn’t know what the abbreviations mean. So write out the full text of the aspects, and describe fully all other variables. (You will see that we do this in our papers.)

  9. Don’t order response categories alphabetically! They need to be ordered logically. For example, political leanings should be ordered from Left to Right and levels of education should be ordered from less to more.

  10. When you have interesting results for several variables that are along the same lines, think of creating a simple index to get more statistical power. That is, take simple averages of similar variables and treat that simple average as an index.

  11. Think about how nearly statistically exogenous your right-hand-side variables are. Other things equal, regressions with more nearly statistically exogenous right-hand-side variables are more interesting. That doesn’t mean you can’t do other things. Just think about this dimension.

  12. Think seriously about scale use. Any statistical analysis you do with aspect-of-well-being data you can probably do both with the raw aspect ratings and with (aspect rating - average of calibration questions). Doing both of those analyses will be much more interesting than just the one analysis.