Eating Highly Processed Food is Correlated with Death

In “Hints for Healthy Eating from the Nurses’ Health Study” I write:

The trouble with observational studies of diet and health that don't include any intervention is the large number of omitted variables that are likely to be correlated with the variables that are directly studied. Still, it is worth knowing for which things one can say:

Either this is bad, or there is something else correlated with it that is bad. 

When multivariate regression is used, one might be able to strengthen this to

Either this is bad, or there is something else bad correlated with it that is not completely predictable from the other variables in the regression.

in discussing “Association Between Ultraprocessed Food Consumption and Risk of Mortality Among Middle-aged Adults in France” by Laure Schnabel, Emmanuelle Kesse-Guyot and Benjamin Allès, I need to go further to elaborate on my interpretation for multivariate regression results that show a coefficient in the undesirable direction for “this”:

Either this is bad, or there is something else bad correlated with it that is not completely predictable from the other variables in the regression.

To make sure the message isn’t lost, let me say this more pointedly: In observational studies in epidemiology and the social sciences, variables that authors say have been “controlled for” are typically only partially controlled for. The reason is that almost all variables in epidemiological and social science data are measured with substantial error. Although things can be complicated in the multivariate context, typically variables that are measured with error get an estimated coefficient smaller than the underlying true relationship. (A higher coefficient multiplies the noise by a bigger number, and that bigger coefficient multiplying the noise is penalized by ordinary least squares, since ordinary least squares is looking for the best linear unbiased predictor, and noise multiplied by a big coefficient hinders prediction.) If the estimated coefficient on a variable meant to control for something is smaller than the true relationship with the true variable underneath the noise, then the variable is only partially controlled for. The only way to truly control for a variable is to do a careful measurement-error model. As a practical matter, anyone who doesn’t mention measurement error and how they are modeling measurement error is almost always not fully controlling for the variables they say they are controlling for.

I like to think of an observed variable as partially capturing the true underlying variable. When, because of measurement error, an observed variable only partially captures the true underlying variable, simply including that variable in multivariate regression will only partially control for the true underlying variable.

If the coefficient of interest is knocked down substantially by partial controlling for a variable Z, it would be knocked down a lot more by fully controlling for a variable Z. It is very common in epidemiology and social science papers to find statements like: “We are interested in the effect of X on Y. Controlling for Z knocks the coefficient on X (the coefficient of interest) down to 2/3 of the value it had without that control, but it is still statistically significantly different from zero.” This is quite worrisome for the qualitative conclusion of interest, because if measurement error biases the coefficient of Z down to only half of what the underlying relationship is, fully controlling for Z using a measurement-error model would be likely to reduce the coefficient of interest by about twice as much, and a coefficient on X that was 1/3 of the size without controls might well be something that could easily happen by chance—that is, not statistically significantly different from zero.

Let’s turn now to the fact that eating highly processed food is positively correlated with mortality. Personally, my prior is that eating highly processed food does, in fact, increase mortality risk. So the statistical point I am making is questioning the strength of the statistical evidence from this French study for a proposition I believe. But ultimately, understanding the statistical tools we use will get us to the truth—and, I believe, through knowing the truth—to a better world.

One of the controls Laure Schnabel, Emmanuelle Kesse-Guyot and Benjamin Allès use is overall adherence to dietary recommendations by the French government (the Programme National Nutrition Santé Guidelines). The trouble is that the true relationship between eating highly processed food and eating badly in other ways is likely to be stronger than what can be shown by the imperfect data they have. That means that, at the end of the day, it is hard to tell whether the extra mortality is coming from the highly processed food or from other dimensions of bad eating that are correlated with a lot of highly processed food. Another set of controls are income and education. Even if the income and education variables measured francs earned last year and number of years of schooling perfectly, what is really likely to be related to people’s causally health-related behavior is probably something more like permanent income on the one hand, and knowledge of health principles on the other—which would depend a lot on dimensions of education such as college major and learning on the job in a profession as well as years of school. Hence, all the things that might stem being poor in the sense of low permanent income (low income not just one particular year, but chronically) and having a low knowledge of health principles are undercontrolled for when they are representative only by typical income and education data. The same kind of argument can be made about controlling for exercise: for health purposes, there are no doubt higher/lower-quality dimensions to exercise that are not fully captured by the exercise data in the French NutriNet-Santé Study that Laure, Emmanuelle and Benjamin are using. If exercise quality were better measured, controlling for more dimensions of exercise would likely knock the coefficient of ultraprocessed food consumption down a bit more.

The bottom line is that there is definitely something about what people who eat a lot of highly processed food do, or about the situations people who eat a lot of highly processed food are in that leads to death, but it is not clear that the highly processed food itself is doing the job. Highly processed food might have been merely driving the getaway car rather than firing the bullet that accomplished the hit job. Highly processed food is clearly hanging out with some bad actors if it didn’t fire the gun, but it is hard to convict it of committing the crime itself.

In absence of clearcut statistical evidence of causality, theory becomes important in helping to establish priors that will affect how one reads ambiguous data. I lay out theoretical reasons for being suspicious of processed food in “The Problem with Processed Food.” One of the problems with processed food is its typical reliance on sugar in some form to make processed food tasty. Laure Schnabel, Emmanuelle Kesse-Guyot and Benjamin Allè given other theoretical reasons to worry about processed food in this passage (from which I have omitted the many citation numbers that pepper it for the sake of readability):

First, studies have documented the carcinogenicity of exposure to neoformed contaminants found in foods that have undergone high-temperature processing. The European Food Safety Authority stated in 2015 that acrylamide was suspected to be carcinogenic and genotoxic, and the International Agency for Research on Cancer classified acrylamide as “probably carcinogenic to humans” (group 2A). Some studies reported a modest association between dietary acrylamide and renal or endometrial cancer risk. Further research is necessary to confirm these speculative hypotheses. Similarly, meat processing can produce carcinogens. The International Agency for Research on Cancer reported in 2015 that processed meat consumption was carcinogenic to hu- mans (group 1), citing sufficient evidence for colorectal cancer. Moreover, the agency found a positive association be- tween processed meat consumption and stomach cancer.

In addition, ultraprocessed foods are characterized by the frequent use of additives in their formulations, and some studies have raised concerns about the health consequences of food additives. For instance, titanium dioxide is widely used by the food industry. However, findings from experimental studies suggest that daily intake of titanium dioxide may be associated with an increased risk of chronic intestinal inflammation and carcinogenesis. Likewise, experimental studies have suggested that consumption of emulsifiers could alter the composition of the gut microbiota, therefore promoting low-grade inflammation in the intestine and enhancing cancer induction and metabolic syndrome. In addition, some findings suggest that artificial intense sweeteners could alter microbiota and be linked with the onset of type 2 diabetes and metabolic diseases, which are major causes of premature mortality.

Food packaging is also suspected to have endocrine-disrupting properties. During storage and transportation of food products, chemicals from food-contact articles can migrate into food, some of which might negatively affect health, such as bisphenol A. Epidemiologic data have suggested that endocrine disruptors are associated with an increased risk of endocrine cancers and metabolic diseases, such as diabetes and obesity.

After the passage of time has led to more deaths and therefore increased the statistical power available from the French data set, one way to test the importance of these forces will be to look at the association of the consumption of highly processed food with different causes of death. For example, it would be tantalizing evidence about causal channels if eating ultraprocessed food predicted a higher risk of death due to cancer by a bigger ratio than the ratio by which it predicted a higher risk of death due to cardiovascular disease.

What are highly processed foods—or in term used by the authors, “ultraprocessed foods”? Laure, Emmanuelle and Benjamin write:

Each of the 3000 foods in the NutriNet-Santé Study composition table was classified according to the NOVA food classification system, which categorizes food products into 4 groups according to the nature, extent, and purpose of processing. This current study focused on 1 group classified as ultraprocessed foods, which are manufactured industrially from multiple ingredients that usually include additives used for technological and/or cosmetic purposes. Ultraprocessed foods are mostly consumed in the form of snacks, desserts, or ready-to-eat or -heat meals.

I have been struck by how cutting out sugar almost automatically cuts out the vast majority of highly processed foods. So it is not easy to tell apart harm from sugar and harm from highly processed foods. (I give tips for going off sugar in “Letting Go of Sugar.”) But it also means that currently it is not that important for one’s own personal efforts to avoid early death to distinguish between the harms of sugar and the harm from highly processed food. (In the future, it might become extremely important to distinguish between the two if large number of people started avoiding sugar and food companies started reformulating their processed food to leave out sugar.)

My recommendation is to cut back on sugar and highly processed foods—efforts in which there are many opportunities to kill both birds—sugar and processed food—with one stone. For more detailed recommendations on good and bad foods, see:

For annotated links to other posts on diet and health, see: