Learn → Module 01

How nutrition science actually works

Nutrition science is a young, underfunded, methodologically constrained discipline whose tools — questionnaires, observational cohorts, surrogate endpoints — were not built for the questions reporters and policy bodies ask of it. Reading any claim requires understanding what each study design can and cannot prove.

13 min read

How nutrition science actually works

TL;DR. Nutrition science is young, poor, and messy. Long-term food trials are almost impossible to run. So the field leans on studies that ask people what they ate, which they cannot remember well. Headlines blow up tiny risk numbers. About 70% of US food research is paid for by the food industry. Brian Wansink's Cornell lab showed how easy it is to fake good results. Even so, the field has produced strong work: Hall's NIH trials, PURE, PREDIMED, Spector's PREDICT, Mendelian randomization, and precision nutrition. Your job is to read it like an epidemiologist. Ask who paid for it, what the study can prove, and what the absolute risk looks like.

What you'll learn

  • Why nutrition headlines flip back and forth, and why the field cannot stop it.
  • How to rank study types from weak to strong, and what each one can prove.
  • How to turn "50% higher risk" into real numbers a writer or RD can use.
  • Who pays for nutrition studies, how the money bends results, and how to spot it.
  • Why the field had its own crisis with fake results, and what Wansink taught us.
  • Three tools for reading any nutrition claim that lands on your desk.
  • Where the field is headed: CGMs, metabolomics, microbiome panels, precision nutrition.

1. Why nutrition science feels broken

Open a newspaper in any decade. The same food plays both villain and hero. Fat killed us in the 1980s. Now fat saves us. Eggs were heart-attack bombs, then okay, then maybe bad again. Coffee causes cancer in one decade and prevents Alzheimer's in the next. The flip-flops are not random. They are what you get when the field's tools cannot answer what the public wants to know. Four problems explain almost every reversal.

Food-frequency questionnaires (FFQs) lie. An FFQ asks people to recall how often they ate about 130 foods over the past year. Pollan reviews the data in In Defense of Food. People under-report how much they eat by 20% to 33%. The worst under-reporting comes from people whose diets matter most. Gladys Block helped build the FFQ used in the Women's Health Initiative. She told Pollan, "I don't believe anything I read in nutritional epidemiology anymore." Most cohort claims about food and disease sit on top of this bad recall.

Studies are too small to catch chronic-disease signals. Walter Willett's Harvard cohorts are the global exception: the Nurses' Health Study (NHS), NHS-II, and the Health Professionals Follow-Up Study. Most nutrition papers cover a few hundred people for a few weeks. To catch a 5% drop in deaths over decades, you need tens of thousands of people and tens of years. The field rarely funds either.

The control group eats like the wider culture. An RCT (randomized controlled trial) of one diet versus another really compares two reported diets that leak into each other. Willett notes the $415M Women's Health Initiative low-fat trial mostly failed. The "low-fat" group did not actually eat much less fat than controls. The contrast the trial was built to find faded away.

Surrogate endpoints stand in for what you care about. Nutrition research has cycled through LDL, HDL, C-reactive protein, and ApoB. Each switch quietly re-rated foods. Drug trials count deaths. Nutrition trials count blood markers because trials long enough to count deaths are rarely funded. Spector calls this parking-lot science. These four problems explain 9 out of 10 flip-flops.

2. The hierarchy of study designs

You can re-rank any headline once you know what kind of study produced it.

Randomized controlled trial (RCT). Two groups, picked at random, ideally blinded. In nutrition, RCTs are rare. You cannot hide whether someone is eating broccoli. People stop sticking to the diet. Watching what people eat at scale costs too much. Two big anchors stand out. PREDIMED (Estruch et al.) followed about 7,000 Spaniards. A Mediterranean diet with extra-virgin olive oil or nuts cut major heart events by about 30% over 5 years. It was briefly pulled in 2018 over randomization concerns at 1 of 11 sites, then re-published with the same result. Hall 2019 (NIH metabolic ward, 20 adults locked in for 28 days, randomized crossover) matched two diets on macros. The ultra-processed diet still drove people to eat 508 more calories per day and gain about 0.9 kg. It was the first clean clinical proof that processing itself drives overeating.

Prospective cohort study. Recruit people, measure their diet with an FFQ, follow them for years. Willett spends Chapter 3 of Eat, Drink, and Be Healthy defending cohorts. When RCTs cannot work, well-run cohorts with repeated diet measures are the next best tool. Cohorts you should know: Nurses' Health Study (Harvard, started 1976, 121,000 nurses); Health Professionals Follow-Up Study (started 1986, 51,000 men); PURE (McMaster, 135,000 people across 18 countries), which complicated the saturated-fat story by finding that higher saturated-fat intake was linked to lower death rates worldwide; and EPIC (521,000 Europeans). Cohorts show links. They cannot prove cause. The case gets strong only when many cohorts agree.

Case-control study. Take a group with a disease, match them to people without it, ask both about past exposures. Quick and cheap. Hurt badly by recall bias: people with cancer recall their diets differently. Use these to come up with ideas, not to settle them.

Mendelian randomization (MR). Uses gene variants linked to an exposure as a stand-in for the exposure. Because gene variants get handed out at random at conception, the variant acts like a lifelong random assignment that lifestyle cannot mess with. Spector cites the largest vitamin-D MR analysis: more than 500,000 people and 188,000 fractures. It found no causal effect. The method has reshaped views of vitamin D, calcium, and several other "protective" nutrients. It works only when an exposure has strong, specific genetic links.

Animal and cell studies. Good for finding mechanisms. Bad for dose. The acrylamide cancer scare came from rats fed doses no human could ever eat. Mechanism work starts a question. It does not end one.

Meta-analyses and systematic reviews. A systematic review sets the search and inclusion rules ahead of time. A meta-analysis pools the results with statistics. Quality depends on the studies inside. The 2019 Canadian meta-analysis that called red meat safe was funded through ILSI and left out most of the harm data. Check the funder and inclusion rules first.

A working ranking: (1) pre-registered large RCT with hard endpoints; (2) large prospective cohort replicated across populations; (3) Mendelian randomization on a well-instrumented exposure; (4) controlled feeding with biomarkers; (5) single cohort; (6) case-control; (7) animal or cell.

3. Relative vs. absolute risk

If you take one idea from this module, take this one. Almost every viral nutrition headline reports relative risk. Almost none reports absolute risk.

A headline: "Daily processed-meat consumption raises colorectal cancer risk by 18 percent." The 18% is the IARC pooled relative risk. Lifetime colorectal cancer risk in the Western population is about 5%. An 18% rise on that baseline lifts lifetime risk to about 5.9%. That is a 0.9-point change. Spector puts it this way in Spoon-Fed: the average Italian meat-eater's processed-meat colorectal cancer risk is about the same as smoking 3 cigarettes a year.

The 2018 Lancet alcohol meta-analysis ("no safe level") used the same trick in reverse. In absolute terms, Spector calculates that 1 drink per day raises alcohol-related event risk by roughly 1 event per 1.25 million bottles of wine. A fair claim. A tiny absolute number.

Three rules when you see a percentage:

  1. What is the baseline rate? Without it, no relative risk has meaning.
  2. What is the absolute change? Translate it into percentage points or cases per person-year.
  3. What is the sample size and follow-up? A 2% relative effect across 500,000 people over 20 years is plausible. The same number across 200 people over 6 weeks is noise.

A related problem is p-hacking. With dozens of variables and outcomes, a cohort dataset can produce hundreds of correlations. About 5% will look significant by chance. Selective publishing without pre-registration guarantees a steady drip of false positives. Nutrition is more at risk than psychology because it has more variables and weaker pre-registration norms.

4. Industry capture

The biggest factor in a nutrition paper is often its funding, not its design. Industry-funded drink studies are about 20 times more likely to favor the sponsor than independent ones. Spector estimates 70% of US food research is industry-funded. Means reports that food companies fund 11 times more nutrition research than the NIH does. Four cases you should know:

The Sugar Research Foundation and the heart-disease pivot. Kearns, Schmidt, and Glantz (JAMA Internal Medicine, 2016) dug through internal SRF letters. The foundation paid Harvard nutrition chair Fred Stare and colleagues to write a 1967 NEJM review. The review played down sugar's role in coronary heart disease and pushed dietary fat as the villain. Disclosure was not required under journal norms at the time. Stare's department took continuous funding from the sugar industry, Coca-Cola, Pepsi, General Foods, and the Tobacco Research Council. Keys had been funded by sugar since 1944. The diet-heart story that drove US guidance for decades was sped up by an industry-funded pivot.

Coca-Cola's $140 million academic spend, 2010-2017. Spector documents that Coca-Cola gave US academics about $140 million in that window. The company funded another 95 US health organizations on top of that. The main question funded was whether inactivity, not sugar, drives obesity. The main answer was inactivity. Exercise-vs-weight papers outnumbered sugar-vs-weight papers about 12 to 1. The Global Energy Balance Network, exposed in 2015, was the most public failure.

ILSI. The International Life Sciences Institute was founded in 1978 by a Coca-Cola vice-president. It has worked its way into WHO panels, the Chinese Health Ministry, and national guideline groups. Means notes that 95% of the 2015 US Dietary Guidelines Advisory Committee had food-industry ties. 93% of industry-funded sweetened-beverage studies show no harm, against 17% of independent studies.

Pharma capture. Means documents that from 2012 to 2019, at least 8,000 NIH-funded researchers held pharma conflicts. The disclosed payments totaled more than $188 million. Stanford Medicine's dean Philip Pizzo took a $3M Pfizer donation while chairing an opioid-policy panel where 9 of 19 members had opioid-maker ties. The metabolic-disease research pipeline runs through institutions paid by firms whose products treat the chronic diseases bad diet causes.

How to spot capture: Read the funding statement. If the funder sells the product being tested, the result needs independent replication. Read disclosures. Check who funded the meta-analysis. A clean set of primary studies summed up by a captured review gives you a captured conclusion.

5. The replication and retraction problem

Every empirical field has gone through a replication audit. Psychology's 2015 Open Science Collaboration could only reproduce 36% of 100 published studies. Nutrition's reckoning came at Cornell.

Brian Wansink and the Cornell Food and Brand Lab. Until 2017, Wansink was the most-cited applied food psychologist in the United States. His "mindless eating" work fed a bestseller and decades of public-health advice: smaller plates, taller glasses, snacks off the counter, the 100-calorie pack. In late 2016 he wrote a blog post praising a grad student for slicing a dataset until it gave several publishable findings. Outside researchers (Tim van der Zee, Jordan Anaya, Nicholas Brown) audited his work. They found bad statistics, duplicate data, and results that could not come from the reported samples. By 2018, Wansink had resigned. JAMA had retracted 6 papers. The total reached about 18 retractions. That included the "pizza papers" on portion size and the wedding-buffet study. Follow-up work by Eric Stice (Stanford / Oregon Research Institute) and Dana Small (Yale) on dietary cue reactivity has had to rebuild trust from a lower starting point.

The problem is bigger than Cornell. Pre-registration is uncommon. Open data sharing is rarer than in genomics. Career rewards favor novelty over replication.

The honest answer is calibration, not cynicism. Treat any single nutrition paper as one roll of a noisy die. Trust builds across designs, populations, and labs. The findings that survive (trans-fat harm, the Mediterranean signal, the ultra-processed-cardiometabolic link) have been rolled enough times to stand.

6. Three honest tools for reading any nutrition claim

When a headline arrives, three questions clear most of the fog.

(a) Name the funder. If the study, meta-analysis, or press release is funded by the industry whose product is being judged, your prior shifts. That is a reweighting, not a refutation. Independent replication is required before action.

(b) Ask for absolute risk and sample size. A 50% relative increase in a tiny absolute risk is a small finding. A 2% relative increase in a common disease is a large one. Without baseline rates, no percentage means anything.

(c) Demand a mechanism and a second evidence stream. A finding that shows up in only one design (only cohorts, only animals) is weak. A finding that shows up in epidemiology, feeding trials, mechanism work, and Mendelian randomization is strong. Trans-fat harm cleared all four bars before it was banned. Most viral claims today clear one.

Spector's shorter version: who paid for it, what does the absolute number look like, and does the biology hang together?

7. Where the field is going

Three shifts are reshaping the next decade. Each one tackles a different structural weakness.

N-of-1 designs and continuous monitoring. Spector's PREDICT study (King's College, MGH, Stanford, ZOE) put continuous glucose monitors (CGMs) on about 2,000 subjects, including hundreds of twins. They tracked 130,000 meals and 32,000 standardized muffins. Results in Nature Medicine: less than 1% of subjects sat close to the average response for glucose, insulin, and triglycerides at once. Identical twins shared only 37% of gut-microbe species. Genes explained under 30% of glucose-response variation and under 5% of fat-response variation. Personal CGMs and microbiome panels are turning every subject into their own controlled experiment.

Metabolomics, proteomics, precision nutrition. Modern Nutrition in Health and Disease added three chapters in its 12th edition for these areas (Ch 121 Metabolomics/Proteomics, Ch 122 AI in Nutrition Research, Ch 123 Precision Nutrition). The premise: DRIs (Dietary Reference Intakes) were built to prevent classic deficiency diseases like rickets, pellagra, and scurvy. They work for that. They struggle with chronic-disease endpoints, where person-to-person variation outweighs population averages. Chapter 109, on DRI methodology, is unusually self-critical.

A quiet convergence at the frontier. Three researchers from different paths are landing in the same place. Kevin Hall (NIH) brings the cleanest metabolic-ward RCT evidence. Tim Spector (King's College) brings the largest N-of-1 cohort and the microbiome lens. Casey Means (Stanford-trained, Levels) brings the mitochondrial framing. Shared picture: ultra-processed food drives overeating no matter the macros. Individual metabolic response varies 10-fold or more. The most useful interventions work at the pattern level, not the nutrient level.

The field is still young. It is still poorly funded. It still flips on single nutrients. But the methods (pre-registration, open data, MR, metabolic-ward trials, continuous monitoring, large multi-population cohorts) are meaningfully stronger in 2026 than in 2010.

FAQ

PubMed vs. DOI? PubMed is NLM's free biomedical index. A DOI is a permanent ID for a specific paper. Find the DOI, then look for an open-access version on PubMed Central or medRxiv. Press releases rarely link the DOI. A minute of searching finds it.

Meta-analysis vs. systematic review? A systematic review sets a search and inclusion protocol ahead of time, then describes the literature in words. A meta-analysis pools the studies with statistics. Quality depends on the studies inside.

Peer review vs. preprint? Peer review is anonymous expert feedback before publication. Preprints (bioRxiv, medRxiv, arXiv) post before that review. COVID showed both can be wrong fast and right fast. Strength comes from replication, not from the journal.

What's a "natural experiment"? A real-world event that acts like randomization. Finnish North Karelia, the trans-fat phase-outs that started in Denmark in 2003, and SNAP changes across US states are examples. Each created exposed and unexposed groups without researchers having to do anything.

What is confounding? A third variable linked to both the exposure and the outcome that creates a fake link. Adventists eat less meat and don't smoke and attend church. A study showing they live longer cannot pin the gap on diet alone. Good cohorts adjust with statistics. Great ones use MR, natural experiments, or twin studies that cut confounding by design.

Why are food RCTs so rare? You cannot blind people to food. People stop sticking to the diet. Hard endpoints take decades. No company wants to sponsor "eat more lentils." Policy is built on cohorts and short mechanistic trials, with rare large RCTs (WHI, PREDIMED, Hall 2019) as anchors.

What is p-hacking? Running many analyses and only reporting the ones with p < 0.05. With dozens of variables and outcomes, a dataset can produce hundreds of correlations. About 1 in 20 will look significant by chance. Without pre-registration, the literature over-represents these false positives. The Wansink case was an industrialized version.

How is AI being used in nutrition? Pattern detection in multi-omics data. Individual prediction (PREDICT's gradient-boosted models on CGM and microbiome data). Literature synthesis, with citation-hallucination risk. Watch for ML without external validation, models trained on the same biased FFQ data, and correlations dressed up as causal claims.

Sources

  • Pollan, M. In Defense of Food. Penguin, 2008 — Ch 9.
  • Spector, T. Spoon-Fed. Vintage, 2020.
  • Means, C. and Means, C. Good Energy. Avery, 2024.
  • Willett, W. Eat, Drink, and Be Healthy. Free Press, 2017 — Ch 3.
  • Ross, A.C. et al., eds. Modern Nutrition in Health and Disease, 12th ed. Jones & Bartlett, 2024 — Chs 109, 121–123.
  • Hall, K. et al. Cell Metabolism 30(1):67-77.e3, 2019. DOI: 10.1016/j.cmet.2019.05.008.
  • Kearns, C., Schmidt, L., Glantz, S. JAMA Internal Medicine 176(11):1680-1685, 2016. DOI: 10.1001/jamainternmed.2016.5394.
  • Estruch, R. et al. PREDIMED. NEJM 378:e34, 2018.
  • van der Zee, T., Anaya, J., Brown, N. Wansink audit, 2016–2018.
  • Yancy, W.S. et al. PURE. The Lancet 390:2050-2062, 2017.
  • Berry, S. et al. PREDICT. Nature Medicine 26:964-973, 2020.

Related glossary

  • Randomized controlled trial — design and limits in food research.
  • Cohort study — prospective design; NHS as canonical example.
  • Food-frequency questionnaire — instrument underwriting most epidemiology.
  • Mendelian randomization — genetic variants as instrumental variables.
  • Relative risk — ratio of incidence between exposed and unexposed groups.
  • Absolute risk — actual percentage-point change in incidence.
  • Industry funding — empirical bias signature on the literature.
  • Conflict of interest — disclosed financial ties.