classic paper statistics biology pseudoreplication

Pseudoreplication and the Art of Biological Statistics

Ever wondered how biologists learn about the world? They use statistics. Learn more about the use (and misuse) of stats in biology!

Strange Bedfellows: Statistics and Biology

I’ve worked with a wide diversity of STEM (Science, Technology, Engineering and Mathematics) graduate students over the years, and I’ve noticed an alarming trend: biologists get a bad rep from physical and chemical scientists when it comes to math. Biology is sometimes depicted as a “soft” (i.e. non-quantitative) science. While this may have been true fifty years ago, it certainly hasn’t been my experience as a life scientist. Maybe this misconception stems from the fact that the roots of biology lie in naturalism [Fig 1]. Charles Darwin was, after all, hired as the HMS Beagle’s naturalist, and his intricate descriptions of the wildlife he found in South America and the Galapagos Islands led him to a theory of evolution by natural selection. Naturalism is often qualitative; that is, it relies on descriptions and observations that cannot be directly measured or easily converted into numbers. Examples of qualitative data include colors, textures, smells, and so forth. On the other hand, biology can also be quantitative. Quantitative data are numbers such as height, area, speed, time, ages, etc. Analyzing quantitative data requires statistics, a method of detecting and describing patterns of numbers.

image alt text Fig 1 Naturalists often provide detailed drawings of their subjects, such as this line drawing of an adult gold tegu, a type of South American lizard. (Source: public domain)

All of the biologists I’ve worked with appreciate the utility and necessity of statistics to our research, so in a way I’ve “grown up” during a paradigm shift. Familiarity with major statistical methods is now a key requirement for graduation from many biology programs, including my own. Better knowledge of statistics ensures better experimental design, as demonstrated by the classic paper I’ll introduce today – Stuart Hurlbert’s 1984 treatise on the idea of pseudoreplication [1].

But first…what is replication?

Replication is a key component of statistics and of experimental design. When you want to compare two groups (for example, people treated with a new drug and people given a sugar pill placebo), you need to compare more than just a single person that was treated and a single person that was untreated. Hurlbert calls this variation “confusion” [Fig 2]. Data cannot be perfectly measured, and even if it could, variation is often part of the natural pattern we’re trying to measure. Even if the pharmaceutical company designed and executed their drug test perfectly, a drug that works for one person may not always work for another. Everyone is unique. That’s why we need replication – so we can measure as much of this variation as possible and incorporate it into our statistical analyses.

image alt text

Fig 2 Hurlbert’s sources of “confusion”, or variation around what’s expected, in any given study. Note specifically source #7. (Source: Hurlbert 1984)

Pseudoreplication – AKA How to Lie Using Statistics

“How to Lie Using Statistics” was a course at Clark University, circa 1973. My mom took that class, and still marvels that it’s actually incredibly easy to mislead people using these ‘facts’. Unfortunately, such ‘facts’ are the source of many of the current scientific controversies. Pseudoreplication is one of the ways you can lie with statistics. Basically, it means that you make inappropriate generalizations based on your study design. If you’re studying a group or a population of people, due to financial and physical limitations, you cannot possibly examine all of them. You must select a group, called the sample, that you will study. You need to have a large enough sample so that you can incorporate as much of the variation as possible. Pseudoreplication is when you choose your sample in such a way that it does not accurately reflect the entire population. Going back to the pharmaceutical company example, if that company claims that their new drug will cure a disease in the American population, but they only tested it on women, they are pseudoreplicating [Fig 3]. Hurlbert found 27% of the papers he examined had committed pseudoreplication.

image alt text

Fig 3 Schematic of pseudoreplication, where, in our drug company example, the shaded boxes represent the two genders, x and y represent the different treatments (placebo vs. new drug), and the dots represent the individuals sampled. It is inappropriate to say that people respond differently to these treatments as there is no control for the effect of gender on the experiment’s outcome. (Source: Hurlbert 1984)

Calling for Statistically-Minded Biology

Training biologists in basic and complex statistical theory will inevitably reduce the rampant issue of pseudoreplication Hurlbert found in 1984, and probably already has. It’s my belief that, with statistically-minded biology, even more exciting questions about the origin and maintenance of biodiversity will be within our reach. To avoid false conclusions, it’s important for everyone to be on the lookout for pseudoreplication and similar statistical-based problems in everyday life.


[1] Hurlbert, S.H. (1984). Pseudoreplication and the Design of Ecological Field Experiments. Ecological Monographs 54(2):187-211.

More From Thats Life [Science]

Dialogue & Discussion