← Back to Insights

Module: Understanding Basic Statistics

By SAUFEX Consortium 23 January 2026

[screen 1]

“Average income rose 5% last year!” Sounds good - but if billionaires got massive raises while most people saw no change, the “average” hides the real story.

Statistics summarize complex information, but they can also obscure important details. Understanding basic statistical concepts helps you evaluate data claims critically without needing advanced math.

[screen 2]

Mean, Median, and Mode

These three “averages” tell different stories:

Mean: Add all values and divide by count (the arithmetic average)

Median: The middle value when data is sorted

Mode: The most common value

Example: Incomes of 1, 2, 3, 4, 100 Mean: 22 (distorted by outlier) Median: 3 (typical middle person) Mode: None (all different)

When someone says “average,” ask which type - it matters.

[screen 3]

Why This Matters

The mean is easily skewed by extreme values. If nine people earn €30,000 and one earns €300,000, the mean is €57,000 - but it doesn’t represent anyone’s actual experience.

The median (€30,000) better represents the typical person.

Political and economic data often use mean when median would be more honest - or vice versa, depending on the desired narrative.

[screen 4]

Understanding Distributions

Data isn’t just about averages - how spread out are the values?

Normal distribution: Bell curve, most values near the middle

Skewed distribution: Data bunched on one side with long tail

Bimodal: Two common value ranges (e.g., incomes in highly unequal societies)

Outliers: Extreme values that don’t fit the pattern

The shape of distribution reveals stories that averages hide.

[screen 5]

Standard Deviation and Variability

Standard deviation measures how spread out data is:

  • Low standard deviation: Values cluster tightly around the mean
  • High standard deviation: Values vary widely

Two cities might have the same average temperature, but one varies from -10°C to 40°C (high standard deviation) while the other stays 15-25°C (low standard deviation).

Understanding variability prevents misleading comparisons based on averages alone.

[screen 6]

Probability Basics

Probability quantifies likelihood, from 0 (impossible) to 1 (certain). Key concepts:

Independent events: One doesn’t affect the other (coin flips)

Conditional probability: Likelihood given other information

Rare events: Low probability doesn’t mean impossible

Gambler’s fallacy: Past random events don’t affect future ones (coin doesn’t “remember” previous flips)

Understanding probability helps evaluate risk claims and predictions.

[screen 7]

Sample Size Matters

Larger samples generally produce more reliable results:

  • 3 people isn’t representative of a country
  • 100 people gives rough indication
  • 1,000+ people enables meaningful conclusions
  • 10,000+ people allows analysis of subgroups

But sampling method matters as much as size. A self-selected online poll of 10,000 people is less reliable than a random sample of 1,000.

[screen 8]

Margins of Error

Surveys include margins of error - the expected variation if the survey were repeated:

“Candidate leads 52% to 48%, margin of error ±3%”

This means the true values could be 49% to 55% and 45% to 51% - it’s actually too close to call.

Many headlines ignore margins of error, treating small differences as meaningful when they’re within statistical noise.

[screen 9]

Statistical Significance Explained

“Statistically significant” means results are unlikely due to random chance alone. Typically defined as less than 5% probability of occurring by chance.

But statistical significance ≠ practical importance:

  • A drug might have “statistically significant” effects that are too small to matter clinically
  • Large sample sizes can make tiny, meaningless differences “significant”
  • Publication bias means only significant results get published

Always ask: significant AND important, or just significant?

[screen 10]

The Replication Crisis

Many published studies fail to replicate when repeated. Reasons include:

  • Publication bias (only positive results published)
  • p-hacking (manipulating analysis until significance appears)
  • Underpowered studies (too small to detect real effects)
  • Researcher degrees of freedom (many analysis choices)

Single studies, especially with surprising results, should be treated as preliminary. Look for replication and meta-analyses.

[screen 11]

Common Statistical Fallacies

Regression to the mean: Extreme values tend toward average on retest (doesn’t require explanation)

Texas sharpshooter: Finding patterns after collecting data (like drawing target around bullet holes)

Base rate neglect: Ignoring how common something is overall

Ecological fallacy: Assuming group statistics apply to individuals

Simpson’s Paradox: Trend reverses when data is disaggregated

Awareness of these patterns helps spot manipulation.

[screen 12]

Evaluating Statistical Claims

When encountering statistics, ask:

  • Which average is being used - mean, median, or mode?
  • What’s the sample size and selection method?
  • What’s the margin of error?
  • What does the distribution look like?
  • Is statistical significance meaningful in practical terms?
  • Has this been replicated?
  • What data is being omitted?
  • Who benefits from this interpretation?

[screen 13]

Statistics as Rhetoric

Remember: Statistics are used rhetorically, not just scientifically. The same data can be presented to support opposite conclusions through:

  • Choosing which average to report
  • Selecting timeframes
  • Using absolute vs. relative numbers
  • Showing or hiding variability
  • Cherry-picking data points

Critical statistical literacy isn’t about advanced math - it’s about asking which choices were made in presenting data and why.