Spot the gotchas in this dataset

I'm about to analyze this dataset. Help me find the gotchas before I draw conclusions from it.

Dataset description:
{{what it is, where it came from, time range, columns}}

The questions I'm planning to answer:
{{your analysis questions}}

Output:
1. **Selection effects** — who or what isn't in this data and might bias results?
2. **Definition shifts** — has the way columns are populated changed over time? (renames, methodology changes, schema migrations)
3. **Survivorship bias** — are we seeing only the "winners," missing what dropped out?
4. **Outliers and skew** — what's the shape of the data? Mean vs median?
5. **Missing data patterns** — are NULLs random or systematic? (Systematic NULLs are the dangerous ones)
6. **Aggregation risks** — when I roll this up, what do I lose?

For each, name the specific risk and how to test for it before assuming the data is clean.

·Open in·Share

data qualitybiasanalysis

More analysis prompts