Claude Skill

Cohort Analysis Spreadsheet

Builds a cohort retention table — acquisition cohort × month, with retention %, weighted average.

Download skill (.zip)Or download whole pack

What it does

Takes raw user/customer data (signup date, last active date or revenue per period) and produces a cohort retention triangle: acquisition cohorts down the rows, period-since-acquisition across the columns. Includes a weighted-average retention curve, dollar retention parallel, and a clean export-friendly view for slides.

When to use

✓You're measuring retention and the existing tooling shows aggregate, not cohort, numbers
✓You're building an investor deck and need a cohort triangle that's actually correct
✓Product wants to compare retention across acquisition channels or pricing changes

When not to use

✗You have <500 users — small cohorts produce noise, not signal
✗You don't have per-user activity data, just aggregate revenue — different analysis

Install

Download the .zip, then unzip into your Claude skills folder.

mkdir -p ~/.claude/skills
unzip ~/Downloads/cohort-analysis-spreadsheet.zip -d ~/.claude/skills/

# Restart Claude Code session.
# Skill is now available — Claude will use it when relevant.

SKILL.md

---
name: cohort-analysis-spreadsheet
description: Use when building a cohort retention analysis — acquisition cohorts × periods-since-acquisition, with retention %, weighted average. Triggers on "cohort analysis", "retention triangle", "cohort retention", "NRR by cohort".
---

# Cohort Analysis Spreadsheet

Build a cohort retention table that's correct AND easy to read. Cohort math is where most analysts go wrong — off-by-one period definitions, mixing month-since-signup with calendar-month, weighted vs simple average. Be deliberate.

## Required inputs

1. **The data**: per-user (or per-customer) records with at minimum a cohort assignment date and an activity record. Two common shapes:
   - **Wide**: one row per user, columns for each period showing active=1/0 or revenue
   - **Long**: one row per user-period (user_id, period, active or revenue) — pivot to wide first
2. **Cohort grain** — month / quarter / week. Pick one.
3. **Period grain** — usually the same as cohort grain. M0, M1, M2, ... where M0 is the cohort's acquisition period.
4. **Metric** — retention (count active / cohort size) or dollar retention (period revenue / M0 revenue)
5. **History depth** — typically 12-24 periods

If the user has data with mixed period grains or unclear cohort assignment ("last_active" date when they really need "active in this period"), clarify before building.

## Approach

**pandas** for the cohort math (groupby + pivot), then write to .xlsx with **openpyxl**. Conditional formatting on the cohort triangle (color scale: lighter = lower retention) is what makes the pattern obvious at a glance.

For dollar retention, the math is per-cohort: `Period_N_revenue / M0_revenue` per cohort, then a weighted average across cohorts where the weight is M0 revenue.

## Workbook structure

1. **Cover** — cohort definition, period definition, metric (count or $), data source, run date
2. **Raw** — the input data, untouched. Reference for sanity checks.
3. **Cohort_Counts** — the count triangle. Cohorts down (one row per acquisition month), M0...Mn across. M0 = cohort size. Mn = users still active in period N.
4. **Cohort_Pct** — same shape, but each cell is `Cohort_Counts[i,n] / Cohort_Counts[i,0]`. M0 column is always 100%. This is the "retention triangle" people refer to.
5. **Cohort_Dollars** (if dollar retention) — period revenue per cohort. Same shape.
6. **Cohort_NRR** (if dollar retention) — `Cohort_Dollars[i,n] / Cohort_Dollars[i,0]`. NRR triangle.
7. **Weighted_Avg** — for each period N, the weighted average across all cohorts (weighted by M0). The "single retention curve" view for slides.
8. **Chart_View** — the curve, plus optionally 3-5 individual cohort lines layered. The slide-ready chart.

## The cohort triangle — getting it right

The triangle has a diagonal of NaN/empty cells: cohort acquired in March 2026 has no M5 yet if it's only July 2026. Don't compute or display retention for periods that haven't happened. Conditional format empty cells light gray.

Period definitions matter:
- **M0** = the acquisition period. Retention = 100% by definition (everyone in the cohort was active in M0 — that's how they got into the cohort).
- **M1** = the period AFTER acquisition. Active in calendar period (cohort_period + 1).
- Don't define M0 as "first period after acquisition" — you'll get retention < 100% in M0 and confuse everyone.

Activity definition matters:
- **Count retention**: was the user active (logged in / placed an order / had an event) at any point in period N?
- **Dollar retention**: how much revenue did this user generate in period N?
- For SaaS NRR: include expansion in the numerator. `Cohort_Period_N_MRR / Cohort_M0_MRR` — values can exceed 100%.

## Formula principles

- **COUNTIFS** for count retention from long-format data: `=COUNTIFS(Activity[user_id], "<>"&"", Activity[cohort_month], $A5, Activity[active_month], B$1)` — but this gets slow for >100k rows. Pivot once in pandas, write the values.
- **SUMIFS** for dollar retention.
- **Retention % as a ratio**: `=IFERROR(B5/$B5, "")` — anchored to M0 column, propagates across.
- **Weighted average for period N**: `=SUMPRODUCT(Cohort_Counts[Mn], Cohort_Counts[M0]) / SUM(Cohort_Counts[M0])` — only over rows that have a value for Mn.
- **Conditional formatting**: 3-color scale on Cohort_Pct, with the M0 column excluded (or its always-100% will dominate the scale).

## Common pitfalls

- **M0 ≠ 100%**. If your M0 column shows 87%, you're miscounting — typically because activity is defined as "next period," not "acquisition period." Re-anchor.
- **Showing retention for incomplete periods.** If a cohort only has 2 months of history, don't display M3 as 0% — it hasn't happened yet. Mark empty.
- **Weighted average over the diagonal.** Don't average across periods with different cohort coverage — older cohorts have more data. Compute weighted avg per period N including only cohorts that have reached period N.
- **Mixing weekly and monthly cohorts** — unweighted average gives weekly cohorts 4x weight. Pick one grain.
- **Assuming retention monotonically decreases.** It can increase (resurrection in subscription products, reorder cycles in commerce). Flag rather than smooth.
- **Reporting NRR but excluding expansion.** NRR with no expansion is GRR — they're different. Be explicit about which.
- **Tiny cohorts plotted alongside big ones** in the chart — a 12-user cohort's retention bounces wildly. Filter to cohorts with size > N or note the weighting.

## Output

Generate the .xlsx with the triangle conditionally formatted. Print: cohort count, total users analyzed, M3 retention (weighted), M6, M12, and a one-line summary like "M3 retention is 64%, declining 2pp per period through M12. Sept 2025 cohort is the outlier — 12pp above the curve." If the user wants a slide-ready version, include the Chart_View tab as a clean export.

Example prompts

Once installed, try these prompts in Claude:

Build cohort retention from this signup + last-active data. Cohorts by signup month, 12 periods deep. Show count and weighted avg.
Dollar retention cohorts from this MRR-by-month-by-customer file. Want NRR by signup quarter.

Related prompts

Don't want to install a skill? These prompts in /prompts cover similar ground for one-shot use:

analysis prompts

HN X LinkedIn Reddit