Developer pack
Claude Skill

Production Monitoring Setup

Stands up error tracking, uptime, and log access so you find out something broke before your users tell you.

What it does

Wires the minimum monitoring a small app needs to be operable: error tracking (Sentry or equivalent), an uptime check on the critical path, access to structured logs, and one alert threshold that actually pages you. The setup counterpart to designing what to observe — built so the first sign of a 3am outage is an alert, not an angry email.

When to use

  • Your app is live and you have no idea when it breaks
  • Setting up the operational baseline before real users arrive
  • You got burned by an outage you heard about from a customer

When not to use

  • Pre-launch with no users — wire it at launch, not before
  • You need a full observability platform (distributed tracing, SLOs) — that's a bigger design job

Install

Download the .zip, then unzip into your Claude skills folder.

mkdir -p ~/.claude/skills
unzip ~/Downloads/production-monitoring-setup.zip -d ~/.claude/skills/

# Restart Claude Code session.
# Skill is now available — Claude will use it when relevant.

SKILL.md

SKILL.md
---
name: production-monitoring-setup
description: Use when standing up monitoring for a live app — error tracking, uptime, logs, alerts. Triggers on "set up monitoring", "Sentry", "uptime check", "error tracking", "alerting", "how do I know when my app breaks", "production monitoring".
---

# Production Monitoring Setup

The goal is narrow: find out something broke before a user does. For a small app that's four things, wired once. Don't build a NASA control room — build the smoke detector.

## 1. Error tracking

Install an error tracker (Sentry is the default; most hosts have a one-line integration) on both the server and the client. It captures the stack trace, the request, and the user context when something throws — so you debug from real data instead of "it doesn't work." Set the release/version so you can tell which deploy introduced an error.

## 2. Uptime check

One external check that hits your critical path (the real user flow, not just `/`) on an interval — e.g. UptimeRobot, BetterStack, or the host's built-in check. An internal health check can't tell you the whole box is down; an external one can.

## 3. Log access

Know where your logs are and how to read them *before* the incident: Vercel/Railway dashboard logs, or shipped to a log service. Logs should be structured enough to filter by request, route, and severity. "I can't find the error" at 3am is a setup failure.

## 4. One alert that pages you

Wire a single, high-signal alert to where you'll actually see it (Slack, email, phone): "error rate spiked" or "uptime check failed." **One good alert beats ten ignored ones** — alert fatigue is the real failure mode. Tune the threshold so it only fires on something you'd get up for.

## Sequence

Error tracking first (highest signal per minute of setup), then uptime, then confirm log access, then the one alert. Trigger a test error and a test downtime to prove each path actually reaches you — an untested alert is not an alert.

## Anti-patterns

- Uptime check that only pings the homepage while the real flow is broken
- Ten alerts nobody reads (fatigue) instead of one that matters
- Finding out where the logs are *during* the incident
- Never test-firing an alert, then discovering it was misconfigured when it mattered
- Building elaborate dashboards before the basic "are we down" signal exists

Example prompts

Once installed, try these prompts in Claude:

  • Set up Sentry + an uptime check + a Slack alert for my Next.js app on Vercel.
  • My app went down and I found out from a user. Give me the minimum monitoring so that never happens silently again.

Related prompts

Don't want to install a skill? These prompts in /prompts cover similar ground for one-shot use: