Evals & observability

Maxim AI

By Maxim AI

End-to-end platform to simulate, evaluate and observe AI agents across the development lifecycle.

Visit Maxim AIFreemium

Best for

agent simulation
evals
observability
cross-functional teams

Other Evals & observability

Braintrust

Eval platform for AI products — define test sets, run them across models, and track regressions over time. The default choice for teams shipping LLM features.

Helicone

LLM observability and logging proxy. One line of code change to log every prompt, response, cost, and latency across providers.

Langfuse

Open-source LLM engineering platform with tracing, evals, prompt management, and dataset tools. Self-hostable or cloud.

Arize Phoenix

Open-source LLM tracing and eval tool from Arize. Built around OpenTelemetry — good fit if you already use OTEL elsewhere.

Patronus AI

Eval and guardrails platform focused on enterprise safety — hallucination detection, PII checks, and policy compliance for LLM outputs.

LangSmith

Observability and eval platform from the LangChain team. Tight integration if you're building agents with LangChain or LangGraph.

Langtrace

Open-source, OpenTelemetry-based end-to-end observability tool with real-time tracing, evals and metrics for LLM apps.

Promptfoo

Open-source LLM testing and red-teaming framework that runs evals and security scans against AI apps, agents and RAG.