Maxim AI
By Maxim AI
End-to-end platform to simulate, evaluate and observe AI agents across the development lifecycle.
Best for
- agent simulation
- evals
- observability
- cross-functional teams
Other Evals & observability
Braintrust
Eval platform for AI products — define test sets, run them across models, and track regressions over time. The default choice for teams shipping LLM features.
Helicone
LLM observability and logging proxy. One line of code change to log every prompt, response, cost, and latency across providers.
Langfuse
Open-source LLM engineering platform with tracing, evals, prompt management, and dataset tools. Self-hostable or cloud.
Arize Phoenix
Open-source LLM tracing and eval tool from Arize. Built around OpenTelemetry — good fit if you already use OTEL elsewhere.
Patronus AI
Eval and guardrails platform focused on enterprise safety — hallucination detection, PII checks, and policy compliance for LLM outputs.
LangSmith
Observability and eval platform from the LangChain team. Tight integration if you're building agents with LangChain or LangGraph.
Langtrace
Open-source, OpenTelemetry-based end-to-end observability tool with real-time tracing, evals and metrics for LLM apps.
Promptfoo
Open-source LLM testing and red-teaming framework that runs evals and security scans against AI apps, agents and RAG.