Promptfoo
By Promptfoo
Open-source LLM testing and red-teaming framework that runs evals and security scans against AI apps, agents and RAG.
Best for
- open-source evals
- red teaming
- security scanning
- CI gating
- Jun 20, 2026Added — open-source LLM testing and red-teaming, evals and security scans against apps, agents, and RAG.
Other Evals & observability
Braintrust
Eval platform for AI products — define test sets, run them across models, and track regressions over time. The default choice for teams shipping LLM features.
Helicone
LLM observability and logging proxy. One line of code change to log every prompt, response, cost, and latency across providers.
Langfuse
Open-source LLM engineering platform with tracing, evals, prompt management, and dataset tools. Self-hostable or cloud.
Arize Phoenix
Open-source LLM tracing and eval tool from Arize. Built around OpenTelemetry — good fit if you already use OTEL elsewhere.
Patronus AI
Eval and guardrails platform focused on enterprise safety — hallucination detection, PII checks, and policy compliance for LLM outputs.
LangSmith
Observability and eval platform from the LangChain team. Tight integration if you're building agents with LangChain or LangGraph.
Langtrace
Open-source, OpenTelemetry-based end-to-end observability tool with real-time tracing, evals and metrics for LLM apps.
Traceloop (OpenLLMetry)
LLM reliability platform built on OpenTelemetry that turns evals and monitors into a continuous release feedback loop.