Evals & observabilityUpdated today

Promptfoo

By Promptfoo

Open-source LLM testing and red-teaming framework that runs evals and security scans against AI apps, agents and RAG.

Visit PromptfooFreemium

Best for

open-source evals
red teaming
security scanning
CI gating

Recent changes

Jun 20, 2026Added — open-source LLM testing and red-teaming, evals and security scans against apps, agents, and RAG.

Other Evals & observability

Braintrust

Eval platform for AI products — define test sets, run them across models, and track regressions over time. The default choice for teams shipping LLM features.

Helicone

LLM observability and logging proxy. One line of code change to log every prompt, response, cost, and latency across providers.

Langfuse

Open-source LLM engineering platform with tracing, evals, prompt management, and dataset tools. Self-hostable or cloud.

Arize Phoenix

Open-source LLM tracing and eval tool from Arize. Built around OpenTelemetry — good fit if you already use OTEL elsewhere.

Patronus AI

Eval and guardrails platform focused on enterprise safety — hallucination detection, PII checks, and policy compliance for LLM outputs.

LangSmith

Observability and eval platform from the LangChain team. Tight integration if you're building agents with LangChain or LangGraph.

Langtrace

Open-source, OpenTelemetry-based end-to-end observability tool with real-time tracing, evals and metrics for LLM apps.

Traceloop (OpenLLMetry)

LLM reliability platform built on OpenTelemetry that turns evals and monitors into a continuous release feedback loop.