Used by 19 of Fortune 50 10+ billion observations/month 100,000+ engineers building on Langfuse
Used by 19 of Fortune 50 · 10+ billion observations/month · 100,000+ engineers building on Langfuse

Open Source AI
Engineering Platform

Trace and evaluate AI Agents. Collaborate with your team to continuously improve quality, cost and latency of your application.

Canva
Twilio
Adobe
Khan Academy
Telus
Intuit
SumUp
Merck
Samsara
Cisco
Expedia
Rocket Money

Gain deep visibility into your traces

Deep visibility into your traces

Track model cost and latency

Track model cost and latency

Improve your prompts

Improve your prompts

Evaluate model outputs automatically

Evaluate model outputs automatically

Collaborate on human reviews

Collaborate on human reviews

Iterate with structured experiments

Iterate with structured experiments

Auto-advance is active. Press Escape to pause auto-advance.

Launch, observe, improve — repeat.

Langfuse helps you ship AI Agents/Products from prototype to production and beyond. Once in production we power your continuous improvement loop using production data to make your agents and LLM applications ever more powerful.

The full LLM engineering loop

Langfuse brings observability, prompts, evals, experiments, and human annotation into one connected workflow — so you can move from prototype to production and keep improving with real usage data. Hover any part of the diagram to learn more.

LLM Engineering Loop Observability Evals Experiments Human Annotation Prompts Prototype Production Ship Measure Improve Iterate

Observability

Trace every LLM call, agent step, and tool use. Debug issues in production with full context.

Evals

Run LLM-as-a-judge, model-based, and custom evaluations on traces automatically.

Experiments

Test prompt changes and model swaps on datasets before shipping to production.

Prompts

Manage, version, and deploy prompts without code changes. A/B test in production.

Human Annotation

Queue traces for human review. Collect ground truth labels to improve your evals.

All the tools, one integrated platform.

One integrated platform to trace, manage prompts, evaluate, and experiment from prototype to production scale.

Observability

Hierarchical traces capture every LLM call, tool invocation, and retrieval step.

Filter by user, session, cost, latency, and more.

Observability UI showing hierarchical traces
Cost & Latency

Track costs and latency across models and deployments.

Break down spend by user, model, and time period.

Cost and latency analytics dashboard
Playground

Iterate on prompts and model configs in the browser.

Test changes against real production traces before deploying.

Prompt playground UI

Trusted by teams at

Canva
Twilio
Adobe
Khan Academy
Telus
Intuit
SumUp
Merck
Samsara
Cisco
Expedia
Rocket Money