Trace and evaluate AI Agents. Collaborate with your team to continuously improve quality, cost and latency of your application.
Auto-advance is active. Press Escape to pause auto-advance.
Launch, observe, improve — repeat.
Langfuse helps you ship AI Agents/Products from prototype to production and beyond. Once in production we power your continuous improvement loop using production data to make your agents and LLM applications ever more powerful.
Langfuse brings observability, prompts, evals, experiments, and human annotation into one connected workflow — so you can move from prototype to production and keep improving with real usage data. Hover any part of the diagram to learn more.
Trace every LLM call, agent step, and tool use. Debug issues in production with full context.
Run LLM-as-a-judge, model-based, and custom evaluations on traces automatically.
Test prompt changes and model swaps on datasets before shipping to production.
Manage, version, and deploy prompts without code changes. A/B test in production.
Queue traces for human review. Collect ground truth labels to improve your evals.
One integrated platform to trace, manage prompts, evaluate, and experiment from prototype to production scale.
Filter by user, session, cost, latency, and more.
Break down spend by user, model, and time period.
Test changes against real production traces before deploying.
Trusted by teams at