MLflow

Mar 2026

Trial

MLflow is an open-source platform for managing the full AI/ML lifecycle — experiment tracking, model registry, prompt versioning, LLM tracing, and agent evaluation — all in one self-hostable system. With MLflow 3.x it has evolved into a first-class platform for agentic engineering teams that need visibility, governance, and reproducibility across LLM workflows.

MLflow vs Langfuse

Both tools appear on this radar under ai-infrastructure, but they serve different team profiles:

	MLflow	Langfuse
Primary focus	Full ML/AI lifecycle	LLM-specific observability
Best for	Teams doing both ML and LLM/agent work	Teams focused purely on LLM API usage
Weight	Heavier (more features, more ops)	Lightweight, fast to set up
Governance	Model registry + webhooks for CI/CD approvals	Prompt management + annotation
LLM Gateway	Yes (AI Gateway — multi-provider routing)	No
Experiment tracking	Yes (original core use case)	Limited

If your team only does LLM API calls and wants fast, lightweight observability: start with Langfuse. If you're running experiments, fine-tuning, managing model versions, or building multi-agent pipelines that need stronger governance: MLflow is the right fit.

Key Capabilities for Agentic Engineering

Agent Tracing & Observability

MLflow 3.x provides OpenTelemetry-native tracing with zero-code auto-instrumentation for LangChain, LlamaIndex, AutoGen, Pydantic AI, and DSPy:

import mlflow
import mlflow.langchain

mlflow.langchain.autolog()  # all LangChain calls traced automatically

# Or trace any function manually:
@mlflow.trace
def run_agent(user_input: str) -> str:
    ...

Full DAG capture for parallel tool calls, conditional branches, and iterative reasoning loops. Every span captures latency, token usage, and cost.

LLM Evaluation

50+ built-in metrics and LLM-as-a-judge scorers for safety, relevance, groundedness, and correctness. Supports multi-turn conversation evaluation and continuous monitoring — run judges on incoming traces without code changes.

AI Gateway

A unified OpenAI-compatible endpoint that routes across providers (OpenAI, Anthropic, Azure OpenAI, Cohere, Amazon Bedrock, Mistral, and more). Centralises API key management, adds rate limiting and per-user budgets, and enables cost-aware provider routing — without changes to application code.

Prompt Registry

Git-inspired versioning with immutable commits, diff highlighting, and environment aliases (beta, staging, production). Attach model configuration (temperature, max tokens) to the prompt version for full reproducibility.

Model Registry & Governance

Version agents and models with full lineage. Webhook-driven registry events enable approval workflows and CI/CD integration — stage changes through dev → staging → production with automated gates.

Key Characteristics

Property	Value
License	Apache 2.0
Current version	MLflow 3.x (MLflow 3.0 launched 2024)
Self-hostable	Yes (Python server, Postgres/S3 backend)
Managed option	Databricks Managed MLflow
Frameworks	LangChain, LlamaIndex, AutoGen, Pydantic AI, DSPy, and more
Languages	Python, TypeScript/JavaScript, Java, R
OpenTelemetry	Yes (native OTLP export)
GitHub	mlflow/mlflow (20K+ stars)
Website	mlflow.org

MLflow

MLflow vs Langfuse

Key Capabilities for Agentic Engineering

Key Characteristics

Further Reading