Tracing
Structured execution traces for multi-agent workflow observability.
Overview
The tracing module captures the full execution flow of an Orchestra run as a hierarchical span tree, enabling post-hoc analysis of:
- Agent steps: Which agents ran, in what order, and how long each took
- LLM generations: Token usage, model identity, API latency, and finish reason per generation
- Validation decisions: What action type was determined and what routing was chosen
- Tool calls: Which tools were invoked, their arguments, results, and duration
- Branch lifecycle: Branch creation, parallel execution, and convergence
- Convergence: Which child branches fed into which parent, and what was aggregated
Traces are written as structured JSON files, one per execution session.
Core Components
TracingConfig
Controls whether tracing is enabled and what detail level to capture:
from marsys.coordination.tracing.config import TracingConfigconfig = TracingConfig(enabled=True,output_dir="./traces",detail_level="verbose", # minimal | standard | verbose)
TraceCollector
EventBus consumer that subscribes to execution events and builds a span tree:
from marsys.coordination.tracing import TraceCollector
You don't create this directly — Orchestra creates it automatically when TracingConfig.enabled=True.
Span
A single unit of work in the trace. Spans form a tree via parent_span_id:
from marsys.coordination.tracing.types import Span
Span kinds: execution, branch, step, generation, tool.
Validation and Convergence
Validation decisions are captured as events on step spans (not separate spans). Convergence is captured as links and events on both the parent branch span and the convergence step span.
TraceTree
A complete trace rooted at an execution span:
from marsys.coordination.tracing.types import TraceTree
TraceWriter
Abstract base for output backends. JSONFileTraceWriter is the built-in implementation:
from marsys.coordination.tracing.writers import JSONFileTraceWriter
Basic Usage
Enable tracing by adding TracingConfig to your ExecutionConfig:
from marsys.coordination import Orchestrafrom marsys.coordination.config import ExecutionConfigfrom marsys.coordination.tracing.config import TracingConfigresult = await Orchestra.run(task="Research quantum computing",topology=topology,agent_registry=AgentRegistry,execution_config=ExecutionConfig(tracing=TracingConfig(enabled=True,output_dir="./traces",),),)
After execution, find the trace at ./traces/{session_id}.json.
For auto_run, the same config applies since it uses Orchestra internally:
result = await agent.auto_run("Research AI trends",max_steps=10,execution_config=ExecutionConfig(tracing=TracingConfig(enabled=True),),)
Trace Output
The JSON output is a tree of spans. Here is an abbreviated example:
{"trace_id": "5228247d-...","session_id": "session-001","metadata": {"task_summary": "Research quantum computing","agent_names": ["Coordinator", "Researcher", "FactChecker"]},"root_span": {"name": "Orchestra.run","kind": "execution","duration_ms": 12450.3,"status": "ok","children": [{"name": "Branch: main","kind": "branch","links": [{"linked_span_id": "...", "relationship": "convergence"}],"children": [{"name": "Step 0: Coordinator","kind": "step","attributes": { "action_type": "parallel_invoke" },"events": [{"name": "validation_decision","attributes": {"next_agents": ["Researcher", "FactChecker"]}}],"children": [{"name": "Generation: claude-sonnet-4-20250514","kind": "generation","duration_ms": 1200.5,"attributes": {"prompt_tokens": 150,"completion_tokens": 80}}]},{"name": "Step 1: Coordinator","kind": "step","attributes": { "action_type": "final_response" },"links": [{"linked_span_id": "researcher-branch-id","relationship": "convergence"},{"linked_span_id": "factchecker-branch-id","relationship": "convergence"}],"events": [{"name": "convergence","attributes": {"successful_count": 2,"total_count": 2}}]}]},{"name": "Branch: Researcher","kind": "branch","attributes": { "trigger_type": "parallel" },"children": ["... step spans with generation/tool children ..."]},{"name": "Branch: FactChecker","kind": "branch","attributes": { "trigger_type": "parallel" },"children": ["... step spans with generation/tool children ..."]}]}}
Span Hierarchy
Each execution produces a tree with this structure:
Execution Span (one per Orchestra.run)├── Branch Span (initial branch, e.g. "main")│ ├── Step Span (agent step that triggers parallel_invoke)│ │ ├── Generation Span (LLM call details)│ │ ├── Tool Span (per tool invocation)│ │ └── Validation Event (routing decision — stored as event, not span)│ ├── Step Span (convergence step — receives aggregated results)│ │ ├── links: [{child_branch_1}, {child_branch_2}] ← convergence│ │ ├── events: [convergence, validation_decision]│ │ └── Generation Span ...│ └── ...├── Branch Span (parallel child 1)│ └── Step Span → Generation → Tool → ...├── Branch Span (parallel child 2)│ └── Step Span → Generation → Tool → ...└── (Convergence links also on parent branch span)
Detail Levels
The detail_level setting controls how much information is captured in each trace:
| Parameter | Type | Default | Description |
|---|---|---|---|
| minimal | string | - | Span hierarchy + timing only, no attributes. Best for performance profiling. |
| standard | string | - | All spans with attributes, content truncated to max_content_length. Best for production with size limits. |
| verbose | string | Yes | Everything including full message content. Default level, provides full visibility during development. |
Best Practices
1. Use Meaningful Session IDs
context = {"session_id": "research_quantum_2026_04"}
This makes trace files easy to find and correlate.
2. Choose the Right Detail Level
Use verbose (default) during development for full visibility. Use minimal in production for low overhead. Use standard with max_content_length when you want attributes but need to control file size.
3. Be Mindful of Content Sensitivity
By default, trace files contain full LLM prompts and responses. Keep trace output directories secure and excluded from version control.
Sensitive Content in Traces
Set include_message_content=False or use detail_level="minimal" to exclude full message content from trace output. Always keep trace directories out of version control.
Limitations
- Only
JSONFileTraceWriteris implemented (Chrome Trace Format export planned for follow-up) - Traces are written on completion — no streaming/incremental output during execution
- Tracing for standalone
agent.run()calls (outside Orchestra) is not supported
Related Documentation
Tracing API Reference
Complete class and method reference for the tracing module
Configuration
ExecutionConfig setup and configuration options
Architecture Overview
How tracing fits into the overall MARSYS system
Tracing Ready!
You now understand how to enable and configure execution tracing in MARSYS. Use traces to debug agent workflows, profile performance, and gain full observability into multi-agent orchestration.