Tracing API

Complete API reference for execution tracing and observability in multi-agent workflows.

Overview

The Tracing API provides structured execution traces that capture the full hierarchy of an Orchestra run — from top-level execution through branches, agent steps, LLM generations, tool calls, and validation decisions.

Core Classes

TracingConfig

Configuration for execution tracing.

Import

from marsys.coordination.tracing.config import TracingConfig

Constructor

TracingConfig(
enabled: bool = False,
output_dir: str = "./traces",
detail_level: str = "verbose",
include_generation_details: bool = True,
include_message_content: bool = True,
include_tool_results: bool = True,
max_content_length: int = 0
)

Parameters

Parameter
enabled
Type
bool
Default
False
Description
Enable trace collection
Parameter
output_dir
Type
str
Default
"./traces"
Description
Directory for trace JSON files
Parameter
detail_level
Type
str
Default
"verbose"
Description
"minimal", "standard", or "verbose"
Parameter
include_generation_details
Type
bool
Default
True
Description
Include token counts, model info in generation spans
Parameter
include_message_content
Type
bool
Default
True
Description
Include full prompt/response content
Parameter
include_tool_results
Type
bool
Default
True
Description
Include tool call results in tool spans
Parameter
max_content_length
Type
int
Default
0
Description
Truncation length for string attributes in standard mode (0 = no truncation)

TraceCollector

EventBus consumer that builds hierarchical span trees from execution events.

Import

from marsys.coordination.tracing import TraceCollector

Constructor

TraceCollector(
event_bus: EventBus,
config: TracingConfig,
writers: Optional[List[TraceWriter]] = None
)

Parameters

Parameter
event_bus
Type
EventBus
Default
Required
Description
Event bus to subscribe to
Parameter
config
Type
TracingConfig
Default
Required
Description
Tracing configuration
Parameter
writers
Type
List[TraceWriter]
Default
[]
Description
Output backends for completed traces

Automatic Creation

You don't create TraceCollector directly. Orchestra creates it when TracingConfig.enabled=True.

Key Methods

finalize(session_id) -> Optional[TraceTree]

async def finalize(session_id: str) -> Optional[TraceTree]

Finalize the trace for a session. Closes any open spans (marking them as error), computes durations, and writes via all registered writers. Called automatically by Orchestra in a try/finally block.

close() -> None

async def close() -> None

Shut down all writers and release resources.

Span

A single unit of work in the execution trace.

Import

from marsys.coordination.tracing.types import Span

Fields

Field
span_id
Type
str
Description
Unique identifier (UUID)
Field
parent_span_id
Type
Optional[str]
Description
Parent span for tree nesting
Field
trace_id
Type
str
Description
Session-level trace identifier
Field
name
Type
str
Description
Human-readable name (e.g., "Step 3: Researcher")
Field
kind
Type
str
Description
execution, branch, step, generation, or tool
Field
start_time
Type
float
Description
Epoch seconds
Field
end_time
Type
Optional[float]
Description
Epoch seconds (set on close)
Field
duration_ms
Type
Optional[float]
Description
Computed on close
Field
status
Type
str
Description
"ok" or "error"
Field
attributes
Type
Dict[str, Any]
Description
Kind-specific data
Field
events
Type
List[Dict]
Description
Instant events attached to this span
Field
children
Type
List[Span]
Description
Child spans
Field
links
Type
List[Dict]
Description
Cross-branch causal links

Key Methods

close(end_time, status) -> None

def close(end_time: Optional[float] = None, status: Optional[str] = None) -> None

Close the span, computing duration_ms from start_time to end_time.

add_event(name, attributes) -> None

def add_event(name: str, attributes: Optional[Dict[str, Any]] = None) -> None

Add an instant event (e.g., validation decision) to this span.

to_dict() -> Dict[str, Any]

def to_dict() -> Dict[str, Any]

Serialize the span tree to a nested dict for JSON output.

TraceTree

A complete execution trace rooted at an execution span.

Import

from marsys.coordination.tracing.types import TraceTree

Fields

Field
trace_id
Type
str
Description
Unique trace identifier
Field
session_id
Type
str
Description
Orchestra session ID
Field
root_span
Type
Span
Description
Root execution span containing the full tree
Field
metadata
Type
Dict[str, Any]
Description
Task summary, agent names, etc.

to_dict() -> Dict[str, Any]

def to_dict() -> Dict[str, Any]

Serialize the full trace tree for JSON output.

TraceWriter

Abstract base for trace output backends.

Import

from marsys.coordination.tracing.writers.base import TraceWriter

Methods

Method
async write(trace: TraceTree) -> None
Description
Write a completed trace
Method
async close() -> None
Description
Release resources

JSONFileTraceWriter

Writes traces as structured JSON files.

Import

from marsys.coordination.tracing.writers import JSONFileTraceWriter

Constructor

JSONFileTraceWriter(config: TracingConfig)

Output: {config.output_dir}/{session_id}.json

The writer respects detail_level from the config:

  • minimal: Span hierarchy and timing only
  • standard: All attributes, strings truncated to max_content_length
  • verbose: Full content, no truncation

Trace Events

Events emitted by execution components and consumed by TraceCollector. All extend StatusEvent.

ExecutionStartEvent

Emitted when Orchestra.execute() begins.

Field
task_summary
Type
str
Description
Truncated task description
Field
topology_summary
Type
Dict
Description
Node and edge counts
Field
agent_names
Type
List[str]
Description
Agents in the topology
Field
config_summary
Type
Dict
Description
Key config values

GenerationEvent

Emitted after each LLM generation completes.

Field
agent_name
Type
str
Description
Agent that ran the generation
Field
step_number
Type
int
Description
Step index in the branch
Field
step_span_id
Type
str
Description
Correlation ID for the step span
Field
model_name
Type
str
Description
Model identifier
Field
provider
Type
str
Description
Provider name
Field
prompt_tokens
Type
Optional[int]
Description
Input tokens
Field
completion_tokens
Type
Optional[int]
Description
Output tokens
Field
reasoning_tokens
Type
Optional[int]
Description
Reasoning tokens (o1/o3)
Field
response_time_ms
Type
Optional[float]
Description
API latency in milliseconds
Field
finish_reason
Type
Optional[str]
Description
Why the model stopped
Field
has_thinking
Type
bool
Description
Whether thinking content was present
Field
has_tool_calls
Type
bool
Description
Whether tool calls were requested

ValidationDecisionEvent

Emitted after response validation determines the next action.

Field
agent_name
Type
str
Description
Agent whose response was validated
Field
step_number
Type
int
Description
Step index
Field
step_span_id
Type
str
Description
Step span correlation
Field
is_valid
Type
bool
Description
Whether validation passed
Field
action_type
Type
str
Description
Determined action (e.g., invoke_agent, final_response)
Field
next_agents
Type
List[str]
Description
Target agent(s) for routing
Field
error_category
Type
Optional[str]
Description
Error classification if invalid
Field
is_tool_continuation
Type
bool
Description
Whether this is a tool continuation bypass

ConvergenceEvent

Emitted when parallel branches converge. The collector attaches convergence links and events to both the parent branch span and the next step span on that branch (the convergence step that receives aggregated results).

Field
parent_branch_id
Type
str
Description
Branch receiving converged results
Field
child_branch_ids
Type
List[str]
Description
Branches that converged
Field
convergence_point
Type
str
Description
Convergence node name
Field
group_id
Type
str
Description
Parallel group identifier
Field
successful_count
Type
int
Description
Successfully completed branches
Field
total_count
Type
int
Description
Total branches in group

Usage Patterns

Enable Tracing

enable_tracing.py
from marsys.coordination.config import ExecutionConfig
from marsys.coordination.tracing.config import TracingConfig
config = ExecutionConfig(
tracing=TracingConfig(enabled=True, output_dir="./traces"),
)

Custom Writer

custom_writer.py
from marsys.coordination.tracing.writers.base import TraceWriter
from marsys.coordination.tracing.types import TraceTree
class MyTraceWriter(TraceWriter):
async def write(self, trace: TraceTree) -> None:
data = trace.to_dict()
# Send to your observability backend
await send_to_backend(data)
async def close(self) -> None:
pass

Access Trace Programmatically

access_trace.py
# After Orchestra.execute(), if you have access to the Orchestra instance:
if orchestra.trace_collector:
trace = await orchestra.trace_collector.finalize(session_id)
if trace:
trace_dict = trace.to_dict()

Best Practices

  • Use detail_level="minimal" in production for low overhead
  • Use detail_level="verbose" (default) during development for full visibility
  • Use detail_level="standard" with max_content_length to limit trace size if needed
  • Use meaningful session_id values for trace file identification
  • Add ./traces/ to .gitignore

Production Usage

Don't leave verbose tracing on in production — trace files can be very large. Also avoid committing trace files containing sensitive prompt/response content.