Tracing API

Complete API reference for execution tracing and observability in multi-agent workflows.

Overview

The Tracing API provides structured execution traces that capture the full hierarchy of an Orchestra run — from top-level execution through branches, agent steps, LLM generations, tool calls, and validation decisions.

Core Classes

TracingConfig

Configuration for execution tracing.

Import

from marsys.coordination.tracing.config import TracingConfig

Constructor

TracingConfig(
    enabled: bool = False,
    output_dir: str = "./traces",
    detail_level: str = "verbose",
    include_generation_details: bool = True,
    include_message_content: bool = True,
    include_tool_results: bool = True,
    max_content_length: int = 0
)

Parameters

Parameter	Type	Default	Description
enabled	bool	False	Enable trace collection
output_dir	str	"./traces"	Directory for trace JSON files
detail_level	str	"verbose"	"minimal", "standard", or "verbose"
include_generation_details	bool	True	Include token counts, model info in generation spans
include_message_content	bool	True	Include full prompt/response content
include_tool_results	bool	True	Include tool call results in tool spans
max_content_length	int	0	Truncation length for string attributes in standard mode (0 = no truncation)

Parameter

enabled

Type

bool

Default

False

Description

Enable trace collection

Parameter

output_dir

Type

str

Default

"./traces"

Description

Directory for trace JSON files

Parameter

detail_level

Type

str

Default

"verbose"

Description

"minimal", "standard", or "verbose"

Parameter

include_generation_details

Type

bool

Default

True

Description

Include token counts, model info in generation spans

Parameter

include_message_content

Type

bool

Default

True

Description

Include full prompt/response content

Parameter

include_tool_results

Type

bool

Default

True

Description

Include tool call results in tool spans

Parameter

max_content_length

Type

int

Default

Description

Truncation length for string attributes in standard mode (0 = no truncation)

TraceCollector

EventBus consumer that builds hierarchical span trees from execution events.

Import

from marsys.coordination.tracing import TraceCollector

Constructor

TraceCollector(
    event_bus: EventBus,
    config: TracingConfig,
    writers: Optional[List[TraceWriter]] = None
)

Parameters

Parameter	Type	Default	Description
event_bus	EventBus	Required	Event bus to subscribe to
config	TracingConfig	Required	Tracing configuration
writers	List[TraceWriter]	[]	Output backends for completed traces

Parameter

event_bus

Type

EventBus

Default

Required

Description

Event bus to subscribe to

Parameter

config

Type

TracingConfig

Default

Required

Description

Tracing configuration

Parameter

writers

Type

List[TraceWriter]

Default

[]

Description

Output backends for completed traces

Automatic Creation

You don't create TraceCollector directly. Orchestra creates it when TracingConfig.enabled=True.

Key Methods

`finalize(session_id) -> Optional[TraceTree]`

async def finalize(session_id: str) -> Optional[TraceTree]

Finalize the trace for a session. Closes any open spans (marking them as error), computes durations, and writes via all registered writers. Called automatically by Orchestra in a try/finally block.

`close() -> None`

async def close() -> None

Shut down all writers and release resources.

Span

A single unit of work in the execution trace.

Import

from marsys.coordination.tracing.types import Span

Fields

Field	Type	Description
span_id	str	Unique identifier (UUID)
parent_span_id	Optional[str]	Parent span for tree nesting
trace_id	str	Session-level trace identifier
name	str	Human-readable name (e.g., "Step 3: Researcher")
kind	str	execution, branch, step, generation, or tool
start_time	float	Epoch seconds
end_time	Optional[float]	Epoch seconds (set on close)
duration_ms	Optional[float]	Computed on close
status	str	"ok" or "error"
attributes	Dict[str, Any]	Kind-specific data
events	List[Dict]	Instant events attached to this span
children	List[Span]	Child spans
links	List[Dict]	Cross-branch causal links

Field

span_id

Type

str

Description

Unique identifier (UUID)

Field

parent_span_id

Type

Optional[str]

Description

Parent span for tree nesting

Field

trace_id

Type

str

Description

Session-level trace identifier

Field

name

Type

str

Description

Human-readable name (e.g., "Step 3: Researcher")

Field

kind

Type

str

Description

execution, branch, step, generation, or tool

Field

start_time

Type

float

Description

Epoch seconds

Field

end_time

Type

Optional[float]

Description

Epoch seconds (set on close)

Field

duration_ms

Type

Optional[float]

Description

Computed on close

Field

status

Type

str

Description

"ok" or "error"

Field

attributes

Type

Dict[str, Any]

Description

Kind-specific data

Field

events

Type

List[Dict]

Description

Instant events attached to this span

Field

children

Type

List[Span]

Description

Child spans

Field

links

Type

List[Dict]

Description

Cross-branch causal links

Key Methods

`close(end_time, status) -> None`

def close(end_time: Optional[float] = None, status: Optional[str] = None) -> None

Close the span, computing duration_ms from start_time to end_time.

`add_event(name, attributes) -> None`

def add_event(name: str, attributes: Optional[Dict[str, Any]] = None) -> None

Add an instant event (e.g., validation decision) to this span.

`to_dict() -> Dict[str, Any]`

def to_dict() -> Dict[str, Any]

Serialize the span tree to a nested dict for JSON output.

TraceTree

A complete execution trace rooted at an execution span.

Import

from marsys.coordination.tracing.types import TraceTree

Fields

Field	Type	Description
trace_id	str	Unique trace identifier
session_id	str	Orchestra session ID
root_span	Span	Root execution span containing the full tree
metadata	Dict[str, Any]	Task summary, agent names, etc.

Field

trace_id

Type

str

Description

Unique trace identifier

Field

session_id

Type

str

Description

Orchestra session ID

Field

root_span

Type

Span

Description

Root execution span containing the full tree

Field

metadata

Type

Dict[str, Any]

Description

Task summary, agent names, etc.

`to_dict() -> Dict[str, Any]`

def to_dict() -> Dict[str, Any]

Serialize the full trace tree for JSON output.

TraceWriter

Abstract base for trace output backends.

Import

from marsys.coordination.tracing.writers.base import TraceWriter

Methods

Method	Description
async write(trace: TraceTree) -> None	Write a completed trace
async close() -> None	Release resources

Method

async write(trace: TraceTree) -> None

Description

Write a completed trace

Method

async close() -> None

Description

Release resources

JSONFileTraceWriter

Writes traces as structured JSON files.

Import

from marsys.coordination.tracing.writers import JSONFileTraceWriter

Constructor

JSONFileTraceWriter(config: TracingConfig)

Output: {config.output_dir}/{session_id}.json

The writer respects detail_level from the config:

minimal: Span hierarchy and timing only
standard: All attributes, strings truncated to max_content_length
verbose: Full content, no truncation

Trace Events

Events emitted by execution components and consumed by TraceCollector. All extend StatusEvent.

ExecutionStartEvent

Emitted when Orchestra.execute() begins.

Field	Type	Description
task_summary	str	Truncated task description
topology_summary	Dict	Node and edge counts
agent_names	List[str]	Agents in the topology
config_summary	Dict	Key config values

Field

task_summary

Type

str

Description

Truncated task description

Field

topology_summary

Type

Dict

Description

Node and edge counts

Field

agent_names

Type

List[str]

Description

Agents in the topology

Field

config_summary

Type

Dict

Description

Key config values

GenerationEvent

Emitted after each LLM generation completes.

Field	Type	Description
agent_name	str	Agent that ran the generation
step_number	int	Step index in the branch
step_span_id	str	Correlation ID for the step span
model_name	str	Model identifier
provider	str	Provider name
prompt_tokens	Optional[int]	Input tokens
completion_tokens	Optional[int]	Output tokens
reasoning_tokens	Optional[int]	Reasoning tokens (o1/o3)
response_time_ms	Optional[float]	API latency in milliseconds
finish_reason	Optional[str]	Why the model stopped
has_thinking	bool	Whether thinking content was present
has_tool_calls	bool	Whether tool calls were requested

Field

agent_name

Type

str

Description

Agent that ran the generation

Field

step_number

Type

int

Description

Step index in the branch

Field

step_span_id

Type

str

Description

Correlation ID for the step span

Field

model_name

Type

str

Description

Model identifier

Field

provider

Type

str

Description

Provider name

Field

prompt_tokens

Type

Optional[int]

Description

Input tokens

Field

completion_tokens

Type

Optional[int]

Description

Output tokens

Field

reasoning_tokens

Type

Optional[int]

Description

Reasoning tokens (o1/o3)

Field

response_time_ms

Type

Optional[float]

Description

API latency in milliseconds

Field

finish_reason

Type

Optional[str]

Description

Why the model stopped

Field

has_thinking

Type

bool

Description

Whether thinking content was present

Field

has_tool_calls

Type

bool

Description

Whether tool calls were requested

ValidationDecisionEvent

Emitted after response validation determines the next action.

Field	Type	Description
agent_name	str	Agent whose response was validated
step_number	int	Step index
step_span_id	str	Step span correlation
is_valid	bool	Whether validation passed
action_type	str	Determined action (e.g., invoke_agent, final_response)
next_agents	List[str]	Target agent(s) for routing
error_category	Optional[str]	Error classification if invalid
is_tool_continuation	bool	Whether this is a tool continuation bypass

Field

agent_name

Type

str

Description

Agent whose response was validated

Field

step_number

Type

int

Description

Step index

Field

step_span_id

Type

str

Description

Step span correlation

Field

is_valid

Type

bool

Description

Whether validation passed

Field

action_type

Type

str

Description

Determined action (e.g., invoke_agent, final_response)

Field

next_agents

Type

List[str]

Description

Target agent(s) for routing

Field

error_category

Type

Optional[str]

Description

Error classification if invalid

Field

is_tool_continuation

Type

bool

Description

Whether this is a tool continuation bypass

ConvergenceEvent

Emitted when parallel branches converge. The collector attaches convergence links and events to both the parent branch span and the next step span on that branch (the convergence step that receives aggregated results).

Field	Type	Description
parent_branch_id	str	Branch receiving converged results
child_branch_ids	List[str]	Branches that converged
convergence_point	str	Convergence node name
group_id	str	Parallel group identifier
successful_count	int	Successfully completed branches
total_count	int	Total branches in group

Field

parent_branch_id

Type

str

Description

Branch receiving converged results

Field

child_branch_ids

Type

List[str]

Description

Branches that converged

Field

convergence_point

Type

str

Description

Convergence node name

Field

group_id

Type

str

Description

Parallel group identifier

Field

successful_count

Type

int

Description

Successfully completed branches

Field

total_count

Type

int

Description

Total branches in group

Usage Patterns

Enable Tracing

enable_tracing.py

from marsys.coordination.config import ExecutionConfig
from marsys.coordination.tracing.config import TracingConfig

config = ExecutionConfig(
    tracing=TracingConfig(enabled=True, output_dir="./traces"),
)

Custom Writer

custom_writer.py

from marsys.coordination.tracing.writers.base import TraceWriter
from marsys.coordination.tracing.types import TraceTree

class MyTraceWriter(TraceWriter):
    async def write(self, trace: TraceTree) -> None:
        data = trace.to_dict()
        # Send to your observability backend
        await send_to_backend(data)

    async def close(self) -> None:
        pass

Access Trace Programmatically

access_trace.py

# After Orchestra.execute(), if you have access to the Orchestra instance:
if orchestra.trace_collector:
    trace = await orchestra.trace_collector.finalize(session_id)
    if trace:
        trace_dict = trace.to_dict()

Best Practices

Use detail_level="minimal" in production for low overhead
Use detail_level="verbose" (default) during development for full visibility
Use detail_level="standard" with max_content_length to limit trace size if needed
Use meaningful session_id values for trace file identification
Add ./traces/ to .gitignore

Production Usage

Don't leave verbose tracing on in production — trace files can be very large. Also avoid committing trace files containing sensitive prompt/response content.

Tracing Concepts — Overview and span hierarchy
Configuration API — ExecutionConfig reference
Architecture Overview — How tracing fits in the system
State Management API — Related persistence module

Navigation

Tracing API

Overview

Core Classes

TracingConfig

Import

Constructor

Parameters

TraceCollector

Import

Constructor

Parameters

Key Methods

finalize(session_id) -> Optional[TraceTree]

close() -> None

Span

Import

Fields

Key Methods

close(end_time, status) -> None

add_event(name, attributes) -> None

to_dict() -> Dict[str, Any]

TraceTree

Import

Fields

to_dict() -> Dict[str, Any]

TraceWriter

Import

Methods

JSONFileTraceWriter

Import

Constructor

Trace Events

ExecutionStartEvent

GenerationEvent

ValidationDecisionEvent

ConvergenceEvent

Usage Patterns

Enable Tracing

Custom Writer

Access Trace Programmatically

Best Practices

Related Documentation

`finalize(session_id) -> Optional[TraceTree]`

`close() -> None`

`close(end_time, status) -> None`

`add_event(name, attributes) -> None`

`to_dict() -> Dict[str, Any]`

`to_dict() -> Dict[str, Any]`