Error Handling
MARSYS provides a comprehensive error handling system for robust operation, graceful degradation, and intelligent recovery in multi-agent workflows.
Overview
The error handling system provides:
- Hierarchical Exceptions: Granular error categorization with rich context
- Intelligent Recovery: Automatic retry strategies and fallback mechanisms
- Error Routing: Route errors to User nodes for human intervention
- Provider-Specific Handling: Tailored strategies for different AI providers
Exception Hierarchy
Base Exceptions
from marsys.agents.exceptions import AgentFrameworkError, AgentErrorclass AgentFrameworkError(Exception):"""Base exception for all MARSYS framework errors."""passclass AgentError(AgentFrameworkError):"""Exception for agent-specific errors."""pass
Error Categories
from marsys.agents.exceptions import (AgentFrameworkError,AgentError,ToolExecutionError,ModelError)# Validation Errorsclass ValidationError(MarsysError):"""Input validation failures."""passclass ActionValidationError(ValidationError):"""Invalid agent actions."""pass# Configuration Errorsclass ConfigurationError(MarsysError):"""Configuration problems."""passclass TopologyError(ConfigurationError):"""Topology definition errors."""pass# Execution Errorsclass ExecutionError(MarsysError):"""Runtime execution failures."""passclass TimeoutError(ExecutionError):"""Operation timeout."""pass# API Errorsclass APIError(MarsysError):"""External API failures."""passclass RateLimitError(APIError):"""API rate limit exceeded."""recoverable = True
Error Handling Patterns
Comprehensive Try-Catch
async def execute_agent_task(agent, task, context):"""Execute task with comprehensive error handling."""try:result = await agent.run(task, context)return resultexcept ValidationError as e:# Handle validation errorslogger.warning(f"Validation error: {e}")corrected_task = correct_validation_issues(task, e)return await agent.run(corrected_task, context)except RateLimitError as e:# Handle rate limits with backoffwait_time = 60logger.info(f"Rate limited. Waiting {wait_time}s...")await asyncio.sleep(wait_time)return await execute_agent_task(agent, task, context)except TimeoutError as e:# Handle timeout with retryif context.get("retry_count", 0) < 3:context["retry_count"] = context.get("retry_count", 0) + 1return await execute_agent_task(agent, task, context)raiseexcept MarsysError as e:# Log framework errorslogger.error(f"Framework error: {e}")raiseexcept Exception as e:# Wrap unexpected errorslogger.error(f"Unexpected error: {e}")raise ExecutionError(f"Unexpected error: {str(e)}")
Built-in Retry Logic
MARSYS automatically retries server-side API errors with exponential backoff at the adapter level. No manual retry logic needed for most API calls.
Error Routing to User
Route errors to User nodes for human intervention:
# Topology with error handlingtopology = {"agents": ["User", "Processor", "ErrorHandler"],"flows": ["User -> Processor","Processor -> User", # Success path"Processor -> ErrorHandler", # Error path"ErrorHandler -> User" # Report to user]}# Configure error routingconfig = ExecutionConfig(enable_error_routing=True,preserve_error_context=True)result = await Orchestra.run(task=task,topology=topology,execution_config=config)
Recovery Strategies
Automatic Retry
# Automatic retry with exponential backoffasync def retry_with_backoff(func, max_retries=3):for attempt in range(max_retries):try:return await func()except RateLimitError:wait_time = 2 ** attempt # 1, 2, 4 secondsawait asyncio.sleep(wait_time)raise MaxRetriesExceeded()
Fallback Mechanisms
# Fallback to alternative providerasync def call_with_fallback(primary_agent, fallback_agent, task):try:return await primary_agent.run(task)except APIError:logger.info("Primary failed, using fallback")return await fallback_agent.run(task)
Best Practices
- Catch specific exceptions: Handle different error types appropriately
- Log with context: Include relevant information for debugging
- Use recovery paths: Design topologies with error handling routes
- Set timeouts: Prevent hanging operations with appropriate timeouts
- Test error scenarios: Verify your error handling works correctly
Don't Swallow Errors
Always either handle an error appropriately or re-raise it. Silent failures make debugging extremely difficult.