File Operations Toolkit
Advanced file management system with intelligent reading strategies, hierarchical content extraction, and secure editing capabilities for MARSYS agents.
On This Page
Overview
The File Operations Toolkit provides type-aware file handling with advanced features designed for AI agents. This toolkit was designed to address the limitations of simple file reading tools, particularly for handling complex documents like PDFs and source code files, with intelligent token management for vision-language models.
Intelligent Reading
- AUTO, FULL, PARTIAL, OVERVIEW, PROGRESSIVE strategies
- Character-based token management
- Provider-specific image token estimation
Content Extraction
- AST-based parsing for code
- Font-analysis for PDF structure
- Image extraction from documents
Safe Editing
- Unified diff format with fallbacks
- Dry-run preview before applying
- ~98% success rate (Aider-like)
Security Framework
- Run filesystem boundaries & mounts
- Pattern-based permissions
- Audit logging
- Search Capabilities: Content search (grep), filename search (glob), and structure search
- Type-Specific Handlers: Specialized handlers for images, PDFs, JSON, YAML, Markdown, and code files
Prerequisites
Core Dependencies: PDF support and image support are included in the core marsys installation:
pip install marsys # Includes PyMuPDF and Pillow
For advanced code parsing (future feature):
pip install tree-sitter
Core Dependencies
- PDF support is provided by
PyMuPDF(included in core installation). - Image support is provided by
Pillow(PIL) (included in core installation). - Both are automatically installed with marsys for full functionality with vision-language models.
Quick Start
Basic Usage
import osfrom pathlib import Pathfrom marsys import Agentfrom marsys.models import ModelConfigfrom marsys.environment import FileOperationConfig, create_file_operation_toolsfrom marsys.environment.filesystem import RunFileSystem# Create model configuration (example using OpenRouter)# Note: API key is only required if you use API-based modelsmodel_config = ModelConfig(type="api",name="anthropic/claude-opus-4.6",provider="openrouter",api_key=os.getenv("OPENROUTER_API_KEY") # Set if using OpenRouter)# Create file operation tools# Shared run filesystem for all agentsfs = RunFileSystem.local(run_root=Path("./runs/run-20260206"))file_config = FileOperationConfig(run_filesystem=fs)file_tools = create_file_operation_tools(file_config)# Create agent with file capabilitiesfile_agent = Agent(model_config=model_config,goal="Manage and analyze files efficiently",instruction="""You are a file management assistant. Use your file operation tools to:- Read files intelligently based on size and type- Extract structured information from documents- Edit files using unified diff format for reliability- Search for content across multiple files- Maintain security by respecting run filesystem boundariesAlways use the most appropriate reading strategy to optimize token usage.When editing, prefer unified diff format for complex changes.""",name="FileAssistant",tools=file_tools)# Use the agentresult = await file_agent.run(prompt="Read the README.md file and summarize its contents")
API Keys
If you are using API-based models (like OpenRouter, OpenAI, etc.), ensure the appropriate API key environment variable is set (e.g., OPENROUTER_API_KEY, OPENAI_API_KEY).
Virtual Paths & RunFileSystem
File operations use virtual POSIX paths. Tool-returned paths are typically ./..., while the run filesystem also accepts absolute virtual form (/...):
/is the run root- Use
./downloads,./screenshots,./outputsfor common artifacts - Relative paths like
./data.csvresolve against the run filesystem working directory
To share files between agents, create a RunFileSystem once and pass it to tools/agents:
from pathlib import Pathfrom marsys.environment.filesystem import RunFileSystemfrom marsys.environment import FileOperationConfig, create_file_operation_toolsfs = RunFileSystem.local(run_root=Path("./runs/run-20260206"))config = FileOperationConfig(run_filesystem=fs)file_tools = create_file_operation_tools(config)
See Run Filesystem for details and mount examples.
Configuration
Default Configuration
from pathlib import Pathfrom marsys.environment import create_file_operation_tools, FileOperationConfig# Use defaults (permissive mode)file_tools = create_file_operation_tools()
Custom Configuration
from pathlib import Pathfrom marsys.environment import FileOperationConfig, create_file_operation_toolsfrom marsys.environment.filesystem import RunFileSystem# Create custom configuration (virtual filesystem + security)fs = RunFileSystem.local(run_root=Path("/home/user/projects"),extra_mounts={"/datasets": Path("/shared/datasets")})config = FileOperationConfig(base_directory=Path("/home/user/projects"),run_filesystem=fs,# File size limits (hard limit for safety)max_file_size_bytes=100 * 1024 * 1024, # 100 MB absolute limit# Character-based reading thresholds (token proxy for text)small_file_threshold=10000, # < 10k chars (~2.5k tokens): FULL readmedium_file_threshold=100000, # 10-100k chars (~25k tokens): PARTIAL readlarge_file_threshold=500000, # > 500k chars (~125k tokens): OVERVIEW first# File type-specific limitsmax_json_content_chars=40000, # JSON truncation threshold (~10k tokens)max_lines_per_read=250, # Max lines for text filesmax_pages_per_read=5, # Max pages for PDF files# Absolute safety limit (applies to ALL file types)max_characters_absolute=120000, # Hard limit: 120K chars (~30k tokens)# Image token limits (for vision models)max_image_pixels=1024 * 1024, # 1 megapixel (1024x1024)max_images_per_read=4, # Maximum images per operation# Security patterns (glob-style)blocked_patterns=["*.key", "*.pem", "*.p12", # Private keys".env", ".env.*", # Environment files"*.sqlite", "*.db", # Databases".git/**", # Git internals],# Auto-approve patterns (no user confirmation needed)auto_approve_patterns=["*.md", "*.txt", # Documentation"*.py", "*.js", "*.java", # Source code"*.json", "*.yaml", "*.yml", # Configuration],# Require approval patternsrequire_approval_patterns=["*.sh", "*.bash", # Shell scripts"Makefile", "Dockerfile", # Build files"*.sql", # SQL files],# Feature flagsenable_delete=False,enable_tree_sitter=True,enable_semantic_search=False,enable_caching=True,cache_ttl_seconds=300,# Searchmax_search_results=100,# Audit loggingenable_audit_logging=True,log_file_path=Path("./file_operations_audit.log"),)# Create tools with custom configfile_tools = create_file_operation_tools(config)
Character Limit Configuration
The toolkit uses different character limits for different purposes:
| Limit | Default | Applies To | Behavior |
|---|---|---|---|
max_json_content_chars | 40,000 | JSON files | Triggers truncation/overview |
max_lines_per_read | 250 lines | Text files | Controls line-based partial reading |
max_pages_per_read | 5 pages | PDF files | Controls page-based partial reading |
max_characters_absolute | 120,000 | ALL file types | Hard limit that raises error |
Context Window Management
The absolute character limit (120K) leaves room for system prompts (~2-5K tokens), agent memory/history (~10-20K tokens), images (~500-2000 tokens per image), and response generation (~2-10K tokens). Total context budget: ~128K-200K tokens for most modern models.
Preset Configurations
from marsys.environment import FileOperationConfig, create_file_operation_toolsfrom pathlib import Path# Permissive mode (default)permissive_config = FileOperationConfig.create_permissive()permissive_tools = create_file_operation_tools(permissive_config)# Restrictive mode (tighter security)restrictive_config = FileOperationConfig.create_restrictive(base_directory=Path("/workspace"))restrictive_tools = create_file_operation_tools(restrictive_config)
Reading Strategies
The toolkit provides five intelligent reading strategies to optimize token usage:
AUTO Strategy (Default)
Automatically selects the best strategy based on file size (character count for text files):
- < 10k characters (~2.5k tokens): FULL read (complete content)
- 10-100k characters (~2.5-25k tokens): PARTIAL read (sections with overview)
- 100-500k characters (~25-125k tokens): PROGRESSIVE (structure first, drill down)
- > 500k characters (~125k+ tokens): OVERVIEW (structure + summary only)
FULL Strategy
Read complete file contents.
Best for: Small config files, complete data processing, files under 10 KB
PARTIAL Strategy
Read with structure overview + selected sections.
Best for: Medium-sized documents, specific sections needed, 10-100 KB files
OVERVIEW Strategy
Extract structure and summary only.
Best for: Large documents, initial exploration, files > 100 KB
PROGRESSIVE Strategy
Load sections incrementally on demand.
Best for: Very large files, code exploration, files > 500 KB
from marsys.environment.file_operations import ReadStrategy# Use FULL strategy for small filesresult = await file_agent.run(prompt="Read config.yaml using FULL strategy",context={"read_strategy": ReadStrategy.FULL})# Use OVERVIEW for large documentsresult = await file_agent.run(prompt="Get overview of large_report.pdf",context={"read_strategy": ReadStrategy.OVERVIEW})# PROGRESSIVE for incremental explorationresult1 = await file_agent.run(prompt="Get structure of codebase/main.py",context={"read_strategy": ReadStrategy.PROGRESSIVE})result2 = await file_agent.run(prompt="Read section 'class:DatabaseManager' from main.py",context={"section_id": "class:DatabaseManager"})
Incremental Reading
For large documents, the toolkit provides incremental reading capabilities that allow agents to request specific page or line ranges.
Reading Specific PDF Pages
# Read pages 5-10 of a PDFresult = await file_agent.run(prompt="Read pages 5 to 10 from research_paper.pdf",context={"start_page": 5,"end_page": 10})# Features:# - Automatic limit enforcement (prevents requesting too many pages)# - Clean response without usage guides# - Returns pure content from requested range
Reading Specific Text Lines
# Read lines 100-200 from a code fileresult = await file_agent.run(prompt="Read lines 100 to 200 from main.py",context={"start_line": 100,"end_line": 200})# Features:# - Character-based limit enforcement# - Maximum characters enforced by max_characters_absolute (120K default)# - Clean response without usage guides (explicit request)
Automatic Overflow Handling
When a file exceeds max_characters_absolute (120K default) and no explicit range is specified, the system raises an error telling the agent to request specific page/line ranges instead.
For PDFs
Returns first N pages (default: 5) with usage guide header/footer explaining total pages and how to request more.
For Text Files
Truncates content at character limit and prepends usage guide showing total lines with guidance on reading more with line ranges.
Important: Usage guides are only shown for automatic overflow, not for explicit range requests.
Request Validation
# Request too many pages - returns error response:# {# "error": true,# "message": "Request exceeds maximum pages per read",# "details": {# "requested_pages": 200,# "maximum_pages": 100,# "suggestion": "Request fewer pages (e.g., start_page=1, end_page=100)"# }# }# Limits enforced:# - PDF pages: max_pages_per_read (default: 5 pages)# - Text lines: max_lines_per_read (default: 250 lines)# - All file types: max_characters_absolute (default: 120K characters)
Search Within Large Documents
# Search for keywords in PDF with page numbersresult = await file_agent.run(prompt="Search for 'machine learning' in research_paper.pdf",context={"search_type": "content","pattern": "machine learning","include_context": True})# Result includes page numbers and line numbers:# {# "matches": [# {# "match": "Machine learning algorithms...",# "location": "page 3, line 45",# "page": 3,# "line": 45,# "context_before": [...],# "context_after": [...]# }# ]# }
Structure Extraction
The toolkit extracts hierarchical structure from various file types:
PDF Structure
Uses font-size analysis to detect headings:
# Returns DocumentStructure with sections like:# Section(id="1", title="Introduction", level=1, ...)# |-- Section(id="1.1", title="Background", level=2)# `-- Section(id="1.2", title="Motivation", level=2)
Code Structure (Future)
AST-based parsing with tree-sitter:
# Returns hierarchy like:# Section(id="module", title="models.py", ...)# |-- Section(id="class:User", title="class User")# | |-- Section(id="method:__init__", ...)# | `-- Section(id="method:validate", ...)# `-- Section(id="class:Database", ...)
Accessing Sections
# Read specific section by IDresult = await file_agent.run(prompt="Read section '1.2' from research_paper.pdf",context={"section_id": "1.2"})
Editing Files
Unified Diff Format (Recommended)
High-success-rate editing using unified diff format with multiple fallback strategies:
result = await file_agent.run(prompt="""Edit config.py using unified diff format:--- config.py+++ config.py@@ -10,3 +10,3 @@DEBUG = True-MAX_WORKERS = 4+MAX_WORKERS = 8LOG_LEVEL = "INFO"""",context={})# Features:# - Multiple fallback strategies (exact match -> whitespace normalization -> fuzzy matching)# - Dry-run preview before applying# - Detailed change reports# - ~98% success rate
Search and Replace
result = await file_agent.run(prompt="""Replace 'old_function()' with 'new_function()' in utils.py""",context={})
Dry Run Preview
# Preview changes before applyingresult = await file_agent.run(prompt="""Show me what would change if I apply this diff (dry run):--- app.py+++ app.py@@ -5,1 +5,1 @@-version = "1.0.0"+version = "1.1.0"""",context={"dry_run": True})
Image Support & Token Estimation
The toolkit provides comprehensive image support with provider-specific token estimation for vision-language models.
Reading Images
# Read an image file with token estimationresult = await file_agent.run(prompt="Read the diagram.png image",context={"provider": "anthropic", # Options: openai, anthropic, google, xai, generic"detail": "high", # Options: high, low (affects some providers)"max_pixels": 1024 * 1024 # Downsample if exceeds this limit})# Result includes:# - Image dimensions and format# - Estimated token count for the provider# - Base64-encoded image data (for sending to VLM)# - Metadata (DPI, color mode, etc.)
Token Estimation by Provider
| Provider | Formula | 1024x1024 Example |
|---|---|---|
| OpenAI (GPT-4V, GPT-4o) | 85 + (170 * num_tiles) | 765 tokens |
| Anthropic (Claude) | (width * height) / 750 | 1,398 tokens |
| Google (Gemini) | 258 per 768x768 tile | 1,032 tokens |
| xAI (Grok) | (num_tiles + 1) * 256 | 1,792 tokens |
Image Configuration
config = FileOperationConfig(# Image pixel limitsmax_image_pixels=1024 * 1024, # 1 megapixel max per imagemax_images_per_read=4, # Max images in single operation# Auto-downsample if needed# Images exceeding max_pixels will be resized maintaining aspect ratio)
Supported Image Formats
Image Extraction from PDFs
# Read PDF with images (future feature)result = await file_agent.run(prompt="Read report.pdf and include any charts or diagrams",context={"extract_images": True,"provider": "openai", # For token estimation"max_images": 4 # Limit to key visuals})# Result includes both text and ImageData objectsprint(f"Text tokens: {result.estimated_tokens}")print(f"Image tokens: {result.total_estimated_image_tokens}")print(f"Total tokens: {result.get_total_estimated_tokens()}")
Token Budget Management
from marsys.environment.file_operations.token_estimation import (estimate_total_tokens,should_downsample_image)# Estimate total tokens before readingestimation = estimate_total_tokens(text_content="Sample text...",images=[(1920, 1080), (1024, 768)], # Image dimensionsprovider="anthropic")print(f"Total estimated tokens: {estimation['total_tokens']}")print(f" Text: {estimation['text_tokens']} tokens")print(f" Images: {estimation['image_tokens']} tokens")# Check if image needs downsamplingneeds_downsample, target_dims = should_downsample_image(width=2048,height=2048,max_pixels=1024 * 1024,max_tokens=1000,provider="openai")if needs_downsample:print(f"Downsample to: {target_dims}")
Search Capabilities
Content Search (Grep)
Search file contents with regex patterns, case sensitivity options, context lines, and file type filtering.
Filename Search (Glob)
Find files by name patterns with glob support, recursive search, and file metadata.
Structure Search
Search within document structures for class definitions, sections, and more.
# Content Search with regexresult = await file_agent.run(prompt="Search for 'TODO' comments in all Python files",context={"search_type": "content","pattern": r"#\s*TODO:","file_pattern": "*.py"})# PDF Content Search with page numbersresult = await file_agent.run(prompt="Search for 'neural network' in all PDF files",context={"search_type": "content","pattern": "neural network","file_pattern": "*.pdf","include_context": True,"context_lines": 2})# Returns matches with page numbers:# {"matches": [{"match": "...", "location": "page 5, line 23", "page": 5, "line": 23}]}# Filename Searchresult = await file_agent.run(prompt="Find all test files",context={"search_type": "filename","pattern": "**/test_*.py"})# Structure Searchresult = await file_agent.run(prompt="Find all class definitions in the codebase",context={"search_type": "structure","query": "class:*"})
Security Features
Run Filesystem Root & Mounts
Restrict operations to a run root and optionally add mounts:
from pathlib import Pathfrom marsys.environment import FileOperationConfig, create_file_operation_toolsconfig = FileOperationConfig(base_directory=Path("/home/user/safe_workspace"),extra_mounts={"/datasets": Path("/shared/datasets")})file_tools = create_file_operation_tools(config)# Attempts to escape the run root are blocked# Virtual paths like /datasets/* map to the mounted host directory
Pattern-Based Permissions
config = FileOperationConfig(# Block these patterns entirelyblocked_patterns=["*.key", "*.pem", # Private keys".env*", # Environment files".git/**", # Git internals],# Auto-approve these patterns (no confirmation)auto_approve_patterns=["*.md", "*.txt", # Safe documents"*.json", # Configuration],# Require user approval for theserequire_approval_patterns=["*.sh", # Shell scripts"*.sql", # Database queries])
File Size Limits
config = FileOperationConfig(max_file_size_bytes=10 * 1024 * 1024, # 10 MB limitmax_characters_absolute=80000, # Absolute read limit (chars)max_lines_per_read=200 # Line-based partial reads)
Audit Logging
config = FileOperationConfig(enable_audit_logging=True,log_file_path=Path("./file_ops_audit.log"))# All operations logged with:# - Timestamp# - Operation type# - File path# - Success/failure# - Agent name
Available Tools
When you call create_file_operation_tools(), you get these tools:
| Tool | Description | Key Parameters |
|---|---|---|
read_file | Read file with intelligent strategy | path, strategy, start_page, end_page, start_line, end_line |
write_file | Write content to file | path, content |
edit_file | Edit using unified diff or search/replace | path, changes, edit_format, dry_run |
search_files | Search content, filenames, or structure | query, search_type, path, include_context |
get_file_structure | Extract hierarchical structure | path |
read_file_section | Read specific section by ID | path, section_id |
list_files | List directory contents | path, pattern |
create_directory | Create directories | path |
delete_file | Delete files (with approval) | path |
Common Issues
Issue: "PyMuPDF not available"
Solution: PyMuPDF is included in core marsys installation for PDF text/image extraction and layout analysis.
pip install --upgrade marsys # PyMuPDF is included in core# Or install directly:pip install PyMuPDF
Issue: "Pillow (PIL) not available"
Solution: Pillow is included in core marsys installation for image file reading, PDF image extraction, and token estimation.
pip install --upgrade marsys # Pillow is included in core# Or install directly:pip install Pillow
Issue: "Path outside run filesystem"
Solution: Either expand the run root or mount the external folder into the virtual filesystem.
# Either expand the run rootconfig = FileOperationConfig(base_directory=Path("/broader/path"))# Or mount the external folder into the virtual filesystemconfig = FileOperationConfig(base_directory=Path("/workspace"),extra_mounts={"/datasets": Path("/shared/datasets")})
Issue: "File too large to read"
Solution: Use OVERVIEW or PROGRESSIVE strategy, or increase limits cautiously.
# Use OVERVIEW or PROGRESSIVE strategyresult = await file_agent.run(prompt="Get overview of large_file.pdf",context={"read_strategy": ReadStrategy.OVERVIEW})# Or increase limits (use cautiously)config = FileOperationConfig(max_file_size_bytes=100 * 1024 * 1024, # 100 MB hard limitmax_characters_absolute=150000, # Increase absolute limit (use cautiously!)max_pages_per_read=10, # More pages per requestmax_lines_per_read=500, # More lines per request)
Issue: "Request exceeds maximum pages per read"
Solution: Request fewer pages, increase the limit, or read in batches.
# Option 1: Request fewer pages (recommended)result = await file_agent.run(prompt="Read pages 1-7 instead of 1-50",context={"start_page": 1, "end_page": 7})# Option 2: Increase page limit via configconfig = FileOperationConfig(max_pages_per_read=10, # Increase pages per requestmax_characters_absolute=150000, # May also need to increase absolute limit)# Option 3: Read in batchesfor start in range(1, 50, 7):result = await file_agent.run(prompt=f"Read pages {start} to {start+6}",context={"start_page": start, "end_page": min(start+6, 50)})
Issue: "Image exceeds token budget"
Solution: Images are automatically downsampled if they exceed max_pixels.
config = FileOperationConfig(max_image_pixels=2 * 1024 * 1024, # 2 megapixelsmax_images_per_read=6 # More images allowed)# Or specify max_pixels per operationresult = await file_agent.run(prompt="Read large_image.png",context={"max_pixels": 2 * 1024 * 1024} # Will downsample if needed)
Issue: "Edit failed to apply"
Solution: Use dry-run to preview, and if it still fails, use search-replace format instead.
# Use dry-run to previewresult = await file_agent.run(prompt="Apply diff with dry-run first",context={"dry_run": True})# Check the preview before applying# If still fails, use search-replace format instead
Best Practices
1. Use Appropriate Reading Strategies
# GOOD - Let AUTO choose or use incremental readingresult = await file_agent.run("Read document.pdf", context={})result = await file_agent.run("Read pages 10-15 from large_report.pdf",context={"start_page": 10, "end_page": 15})# GOOD - Use search to find relevant sections firstresult = await file_agent.run("Search for 'conclusions' in report.pdf, then read those pages",context={})# AVOID - Requesting entire large file at onceresult = await file_agent.run("Read all 100 pages of manual.pdf", context={})
2. Leverage Search for Large Documents
# GOOD - Search before readingresult = await file_agent.run(prompt="""Find sections about 'machine learning' in research.pdf,then read the relevant pages in detail""",context={})
3. Always Use Dry Run for Critical Edits
# GOOD - Preview before applyingresult = await file_agent.run("Update production config (dry run first)",context={"dry_run": True})
4. Restrict Run Root for Security
# GOOD - Limit the run rootconfig = FileOperationConfig(base_directory=Path("/workspace"))
5. Use Pattern-Based Permissions
# GOOD - Block sensitive filesconfig = FileOperationConfig(blocked_patterns=["*.key", "*.pem", ".env*"])
6. Enable Audit Logging for Production
# GOOD - Track all operationsconfig = FileOperationConfig(enable_audit_logging=True,log_file_path=Path("./audit.log"))
Security
Always configure blocked_patterns to prevent access to sensitive files like private keys, environment variables, and credentials.
Token Management
The toolkit uses character count (not file size) as a proxy for text tokens. Image tokens are estimated using provider-specific formulas. AUTO strategy intelligently selects reading approach based on character count.
Incremental Reading (New in v0.2)
Use start_page/end_page for PDFs and start_line/end_line for text files to efficiently read large documents in chunks.
Related Documentation
Specialized Tools
FileOperationTools class and other domain-specific tools.
Multimodal Agents
Building agents that process images and documents.
Run Filesystem
Virtual path semantics and mount configuration.
Agent API
Integrate file tools with agents.