File Operations Toolkit

Advanced file management system with intelligent reading strategies, hierarchical content extraction, and secure editing capabilities for MARSYS agents.

Overview

The File Operations Toolkit provides type-aware file handling with advanced features designed for AI agents:

Intelligent Reading

  • • AUTO, FULL, PARTIAL, OVERVIEW, PROGRESSIVE strategies
  • • Character-based token management
  • • Provider-specific image token estimation

Content Extraction

  • • AST-based parsing for code
  • • Font-analysis for PDF structure
  • • Image extraction from documents

Safe Editing

  • • Unified diff format with fallbacks
  • • Dry-run preview before applying
  • • ~98% success rate (Aider-like)

Security Framework

  • • Base directory enforcement
  • • Pattern-based permissions
  • • Audit logging

Core Dependencies

PDF support (PyMuPDF) and image support (Pillow) are included in the core marsys installation.

Quick Start

import os
from pathlib import Path
from marsys import Agent
from marsys.models import ModelConfig
from marsys.environment import create_file_operation_tools
# Create model configuration
model_config = ModelConfig(
type="api",
name="anthropic/claude-sonnet-4",
provider="openrouter",
api_key=os.getenv("OPENROUTER_API_KEY")
)
# Create file operation tools
file_tools = create_file_operation_tools()
# Create agent with file capabilities
file_agent = Agent(
model_config=model_config,
goal="Manage and analyze files efficiently",
instruction="""You are a file management assistant. Use your file operation tools to:
- Read files intelligently based on size and type
- Extract structured information from documents
- Edit files using unified diff format for reliability
- Search for content across multiple files
- Maintain security by respecting base directory restrictions
Always use the most appropriate reading strategy to optimize token usage.
When editing, prefer unified diff format for complex changes.""",
name="FileAssistant",
tools=file_tools
)
# Use the agent
result = await file_agent.run(
prompt="Read the README.md file and summarize its contents",
context={"working_dir": Path.cwd()}
)

Configuration

Default Configuration

from pathlib import Path
from marsys.environment import create_file_operation_tools, FileOperationConfig
# Use defaults (permissive mode)
file_tools = create_file_operation_tools()

Custom Configuration

from pathlib import Path
from marsys.environment import FileOperationConfig, create_file_operation_tools
config = FileOperationConfig(
# Base directory enforcement
base_directory=Path("/home/user/projects"),
force_base_directory=True, # Require all operations within base_directory
# File size limits (hard limit for safety)
max_file_size_bytes=100 * 1024 * 1024, # 100 MB absolute limit
# Character-based reading thresholds (token proxy for text)
small_file_threshold=10000, # < 10k chars (~2.5k tokens): FULL read
medium_file_threshold=100000, # 10-100k chars (~25k tokens): PARTIAL read
large_file_threshold=500000, # > 500k chars (~125k tokens): OVERVIEW first
# File type-specific limits
max_json_content_chars=40000, # JSON truncation threshold (~10k tokens)
max_lines_per_read=250, # Max lines for text files
max_pages_per_read=5, # Max pages for PDF files
# Absolute safety limit (applies to ALL file types)
max_characters_absolute=120000, # Hard limit: 120K chars (~30k tokens)
# Image token limits (for vision models)
max_image_pixels=1024 * 1024, # 1 megapixel (1024x1024)
max_images_per_read=4, # Maximum images per operation
# Security patterns (glob-style)
blocked_patterns=[
"*.key", "*.pem", "*.p12", # Private keys
".env", ".env.*", # Environment files
"*.sqlite", "*.db", # Databases
".git/**", # Git internals
],
# Auto-approve patterns (no user confirmation needed)
auto_approve_patterns=[
"*.md", "*.txt", # Documentation
"*.py", "*.js", "*.java", # Source code
"*.json", "*.yaml", "*.yml", # Configuration
],
# Require approval patterns
require_approval_patterns=[
"*.sh", "*.bash", # Shell scripts
"Makefile", "Dockerfile", # Build files
"*.sql", # SQL files
],
# Editing
enable_editing=True,
enable_dry_run=True, # Allow preview before applying edits
# Audit logging
enable_audit_log=True,
audit_log_path=Path("./file_operations_audit.log"),
)
# Create tools with custom config
file_tools = create_file_operation_tools(config)

Character Limit Configuration

The toolkit uses different character limits for different purposes:

LimitDefaultApplies ToBehavior
max_json_content_chars40,000JSON filesTriggers truncation/overview
max_lines_per_read250 linesText filesControls line-based partial reading
max_pages_per_read5 pagesPDF filesControls page-based partial reading
max_characters_absolute120,000ALL file typesHard limit that raises error

Context Window Management

The absolute character limit (120K) leaves room for system prompts (~2-5K tokens), agent memory/history (~10-20K tokens), images (~500-2000 tokens per image), and response generation (~2-10K tokens).

Preset Configurations

from marsys.environment import FileOperationConfig, create_file_operation_tools
# Permissive mode (default)
permissive_config = FileOperationConfig.create_permissive()
permissive_tools = create_file_operation_tools(permissive_config)
# Restrictive mode (tighter security)
restrictive_config = FileOperationConfig.create_restrictive()
restrictive_tools = create_file_operation_tools(restrictive_config)

Reading Strategies

The toolkit provides five intelligent reading strategies to optimize token usage:

AUTO Strategy (Default)

Automatically selects the best strategy based on file size (character count for text files):

  • < 10k characters (~2.5k tokens): FULL read (complete content)
  • 10-100k characters (~2.5-25k tokens): PARTIAL read (sections with overview)
  • 100-500k characters (~25-125k tokens): PROGRESSIVE (structure first, drill down)
  • > 500k characters (~125k+ tokens): OVERVIEW (structure + summary only)

FULL Strategy

Read complete file contents.

Best for: Small config files, complete data processing, files under 10 KB

PARTIAL Strategy

Read with structure overview + selected sections.

Best for: Medium-sized documents, specific sections needed, 10-100 KB files

OVERVIEW Strategy

Extract structure and summary only.

Best for: Large documents, initial exploration, files > 100 KB

PROGRESSIVE Strategy

Load sections incrementally on demand.

Best for: Very large files, code exploration, files > 500 KB

from marsys.environment.file_operations import ReadStrategy
# Use FULL strategy for small files
result = await file_agent.run(
prompt="Read config.yaml using FULL strategy",
context={"read_strategy": ReadStrategy.FULL}
)
# Use OVERVIEW for large documents
result = await file_agent.run(
prompt="Get overview of large_report.pdf",
context={"read_strategy": ReadStrategy.OVERVIEW}
)
# PROGRESSIVE for incremental exploration
result1 = await file_agent.run(
prompt="Get structure of codebase/main.py",
context={"read_strategy": ReadStrategy.PROGRESSIVE}
)
result2 = await file_agent.run(
prompt="Read section 'class:DatabaseManager' from main.py",
context={"section_id": "class:DatabaseManager"}
)

Incremental Reading

For large documents, the toolkit provides incremental reading capabilities that allow agents to request specific page or line ranges.

Reading Specific PDF Pages

# Read pages 5-10 of a PDF
result = await file_agent.run(
prompt="Read pages 5 to 10 from research_paper.pdf",
context={
"start_page": 5,
"end_page": 10
}
)
# Features:
# - Automatic limit enforcement (prevents requesting too many pages)
# - Clean response without usage guides
# - Returns pure content from requested range

Reading Specific Text Lines

# Read lines 100-200 from a code file
result = await file_agent.run(
prompt="Read lines 100 to 200 from main.py",
context={
"start_line": 100,
"end_line": 200
}
)
# Features:
# - Character-based limit enforcement
# - Maximum characters enforced by max_characters_absolute (120K default)
# - Clean response without usage guides (explicit request)

Automatic Overflow Handling

When a file exceeds max_characters_absolute (120K default) and no explicit range is specified:

For PDFs

Returns first N pages (default: 5) with usage guide header/footer explaining total pages and how to request more.

For Text Files

Truncates content at character limit and prepends usage guide showing total lines with guidance on reading more with line ranges.

Request Validation

# Request too many pages - returns error response:
# {
# "error": true,
# "message": "Request exceeds maximum pages per read",
# "details": {
# "requested_pages": 200,
# "maximum_pages": 100,
# "suggestion": "Request fewer pages (e.g., start_page=1, end_page=100)"
# }
# }
# Limits enforced:
# - PDF pages: max_pages_per_read (default: 5 pages)
# - Text lines: max_lines_per_read (default: 250 lines)
# - All file types: max_characters_absolute (default: 120K characters)

Structure Extraction

The toolkit extracts hierarchical structure from various file types:

PDF Structure

Uses font-size analysis to detect headings:

# Returns DocumentStructure with sections like:
# Section(id="1", title="Introduction", level=1, ...)
# ├── Section(id="1.1", title="Background", level=2)
# └── Section(id="1.2", title="Motivation", level=2)

Code Structure

AST-based parsing with tree-sitter:

# Returns hierarchy like:
# Section(id="module", title="models.py", ...)
# ├── Section(id="class:User", title="class User")
# │ ├── Section(id="method:__init__", ...)
# │ └── Section(id="method:validate", ...)

Accessing Sections

# Read specific section by ID
result = await file_agent.run(
prompt="Read section '1.2' from research_paper.pdf",
context={"section_id": "1.2"}
)

Editing Files

Unified Diff Format (Recommended)

High-success-rate editing using unified diff format with multiple fallback strategies:

result = await file_agent.run(
prompt="""Edit config.py using unified diff format:
--- config.py
+++ config.py
@@ -10,3 +10,3 @@
DEBUG = True
-MAX_WORKERS = 4
+MAX_WORKERS = 8
LOG_LEVEL = "INFO"
""",
context={}
)
# Features:
# - Multiple fallback strategies (exact match → whitespace normalization → fuzzy matching)
# - Dry-run preview before applying
# - Detailed change reports
# - ~98% success rate

Dry Run Preview

# Preview changes before applying
result = await file_agent.run(
prompt="""Show me what would change if I apply this diff (dry run):
--- app.py
+++ app.py
@@ -5,1 +5,1 @@
-version = "1.0.0"
+version = "1.1.0"
""",
context={"dry_run": True}
)

Image Support & Token Estimation

The toolkit provides comprehensive image support with provider-specific token estimation for vision-language models.

Reading Images

# Read an image file with token estimation
result = await file_agent.run(
prompt="Read the diagram.png image",
context={
"provider": "anthropic", # Options: openai, anthropic, google, xai, generic
"detail": "high", # Options: high, low (affects some providers)
"max_pixels": 1024 * 1024 # Downsample if exceeds this limit
}
)
# Result includes:
# - Image dimensions and format
# - Estimated token count for the provider
# - Base64-encoded image data (for sending to VLM)
# - Metadata (DPI, color mode, etc.)

Token Estimation by Provider

ProviderFormula1024x1024 Example
OpenAI (GPT-4V, GPT-4o)85 + (170 * num_tiles)765 tokens
Anthropic (Claude)(width * height) / 7501,398 tokens
Google (Gemini)258 per 768x768 tile1,032 tokens
xAI (Grok)(num_tiles + 1) * 2561,792 tokens

Supported Image Formats

JPEG (.jpg, .jpeg)PNG (.png)GIF (.gif)WebP (.webp)BMP (.bmp)TIFF (.tiff, .tif)ICO (.ico)SVG (.svg)

Search Capabilities

Content Search (Grep)

Search file contents with regex patterns, case sensitivity options, context lines, and file type filtering.

Filename Search (Glob)

Find files by name patterns with glob support, recursive search, and file metadata.

Structure Search

Search within document structures for class definitions, sections, and more.

# Content Search with regex
result = await file_agent.run(
prompt="Search for 'TODO' comments in all Python files",
context={
"search_type": "content",
"pattern": r"#\s*TODO:",
"file_pattern": "*.py"
}
)
# PDF Content Search with page numbers
result = await file_agent.run(
prompt="Search for 'neural network' in all PDF files",
context={
"search_type": "content",
"pattern": "neural network",
"file_pattern": "*.pdf",
"include_context": True,
"context_lines": 2
}
)
# Returns matches with page numbers:
# {"matches": [{"match": "...", "location": "page 5, line 23", "page": 5, "line": 23}]}
# Filename Search
result = await file_agent.run(
prompt="Find all test files",
context={
"search_type": "filename",
"pattern": "**/test_*.py"
}
)
# Structure Search
result = await file_agent.run(
prompt="Find all class definitions in the codebase",
context={
"search_type": "structure",
"query": "class:*"
}
)

Security Features

Base Directory Enforcement

from pathlib import Path
from marsys.environment import FileOperationConfig, create_file_operation_tools
config = FileOperationConfig(
base_directory=Path("/home/user/safe_workspace"),
force_base_directory=True # Reject operations outside this directory
)
file_tools = create_file_operation_tools(config)
# Attempts to access /etc/passwd will be blocked

Pattern-Based Permissions

config = FileOperationConfig(
# Block these patterns entirely
blocked_patterns=[
"*.key", "*.pem", # Private keys
".env*", # Environment files
".git/**", # Git internals
],
# Auto-approve these patterns (no confirmation)
auto_approve_patterns=[
"*.md", "*.txt", # Safe documents
"*.json", # Configuration
],
# Require user approval for these
require_approval_patterns=[
"*.sh", # Shell scripts
"*.sql", # Database queries
]
)

Audit Logging

config = FileOperationConfig(
enable_audit_log=True,
audit_log_path=Path("./file_ops_audit.log")
)
# All operations logged with:
# - Timestamp
# - Operation type
# - File path
# - Success/failure
# - Agent name

Available Tools

ToolDescriptionKey Parameters
read_fileRead file with intelligent strategypath, strategy, start_page, end_page, start_line, end_line
write_fileWrite content to filepath, content
edit_fileEdit using unified diff or search/replacepath, changes, edit_format, dry_run
search_filesSearch content, filenames, or structurequery, search_type, path, include_context
get_file_structureExtract hierarchical structurepath
read_file_sectionRead specific section by IDpath, section_id
list_filesList directory contentspath, pattern
create_directoryCreate directoriespath
delete_fileDelete files (with approval)path

Best Practices

1. Use Appropriate Reading Strategies

# GOOD - Let AUTO choose or use incremental reading
result = await file_agent.run("Read document.pdf", context={})
result = await file_agent.run("Read pages 10-15 from large_report.pdf",
context={"start_page": 10, "end_page": 15})
# AVOID - Requesting entire large file at once
result = await file_agent.run("Read all 100 pages of manual.pdf", context={})

2. Leverage Search for Large Documents

# GOOD - Search before reading
result = await file_agent.run(
prompt="""Find sections about 'machine learning' in research.pdf,
then read the relevant pages in detail""",
context={}
)

3. Always Use Dry Run for Critical Edits

# GOOD - Preview before applying
result = await file_agent.run(
"Update production config (dry run first)",
context={"dry_run": True}
)

4. Restrict Base Directory for Security

# GOOD - Enforce base directory
config = FileOperationConfig(
base_directory=Path("/workspace"),
force_base_directory=True
)

5. Enable Audit Logging for Production

# GOOD - Track all operations
config = FileOperationConfig(
enable_audit_log=True,
audit_log_path=Path("./audit.log")
)

Security

Always configure blocked_patterns to prevent access to sensitive files like private keys, environment variables, and credentials.

Token Management

The toolkit uses character count (not file size) as a proxy for text tokens. Image tokens are estimated using provider-specific formulas. AUTO strategy intelligently selects reading approach based on character count.

Incremental Reading (New in v0.2)

Use start_page/end_page for PDFs and start_line/end_line for text files to efficiently read large documents in chunks.

Related Documentation

Tools API Reference

Complete API documentation for tool schemas.

Specialized Tools

FileOperationTools class and other domain-specific tools.

Multimodal Agents

Building agents that process images and documents.

Agent API

Integrate file tools with agents.