Built-in Tools Reference
Complete guide to MARSYS built-in tools with prerequisites, setup instructions, and usage examples.
MARSYS includes several built-in tools for common operations. Each tool has specific prerequisites that must be met before use.
On This Page
Web Search Tools
Production Recommendation
For production deployments, use Google Custom Search API (tool_google_search_api or web_search with API key configured). DuckDuckGo has aggressive bot detection and will block automated requests.
DuckDuckGo should only be used for: Development/testing, low-volume use cases (< 10 searches/hour), fallback when Google quota is exhausted, or privacy-sensitive queries.
tool_google_search_api
Google Custom Search API integration for high-quality web search results.
Prerequisites
- Google Cloud Platform account
- Custom Search API enabled
- Two environment variables required
Setup Steps
1. Create a Google Cloud Project
Go to Google Cloud Console, create a new project (or select existing), and give it a descriptive name.
2. Enable Custom Search API
In the Cloud Console, navigate to "APIs & Services" → "Enable APIs and Services" → Search for "Custom Search API" → Click "Enable".
3. Create API Key
Go to "APIs & Services" → "Credentials" → "CREATE CREDENTIALS" → "API key". Copy the key immediately. Recommended: Restrict access by IP address or HTTP referrer.
4. Create Programmable Search Engine
Visit Programmable Search Engine, create a new search engine, and copy the Search engine ID (cx parameter).
5. Set Environment Variables
# Unix/macOS/Linuxexport GOOGLE_SEARCH_API_KEY="your-api-key-here"export GOOGLE_CSE_ID_GENERIC="your-search-engine-id-here"# Or add to .env fileGOOGLE_SEARCH_API_KEY=your-api-key-hereGOOGLE_CSE_ID_GENERIC=your-search-engine-id-here
Usage
from marsys.environment.tools import tool_google_search_api# Perform searchresults = tool_google_search_api(query="Python machine learning",num_results=5,lang="en")
API Limits & Pricing
- Free tier: 100 queries/day
- Paid tier: $5 per 1,000 additional queries (up to 10,000 queries/day max)
- Results: Maximum 10 results per query
tool_google_search_community
Alternative Google search using web scraping (no API key required).
from marsys.environment.tools import tool_google_search_community# Perform search without APIresults = tool_google_search_community(query="Python tutorials",num_results=5,lang="en")
Limitations
- • Slower than API version
- • Rate-limited by Google
- • May be blocked with excessive use
- • Less reliable for production
When to Use
- • Development and testing
- • Personal projects
- • When API quota exhausted
- • No API key available
web_search
Unified web search interface with automatic fallback.
from marsys.environment.tools import web_search# Automatically tries API first, falls back to scraperresults = await web_search(query="AI trends 2025",max_results=5,search_engine="google" # Currently only google supported)
Return Format
[{"title": "Article Title","url": "https://example.com","snippet": "Description of the article...","source": "Google Search API" # or "Google Search (Community Library)"},# ... more results]
fetch_url_content
Fetch and extract clean content from any URL.
from marsys.environment.tools import fetch_url_content# Fetch webpage contentcontent = await fetch_url_content(url="https://example.com/article",timeout=30,include_metadata=True)
Return Format
{"url": "https://example.com/article","title": "Article Title","content": "Clean extracted text content...","markdown": "# Article Title\n\nContent in markdown...","links": ["https://...", ...],"images": ["https://...", ...],"metadata": {"description": "Meta description","author": "Author name","published_date": "2025-01-01"}}
Data Processing Tools
calculate_math
Evaluate mathematical expressions safely.
from marsys.environment.tools import calculate_math# Calculate expressionresult = calculate_math(expression="(2 + 3) * 4 / 2",precision=2)# Returns:# {# "result": 10.0,# "expression": "(2 + 3) * 4 / 2",# "precision": 2# }
Safety Features
- • Uses
ast.literal_evalfor safe evaluation - • Prevents code execution
- • Supports:
+,-,*,/,**,(), numbers
data_transform
Transform and process structured data.
from marsys.environment.tools import data_transform# Transform dataresult = data_transform(data={"values": [1, 2, 3, 4, 5]},operation="statistics", # or "filter", "map", "reduce"params={"fields": ["mean", "median", "std"]})
| Operation | Description |
|---|---|
statistics | Calculate statistical measures |
filter | Filter data by conditions |
map | Transform each element |
reduce | Aggregate data |
File Operations
file_operations
Unified interface for file system operations.
from marsys.environment.tools import file_operations# Read filecontent = await file_operations(operation="read",path="/path/to/file.txt",encoding="utf-8")# Write fileresult = await file_operations(operation="write",path="/path/to/output.txt",content="Hello, World!",mode="write" # or "append")# List directoryfiles = await file_operations(operation="list",path="/path/to/directory",pattern="*.py" # optional glob pattern)
| Operation | Description |
|---|---|
read | Read file contents |
write | Write to file |
append | Append to file |
list | List directory contents |
exists | Check if path exists |
delete | Remove file (use with caution) |
Web Content Tools
read_file
Read and parse various file formats.
from marsys.environment.web_tools import read_file# Read text filecontent = await read_file("/path/to/document.txt")# Read PDFcontent = await read_file("/path/to/document.pdf")
Supported Formats
- • Plain text (
.txt,.md,.py, etc.) - • PDF files (
.pdf) - • Automatic format detection
extract_text_from_pdf
Extract text content from PDF files using pdfminer.six.
from marsys.environment.web_tools import extract_text_from_pdf# Extract PDF texttext = extract_text_from_pdf("/path/to/document.pdf")
clean_and_extract_html
Clean HTML and extract structured content.
from marsys.environment.web_tools import clean_and_extract_html# Extract from HTMLresult = await clean_and_extract_html(html_content="<html>...</html>",base_url="https://example.com",output_format="markdown" # or "text")# Returns:# {# "title": "Page Title",# "content": "Clean content...",# "markdown": "# Title\n\nContent...",# "links": [...],# "images": [...],# "metadata": {...}# }
Tool Registration
Using Tools with Agents
from marsys import Agent, ModelConfigfrom marsys.environment.tools import (web_search,fetch_url_content,calculate_math)agent = Agent(model_config=ModelConfig(type="api",name="anthropic/claude-sonnet-4",provider="openrouter",max_tokens=12000),name="ResearchAgent",goal="Research agent with web search capabilities",instruction="You are a research agent with access to web search, URL fetching, and calculation tools.",tools={"web_search": web_search,"fetch_url_content": fetch_url_content,"calculate_math": calculate_math})
List Available Tools
from marsys.environment.tools import list_tools# Get list of all built-in toolstools = list_tools()print(tools)# Output: ['tool_google_search_api', 'tool_google_search_community', 'web_search', ...]
Get Tool by Name
from marsys.environment.tools import get_tool# Dynamically get tool functionsearch_tool = get_tool("web_search")if search_tool:results = await search_tool("Python tutorials")
Common Issues
Issue: "Google Search API key not configured"
Solution: Set the required environment variables
export GOOGLE_SEARCH_API_KEY="your-api-key"export GOOGLE_CSE_ID_GENERIC="your-cse-id"
Or use the community search instead (no API key required).
Issue: "googlesearch library not installed"
Solution:
pip install googlesearch-python
Issue: PDF extraction fails
Solution: Ensure PDF dependencies are installed
pip install pypdf pdfminer.six
Issue: Rate limiting on web scraping
Solutions:
- Use the API version instead of community scraper
- Add delays between requests
- Implement caching to reduce repeated requests
- Use
web_searchwhich has automatic fallback
Best Practices
1. Use Google API for Production Search
# BEST - Google Custom Search API (recommended for production)from marsys.environment.tools import tool_google_search_api# Requires GOOGLE_SEARCH_API_KEY and GOOGLE_CSE_ID_GENERIC# GOOD - Automatic fallback (API if available, scraper otherwise)from marsys.environment.tools import web_search# DEVELOPMENT ONLY - DuckDuckGo (will be blocked in production)from marsys.environment.search_tools import SearchToolssearch_tools = SearchTools()# Only for testing/development, < 10 searches/hour# AVOID - Community scraper for productionfrom marsys.environment.tools import tool_google_search_community
2. Always Set Timeouts
# GOOD - Prevents hangingcontent = await fetch_url_content(url, timeout=30)# BAD - No explicit timeoutcontent = await fetch_url_content(url) # Uses default, but be explicit
3. Handle Errors Gracefully
try:results = await web_search(query)if results and "error" not in results[0]:process_results(results)except Exception as e:logger.error(f"Search failed: {e}")# Use fallback or notify user
4. Cache Expensive Operations
from functools import lru_cache@lru_cache(maxsize=100)def cached_search(query: str):return web_search(query)
5. Secure Your API Keys
# GOOD - Use environment variablesimport osapi_key = os.getenv("GOOGLE_SEARCH_API_KEY")# NEVER - Hardcode credentialsapi_key = "AIzaSyABC123..." # DON'T DO THIS
Environment Variables
Always store API keys in environment variables or .env files. Never hardcode credentials in your code.
Rate Limits
Be aware of API rate limits and implement appropriate caching and retry strategies for production use.
API Key Security
In the Google Cloud Console, restrict your API key by IP address or HTTP referrer to prevent unauthorized use.
Related Documentation
Tools API Reference
Complete API documentation for tool schemas.
Specialized Tools
Domain-specific tool classes for advanced use cases.
File Operations
Advanced file handling and reading strategies.
Agent API
Integrate tools with agents.