Browser Automation
MARSYS provides powerful browser automation capabilities through the BrowserAgent, enabling web scraping, interaction, and intelligent navigation.
Overview
The browser automation system provides:
- Dual Operation Modes: PRIMITIVE for fast content extraction, ADVANCED for complex multi-step scenarios with visual interaction
- Web Navigation: Navigate, scrape, and interact with websites
- Intelligent Automation: LLM-guided browser control and decision making
- Dynamic Content Handling: JavaScript execution and async content loading
- Form Automation: Fill forms, click elements, and handle interactions
- Multimodal Capabilities: Screenshot-based visual understanding with element detection (ADVANCED mode)
- Session Persistence: Save and load browser sessions across runs
- Robust Error Handling: Retry mechanisms and resilient operations
Operation Modes
PRIMITIVE Mode
Fast, efficient content extraction without visual interaction:
- High-level tools for quick content retrieval
- No visual feedback or screenshots
- No vision model required
- Optimized for speed and simplicity
- Best for web scraping, content aggregation, and API-like web interactions
ADVANCED Mode
Complex multi-step scenarios requiring visual interaction and coordinate-based control:
- Low-level coordinate-based tools
- Visual feedback with auto-screenshot support
- Vision model integration for visual understanding
- Multi-step navigation and interaction
- Best for form automation, cookie popups, anti-bot sites, and complex workflows
Creating a BrowserAgent
PRIMITIVE Mode
from marsys.agents import BrowserAgent, BrowserAgentModefrom marsys.models import ModelConfig# PRIMITIVE Mode - Fast content extractionbrowser_agent = await BrowserAgent.create_safe(model_config=ModelConfig(type="api",provider="openrouter",name="anthropic/claude-haiku-4.5",temperature=0.3),name="web_scraper",mode="primitive", # or BrowserAgentMode.PRIMITIVEgoal="Fast web scraping agent for content extraction",headless=True,tmp_dir="./tmp/browser")
ADVANCED Mode
# ADVANCED Mode - Visual interactionbrowser_agent = await BrowserAgent.create_safe(model_config=ModelConfig(type="api",provider="openrouter",name="anthropic/claude-haiku-4.5",temperature=0.3),name="web_navigator",mode="advanced", # or BrowserAgentMode.ADVANCEDgoal="Expert web automation for complex interactions",auto_screenshot=True, # Enable visual feedbackvision_model_config=ModelConfig(type="api",provider="openrouter",name="google/gemini-2.5-flash", # Fast and cost-effectivetemperature=0,thinking_budget=0),headless=False,tmp_dir="./tmp/screenshots")
Cleanup
Always clean up browser resources when done to avoid memory leaks. Use await browser_agent.cleanup() in a finally block.
Available Tools
PRIMITIVE Mode Tools
fetch_url- Navigate and extract content in one stepget_page_metadata- Get page title, URL, and linksdownload_file- Download files from URLsget_page_elements- Get interactive elements with selectors (token-efficient format)inspect_element- Get element details by selector (truncated text preview)
ADVANCED Mode Tools (Additional)
All PRIMITIVE mode tools, plus:
goto- Navigate to URL (auto-detects downloads)scroll_up/scroll_down- Scroll the pagemouse_click- Click at specific coordinates (auto-detects downloads)keyboard_input- Type text into focused input fields (search boxes, forms)keyboard_press- Press special keys (Enter, Tab, arrows, etc.) (auto-detects downloads)search_page- Find text on page with Chrome-like highlightinggo_back/reload- Navigation controlsget_url/get_title- Get current page informationscreenshot- Take screenshot with element highlighting (returns multimodal ToolResponse)inspect_at_position- Get element info at screen coordinates (x, y)list_tabs/get_active_tab/switch_to_tab/close_tab- Tab managementsave_session- Save browser session state for persistence
Usage Example
from marsys.coordination import Orchestra# Create browser agentbrowser_agent = await BrowserAgent.create_safe(model_config=config,name="scraper",mode="primitive",goal="Extract data from websites")try:# Use with Orchestratopology = {"agents": ["scraper"],"flows": []}result = await Orchestra.run(task="Go to example.com and extract all headings",topology=topology)finally:await browser_agent.cleanup()
Text Search on Page
search_page()
Find text on web pages with Chrome-like visual highlighting and navigation!
# Search for text on the current pageresult = await browser_tool.search_page("quantum computing")# Returns: "Match 1/5 found and highlighted"# All matches highlighted in YELLOW, current match in ORANGE# Navigate to next match - call again with SAME termresult = await browser_tool.search_page("quantum computing")# Returns: "Match 2/5"# Scrolls to and highlights next occurrence# Continue navigating (wraps around after last match)result = await browser_tool.search_page("quantum computing")# Returns: "Match 3/5"
Features:
- Visual Highlighting: All matches in YELLOW, current in ORANGE (Chrome-like)
- Auto-scroll: Automatically scrolls to current match (centered in viewport)
- Match Counter: Shows "Match X/Y" so you know your progress
- Wrap-around: After last match, returns to first match
- Case-insensitive: Finds text regardless of case
Limitations:
- ❌ Does NOT work with PDF files (PDFs are auto-downloaded, not displayed)
- ❌ Does NOT search across multiple pages
- ✅ Works with regular web pages, including shadow DOM content
Automatic Download Detection
Smart Download Handling
Actions that trigger file downloads are automatically detected and reported!
# Clicking a download link automatically detects the downloadresult = await browser_tool.mouse_click(x=450, y=300)# Returns: "Action 'mouse_click' triggered a file download.# File 'report.pdf' has been downloaded to: /path/to/tmp/downloads/report.pdf"# Navigating to a PDF URL triggers automatic downloadresult = await browser_tool.goto("https://example.com/paper.pdf")# Returns: "Action 'goto' triggered a file download.# File 'paper.pdf' has been downloaded to: /path/to/tmp/downloads/paper.pdf"# Pressing Enter on a download buttonawait browser_tool.mouse_click(x=500, y=400) # Focus download buttonawait browser_tool.keyboard_press("Enter")# Returns: "Action 'keyboard_press' triggered a file download.# File 'data.xlsx' has been downloaded to: /path/to/tmp/downloads/data.xlsx"
Automatic Detection Features:
- ✅ Detects downloads triggered by clicks, keyboard presses, or navigation
- ✅ Returns file path and filename in response
- ✅ Downloads saved to
./tmp/downloads/by default - ✅ PDFs are always downloaded (never displayed in browser)
- ✅ Works with all file types (PDF, Excel, CSV, images, etc.)
Session Persistence
Browser Session Persistence
BrowserAgent supports saving and loading browser sessions (cookies, localStorage) using Playwright's storage_state feature. This enables persistent authentication across browser sessions.
Loading a Saved Session
from marsys.agents import BrowserAgent# Create agent with existing session stateagent = await BrowserAgent.create_safe(model_config=config,name="AuthenticatedBrowser",mode="advanced",session_path="./sessions/linkedin_session.json", # Load existing sessionheadless=True)# Browser is now initialized with saved cookies and localStorage# Already logged in to LinkedIn, Google, etc.await agent.run("Go to linkedin.com/feed and extract posts")
Saving a Session
# After logging in manually or programmatically, save the sessionresult = await agent.browser_tool.save_session("./sessions/my_session.json")# Returns: "Session saved successfully to ./sessions/my_session.json. Saved 15 cookies and 3 origin storage entries."# The agent can also save sessions via tool callsresult = await agent.run("Save the current session to ./sessions/backup.json")
Choosing the Right Mode
Mode Selection
Use PRIMITIVE for simple scraping tasks where speed matters.
Use ADVANCED when you need to interact with forms, handle popups, or navigate complex multi-step workflows.