Browser Automation
MARSYS provides powerful browser automation capabilities through the BrowserAgent, enabling web scraping, interaction, and intelligent navigation for multi-agent workflows.
Overview
The browser automation system provides:
- Dual Operation Modes: PRIMITIVE for fast content extraction, ADVANCED for complex multi-step scenarios with visual interaction
- Web Navigation: Navigate, scrape, and interact with websites
- Intelligent Automation: LLM-guided browser control and decision making
- Dynamic Content Handling: JavaScript execution and async content loading
- Form Automation: Fill forms, click elements, and handle interactions
- Multimodal Capabilities: Screenshot-based visual understanding with element detection (ADVANCED mode)
- Robust Error Handling: Retry mechanisms and resilient operations
Operation Modes
BrowserAgent supports two distinct operation modes optimized for different use cases:
PRIMITIVE Mode
Purpose: Fast, efficient content extraction without visual interaction
Characteristics:
- High-level tools for quick content retrieval
- No visual feedback or screenshots
- No vision model required
- Optimized for speed and simplicity
- Single-step operations
Available Tools (5):
fetch_url- Navigate and extract content in one stepget_page_metadata- Get page title, URL, and linksdownload_file- Download files from URLslist_downloads- List files in the downloads directoryget_page_elements- Get interactive elements with selectors (token-efficient format)inspect_element- Get element details by selector (truncated text preview)
Best For: Web scraping and data extraction, content aggregation, simple information retrieval, API-like web interactions
ADVANCED Mode
Purpose: Complex multi-step scenarios requiring visual interaction and coordinate-based control
Characteristics:
- Low-level coordinate-based tools
- Visual feedback with auto-screenshot support
- Vision model integration for visual understanding
- Multi-step navigation and interaction
- Form filling and complex workflows
Available Tools (20+): All PRIMITIVE mode tools, plus:
goto- Navigate to URL (auto-detects downloads)scroll_up/scroll_down- Scroll the pagemouse_click- Click at specific coordinates (auto-detects downloads)keyboard_input- Type text into focused input fields (search boxes, forms)keyboard_press- Press special keys (Enter, Tab, arrows, etc.) (auto-detects downloads)search_page- Find text on page with Chrome-like highlightinggo_back- Navigate backreload- Reload current pageget_url/get_title- Get page informationscreenshot- Take screenshot with element highlighting (returns multimodal ToolResponse)inspect_element- Get element details by selector (truncated text preview)inspect_at_position- Get element info at screen coordinates (x, y)list_tabs/get_active_tab/switch_to_tab/close_tab- Tab managementsave_session- Save browser session state for persistence
Best For: Form automation with complex interactions, multi-step workflows requiring visual confirmation, handling cookie popups and modals, sites with anti-bot protections, tasks requiring precise element interaction
Choosing the Right Mode
from marsys.agents import BrowserAgent, BrowserAgentMode# Mode selection with enum (type-safe)browser_agent = await BrowserAgent.create_safe(model_config=config,name="scraper",mode=BrowserAgentMode.PRIMITIVE, # Using enumgoal="Efficiently fetch and extract content from web pages")# Mode selection with string (convenient)browser_agent = await BrowserAgent.create_safe(model_config=config,name="scraper",mode="primitive", # Using stringgoal="Efficiently fetch and extract content from web pages")# ADVANCED mode - Visual interactionbrowser_agent = await BrowserAgent.create_safe(model_config=config, # Main agent model (Claude Haiku/Sonnet recommended)name="navigator",mode=BrowserAgentMode.ADVANCED, # or mode="advanced"auto_screenshot=True, # Enable visual feedbackvision_model_config=ModelConfig( # Vision model for screenshot analysistype="api",provider="openrouter",name="google/gemini-3-flash-preview", # Recommended: fast and cost-effective# For complex tasks, use: "google/gemini-3-pro-preview"temperature=0,thinking_budget=0 # Disable thinking for faster vision responses),goal="Navigate and interact with web pages like a human")
BrowserAgent
Creating a BrowserAgent
from marsys.agents import BrowserAgentfrom marsys.models import ModelConfig# PRIMITIVE Mode - Fast content extractionbrowser_agent = await BrowserAgent.create_safe(model_config=ModelConfig(type="api",provider="openrouter",name="anthropic/claude-opus-4.6",temperature=0.3),name="web_scraper",mode="primitive", # Simple string mode selectiongoal="Fast web scraping agent for content extraction",headless=True,tmp_dir="./runs/run-20260206")# ADVANCED Mode - Visual interaction with auto-screenshotbrowser_agent_advanced = await BrowserAgent.create_safe(model_config=ModelConfig(type="api",provider="openrouter",name="anthropic/claude-opus-4.6", # Main agent for decision-making and planningtemperature=0.3),name="web_navigator",mode="advanced", # Simple string mode selectiongoal="Expert web automation agent for complex interactions",auto_screenshot=True, # Enable visual feedbackvision_model_config=ModelConfig( # Required for auto-screenshottype="api",provider="openrouter",name="google/gemini-3-flash-preview", # Recommended for browser visiontemperature=0,thinking_budget=0 # Disable thinking for faster vision responses),headless=False,tmp_dir="./runs/run-20260206")# Always clean uptry:# Use the agentresult = await browser_agent.run("Navigate to example.com and extract the main heading")finally:if browser_agent.browser_tool:await browser_agent.browser_tool.close()
Virtual paths: BrowserAgent returns virtual paths for artifacts such as ./downloads/report.pdf and ./screenshots/step_1.png. See Run Filesystem.
BrowserAgent Artifact Configuration
BrowserAgent.create_safe(...) supports explicit download path behavior and tool naming:
browser_agent = await BrowserAgent.create_safe(model_config=config,name="web_scraper",mode="primitive",tmp_dir="./runs/run-20260206",downloads_subdir="downloads", # Host folder under tmp_dirdownloads_virtual_dir="./downloads", # Path shown to the agentfetch_file_tool_name="fetch_file", # Expose download tool under custom name)
downloads_subdirchanges host-side layout undertmp_dir.downloads_virtual_dirchanges what agents see/return in tool outputs.fetch_file_tool_nameremaps the download tool name from the defaultdownload_file.
Viewport Auto-Detection
If viewport_width/viewport_height are not provided, BrowserAgent picks defaults by model family:
- Google/Gemini:
1000x1000 - Anthropic/Claude:
1344x896 - OpenAI/GPT:
1024x768 - Fallback:
1536x1536
Using AgentPool for Parallel Browsing
from marsys.agents import AgentPool# Create pool of browser agentsbrowser_pool = AgentPool(agent_class=BrowserAgent,num_instances=3,model_config=config,agent_name="BrowserPool",headless=True)# Parallel scrapingasync def scrape_urls(urls: List[str]):tasks = []for i, url in enumerate(urls):async with browser_pool.acquire(f"branch_{i}") as agent:task = agent.run(f"Scrape content from {url}")tasks.append(task)results = await asyncio.gather(*tasks)return results# Cleanup poolawait browser_pool.cleanup()
Text Search on Page
New Feature: search_page()
Find text on web pages with Chrome-like visual highlighting and navigation!
# Search for text on the current pageresult = await browser_tool.search_page("quantum computing")# Returns: "Match 1/5 found and highlighted"# All matches highlighted in YELLOW, current match in ORANGE# Navigate to next match - call again with SAME termresult = await browser_tool.search_page("quantum computing")# Returns: "Match 2/5"# Scrolls to and highlights next occurrence# Continue navigatingresult = await browser_tool.search_page("quantum computing")# Returns: "Match 3/5"# Wraps around after last match back to first
Features:
- Visual Highlighting: All matches in YELLOW, current in ORANGE (Chrome-like)
- Auto-scroll: Automatically scrolls to current match (centered in viewport)
- Match Counter: Shows "Match X/Y" so you know your progress
- Wrap-around: After last match, returns to first match
- Case-insensitive: Finds text regardless of case
Limitations:
- Does NOT work with PDF files (PDFs are auto-downloaded, not displayed)
- Does NOT search across multiple pages
- Works with regular web pages, including shadow DOM content
Example - Finding Specific Information:
# Navigate to documentation pageawait browser_tool.goto("https://docs.example.com/api")# Search for specific API endpointresult = await browser_tool.search_page("/api/v2/users")# Match 1/3 found - scrolls to first occurrence# Check if it's the right one with screenshotscreenshot = await browser_tool.screenshot()# Visual: See highlighted text in orange# Not the right one? Navigate to next matchresult = await browser_tool.search_page("/api/v2/users")# Match 2/3 - scrolls to second occurrence
Automatic Download Detection
Smart Download Handling
Actions that trigger file downloads are automatically detected and reported!
The browser automatically detects when actions (clicks, Enter key presses, navigation) trigger file downloads:
# Clicking a download link automatically detects the downloadresult = await browser_tool.mouse_click(x=450, y=300)# Returns: "Action 'mouse_click' triggered a file download.# File 'report.pdf' has been downloaded to: ./downloads/report.pdf"# Navigating to a PDF URL triggers automatic downloadresult = await browser_tool.goto("https://example.com/paper.pdf")# Returns: "Action 'goto' triggered a file download.# File 'paper.pdf' has been downloaded to: ./downloads/paper.pdf"# Pressing Enter on a download buttonawait browser_tool.mouse_click(x=500, y=400) # Focus download buttonawait browser_tool.keyboard_press("Enter")# Returns: "Action 'keyboard_press' triggered a file download.# File 'data.xlsx' has been downloaded to: ./downloads/data.xlsx"
Automatic Detection Features:
- Detects downloads triggered by clicks, keyboard presses, or navigation
- Returns file path and filename in response
- Downloads saved under virtual
./downloads(host default:./tmp/downloads) - PDFs are always downloaded (never displayed in browser)
- Works with all file types (PDF, Excel, CSV, images, etc.)
download_file itself uses a dual strategy:
- Primary: Playwright request context (inherits browser cookies/session)
- Fallback: browser navigation + download-event detection
Listing Downloads:
# List all files in the downloads directorydownloads = await browser_tool.list_downloads()# Returns a formatted list with sizes and paths
PDF-Specific Behavior:
# PDFs are NEVER displayed in browser - always downloadedawait browser_tool.goto("https://research.org/paper.pdf")# Automatically downloads to ./downloads/paper.pdf# Browser stays on previous page# search_page() does NOT work with PDFs# Instead, use file operation tools on the downloaded file
Download Path Configuration:
browser_tool = await BrowserTool.create_safe(downloads_path="/custom/path/downloads", # Custom host download directorytemp_dir="/custom/tmp", # Custom temp directory (default: ./tmp)downloads_virtual_dir="./downloads", # Virtual path returned to agents)
Session Persistence
Browser Session Persistence
BrowserAgent supports saving and loading browser sessions (cookies, localStorage) using Playwright's storage_state feature. This enables persistent authentication across browser sessions.
Loading a Saved Session
from marsys.agents import BrowserAgent# Create agent with existing session stateagent = await BrowserAgent.create_safe(model_config=config,name="AuthenticatedBrowser",mode="advanced",session_path="./sessions/linkedin_session.json", # Load existing sessionheadless=True)# Browser is now initialized with saved cookies and localStorage# Already logged in to LinkedIn, Google, etc.await agent.run("Go to linkedin.com/feed and extract posts")
Saving a Session
# Save via BrowserAgent tool invocationresult = await agent.run("Save the current session to ./sessions/my_session.json")# Returns a success message with cookie/origin counts# You can save additional checkpoints as neededresult = await agent.run("Save the current session to ./sessions/backup.json")
Session File Format
The session file is a JSON file compatible with Playwright's storage_state:
{"cookies": [{"name": "session_id","value": "abc123","domain": ".example.com","path": "/","expires": 1735689600,"httpOnly": true,"secure": true}],"origins": [{"origin": "https://example.com","localStorage": [{"name": "user_token", "value": "xyz789"}]}]}
Error Handling
Resilient Operations
class ResilientBrowserAgent(BrowserAgent):"""Browser agent with enhanced error handling."""async def retry_operation(self,operation: Callable,max_retries: int = 3,backoff_factor: float = 2.0,context = None):"""Execute operation with exponential backoff retry."""last_error = Nonewait_time = 1.0for attempt in range(max_retries):try:result = await operation()if attempt > 0:await self._log_progress(context, LogLevel.INFO,f"Operation succeeded on attempt {attempt + 1}")return resultexcept Exception as e:last_error = eawait self._log_progress(context, LogLevel.WARNING,f"Attempt {attempt + 1} failed: {e}")if attempt < max_retries - 1:await asyncio.sleep(wait_time)wait_time *= backoff_factorraise Exception(f"Operation failed after {max_retries} attempts: {last_error}")async def safe_extract(self,selector: str,default: Any = None,context = None):"""Safely extract element with fallback."""try:text = await self.browser_tool.get_text(selector)if text:return text.strip()except Exception as e:await self._log_progress(context, LogLevel.DEBUG,f"Failed to extract {selector}: {e}")return default
Performance Optimization
Resource Blocking
class OptimizedBrowserAgent(BrowserAgent):"""Optimized browser agent for faster scraping."""async def setup_fast_scraping(self, context = None):"""Configure browser for fast text scraping."""# Block unnecessary resourcesawait self.browser_tool.context.route("**/*", lambda route:route.abort() if route.request.resource_type in["image", "stylesheet", "font", "media"]else route.continue_())# Disable JavaScript if not neededawait self.browser_tool.context.set_javascript_enabled(False)await self._log_progress(context, LogLevel.INFO,"Optimized browser for fast scraping")
Best Practices
1. Explicit Waits
# GOOD - Wait for specific conditionsawait browser_tool.wait_for_selector("#content", timeout=10000, state="visible")await browser_tool.wait_for_navigation()# BAD - Fixed delaysawait asyncio.sleep(5) # Unreliable and slow
2. Robust Selectors
# GOOD - Specific, stable selectorsawait browser_tool.click("[data-testid='submit-button']")await browser_tool.click("#unique-id")# BAD - Fragile selectorsawait browser_tool.click("div > span:nth-child(3)")
3. Resource Management
# GOOD - Always cleanupbrowser_agent = await BrowserAgent.create_safe(model_config=config,name="CleanupExample",mode="advanced",headless=True,)try:# Use agentresult = await browser_agent.run(task)finally:await browser_agent.browser_tool.close()# BAD - Leaving browsers openbrowser_agent = await BrowserAgent.create_safe(model_config=config,name="LeakyBrowser",mode="advanced",headless=True,)result = await browser_agent.run(task)# Browser left running!
4. Error Context
# GOOD - Detailed error contexttry:await browser_tool.click(selector)except Exception as e:await self._log_progress(context, LogLevel.ERROR,f"Failed to click {selector} on {await browser_tool.get_url()}: {e}")# Take screenshot for debuggingawait browser_tool.screenshot("error_screenshot.png")# BAD - Generic error handlingtry:await browser_tool.click(selector)except:print("Click failed")
Browser Automation Ready!
You now understand browser automation in MARSYS. The BrowserAgent provides powerful web interaction capabilities for your multi-agent workflows.