Browser Automation
MARSYS provides powerful browser automation capabilities through the BrowserAgent, enabling web scraping, interaction, and intelligent navigation for multi-agent workflows.
Overview
The browser automation system provides:
- Dual Operation Modes: PRIMITIVE for fast content extraction, ADVANCED for complex multi-step scenarios with visual interaction
- Web Navigation: Navigate, scrape, and interact with websites
- Intelligent Automation: LLM-guided browser control and decision making
- Dynamic Content Handling: JavaScript execution and async content loading
- Form Automation: Fill forms, click elements, and handle interactions
- Multimodal Capabilities: Screenshot-based visual understanding with element detection (ADVANCED mode)
- Robust Error Handling: Retry mechanisms and resilient operations
Architecture
The browser automation system is organized into three layers:
- BrowserAgent (High-level Interface) — provides the agent-facing API for navigation, interaction, extraction, and screenshots.
- BrowserTool (Low-level Operations) — wraps individual browser actions and exposes them as callable tools.
- Playwright (Browser Control) — drives the actual browser instance.
The execution flow follows: Agent Logic → Plan Actions → Execute Tools → Validate Results.
Operation Modes
BrowserAgent supports two distinct operation modes optimized for different use cases:
PRIMITIVE Mode
Purpose: Fast, efficient content extraction without visual interaction
Characteristics:
- High-level tools for quick content retrieval
- No visual feedback or screenshots
- No vision model required
- Optimized for speed and simplicity
- Single-step operations
Available Tools (5):
fetch_url- Navigate and extract content in one stepget_page_metadata- Get page title, URL, and linksdownload_file- Download files from URLslist_downloads- List files in the downloads directoryget_page_elements- Get interactive elements with selectors (token-efficient format)inspect_element- Get element details by selector (truncated text preview)
Best For:
- Web scraping and data extraction
- Content aggregation
- Simple information retrieval
- API-like web interactions
ADVANCED Mode
Purpose: Complex multi-step scenarios requiring visual interaction and coordinate-based control
Characteristics:
- Low-level coordinate-based tools
- Visual feedback with auto-screenshot support
- Vision model integration for visual understanding
- Multi-step navigation and interaction
- Form filling and complex workflows
Available Tools (20+): All PRIMITIVE mode tools, plus:
goto- Navigate to URL (auto-detects downloads)scroll_up/scroll_down- Scroll the pagemouse_click- Click at specific coordinates (auto-detects downloads)keyboard_input- Type text into focused input fields (search boxes, forms)keyboard_press- Press special keys (Enter, Tab, arrows, etc.) (auto-detects downloads)search_page- Find text on page with Chrome-like highlightinggo_back- Navigate backreload- Reload current pageget_url/get_title- Get page informationscreenshot- Take screenshot with element highlighting (returns multimodal ToolResponse)inspect_element- Get element details by selector (truncated text preview)inspect_at_position- Get element info at screen coordinates (x, y)list_tabs- List all open browser tabsget_active_tab- Get currently active tab infoswitch_to_tab- Switch to a specific tab by indexclose_tab- Close a tab by indexsave_session- Save browser session state for persistence
Best For:
- Form automation with complex interactions
- Multi-step workflows requiring visual confirmation
- Handling cookie popups and modals
- Sites with anti-bot protections
- Tasks requiring precise element interaction
Choosing the Right Mode
from marsys.agents import BrowserAgent, BrowserAgentMode# Mode selection with enum (type-safe)browser_agent = await BrowserAgent.create_safe(model_config=config,name="scraper",mode=BrowserAgentMode.PRIMITIVE, # Using enumgoal="Efficiently fetch and extract content from web pages")# Mode selection with string (convenient)browser_agent = await BrowserAgent.create_safe(model_config=config,name="scraper",mode="primitive", # Using stringgoal="Efficiently fetch and extract content from web pages")# ADVANCED mode - Visual interactionbrowser_agent = await BrowserAgent.create_safe(model_config=config, # Main agent model (Claude Haiku/Sonnet recommended)name="navigator",mode=BrowserAgentMode.ADVANCED, # or mode="advanced"auto_screenshot=True, # Enable visual feedbackvision_model_config=ModelConfig( # Vision model for screenshot analysistype="api",provider="openrouter",name="google/gemini-3-flash-preview", # Recommended: fast and cost-effective# For complex tasks, use: "google/gemini-3-pro-preview"temperature=0,thinking_budget=0 # Disable thinking for faster vision responses),goal="Navigate and interact with web pages like a human")
BrowserAgent
Creating a BrowserAgent
from marsys.agents import BrowserAgentfrom marsys.models import ModelConfig# PRIMITIVE Mode - Fast content extractionbrowser_agent = await BrowserAgent.create_safe(model_config=ModelConfig(type="api",provider="openrouter",name="anthropic/claude-opus-4.6",temperature=0.3),name="web_scraper",mode="primitive", # Simple string mode selectiongoal="Fast web scraping agent for content extraction",headless=True,tmp_dir="./runs/run-20260206")# ADVANCED Mode - Visual interaction with auto-screenshotbrowser_agent_advanced = await BrowserAgent.create_safe(model_config=ModelConfig(type="api",provider="openrouter",name="anthropic/claude-opus-4.6", # Main agent for decision-making and planningtemperature=0.3),name="web_navigator",mode="advanced", # Simple string mode selectiongoal="Expert web automation agent for complex interactions",auto_screenshot=True, # Enable visual feedbackvision_model_config=ModelConfig( # Required for auto-screenshottype="api",provider="openrouter",name="google/gemini-3-flash-preview", # Recommended: fast and cost-effective for browser vision# For complex tasks, use: "google/gemini-3-pro-preview"temperature=0,thinking_budget=0 # Disable thinking for faster vision responses),headless=False,tmp_dir="./runs/run-20260206")# Always clean uptry:# Use the agentresult = await browser_agent.run("Navigate to example.com and extract the main heading")finally:if browser_agent.browser_tool:await browser_agent.browser_tool.close()
Virtual paths: BrowserAgent returns virtual paths for artifacts such as ./downloads/report.pdf and ./screenshots/step_1.png. See Run Filesystem.
BrowserAgent Artifact Configuration
BrowserAgent.create_safe(...) supports explicit download path behavior and tool naming:
browser_agent = await BrowserAgent.create_safe(model_config=config,name="web_scraper",mode="primitive",tmp_dir="./runs/run-20260206",downloads_subdir="downloads", # Host folder under tmp_dirdownloads_virtual_dir="./downloads", # Path shown to the agentfetch_file_tool_name="fetch_file", # Expose download tool under custom name)
Notes:
downloads_subdirchanges host-side layout undertmp_dir.downloads_virtual_dirchanges what agents see/return in tool outputs.fetch_file_tool_nameremaps the download tool name from the defaultdownload_file.
Viewport Auto-Detection
If viewport_width/viewport_height are not provided, BrowserAgent picks defaults by model family:
- Google/Gemini:
1000x1000 - Anthropic/Claude:
1344x896 - OpenAI/GPT:
1024x768 - Fallback:
1536x1536
Using AgentPool for Parallel Browsing
from marsys.agents import AgentPool# Create pool of browser agentsbrowser_pool = AgentPool(agent_class=BrowserAgent,num_instances=3,model_config=config,agent_name="BrowserPool",headless=True)# Parallel scrapingasync def scrape_urls(urls: List[str]):tasks = []for i, url in enumerate(urls):async with browser_pool.acquire(f"branch_{i}") as agent:task = agent.run(f"Scrape content from {url}")tasks.append(task)results = await asyncio.gather(*tasks)return results# Cleanup poolawait browser_pool.cleanup()
Browser Tools
Tool Overview by Mode
PRIMITIVE Mode Tools (Fast content extraction):
fetch_url- Navigate and extract content in one step (returns Dict with markdown/text)get_page_metadata- Get title, URL, and links quicklydownload_file- Download files from URLsinspect_element- Get element details by selector
ADVANCED Mode Additional Tools (Visual interaction):
goto,go_back,reload- Navigation controlscroll_up,scroll_down- Page scrollingmouse_click- Click at coordinateskeyboard_input- Type text into focused input fields (search boxes, forms)keyboard_press- Press special keys (Enter, Tab, Escape, arrows, etc.)search_page- Search for text on page with visual highlighting (Chrome-like find)screenshot- Multimodal response with numbered element detection (ToolResponse format)get_url,get_title- Current page informationlist_tabs,get_active_tab,switch_to_tab,close_tab- Tab managementsave_session- Save browser session state for persistenceinspect_at_position- Get element info at screen coordinates (x, y)
Navigation Tools
class NavigationAgent(BrowserAgent):"""Agent with navigation capabilities."""async def navigate_with_history(self, urls: List[str], context):"""Navigate through multiple pages with history."""for url in urls:await self.browser_tool.goto(url)await self._log_progress(context, LogLevel.INFO, f"Navigated to {url}")# Wait for page to loadawait self.browser_tool.wait_for_navigation()# Take screenshot for debuggingawait self.browser_tool.screenshot(filename=f"{url.replace('/', '_')}.png")# Navigate back through historyfor _ in range(len(urls) - 1):await self.browser_tool.go_back()current = await self.browser_tool.get_url()await self._log_progress(context, LogLevel.INFO, f"Back to {current}")
Interaction Tools
class InteractionAgent(BrowserAgent):"""Agent for web interactions."""async def smart_form_fill(self, form_data: Dict, context):"""Intelligently fill forms based on field types."""for field_name, value in form_data.items():# Try different selector strategiesselectors = [f"input[name='{field_name}']",f"input[id='{field_name}']",f"textarea[name='{field_name}']",f"select[name='{field_name}']"]for selector in selectors:try:# Determine field type and fill appropriatelyif selector.startswith("select"):await self.browser_tool.select_option(selector, str(value))elif isinstance(value, bool):if value: # Check if should be checkedawait self.browser_tool.click(selector)else:await self.browser_tool.fill(selector, str(value))await self._log_progress(context, LogLevel.DEBUG,f"Filled {field_name} with {value}")breakexcept Exception:continueasync def smart_click(self, text: str, context, element_type: str = "button"):"""Click element by text content."""# XPath to find element by textxpath = f"//{element_type}[contains(text(), '{text}')]"try:await self.browser_tool.wait_for_selector(xpath, timeout=5000, state="visible")await self.browser_tool.click(xpath)await self._log_progress(context, LogLevel.INFO, f"Clicked '{text}' {element_type}")except Exception as e:# Fallback to JavaScript clickscript = f"""Array.from(document.querySelectorAll('{element_type}')).find(el => el.textContent.includes('{text}'))?.click()"""await self.browser_tool.evaluate_javascript(script)
Text Search on Page
New Feature: search_page()
Find text on web pages with Chrome-like visual highlighting and navigation!
# Search for text on the current pageresult = await browser_tool.search_page("quantum computing")# Returns: "Match 1/5 found and highlighted"# All matches highlighted in YELLOW, current match in ORANGE# Navigate to next match - call again with SAME termresult = await browser_tool.search_page("quantum computing")# Returns: "Match 2/5"# Scrolls to and highlights next occurrence# Continue navigatingresult = await browser_tool.search_page("quantum computing")# Returns: "Match 3/5"# Wraps around after last match back to first
Features:
- Visual Highlighting: All matches in YELLOW, current in ORANGE (Chrome-like)
- Auto-scroll: Automatically scrolls to current match (centered in viewport)
- Match Counter: Shows "Match X/Y" so you know your progress
- Wrap-around: After last match, returns to first match
- Case-insensitive: Finds text regardless of case
Limitations:
- Does NOT work with PDF files (PDFs are auto-downloaded, not displayed)
- Does NOT search across multiple pages
- Works with regular web pages, including shadow DOM content
Example - Finding Specific Information:
# Navigate to documentation pageawait browser_tool.goto("https://docs.example.com/api")# Search for specific API endpointresult = await browser_tool.search_page("/api/v2/users")# Match 1/3 found - scrolls to first occurrence# Check if it's the right one with screenshotscreenshot = await browser_tool.screenshot()# Visual: See highlighted text in orange# Not the right one? Navigate to next matchresult = await browser_tool.search_page("/api/v2/users")# Match 2/3 - scrolls to second occurrence
Automatic Download Detection
Smart Download Handling
Actions that trigger file downloads are automatically detected and reported!
The browser automatically detects when actions (clicks, Enter key presses, navigation) trigger file downloads:
# Clicking a download link automatically detects the downloadresult = await browser_tool.mouse_click(x=450, y=300)# Returns: "Action 'mouse_click' triggered a file download.# File 'report.pdf' has been downloaded to: ./downloads/report.pdf"# Navigating to a PDF URL triggers automatic downloadresult = await browser_tool.goto("https://example.com/paper.pdf")# Returns: "Action 'goto' triggered a file download.# File 'paper.pdf' has been downloaded to: ./downloads/paper.pdf"# Pressing Enter on a download buttonawait browser_tool.mouse_click(x=500, y=400) # Focus download buttonawait browser_tool.keyboard_press("Enter")# Returns: "Action 'keyboard_press' triggered a file download.# File 'data.xlsx' has been downloaded to: ./downloads/data.xlsx"
Automatic Detection Features:
- Detects downloads triggered by clicks, keyboard presses, or navigation
- Returns file path and filename in response
- Downloads saved under virtual
./downloads(host default:./tmp/downloads) - PDFs are always downloaded (never displayed in browser)
- Works with all file types (PDF, Excel, CSV, images, etc.)
download_file itself uses a dual strategy:
- Primary: Playwright request context (inherits browser cookies/session)
- Fallback: browser navigation + download-event detection
If no file is detected but the page loads, it returns a message like "No downloadable file detected from URL..." with the loaded URL.
Listing Downloads:
# List all files in the downloads directorydownloads = await browser_tool.list_downloads()# Returns a formatted list with sizes and paths
PDF-Specific Behavior:
# PDFs are NEVER displayed in browser - always downloadedawait browser_tool.goto("https://research.org/paper.pdf")# Automatically downloads to ./downloads/paper.pdf# Browser stays on previous page# search_page() does NOT work with PDFs# Instead, use file operation tools on the downloaded file
Download Path Configuration:
browser_tool = await BrowserTool.create_safe(downloads_path="/custom/path/downloads", # Custom host download directorytemp_dir="/custom/tmp", # Custom temp directory (default: ./tmp)downloads_virtual_dir="./downloads", # Virtual path returned to agents)
Data Extraction
class ScraperAgent(BrowserAgent):"""Advanced web scraping agent."""async def extract_structured_data(self, url: str, schema: Dict, context):"""Extract data according to schema."""await self.browser_tool.goto(url)await self.browser_tool.wait_for_navigation()# Extract based on schemaextracted_data = {}for field_name, config in schema.items():selector = config.get('selector')attribute = config.get('attribute')multiple = config.get('multiple', False)try:if multiple:# Extract from multiple elements via JSif attribute:script = f"""Array.from(document.querySelectorAll({selector!r})).map(el => el.getAttribute({attribute!r}))"""else:script = f"""Array.from(document.querySelectorAll({selector!r})).map(el => (el.textContent || '').trim())"""extracted_data[field_name] = await self.browser_tool.evaluate_javascript(script)else:# Extract from single elementif attribute:value = await self.browser_tool.get_attribute(selector, attribute)else:value = await self.browser_tool.get_text(selector)extracted_data[field_name] = valueexcept Exception as e:await self._log_progress(context, LogLevel.WARNING,f"Failed to extract {field_name}: {e}")extracted_data[field_name] = Nonereturn extracted_dataasync def extract_table_data(self, table_selector: str, context):"""Extract data from HTML tables."""script = f"""() => {{const table = document.querySelector('{table_selector}');if (!table) return null;const headers = Array.from(table.querySelectorAll('th')).map(th => th.textContent.trim());const rows = Array.from(table.querySelectorAll('tbody tr')).map(tr => {{const cells = Array.from(tr.querySelectorAll('td'));const rowData = {{}};cells.forEach((td, i) => {{rowData[headers[i] || `col_${i}`] = td.textContent.trim();}});return rowData;}});return {{headers, rows}};}}"""return await self.browser_tool.evaluate_javascript(script)
Advanced Patterns
Pagination Handling
class PaginationAgent(BrowserAgent):"""Handle paginated content."""async def scrape_all_pages(self,start_url: str,item_selector: str,next_button_selector: str,max_pages: int = 10,context = None):"""Scrape data across multiple pages."""all_items = []current_page = 1await self.browser_tool.goto(start_url)while current_page <= max_pages:# Wait for items to loadawait self.browser_tool.wait_for_selector(item_selector, timeout=10000, state="visible")# Extract items from current pageitems = await self.browser_tool.evaluate_javascript(f"""Array.from(document.querySelectorAll('{item_selector}')).map(el => el.textContent.trim())""")all_items.extend(items)await self._log_progress(context, LogLevel.INFO,f"Page {current_page}: Extracted {len(items)} items")# Check for next pagetry:await self.browser_tool.wait_for_selector(next_button_selector, timeout=2000, state="visible")await self.browser_tool.click(next_button_selector)await self.browser_tool.wait_for_navigation()current_page += 1except Exception:breakreturn all_items
Dynamic Content Loading
class DynamicContentAgent(BrowserAgent):"""Handle JavaScript-heavy sites."""async def wait_for_ajax_content(self,content_selector: str,timeout: int = 30,context = None):"""Wait for AJAX content to load."""# Wait for a selector that indicates content has loadedawait self.browser_tool.wait_for_selector(content_selector, timeout=timeout * 1000, state="visible")async def infinite_scroll_scrape(self,item_selector: str,target_count: int,context = None):"""Handle infinite scroll patterns."""items_found = 0no_new_items_count = 0max_no_new = 3while items_found < target_count:# Count current itemscurrent_items = await self.browser_tool.evaluate_javascript(f"document.querySelectorAll('{item_selector}').length")if current_items == items_found:no_new_items_count += 1if no_new_items_count >= max_no_new:breakelse:no_new_items_count = 0items_found = current_items# Scroll to bottomawait self.browser_tool.evaluate_javascript("window.scrollTo(0, document.body.scrollHeight)")# Wait for potential new contentawait asyncio.sleep(2)await self._log_progress(context, LogLevel.DEBUG,f"Found {items_found} items, target: {target_count}")# Extract all itemsreturn await self.browser_tool.evaluate_javascript(f"""Array.from(document.querySelectorAll('{item_selector}')).map(el => el.textContent.trim())""")
Session Persistence
Browser Session Persistence
BrowserAgent supports saving and loading browser sessions (cookies, localStorage) using Playwright's storage_state feature. This enables persistent authentication across browser sessions.
Loading a Saved Session
from marsys.agents import BrowserAgent# Create agent with existing session stateagent = await BrowserAgent.create_safe(model_config=config,name="AuthenticatedBrowser",mode="advanced",session_path="./sessions/linkedin_session.json", # Load existing sessionheadless=True)# Browser is now initialized with saved cookies and localStorage# Already logged in to LinkedIn, Google, etc.await agent.run("Go to linkedin.com/feed and extract posts")
Saving a Session
# Save via BrowserAgent tool invocationresult = await agent.run("Save the current session to ./sessions/my_session.json")# Returns a success message with cookie/origin counts# You can save additional checkpoints as neededresult = await agent.run("Save the current session to ./sessions/backup.json")
Session File Format
The session file is a JSON file compatible with Playwright's storage_state:
{"cookies": [{"name": "session_id","value": "abc123","domain": ".example.com","path": "/","expires": 1735689600,"httpOnly": true,"secure": true}],"origins": [{"origin": "https://example.com","localStorage": [{"name": "user_token", "value": "xyz789"}]}]}
Authentication Handling
class AuthAgent(BrowserAgent):"""Handle authentication flows."""async def login_with_cookies(self,login_url: str,cookies: List[Dict],context = None):"""Login using saved cookies."""# Navigate to siteawait self.browser_tool.goto(login_url)# Set cookiesfor cookie in cookies:await self.browser_tool.context.add_cookies([cookie])# Refresh to apply cookiesawait self.browser_tool.reload()# Verify login successreturn await self.verify_login_status(context)async def handle_2fa(self,code_input_selector: str,get_2fa_code: Callable,context = None):"""Handle two-factor authentication."""# Wait for 2FA inputawait self.browser_tool.wait_for_selector(code_input_selector, timeout=30000, state="visible")# Get 2FA code (from email, SMS, authenticator, etc.)code = await get_2fa_code()# Enter codeawait self.browser_tool.fill(code_input_selector, code)# Submit (usually auto-submits, but can click submit if needed)await self.browser_tool.press_key("Enter")# Wait for redirect after successful 2FAawait self.browser_tool.wait_for_navigation()
Error Handling
Resilient Operations
class ResilientBrowserAgent(BrowserAgent):"""Browser agent with enhanced error handling."""async def retry_operation(self,operation: Callable,max_retries: int = 3,backoff_factor: float = 2.0,context = None):"""Execute operation with exponential backoff retry."""last_error = Nonewait_time = 1.0for attempt in range(max_retries):try:result = await operation()if attempt > 0:await self._log_progress(context, LogLevel.INFO,f"Operation succeeded on attempt {attempt + 1}")return resultexcept Exception as e:last_error = eawait self._log_progress(context, LogLevel.WARNING,f"Attempt {attempt + 1} failed: {e}")if attempt < max_retries - 1:await asyncio.sleep(wait_time)wait_time *= backoff_factorraise Exception(f"Operation failed after {max_retries} attempts: {last_error}")async def safe_extract(self,selector: str,default: Any = None,context = None):"""Safely extract element with fallback."""try:text = await self.browser_tool.get_text(selector)if text:return text.strip()except Exception as e:await self._log_progress(context, LogLevel.DEBUG,f"Failed to extract {selector}: {e}")return default
Performance Optimization
Resource Blocking
class OptimizedBrowserAgent(BrowserAgent):"""Optimized browser agent for faster scraping."""async def setup_fast_scraping(self, context = None):"""Configure browser for fast text scraping."""# Block unnecessary resourcesawait self.browser_tool.context.route("**/*", lambda route:route.abort() if route.request.resource_type in["image", "stylesheet", "font", "media"]else route.continue_())# Disable JavaScript if not neededawait self.browser_tool.context.set_javascript_enabled(False)await self._log_progress(context, LogLevel.INFO,"Optimized browser for fast scraping")async def parallel_scrape(self,urls: List[str],extractor: Callable,max_concurrent: int = 5,context = None):"""Scrape multiple URLs in parallel."""semaphore = asyncio.Semaphore(max_concurrent)async def scrape_with_limit(url):async with semaphore:try:await self.browser_tool.goto(url)return await extractor(self.browser_tool)except Exception as e:await self._log_progress(context, LogLevel.ERROR,f"Failed to scrape {url}: {e}")return Nonetasks = [scrape_with_limit(url) for url in urls]results = await asyncio.gather(*tasks)return [r for r in results if r is not None]
Best Practices
1. Explicit Waits
# GOOD - Wait for specific conditionsawait browser_tool.wait_for_selector("#content", timeout=10000, state="visible")await browser_tool.wait_for_navigation()# BAD - Fixed delaysawait asyncio.sleep(5) # Unreliable and slow
2. Robust Selectors
# GOOD - Specific, stable selectorsawait browser_tool.click("[data-testid='submit-button']")await browser_tool.click("#unique-id")# BAD - Fragile selectorsawait browser_tool.click("div > span:nth-child(3)")
3. Resource Management
# GOOD - Always cleanupbrowser_agent = await BrowserAgent.create_safe(model_config=config,name="CleanupExample",mode="advanced",headless=True,)try:# Use agentresult = await browser_agent.run(task)finally:await browser_agent.browser_tool.close()# BAD - Leaving browsers openbrowser_agent = await BrowserAgent.create_safe(model_config=config,name="LeakyBrowser",mode="advanced",headless=True,)result = await browser_agent.run(task)# Browser left running!
4. Error Context
# GOOD - Detailed error contexttry:await browser_tool.click(selector)except Exception as e:await self._log_progress(context, LogLevel.ERROR,f"Failed to click {selector} on {await browser_tool.get_url()}: {e}")# Take screenshot for debuggingawait browser_tool.screenshot("error_screenshot.png")# BAD - Generic error handlingtry:await browser_tool.click(selector)except:print("Click failed")
Next Steps
Agents
Learn about the agent system
Tools
Explore available tools
Testing Guide
Test browser automation
API Reference
Complete browser API
Browser Automation Ready!
You now understand browser automation in MARSYS. The BrowserAgent provides powerful web interaction capabilities for your multi-agent workflows.