Learning Agents

Create agents that use local models with optional PEFT (Parameter-Efficient Fine-Tuning) capabilities for custom model behavior.

Under Active Development

LearnableAgent is currently under active development. The API may change in future releases.

The current implementation provides foundational capabilities only: you can create agents with PEFT (LoRA) adapters and access model weights/tokenizers for external training pipelines.

Future updates will add integrated training components including:

SFT Trainer: Supervised fine-tuning on instruction datasets
DPO/RLHF Trainer: Preference alignment training
Workflow Trainer: Train agents on multi-agent conversation traces

To fully leverage LearnableAgent for training, you currently need to implement your own training loop using the exposed trainable_model and tokenizer.

Overview

Learning agents in MARSYS are designed for:

Local Model Execution: Run open-source models (Qwen, LLaMA, Mistral, etc.) locally
PEFT Support: Attach learning heads like LoRA for fine-tuning
Weight Access: Direct access to model weights and tokenizer for training

Requirements

LearnableAgent requires:

Local GPU/compute resources
The marsys[local-models] package
HuggingFace backend only (vLLM does not support training)

pip install marsys[local-models]

Local-Only Restriction

LearnableAgent only supports local models (type="local") with the HuggingFace backend. Using backend="vllm" will raise a TypeError since vLLM does not support training.

LearnableAgent

Concrete implementation for local models with optional PEFT:

from marsys.agents import LearnableAgent
from marsys.models import ModelConfig

# Configure local model (HuggingFace only)
model_config = ModelConfig(
    type="local",
    model_class="llm",
    name="Qwen/Qwen3-4B-Instruct-2507",
    backend="huggingface",  # Required for training
    torch_dtype="bfloat16",
    device_map="auto",
    max_tokens=4096
)

agent = LearnableAgent(
    model_config=model_config,
    name="MyLearnableAgent",
    goal="A helpful assistant that answers questions",
    instruction="You are a helpful assistant. Provide clear and accurate responses to user queries.",
    tools={"search": search_function},
    learning_head="peft",
    learning_head_config={
        "r": 16,          # LoRA rank
        "lora_alpha": 32,
        "target_modules": ["q_proj", "v_proj"]
    }
)

Key Distinction: Unlike Agent which uses API-based models (OpenAI, Anthropic), LearnableAgent works with local models where you have direct weight access.

PEFT Configuration

When using learning_head="peft", provide configuration for the PEFT head:

from marsys.agents import LearnableAgent
from marsys.models import ModelConfig

learning_head_config = {
    "r": 16,                           # LoRA rank
    "lora_alpha": 32,                  # LoRA alpha scaling
    "target_modules": ["q_proj", "v_proj"],  # Modules to adapt
    "lora_dropout": 0.1,               # Dropout rate
    "bias": "none"                     # Bias training setting
}

model_config = ModelConfig(
    type="local",
    model_class="llm",
    name="Qwen/Qwen3-4B-Instruct-2507",
    backend="huggingface",
    torch_dtype="bfloat16",
    device_map="auto"
)

agent = LearnableAgent(
    model_config=model_config,
    name="ExpertCoder",
    goal="Expert coding assistant for development tasks",
    instruction="You are an expert coder. Help with code generation, debugging, and optimization.",
    learning_head="peft",
    learning_head_config=learning_head_config
)

The model is wrapped in a PeftHead which handles the LoRA adaptation.

Usage Example

from marsys.agents import LearnableAgent
from marsys.models import ModelConfig
from marsys.coordination import Orchestra

# Configure local model
model_config = ModelConfig(
    type="local",
    model_class="llm",
    name="Qwen/Qwen3-4B-Instruct-2507",
    backend="huggingface",
    torch_dtype="bfloat16",
    device_map="auto",
    max_tokens=4096
)

# Create agent with PEFT
agent = LearnableAgent(
    model_config=model_config,
    name="CodeReviewer",
    goal="Expert code reviewer for quality assurance",
    instruction="You are an expert code reviewer. Analyze code for bugs, security issues, and best practices.",
    learning_head="peft",
    learning_head_config={
        "r": 8,
        "lora_alpha": 16,
        "target_modules": ["q_proj", "v_proj"]
    }
)

# Use in a topology
topology = {
    "agents": ["CodeReviewer"],
    "flows": []
}

result = await Orchestra.run(
    task="Review this Python code for bugs",
    topology=topology
)

Training Access

LearnableAgent provides access to the underlying PyTorch model and tokenizer for training:

# Access model internals for training
pytorch_model = agent.model.trainable_model  # PEFT-wrapped model
tokenizer = agent.model.tokenizer            # HuggingFace tokenizer
base_model = agent.model.base_model          # Original model (pre-PEFT)

# Example: Use with trl for RLHF
from trl import PPOTrainer, PPOConfig

ppo_config = PPOConfig(
    learning_rate=1e-5,
    batch_size=4
)

trainer = PPOTrainer(
    config=ppo_config,
    model=agent.model.trainable_model,
    tokenizer=agent.model.tokenizer,
    # ... training data and reward model
)

PeftHead Properties

When learning_head="peft" is used, the agent's model is wrapped in a PeftHead that provides:

trainable_model: The PEFT-wrapped model for training
base_model: The original HuggingFace model
tokenizer: The model's tokenizer
save_pretrained(path): Save the PEFT adapter weights

When to Use LearnableAgent

Use LearnableAgent when you need:

Custom model behavior through training
Local GPU/compute resources
Open-source models (Qwen, LLaMA, Mistral, Phi, etc.)
Full control over model architecture
Fine-tuning for specific workflows
Direct access to model weights and tokenizer

Use Agent (with API models) when you need:

Quick setup without GPU requirements
Latest model capabilities (GPT-5, Claude Sonnet 4.5)
Pay-per-use pricing
No infrastructure management

Limitations

The current implementation:

Only supports "peft" as the learning head type
Requires HuggingFace backend (backend="huggingface")
vLLM backend is not supported (no training capabilities)
Does not include feedback-based learning or experience tracking
Training loop must be implemented separately

Architecture

LearnableAgent uses the adapter pattern internally:

┌─────────────────────────┐
                    │     LearnableAgent      │
                    │  (model_config: local)  │
                    └───────────┬─────────────┘
                                │
                    ┌───────────┴─────────────┐
                    │   LocalAdapterFactory    │
                    └───────────┬─────────────┘
                                │
              ┌─────────────────┼─────────────────┐
              ▼                 ▼                 ▼
    ┌─────────────────┐ ┌─────────────────┐  ┌────────┐
    │ HuggingFaceLLM  │ │ HuggingFaceVLM  │  │ vLLM   │
    │    Adapter      │ │    Adapter      │  │Adapter │
    │ ✅ Training     │ │ ✅ Training     │  │ ❌     │
    └────────┬────────┘ └────────┬────────┘  └────────┘
             │                   │
    ┌────────┴───────────────────┴────────┐
    │             PeftHead                 │
    │  (LoRA adaptation wrapper)          │
    └──────────────────────────────────────┘

Future Training Module

We are actively developing a comprehensive training module that will integrate with LearnableAgent:

SFT Trainer: Supervised fine-tuning on instruction datasets
DPO/RLHF Trainer: Preference alignment training
Workflow Trainer: Train agents on multi-agent conversation traces

Stay tuned for updates in upcoming releases!

Navigation