Learning Agents
Create agents that use local models with optional PEFT (Parameter-Efficient Fine-Tuning) capabilities for custom model behavior.
Under Active Development
LearnableAgent is currently under active development. The API may change in future releases.
The current implementation provides foundational capabilities only: you can create agents with PEFT (LoRA) adapters and access model weights/tokenizers for external training pipelines.
Future updates will add integrated training components including:
- SFT Trainer: Supervised fine-tuning on instruction datasets
- DPO/RLHF Trainer: Preference alignment training
- Workflow Trainer: Train agents on multi-agent conversation traces
To fully leverage LearnableAgent for training, you currently need to implement your own training loop using the exposed trainable_model and tokenizer.
Overview
Learning agents in MARSYS are designed for:
- Local Model Execution: Run open-source models (Qwen, LLaMA, Mistral, etc.) locally
- PEFT Support: Attach learning heads like LoRA for fine-tuning
- Weight Access: Direct access to model weights and tokenizer for training
Requirements
LearnableAgent requires:
- Local GPU/compute resources
- The
marsys[local-models]package - HuggingFace backend only (vLLM does not support training)
pip install marsys[local-models]
Local-Only Restriction
LearnableAgent only supports local models (type="local") with the HuggingFace backend. Using backend="vllm" will raise a TypeError since vLLM does not support training.
LearnableAgent
Concrete implementation for local models with optional PEFT:
from marsys.agents import LearnableAgentfrom marsys.models import ModelConfig# Configure local model (HuggingFace only)model_config = ModelConfig(type="local",model_class="llm",name="Qwen/Qwen3-4B-Instruct-2507",backend="huggingface", # Required for trainingtorch_dtype="bfloat16",device_map="auto",max_tokens=4096)agent = LearnableAgent(model_config=model_config,name="MyLearnableAgent",goal="A helpful assistant that answers questions",instruction="You are a helpful assistant. Provide clear and accurate responses to user queries.",tools={"search": search_function},learning_head="peft",learning_head_config={"r": 16, # LoRA rank"lora_alpha": 32,"target_modules": ["q_proj", "v_proj"]})
Key Distinction: Unlike Agent which uses API-based models (OpenAI, Anthropic), LearnableAgent works with local models where you have direct weight access.
PEFT Configuration
When using learning_head="peft", provide configuration for the PEFT head:
from marsys.agents import LearnableAgentfrom marsys.models import ModelConfiglearning_head_config = {"r": 16, # LoRA rank"lora_alpha": 32, # LoRA alpha scaling"target_modules": ["q_proj", "v_proj"], # Modules to adapt"lora_dropout": 0.1, # Dropout rate"bias": "none" # Bias training setting}model_config = ModelConfig(type="local",model_class="llm",name="Qwen/Qwen3-4B-Instruct-2507",backend="huggingface",torch_dtype="bfloat16",device_map="auto")agent = LearnableAgent(model_config=model_config,name="ExpertCoder",goal="Expert coding assistant for development tasks",instruction="You are an expert coder. Help with code generation, debugging, and optimization.",learning_head="peft",learning_head_config=learning_head_config)
The model is wrapped in a PeftHead which handles the LoRA adaptation.
Usage Example
from marsys.agents import LearnableAgentfrom marsys.models import ModelConfigfrom marsys.coordination import Orchestra# Configure local modelmodel_config = ModelConfig(type="local",model_class="llm",name="Qwen/Qwen3-4B-Instruct-2507",backend="huggingface",torch_dtype="bfloat16",device_map="auto",max_tokens=4096)# Create agent with PEFTagent = LearnableAgent(model_config=model_config,name="CodeReviewer",goal="Expert code reviewer for quality assurance",instruction="You are an expert code reviewer. Analyze code for bugs, security issues, and best practices.",learning_head="peft",learning_head_config={"r": 8,"lora_alpha": 16,"target_modules": ["q_proj", "v_proj"]})# Use in a topologytopology = {"agents": ["CodeReviewer"],"flows": []}result = await Orchestra.run(task="Review this Python code for bugs",topology=topology)
Training Access
LearnableAgent provides access to the underlying PyTorch model and tokenizer for training:
# Access model internals for trainingpytorch_model = agent.model.trainable_model # PEFT-wrapped modeltokenizer = agent.model.tokenizer # HuggingFace tokenizerbase_model = agent.model.base_model # Original model (pre-PEFT)# Example: Use with trl for RLHFfrom trl import PPOTrainer, PPOConfigppo_config = PPOConfig(learning_rate=1e-5,batch_size=4)trainer = PPOTrainer(config=ppo_config,model=agent.model.trainable_model,tokenizer=agent.model.tokenizer,# ... training data and reward model)
PeftHead Properties
When learning_head="peft" is used, the agent's model is wrapped in a PeftHead that provides:
trainable_model: The PEFT-wrapped model for trainingbase_model: The original HuggingFace modeltokenizer: The model's tokenizersave_pretrained(path): Save the PEFT adapter weights
When to Use LearnableAgent
Use LearnableAgent when you need:
- Custom model behavior through training
- Local GPU/compute resources
- Open-source models (Qwen, LLaMA, Mistral, Phi, etc.)
- Full control over model architecture
- Fine-tuning for specific workflows
- Direct access to model weights and tokenizer
Use Agent (with API models) when you need:
- Quick setup without GPU requirements
- Latest model capabilities (GPT-5, Claude Sonnet 4.5)
- Pay-per-use pricing
- No infrastructure management
Limitations
The current implementation:
- Only supports
"peft"as the learning head type - Requires HuggingFace backend (
backend="huggingface") - vLLM backend is not supported (no training capabilities)
- Does not include feedback-based learning or experience tracking
- Training loop must be implemented separately
Architecture
LearnableAgent uses the adapter pattern internally:
┌─────────────────────────┐│ LearnableAgent ││ (model_config: local) │└───────────┬─────────────┘│┌───────────┴─────────────┐│ LocalAdapterFactory │└───────────┬─────────────┘│┌─────────────────┼─────────────────┐▼ ▼ ▼┌─────────────────┐ ┌─────────────────┐ ┌────────┐│ HuggingFaceLLM │ │ HuggingFaceVLM │ │ vLLM ││ Adapter │ │ Adapter │ │Adapter ││ ✅ Training │ │ ✅ Training │ │ ❌ │└────────┬────────┘ └────────┬────────┘ └────────┘│ │┌────────┴───────────────────┴────────┐│ PeftHead ││ (LoRA adaptation wrapper) │└──────────────────────────────────────┘
Future Training Module
We are actively developing a comprehensive training module that will integrate with LearnableAgent:
- SFT Trainer: Supervised fine-tuning on instruction datasets
- DPO/RLHF Trainer: Preference alignment training
- Workflow Trainer: Train agents on multi-agent conversation traces
Stay tuned for updates in upcoming releases!