Fleet
The fleet is Agentend's typed worker system. Instead of one model doing everything, the fleet assigns specialized model workers to specific task slots — classification, extraction, verification, summarization, generation, and tool calling. Each slot is backed by benchmark data to recommend the best model for the job.
6 Worker Slots
Every Agentend application has 6 worker slots. Each slot has a primary recommendation, a budget option, a fallback, and a local/self-hosted pick. Benchmark data is sourced from March 2026 evaluations.
classify
Fast intent classification. Needs low latency and high classification accuracy.
| Strategy | Model | Provider | Latency | Accuracy |
|---|---|---|---|---|
| Primary | claude-haiku-4-5 | Anthropic | 180ms | 92.1% |
| Fallback | gpt-4o-mini | OpenAI | 150ms | 90.5% |
| Budget | gemini-2.0-flash | 120ms | 91.0% | |
| Local | qwen2.5-7b | Alibaba | 200ms | 87.3% |
extract
Structured data extraction. Needs high JSON accuracy and reliable output formatting.
| Strategy | Model | Provider | JSON Acc. |
|---|---|---|---|
| Primary | claude-sonnet-4-6 | Anthropic | 96.2% |
| Fallback | gpt-4o | OpenAI | 94.8% |
| Budget | gemini-2.5-flash | 93.1% | |
| Local | qwen2.5-72b | Alibaba | 91.5% |
verify
Fact-checking and validation. Needs high reasoning power and factual accuracy.
| Strategy | Model | Provider | Key Score |
|---|---|---|---|
| Primary | claude-opus-4-6 | Anthropic | GPQA Diamond 83.8 |
| Fallback | gemini-3-pro | GPQA Diamond 82.1 | |
| Budget | gemini-2.5-flash | Facts 58.3 | |
| Local | llama-4-maverick | Meta | Facts 52.1 |
summarize
Content condensation. Optimized for summarization quality.
| Strategy | Model | Provider | Quality |
|---|---|---|---|
| Primary | claude-sonnet-4-6 | Anthropic | 94.1 |
| Fallback | gpt-4o | OpenAI | -- |
| Budget | gemini-2.0-flash | 89.5 | |
| Local | phi-4-14b | Microsoft | 85.2 |
generate
Content and code generation. Needs highest reasoning and generation quality.
| Strategy | Model | Provider | Key Score |
|---|---|---|---|
| Primary | claude-opus-4-6 | Anthropic | SWE-bench 80.8, HumanEval 92.0 |
| Fallback | glm-4.7 | Zhipu | HumanEval 94.2, SWE-bench 76.5 |
| Budget | gemini-2.5-flash | HumanEval 85.3 | |
| Local | qwen2.5-coder-32b | Alibaba | HumanEval 88.1 |
tool_call
Function and tool calling. Needs reliable tool use and high success rate.
| Strategy | Model | Provider | Tool Use Acc. |
|---|---|---|---|
| Primary | claude-sonnet-4-6 | Anthropic | 96.8% |
| Fallback | gpt-4o | OpenAI | 95.1% |
| Budget | gemini-2.5-flash | 92.3% | |
| Local | llama-4-maverick | Meta | 88.7% |
fleet.yaml configuration
The fleet is configured in fleet.yaml at the project root. Each worker slot specifies a model, instance count, timeout, and retry policy.
fleet:
name: default
description: Default agentend fleet configuration
workers:
classify:
count: 1
model: gpt-4-turbo
timeout: 30
retry_policy: exponential_backoff
extract:
count: 2
model: gpt-4-turbo
timeout: 60
retry_policy: exponential_backoff
verify:
count: 1
model: gpt-4-turbo
timeout: 45
retry_policy: exponential_backoff
summarize:
count: 2
model: gpt-3.5-turbo
timeout: 60
retry_policy: exponential_backoff
generate:
count: 4
model: gpt-4-turbo
timeout: 120
retry_policy: exponential_backoff
tool_call:
count: 2
model: gpt-4-turbo
timeout: 90
retry_policy: exponential_backoff3-level configuration override
Agentend uses a 3-level configuration override system. Each level can override settings from the level above it:
| Level | Source | Description |
|---|---|---|
| 1. Global | fleet.yaml | Base configuration for all workers |
| 2. Per-slot | fleet.yaml workers.* | Overrides for a specific worker slot |
| 3. Per-request | WorkerConfig | Runtime overrides via the API or capability code |
RouteLLM smart routing
Agentend integrates with RouteLLM for intelligent model routing. Based on the complexity of the incoming request, RouteLLM can dynamically route to a cheaper model for simple tasks or escalate to a more capable model for hard problems — all within the same worker slot.
The BenchmarkRegistry powers this routing by providing cost, latency, and quality data for each model. Use apply_to_fleet_config() to automatically fill in benchmark-recommended models:
from agentend import BenchmarkRegistry
registry = BenchmarkRegistry()
# Apply budget-optimized models to your fleet config
config = registry.apply_to_fleet_config(fleet_config, strategy="budget")
# Or get a recommendation for a specific slot
rec = registry.get_recommendation("extract")
print(rec.primary.model_id) # claude-sonnet-4-6
print(rec.budget_pick.model_id) # gemini-2.5-flash