Fleet

The fleet is Agentend's typed worker system. Instead of one model doing everything, the fleet assigns specialized model workers to specific task slots — classification, extraction, verification, summarization, generation, and tool calling. Each slot is backed by benchmark data to recommend the best model for the job.

6 Worker Slots

Every Agentend application has 6 worker slots. Each slot has a primary recommendation, a budget option, a fallback, and a local/self-hosted pick. Benchmark data is sourced from March 2026 evaluations.

classify

Fast intent classification. Needs low latency and high classification accuracy.

StrategyModelProviderLatencyAccuracy
Primaryclaude-haiku-4-5Anthropic180ms92.1%
Fallbackgpt-4o-miniOpenAI150ms90.5%
Budgetgemini-2.0-flashGoogle120ms91.0%
Localqwen2.5-7bAlibaba200ms87.3%

extract

Structured data extraction. Needs high JSON accuracy and reliable output formatting.

StrategyModelProviderJSON Acc.
Primaryclaude-sonnet-4-6Anthropic96.2%
Fallbackgpt-4oOpenAI94.8%
Budgetgemini-2.5-flashGoogle93.1%
Localqwen2.5-72bAlibaba91.5%

verify

Fact-checking and validation. Needs high reasoning power and factual accuracy.

StrategyModelProviderKey Score
Primaryclaude-opus-4-6AnthropicGPQA Diamond 83.8
Fallbackgemini-3-proGoogleGPQA Diamond 82.1
Budgetgemini-2.5-flashGoogleFacts 58.3
Localllama-4-maverickMetaFacts 52.1

summarize

Content condensation. Optimized for summarization quality.

StrategyModelProviderQuality
Primaryclaude-sonnet-4-6Anthropic94.1
Fallbackgpt-4oOpenAI--
Budgetgemini-2.0-flashGoogle89.5
Localphi-4-14bMicrosoft85.2

generate

Content and code generation. Needs highest reasoning and generation quality.

StrategyModelProviderKey Score
Primaryclaude-opus-4-6AnthropicSWE-bench 80.8, HumanEval 92.0
Fallbackglm-4.7ZhipuHumanEval 94.2, SWE-bench 76.5
Budgetgemini-2.5-flashGoogleHumanEval 85.3
Localqwen2.5-coder-32bAlibabaHumanEval 88.1

tool_call

Function and tool calling. Needs reliable tool use and high success rate.

StrategyModelProviderTool Use Acc.
Primaryclaude-sonnet-4-6Anthropic96.8%
Fallbackgpt-4oOpenAI95.1%
Budgetgemini-2.5-flashGoogle92.3%
Localllama-4-maverickMeta88.7%

fleet.yaml configuration

The fleet is configured in fleet.yaml at the project root. Each worker slot specifies a model, instance count, timeout, and retry policy.

fleet:
  name: default
  description: Default agentend fleet configuration

  workers:
    classify:
      count: 1
      model: gpt-4-turbo
      timeout: 30
      retry_policy: exponential_backoff

    extract:
      count: 2
      model: gpt-4-turbo
      timeout: 60
      retry_policy: exponential_backoff

    verify:
      count: 1
      model: gpt-4-turbo
      timeout: 45
      retry_policy: exponential_backoff

    summarize:
      count: 2
      model: gpt-3.5-turbo
      timeout: 60
      retry_policy: exponential_backoff

    generate:
      count: 4
      model: gpt-4-turbo
      timeout: 120
      retry_policy: exponential_backoff

    tool_call:
      count: 2
      model: gpt-4-turbo
      timeout: 90
      retry_policy: exponential_backoff

3-level configuration override

Agentend uses a 3-level configuration override system. Each level can override settings from the level above it:

LevelSourceDescription
1. Globalfleet.yamlBase configuration for all workers
2. Per-slotfleet.yaml workers.*Overrides for a specific worker slot
3. Per-requestWorkerConfigRuntime overrides via the API or capability code

RouteLLM smart routing

Agentend integrates with RouteLLM for intelligent model routing. Based on the complexity of the incoming request, RouteLLM can dynamically route to a cheaper model for simple tasks or escalate to a more capable model for hard problems — all within the same worker slot.

The BenchmarkRegistry powers this routing by providing cost, latency, and quality data for each model. Use apply_to_fleet_config() to automatically fill in benchmark-recommended models:

from agentend import BenchmarkRegistry

registry = BenchmarkRegistry()

# Apply budget-optimized models to your fleet config
config = registry.apply_to_fleet_config(fleet_config, strategy="budget")

# Or get a recommendation for a specific slot
rec = registry.get_recommendation("extract")
print(rec.primary.model_id)    # claude-sonnet-4-6
print(rec.budget_pick.model_id) # gemini-2.5-flash