Fleet

The fleet is Agentend's typed worker system. Instead of one model doing everything, the fleet assigns specialized model workers to specific task slots — classification, extraction, verification, summarization, generation, and tool calling. Each slot is backed by benchmark data to recommend the best model for the job.

6 Worker Slots

Every Agentend application has 6 worker slots. Each slot has a primary recommendation, a budget option, a fallback, and a local/self-hosted pick. Benchmark data is sourced from March 2026 evaluations.

classify

Fast intent classification. Needs low latency and high classification accuracy.

Strategy	Model	Provider	Latency	Accuracy
Primary	claude-haiku-4-5	Anthropic	180ms	92.1%
Fallback	gpt-4o-mini	OpenAI	150ms	90.5%
Budget	gemini-2.0-flash	Google	120ms	91.0%
Local	qwen2.5-7b	Alibaba	200ms	87.3%

extract

Structured data extraction. Needs high JSON accuracy and reliable output formatting.

Strategy	Model	Provider	JSON Acc.
Primary	claude-sonnet-4-6	Anthropic	96.2%
Fallback	gpt-4o	OpenAI	94.8%
Budget	gemini-2.5-flash	Google	93.1%
Local	qwen2.5-72b	Alibaba	91.5%

verify

Fact-checking and validation. Needs high reasoning power and factual accuracy.

Strategy	Model	Provider	Key Score
Primary	claude-opus-4-6	Anthropic	GPQA Diamond 83.8
Fallback	gemini-3-pro	Google	GPQA Diamond 82.1
Budget	gemini-2.5-flash	Google	Facts 58.3
Local	llama-4-maverick	Meta	Facts 52.1

summarize

Content condensation. Optimized for summarization quality.

Strategy	Model	Provider	Quality
Primary	claude-sonnet-4-6	Anthropic	94.1
Fallback	gpt-4o	OpenAI	--
Budget	gemini-2.0-flash	Google	89.5
Local	phi-4-14b	Microsoft	85.2

generate

Content and code generation. Needs highest reasoning and generation quality.

Strategy	Model	Provider	Key Score
Primary	claude-opus-4-6	Anthropic	SWE-bench 80.8, HumanEval 92.0
Fallback	glm-4.7	Zhipu	HumanEval 94.2, SWE-bench 76.5
Budget	gemini-2.5-flash	Google	HumanEval 85.3
Local	qwen2.5-coder-32b	Alibaba	HumanEval 88.1

tool_call

Function and tool calling. Needs reliable tool use and high success rate.

Strategy	Model	Provider	Tool Use Acc.
Primary	claude-sonnet-4-6	Anthropic	96.8%
Fallback	gpt-4o	OpenAI	95.1%
Budget	gemini-2.5-flash	Google	92.3%
Local	llama-4-maverick	Meta	88.7%

fleet.yaml configuration

The fleet is configured in fleet.yaml at the project root. Each worker slot specifies a model, instance count, timeout, and retry policy.

fleet:
  name: default
  description: Default agentend fleet configuration

  workers:
    classify:
      count: 1
      model: gpt-4-turbo
      timeout: 30
      retry_policy: exponential_backoff

    extract:
      count: 2
      model: gpt-4-turbo
      timeout: 60
      retry_policy: exponential_backoff

    verify:
      count: 1
      model: gpt-4-turbo
      timeout: 45
      retry_policy: exponential_backoff

    summarize:
      count: 2
      model: gpt-3.5-turbo
      timeout: 60
      retry_policy: exponential_backoff

    generate:
      count: 4
      model: gpt-4-turbo
      timeout: 120
      retry_policy: exponential_backoff

    tool_call:
      count: 2
      model: gpt-4-turbo
      timeout: 90
      retry_policy: exponential_backoff

3-level configuration override

Agentend uses a 3-level configuration override system. Each level can override settings from the level above it:

Level	Source	Description
1. Global	fleet.yaml	Base configuration for all workers
2. Per-slot	fleet.yaml workers.*	Overrides for a specific worker slot
3. Per-request	WorkerConfig	Runtime overrides via the API or capability code

RouteLLM smart routing

Agentend integrates with RouteLLM for intelligent model routing. Based on the complexity of the incoming request, RouteLLM can dynamically route to a cheaper model for simple tasks or escalate to a more capable model for hard problems — all within the same worker slot.

The BenchmarkRegistry powers this routing by providing cost, latency, and quality data for each model. Use apply_to_fleet_config() to automatically fill in benchmark-recommended models:

from agentend import BenchmarkRegistry

registry = BenchmarkRegistry()

# Apply budget-optimized models to your fleet config
config = registry.apply_to_fleet_config(fleet_config, strategy="budget")

# Or get a recommendation for a specific slot
rec = registry.get_recommendation("extract")
print(rec.primary.model_id)    # claude-sonnet-4-6
print(rec.budget_pick.model_id) # gemini-2.5-flash