Models & Providers¶

Specifying a model¶

The Model field on any agent (or on Selection.Model, Selection.Magentic.Model, Compaction.Model) accepts three forms:

1. Plain string — auto-detection¶

Model: gpt-4o

The provider, API endpoint, and API key environment variable are inferred from the model ID prefix. Nothing else needs to be configured.

2. Named alias — registry reference¶

Define aliases once in the top-level Models dictionary, then reference by name:

Models:
  fast:
    ModelId: grok-4.3
    ReasoningEffort: none
  smart:
    ModelId: grok-4.3
    ReasoningEffort: low

Agents:
  - Name: Planner
    Model:
      ModelId: fast
      MaxTokens: 4096
  - Name: Developer
    Model:
      ModelId: fast
      MaxTokens: 16384
  - Name: Tester
    Model:
      ModelId: smart
      MaxTokens: 8192

Per-agent Temperature and MaxTokens override the alias values.

3. Inline object — full manual control¶

Model:
  ModelId: my-model
  Provider: openai
  Endpoint: https://my-proxy.example.com/v1
  ApiKeyEnvVar: MY_PROXY_KEY
  MaxTokens: 8192
  Temperature: 0.2

Any field left empty falls back to auto-detection.

ModelConfig fields¶

Field	Type	Default	Description
`ModelId`	string	—	Model identifier sent to the API.
`Provider`	string	auto	Connector type: `openai`, `azure`, `google`, `mistral`, `ollama`. Auto-detected from `ModelId` if omitted.
`Endpoint`	string	auto	API base URL. Auto-detected from provider if omitted. Required for `azure`. Falls back to `endpoint` in `~/.fuseraft/config` when blank.
`ApiKeyEnvVar`	string	auto	Name of the environment variable holding the primary API key. Auto-detected from provider if omitted. Leave empty for `ollama`. Falls back to `apiKeyEnvVar` in `~/.fuseraft/config` when blank.
`ApiKey`	string	—	Literal API key. Takes precedence over `ApiKeyEnvVar`. Used by the REPL wizard; not recommended in YAML configs.
`ApiKeys`	array	—	Additional literal API keys for pool rotation. See Credential pool rotation.
`ApiKeyEnvVars`	array	—	Additional environment variable names each holding an API key, for pool rotation. See Credential pool rotation.
`MaxTokens`	int	`0`	Max tokens per response. `0` = use model default.
`MaxContextTokens`	int	`0`	Input context window limit (≈85% of the model's advertised maximum). Requests that would exceed this value are rejected before the API call — prevents expensive failures on models with hard limits. `0` disables the check.
`MaxPayloadBytes`	integer	`0`	Maximum serialized request body size in bytes. When set, the agent middleware estimates the outgoing JSON payload size (content × 1.2 + tool schemas × 1.1 + 2 KB envelope) before each API call and rejects it if it would exceed this limit — preventing HTTP 413 errors from upstream proxies (e.g. nginx). Set to your proxy's `client_max_body_size` minus ~10% headroom. `0` = no limit enforced.
`Temperature`	number	—	Sampling temperature (0.0–2.0). Omit for reasoning models that reject this parameter.
`ReasoningEffort`	string	—	Reasoning depth for models that support it (e.g. `grok-4.3`). Values: `none`, `low`, `medium`, `high`. Injected as `"reasoning": {"effort": "..."}` in the request. Omit for models that do not support this parameter.
`FalloverModels`	array	—	Ordered list of fallover models to try when this model fails with a classifiable error. Each entry supports the same shorthand as `ModelId` (a plain string in YAML). See Fallover chain.
`FalloverOn`	array	—	Error reasons that trigger fallover. Defaults to all recoverable reasons: `RateLimit`, `ContextExceeded`, `QuotaExceeded`, `ServerError`. `AuthError` is never fallover-able. Only relevant when `FalloverModels` is set.

Auto-detection table¶

When Endpoint and ApiKeyEnvVar are not specified, they are filled in based on the model ID prefix:

Model prefix	Provider	Default endpoint	API key env var
`gpt-*`	openai	`https://api.openai.com/v1`	`OPENAI_API_KEY`
`o1`, `o3`, `o4*`	openai	`https://api.openai.com/v1`	`OPENAI_API_KEY`
`grok-*`	openai	`https://api.x.ai/v1`	`XAI_API_KEY`
`claude-*`	openai	`https://api.anthropic.com/v1`	`ANTHROPIC_API_KEY`
`gemini-`, `learnlm-`	google	`https://generativelanguage.googleapis.com/v1beta/openai`	`GOOGLE_AI_API_KEY`
`mistral-`, `mixtral-`	mistral	`https://api.mistral.ai/v1`	`MISTRAL_API_KEY`
`codestral-`, `pixtral-`	mistral	`https://api.mistral.ai/v1`	`MISTRAL_API_KEY`
`deepseek-*`	openai	`https://api.deepseek.com/v1`	`DEEPSEEK_API_KEY`
`llama`, `phi`, `qwen`, `gemma`, `codellama`, `smollm`	ollama	`http://localhost:11434`	(none)
`name:tag` (colon format)	ollama	`http://localhost:11434`	(none)

For any model not matching the table, specify Provider, Endpoint, and ApiKeyEnvVar explicitly.

Global config defaults¶

~/.fuseraft/config can define a default endpoint and apiKeyEnvVar that are applied to every agent model (and named alias) that doesn't set those fields itself. This means you only need to configure the provider once — generated agent files work out of the box without repeating the values.

{
  "modelId": "anthropic.claude-sonnet-4-5-20250929-v1:0",
  "endpoint": "http://localhost:3000/api/openai/v1",
  "apiKeyEnvVar": "OPENWEBUI_API_KEY"
}

Set this file via fuseraft repl (the setup wizard writes it automatically) or edit it directly.

OS keychain fallback¶

If an agent model has neither ApiKey nor ApiKeyEnvVar set after global defaults are applied, fuseraft retrieves the key stored in the OS keychain (set via fuseraft key set or the REPL wizard) and injects it as a literal ApiKey. This means the full auth resolution order for any agent model is:

Explicit ApiKey in the agent file (literal value)
ApiKeyEnvVar from the agent file (env var lookup)
apiKeyEnvVar from ~/.fuseraft/config (env var lookup)
OS keychain (retrieved once at startup, injected as literal key)
Nothing — Ollama and other unauthenticated providers work without a key

Per-agent values always win; global values only fill in empty fields.

Supported providers¶

openai — OpenAI and OpenAI-compatible APIs¶

Uses Microsoft.Extensions.AI with the OpenAI connector. Works with any API that follows the OpenAI chat completions format.

Compatible services include: OpenAI, xAI (Grok), Anthropic (via their OpenAI-compatible endpoint), DeepSeek, OpenRouter, Groq, Together AI, LM Studio, vLLM, and many others.

export OPENAI_API_KEY=sk-...

Model: gpt-4o

azure — Azure OpenAI Service¶

Uses Microsoft.Extensions.AI with the Azure OpenAI connector. Requires Endpoint (your Azure resource URL) and ApiKeyEnvVar.

export AZURE_OPENAI_API_KEY=...

Model:
  ModelId: gpt-4o
  Provider: azure
  Endpoint: https://my-resource.openai.azure.com/
  ApiKeyEnvVar: AZURE_OPENAI_API_KEY

ModelId maps to the Azure deployment name, not the underlying model name.

google — Google AI Gemini¶

Uses Microsoft.Extensions.AI with the Google connector. Connects via the Google AI API.

export GOOGLE_AI_API_KEY=...

Model: gemini-2.0-flash

mistral — Mistral AI¶

Uses Microsoft.Extensions.AI with the Mistral connector.

export MISTRAL_API_KEY=...

Model: mistral-large-latest

ollama — Local models via Ollama¶

Uses OllamaApiClient from OllamaSharp. No API key required. The default endpoint is http://localhost:11434.

Model: llama3.2

To use a custom Ollama endpoint:

Model:
  ModelId: llama3.2
  Endpoint: http://192.168.1.50:11434

Using Open WebUI¶

Open WebUI exposes an OpenAI-compatible API. Use the openai provider with your Open WebUI instance URL.

export OPENWEBUI_API_KEY=<key-from-owui-settings>

Models:
  local-llama:
    ModelId: llama3.2
    Provider: openai
    Endpoint: http://localhost:3000/api/openai/v1
    ApiKeyEnvVar: OPENWEBUI_API_KEY

Generate the API key in Open WebUI under Settings → Account → API Keys.

Mixing providers across agents¶

Each agent gets its own chat client and its own model. You can freely mix providers in a single config:

Models:
  planner-model:
    ModelId: gpt-4o
  coder-model:
    ModelId: claude-3-5-sonnet-20241022
  local-reviewer:
    ModelId: llama3.2

Agents:
  - Name: Planner
    Model:
      ModelId: planner-model
    ...
  - Name: Developer
    Model:
      ModelId: coder-model
    ...
  - Name: Reviewer
    Model:
      ModelId: local-reviewer
    ...

Each agent's API calls are made with its own key and endpoint. Token costs are tracked and summed across all agents.

Credential pool rotation¶

When multiple API keys are available for the same provider, fuseraft-cli automatically rotates between them on 429 Too Many Requests responses. This keeps long sessions alive when a single API key hits its rate limit.

How it works¶

All keys from ApiKey, ApiKeyEnvVar, ApiKeys, and ApiKeyEnvVars are collected and deduplicated at session start.
Requests use the current slot (round-robin starting at 0).
When a 429 is returned — after TransientRetryHandler exhausts its own per-request retries — the slot is marked with a 60-second cooldown and the next available slot is tried.
If all slots are simultaneously rate-limited, the session surfaces the error rather than busy-waiting.
Single-key configs (the common case) are unaffected: KeyPoolChatClient is only activated when more than one distinct key resolves.

Configuration¶

Model:
  ModelId: gpt-4o
  ApiKeyEnvVar: OPENAI_API_KEY_1       # primary key (env var)
  ApiKeyEnvVars:
    - OPENAI_API_KEY_2                 # rotated to on 429
    - OPENAI_API_KEY_3

# Or mix literal keys:
Model:
  ModelId: claude-sonnet-4-6
  ApiKeyEnvVar: ANTHROPIC_API_KEY
  ApiKeys:
    - sk-ant-...second-key...
    - sk-ant-...third-key...

Keys can also be sourced entirely from env vars:

Model:
  ModelId: gpt-4o
  ApiKeyEnvVars:
    - OPENAI_KEY_TEAM_1
    - OPENAI_KEY_TEAM_2
    - OPENAI_KEY_TEAM_3

Named alias with a pool¶

Models:
  gpt4-pool:
    ModelId: gpt-4o
    ApiKeyEnvVar: OPENAI_API_KEY_1
    ApiKeyEnvVars:
      - OPENAI_API_KEY_2
      - OPENAI_API_KEY_3

Agents:
  - Name: Developer
    Model: gpt4-pool

All agents that reference the same alias share the same KeyPoolChatClient instance, so rotation state is shared across agents. Cooldowns from one agent's 429 apply to all agents using that pool for the duration of the cooldown window.

Fallover chain¶

When a provider call fails with a classifiable error (rate limit, context exceeded, quota exhausted, or server error), fuseraft-cli can automatically retry on a different model. Configure an ordered FalloverModels list on any ModelConfig and the primary model is tried first; on failure the next entry is tried, and so on until one succeeds or all are exhausted.

Model:
  ModelId: claude-opus-4-7
  FalloverModels:
    - gpt-4o           # tried if Anthropic returns 429 or 5xx
    - gemini-2.0-flash # final fallback

Each fallover entry supports the same shorthand as ModelId (a plain string) and goes through the full model resolution pipeline — including its own key pool if you configure ApiKeys or ApiKeyEnvVars on the fallover model.

How it works¶

The primary model is tried first.
If it throws an exception whose cause is in FalloverOn, the error is logged to stderr and the next model in the chain is tried.
For streaming responses, fallover only fires before the first chunk is delivered — mid-stream exceptions propagate as-is (the caller has already received partial output).
If all models in the chain fail, the last exception is re-thrown.

Fallover reasons¶

Reason	HTTP status	Trigger condition
`RateLimit`	429	Request-rate limit hit (not billing-related).
`ContextExceeded`	400	Prompt exceeded the model's context window.
`QuotaExceeded`	429 + quota/billing message	Account-level quota or billing limit reached.
`ServerError`	5xx	Provider-side server error after all per-request retries.
`AuthError`	401 / 403	Invalid or missing credentials — not fallover-able by default.

The default FalloverOn value covers all four recoverable reasons. Override it to restrict fallover to specific conditions:

Model:
  ModelId: gpt-4o
  FalloverModels:
    - gemini-2.0-flash
  FalloverOn:
    - ContextExceeded   # only fallover when the prompt is too long

Combining with credential pool rotation¶

FalloverModels and credential pool rotation work independently and compose well. The key pool rotates between API keys for the same model; the fallover chain switches to a different model when all keys on the primary are exhausted:

Model:
  ModelId: claude-opus-4-7
  ApiKeyEnvVar: ANTHROPIC_KEY_1
  ApiKeyEnvVars:
    - ANTHROPIC_KEY_2     # rotated to on 429
  FalloverModels:
    - gpt-4o              # tried after all Anthropic keys are rate-limited

Named alias with a fallover chain¶

Models:
  robust:
    ModelId: claude-opus-4-7
    FalloverModels:
      - gpt-4o
      - gemini-2.0-flash

Agents:
  - Name: Developer
    Model: robust

Reasoning models¶

Reasoning models (OpenAI o1/o3/o4, xAI grok-4.3) reject the temperature parameter. Leave Temperature unset (null) for these models.

xAI reasoning effort¶

grok-4.3 supports four reasoning depth levels controlled by the ReasoningEffort field:

Value	Behaviour
`none`	Reasoning disabled — fastest, cheapest. Use for structured output, routing, and summarisation agents.
`low`	Light reasoning (default when unset on `grok-4.3`). Balances speed and analytical depth.
`medium`	More thinking tokens. Good for complex analysis, planning, and code review.
`high`	Maximum reasoning — slowest and most expensive. Reserve for the hardest problems.

Models:
  fast:
    ModelId: grok-4.3
    ApiKeyEnvVar: XAI_API_KEY
    ReasoningEffort: none      # structured output, routing agents

  reasoning:
    ModelId: grok-4.3
    ApiKeyEnvVar: XAI_API_KEY
    ReasoningEffort: low       # general agentic work

  deep:
    ModelId: grok-4.3
    ApiKeyEnvVar: XAI_API_KEY
    ReasoningEffort: high      # complex planning or review

The value is injected at the HTTP layer as "reasoning": {"effort": "..."} — no SDK-level support is required.

For OpenAI o1/o3/o4, leave ReasoningEffort unset; those models use a separate SDK-native mechanism (ReasoningEffortLevel) that the OpenAI SDK applies automatically.