Provider-agnostic Model Registry with Budget, Rate Limits, and Fallback Routing

### Describe the feature

Summary  
Introduce a first-class, provider-agnostic Model Registry that centralizes model configuration (provider, endpoints, capabilities), enforces cost/rate limits, and supports automatic fallback routing across multiple LLMs. This reduces unexpected spend, improves reliability across providers, and simplifies configuration for users running SWE-agent in production, batch, or research settings.

User Story  
As a maintainer/user of SWE-agent, I want a unified model registry with budgets, rate limits, and automatic fallbacks so that long-running or batch workflows are reliable, cost-bounded, and portable across LLM providers.

Problem / Pain  
- Cost/budget governance gaps: Users report “Buggy cost control with Gemini 2.5 Pro” (#1240), indicating current cost controls are insufficient or provider-specific.  
- Documentation gap: A link referencing a “custom model registry” is broken (#1254), suggesting demand for a consolidated registry pattern that’s not fully implemented or discoverable.  
- Configuration fragmentation: Users ask for recommended configs for open-source models (#1212). A registry would standardize model definitions, capabilities, and limits in YAML.  
- Evidence of feasibility and integration points:  
  - Config-driven design is core to the project (config/README.md; docs site references).  
  - pyproject.toml shows pydantic and pydantic_settings for typed config and litellm as the provider layer (pyproject.toml lines 41–64).  
  - CLI entry point sweagent = "sweagent.run.run:main" (pyproject.toml lines 66–68) suggests a single orchestrator that can route all model calls through a registry/policy layer.

Feasibility & Integration Points  
- Config and validation: Use pydantic_settings (pyproject.toml lines 55–57) to define ModelConfig and PolicyConfig data models and load from YAML.  
- Provider abstraction: Integrate policy enforcement around the litellm call sites (pyproject.toml lines 57–58 list litellm), adding rate/budget guards and fallback.  
- CLI orchestration: Hook into the sweagent entry point (pyproject.toml lines 66–68) to load the registry, wire the policy engine, and pass model handles to existing execution flows.  
- Logging/metrics: Reuse rich/textual logging (pyproject.toml lines 45–46, 62–63) and emit JSON lines for budgets and throttling.  
- Exact LLM call site path: Unable to determine from available data; implement as a thin wrapper module (e.g., sweagent/llm/registry.py) and replace direct litellm usages via a factory.

Quality Considerations  
Security | Performance | Reliability | Accessibility | i18n | Observability | Maintainability  
- Security: Minimal surface; ensure no secrets are logged in metrics.  
- Performance: Token accounting and simple in-memory counters; negligible overhead.  
- Reliability: Fallback and backoff significantly improve run completion.  
- Accessibility: Not applicable (CLI/config).  
- i18n: Not applicable.  
- Observability: Add counters and events for spend, limits, and fallbacks.  
- Maintainability: Centralizes model configuration and reduces provider-specific conditionals.

Related Issues/PRs  
#1240 — Buggy cost control with Gemini 2.5 Pro (addresses governance with provider-agnostic policies)  
#1254 — Broken link to “custom model registry” (introduces documented, working registry with schema)  
#1212 — Recommended config for open-source models? (provides standardized examples and guidance)  
Risks & Mitigations  
- Risk: Provider-specific quirks (cost accounting, rate limits) → Mitigation: Make cost/rate fields configurable; allow per-provider adapters.  
- Risk: Over-enforcement disrupting workflows → Mitigation: Soft mode by default; opt-in hard stop; detailed logs for tuning.  
- Risk: Config complexity → Mitigation: Ship curated examples and sensible defaults; strict pydantic validation with helpful errors.

### Potential Solutions

Proposed Solution  
- Behavioral summary  
  - Add a model_registry section in YAML configs that defines named models with: provider, model_id, base_url (optional), capabilities (e.g., function-calling), token limits, RPM/TPM, estimated cost per token, and monthly/ run-level budgets.  
  - Introduce an LLM Policy Engine that:  
    - Enforces per-model rate limits and cost budgets across runs and batches.  
    - Routes to a configured fallback chain on provider/model failure or policy violation.  
    - Emits structured metrics/events for observability (per-request tokens, spend, throttling).  
- API/CLI/config sketch  
  - YAML (example):  
    model_registry:  
      default: gpt-4o  
      models:  
        gpt-4o: {provider: openai, model_id: gpt-4o, rpm: 60, tpm: 180000, usd_per_1k_input: 5e-3, usd_per_1k_output: 15e-3, capabilities: [function_calling]}  
        gemini-pro: {provider: google, model_id: gemini-2.5-pro, rpm: 60, tpm: 200000, usd_per_1k_input: 1e-3, usd_per_1k_output: 2e-3}  
        local-llm: {provider: openrouter, model_id: qwen2.5-coder, base_url: http://localhost:11434, rpm: 120, tpm: 300000}  
      policies:  
        budget_usd_per_run: 5.00  
        fallback_chain: [gpt-4o, gemini-pro, local-llm]  
        hard_stop_on_budget: true  
  - CLI:  
    - --model-registry <path.yaml> to supply/override registry.  
    - --budget-usd <float>, --rpm-limit <int>, --tpm-limit <int> as overrides.  
- Error handling and edge cases  
  - If hard_stop_on_budget is true, fail-fast with actionable error when limit is exceeded.  
  - If a provider is rate-limited, backoff with tenacity and route to next fallback if configured.  
  - Validate model capabilities vs. requested features (e.g., function-calling) before selection; fail early with clear guidance if no compatible model is available.  
- Compatibility strategy  
  - Default behavior unchanged unless model_registry is provided.  
  - Backwards compatible: existing single-model configs continue to work.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provider-agnostic Model Registry with Budget, Rate Limits, and Fallback Routing #1285

Describe the feature

Potential Solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provider-agnostic Model Registry with Budget, Rate Limits, and Fallback Routing #1285

Description

Describe the feature

Potential Solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions