Skip to content

Provider-agnostic Model Registry with Budget, Rate Limits, and Fallback Routing #1285

@0x-Professor

Description

@0x-Professor

Describe the feature

Summary
Introduce a first-class, provider-agnostic Model Registry that centralizes model configuration (provider, endpoints, capabilities), enforces cost/rate limits, and supports automatic fallback routing across multiple LLMs. This reduces unexpected spend, improves reliability across providers, and simplifies configuration for users running SWE-agent in production, batch, or research settings.

User Story
As a maintainer/user of SWE-agent, I want a unified model registry with budgets, rate limits, and automatic fallbacks so that long-running or batch workflows are reliable, cost-bounded, and portable across LLM providers.

Problem / Pain

  • Cost/budget governance gaps: Users report “Buggy cost control with Gemini 2.5 Pro” (Buggy cost control with Gemini 2.5 Pro #1240), indicating current cost controls are insufficient or provider-specific.
  • Documentation gap: A link referencing a “custom model registry” is broken (broken link: See the custom model registry section below for details. #1254), suggesting demand for a consolidated registry pattern that’s not fully implemented or discoverable.
  • Configuration fragmentation: Users ask for recommended configs for open-source models (Recommended config for open-source models? #1212). A registry would standardize model definitions, capabilities, and limits in YAML.
  • Evidence of feasibility and integration points:
    • Config-driven design is core to the project (config/README.md; docs site references).
    • pyproject.toml shows pydantic and pydantic_settings for typed config and litellm as the provider layer (pyproject.toml lines 41–64).
    • CLI entry point sweagent = "sweagent.run.run:main" (pyproject.toml lines 66–68) suggests a single orchestrator that can route all model calls through a registry/policy layer.

Feasibility & Integration Points

  • Config and validation: Use pydantic_settings (pyproject.toml lines 55–57) to define ModelConfig and PolicyConfig data models and load from YAML.
  • Provider abstraction: Integrate policy enforcement around the litellm call sites (pyproject.toml lines 57–58 list litellm), adding rate/budget guards and fallback.
  • CLI orchestration: Hook into the sweagent entry point (pyproject.toml lines 66–68) to load the registry, wire the policy engine, and pass model handles to existing execution flows.
  • Logging/metrics: Reuse rich/textual logging (pyproject.toml lines 45–46, 62–63) and emit JSON lines for budgets and throttling.
  • Exact LLM call site path: Unable to determine from available data; implement as a thin wrapper module (e.g., sweagent/llm/registry.py) and replace direct litellm usages via a factory.

Quality Considerations
Security | Performance | Reliability | Accessibility | i18n | Observability | Maintainability

  • Security: Minimal surface; ensure no secrets are logged in metrics.
  • Performance: Token accounting and simple in-memory counters; negligible overhead.
  • Reliability: Fallback and backoff significantly improve run completion.
  • Accessibility: Not applicable (CLI/config).
  • i18n: Not applicable.
  • Observability: Add counters and events for spend, limits, and fallbacks.
  • Maintainability: Centralizes model configuration and reduces provider-specific conditionals.

Related Issues/PRs
#1240 — Buggy cost control with Gemini 2.5 Pro (addresses governance with provider-agnostic policies)
#1254 — Broken link to “custom model registry” (introduces documented, working registry with schema)
#1212 — Recommended config for open-source models? (provides standardized examples and guidance)
Risks & Mitigations

  • Risk: Provider-specific quirks (cost accounting, rate limits) → Mitigation: Make cost/rate fields configurable; allow per-provider adapters.
  • Risk: Over-enforcement disrupting workflows → Mitigation: Soft mode by default; opt-in hard stop; detailed logs for tuning.
  • Risk: Config complexity → Mitigation: Ship curated examples and sensible defaults; strict pydantic validation with helpful errors.

Potential Solutions

Proposed Solution

  • Behavioral summary
    • Add a model_registry section in YAML configs that defines named models with: provider, model_id, base_url (optional), capabilities (e.g., function-calling), token limits, RPM/TPM, estimated cost per token, and monthly/ run-level budgets.
    • Introduce an LLM Policy Engine that:
      • Enforces per-model rate limits and cost budgets across runs and batches.
      • Routes to a configured fallback chain on provider/model failure or policy violation.
      • Emits structured metrics/events for observability (per-request tokens, spend, throttling).
  • API/CLI/config sketch
    • YAML (example):
      model_registry:
      default: gpt-4o
      models:
      gpt-4o: {provider: openai, model_id: gpt-4o, rpm: 60, tpm: 180000, usd_per_1k_input: 5e-3, usd_per_1k_output: 15e-3, capabilities: [function_calling]}
      gemini-pro: {provider: google, model_id: gemini-2.5-pro, rpm: 60, tpm: 200000, usd_per_1k_input: 1e-3, usd_per_1k_output: 2e-3}
      local-llm: {provider: openrouter, model_id: qwen2.5-coder, base_url: http://localhost:11434, rpm: 120, tpm: 300000}
      policies:
      budget_usd_per_run: 5.00
      fallback_chain: [gpt-4o, gemini-pro, local-llm]
      hard_stop_on_budget: true
    • CLI:
      • --model-registry <path.yaml> to supply/override registry.
      • --budget-usd , --rpm-limit , --tpm-limit as overrides.
  • Error handling and edge cases
    • If hard_stop_on_budget is true, fail-fast with actionable error when limit is exceeded.
    • If a provider is rate-limited, backoff with tenacity and route to next fallback if configured.
    • Validate model capabilities vs. requested features (e.g., function-calling) before selection; fail early with clear guidance if no compatible model is available.
  • Compatibility strategy
    • Default behavior unchanged unless model_registry is provided.
    • Backwards compatible: existing single-model configs continue to work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions