-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the feature
Summary
Introduce a first-class, provider-agnostic Model Registry that centralizes model configuration (provider, endpoints, capabilities), enforces cost/rate limits, and supports automatic fallback routing across multiple LLMs. This reduces unexpected spend, improves reliability across providers, and simplifies configuration for users running SWE-agent in production, batch, or research settings.
User Story
As a maintainer/user of SWE-agent, I want a unified model registry with budgets, rate limits, and automatic fallbacks so that long-running or batch workflows are reliable, cost-bounded, and portable across LLM providers.
Problem / Pain
- Cost/budget governance gaps: Users report “Buggy cost control with Gemini 2.5 Pro” (Buggy cost control with Gemini 2.5 Pro #1240), indicating current cost controls are insufficient or provider-specific.
- Documentation gap: A link referencing a “custom model registry” is broken (broken link: See the custom model registry section below for details. #1254), suggesting demand for a consolidated registry pattern that’s not fully implemented or discoverable.
- Configuration fragmentation: Users ask for recommended configs for open-source models (Recommended config for open-source models? #1212). A registry would standardize model definitions, capabilities, and limits in YAML.
- Evidence of feasibility and integration points:
- Config-driven design is core to the project (config/README.md; docs site references).
- pyproject.toml shows pydantic and pydantic_settings for typed config and litellm as the provider layer (pyproject.toml lines 41–64).
- CLI entry point sweagent = "sweagent.run.run:main" (pyproject.toml lines 66–68) suggests a single orchestrator that can route all model calls through a registry/policy layer.
Feasibility & Integration Points
- Config and validation: Use pydantic_settings (pyproject.toml lines 55–57) to define ModelConfig and PolicyConfig data models and load from YAML.
- Provider abstraction: Integrate policy enforcement around the litellm call sites (pyproject.toml lines 57–58 list litellm), adding rate/budget guards and fallback.
- CLI orchestration: Hook into the sweagent entry point (pyproject.toml lines 66–68) to load the registry, wire the policy engine, and pass model handles to existing execution flows.
- Logging/metrics: Reuse rich/textual logging (pyproject.toml lines 45–46, 62–63) and emit JSON lines for budgets and throttling.
- Exact LLM call site path: Unable to determine from available data; implement as a thin wrapper module (e.g., sweagent/llm/registry.py) and replace direct litellm usages via a factory.
Quality Considerations
Security | Performance | Reliability | Accessibility | i18n | Observability | Maintainability
- Security: Minimal surface; ensure no secrets are logged in metrics.
- Performance: Token accounting and simple in-memory counters; negligible overhead.
- Reliability: Fallback and backoff significantly improve run completion.
- Accessibility: Not applicable (CLI/config).
- i18n: Not applicable.
- Observability: Add counters and events for spend, limits, and fallbacks.
- Maintainability: Centralizes model configuration and reduces provider-specific conditionals.
Related Issues/PRs
#1240 — Buggy cost control with Gemini 2.5 Pro (addresses governance with provider-agnostic policies)
#1254 — Broken link to “custom model registry” (introduces documented, working registry with schema)
#1212 — Recommended config for open-source models? (provides standardized examples and guidance)
Risks & Mitigations
- Risk: Provider-specific quirks (cost accounting, rate limits) → Mitigation: Make cost/rate fields configurable; allow per-provider adapters.
- Risk: Over-enforcement disrupting workflows → Mitigation: Soft mode by default; opt-in hard stop; detailed logs for tuning.
- Risk: Config complexity → Mitigation: Ship curated examples and sensible defaults; strict pydantic validation with helpful errors.
Potential Solutions
Proposed Solution
- Behavioral summary
- Add a model_registry section in YAML configs that defines named models with: provider, model_id, base_url (optional), capabilities (e.g., function-calling), token limits, RPM/TPM, estimated cost per token, and monthly/ run-level budgets.
- Introduce an LLM Policy Engine that:
- Enforces per-model rate limits and cost budgets across runs and batches.
- Routes to a configured fallback chain on provider/model failure or policy violation.
- Emits structured metrics/events for observability (per-request tokens, spend, throttling).
- API/CLI/config sketch
- YAML (example):
model_registry:
default: gpt-4o
models:
gpt-4o: {provider: openai, model_id: gpt-4o, rpm: 60, tpm: 180000, usd_per_1k_input: 5e-3, usd_per_1k_output: 15e-3, capabilities: [function_calling]}
gemini-pro: {provider: google, model_id: gemini-2.5-pro, rpm: 60, tpm: 200000, usd_per_1k_input: 1e-3, usd_per_1k_output: 2e-3}
local-llm: {provider: openrouter, model_id: qwen2.5-coder, base_url: http://localhost:11434, rpm: 120, tpm: 300000}
policies:
budget_usd_per_run: 5.00
fallback_chain: [gpt-4o, gemini-pro, local-llm]
hard_stop_on_budget: true - CLI:
- --model-registry <path.yaml> to supply/override registry.
- --budget-usd , --rpm-limit , --tpm-limit as overrides.
- YAML (example):
- Error handling and edge cases
- If hard_stop_on_budget is true, fail-fast with actionable error when limit is exceeded.
- If a provider is rate-limited, backoff with tenacity and route to next fallback if configured.
- Validate model capabilities vs. requested features (e.g., function-calling) before selection; fail early with clear guidance if no compatible model is available.
- Compatibility strategy
- Default behavior unchanged unless model_registry is provided.
- Backwards compatible: existing single-model configs continue to work.