-
Notifications
You must be signed in to change notification settings - Fork 129
Description
Is your feature request related to a problem?
Currently MemMachine only verifies the health of internal containerized services (PostgreSQL, Neo4j, API), but does not probe the connectivity or health of external upstream LLM/embedder providers such as OpenAI, Ollama, AWS Bedrock, etc. This leads to confusing startup failures when the configured endpoint is unreachable, misconfigured, or affected by host/bridge network issues—common on Docker for Linux where 'host.docker.internal' is not mapped by default. As a result, users may experience errors (e.g., 'Name or service not known', 'Connection error') that are difficult to diagnose, resulting in support burden and downtime.
Describe the solution you'd like
- Implement provider-aware health checks at startup:
- Probe the configured upstream embedder/provider endpoint (OpenAI, Ollama, Bedrock, etc.) with a lightweight API call (e.g., /v1/models, /api/tags, or similar endpoint for each provider).
- On failure, log clear, actionable error messages specifying the endpoint, error, and troubleshooting hints (e.g., network config, Docker caveats).
- Fail fast on unreachable endpoints and guide the user to correct misconfiguration before proceeding.
- Expose provider status via an API or CLI health check mechanism.
- Document Docker networking caveats for Linux in the README and startup logs:
- Explain the difference in Docker network behavior on Linux (lack of 'host.docker.internal' mapping by default).
- Provide the recommended way to enable host access on Linux (e.g., --add-host=host.docker.internal:host-gateway or explicit IP mapping).
- Link to Docker documentation and project-specific troubleshooting guides.
Describe alternatives you've considered
- Continue with current approach and rely on user troubleshooting after startup failures (confusing, high support load).
- Rely on users to manually check endpoint connectivity before starting MemMachine.
- Add this as a post-start diagnostic rather than failing startup (less user-friendly, may mask early errors).
Additional context
Reference: #918 and related networking issues with Ollama, OpenAI providers in container setups, especially on Linux. This enhancement will improve supportability and user experience by surfacing configuration/networking errors early and guiding users toward a fix.
Sample implementation logic (pseudocode):
def probe_embedder(config):
provider = config["provider"]
endpoint = config["embedder_endpoint"]
try:
if provider == "openai":
resp = requests.get(f"{endpoint}/v1/models", headers={"Authorization": f"Bearer {config['api_key']}"})
elif provider == "ollama":
resp = requests.get(f"{endpoint}/tags")
elif provider == "bedrock":
# Minimal API call
pass
resp.raise_for_status()
logger.info(f"{provider} endpoint {endpoint} is healthy.")
except Exception as e:
logger.error(f"Failed to connect to {provider} endpoint: {endpoint} ({str(e)})")
return False
return True