Skip to content

[Feat]: Upstream Provider Endpoint Health Checks and Docker Networking Documentation #920

@sscargal

Description

@sscargal

Is your feature request related to a problem?

Currently MemMachine only verifies the health of internal containerized services (PostgreSQL, Neo4j, API), but does not probe the connectivity or health of external upstream LLM/embedder providers such as OpenAI, Ollama, AWS Bedrock, etc. This leads to confusing startup failures when the configured endpoint is unreachable, misconfigured, or affected by host/bridge network issues—common on Docker for Linux where 'host.docker.internal' is not mapped by default. As a result, users may experience errors (e.g., 'Name or service not known', 'Connection error') that are difficult to diagnose, resulting in support burden and downtime.

Describe the solution you'd like

  1. Implement provider-aware health checks at startup:
    • Probe the configured upstream embedder/provider endpoint (OpenAI, Ollama, Bedrock, etc.) with a lightweight API call (e.g., /v1/models, /api/tags, or similar endpoint for each provider).
    • On failure, log clear, actionable error messages specifying the endpoint, error, and troubleshooting hints (e.g., network config, Docker caveats).
    • Fail fast on unreachable endpoints and guide the user to correct misconfiguration before proceeding.
  2. Expose provider status via an API or CLI health check mechanism.
  3. Document Docker networking caveats for Linux in the README and startup logs:
    • Explain the difference in Docker network behavior on Linux (lack of 'host.docker.internal' mapping by default).
    • Provide the recommended way to enable host access on Linux (e.g., --add-host=host.docker.internal:host-gateway or explicit IP mapping).
    • Link to Docker documentation and project-specific troubleshooting guides.

Describe alternatives you've considered

  • Continue with current approach and rely on user troubleshooting after startup failures (confusing, high support load).
  • Rely on users to manually check endpoint connectivity before starting MemMachine.
  • Add this as a post-start diagnostic rather than failing startup (less user-friendly, may mask early errors).

Additional context

Reference: #918 and related networking issues with Ollama, OpenAI providers in container setups, especially on Linux. This enhancement will improve supportability and user experience by surfacing configuration/networking errors early and guiding users toward a fix.

Sample implementation logic (pseudocode):

def probe_embedder(config):
    provider = config["provider"]
    endpoint = config["embedder_endpoint"]
    try:
        if provider == "openai":
            resp = requests.get(f"{endpoint}/v1/models", headers={"Authorization": f"Bearer {config['api_key']}"})
        elif provider == "ollama":
            resp = requests.get(f"{endpoint}/tags")
        elif provider == "bedrock":
            # Minimal API call
            pass
        resp.raise_for_status()
        logger.info(f"{provider} endpoint {endpoint} is healthy.")
    except Exception as e:
        logger.error(f"Failed to connect to {provider} endpoint: {endpoint} ({str(e)})")
        return False
    return True

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions