Skip to content

Conversation

@sscargal
Copy link
Contributor

@sscargal sscargal commented Jan 7, 2026

Purpose of the change

This is Part 2 of n of my performance investigation into MemMachine.

To help users tune the Postgres and Neo4j Databases to handle more load when using our Docker Compose environment, this PR:

  • Allows users to configure the Postgres pool_size and max_overflow property values to improve performance
  • Allows users to configure the Neo4j range_index_creation_threshold and vector_index_creation_threshold property values to improve performance

Description

The out-of-the-box settings for Postgres and Neo4j are not 100% suitable for MemMachine, especially when we enable FastAPI workers to handle more inbound traffic - See #903. When we increase the number of workers and inbound connections, the Postgres connection pool is set to 5, but we need more.

Similarly, in Neo4j, indexing occurs only every 10,000 nodes. This causes significant ingestion delays and O(n) lookup times until the indexes are created, as Neo4j does full table scans for each new insertion.

Note: This PR does not change the database defaults. All we do is expose the necessary configuration options to the users so they can change them themselves. We will write some documentation/guidance for this. For example, increasing the Postgres pool_size to 10 and max_overflow to 5 on an 8 vCPU/16GiB system certainly helps with up to 100 users. Similarly, reducing the Neo4j index from 10,000 to 10 or even 1 significantly improves ingestion and lookup speeds.


PostgreSQL Connection Pool Properties

pool_size sets the fixed number of persistent database connections maintained in the pool for reuse, minimizing the expensive overhead of creating new connections for each FastAPI request.

max_overflow defines the maximum number of temporary extra connections allowed beyond pool_size during traffic spikes, enabling the pool to expand dynamically without rejecting requests.

Why Configure for Load Testing

During load tests, high concurrency from multiple FastAPI worker processes can exhaust fixed pools, causing stalls as workers queue indefinitely for connections and reducing parallelism.

Adjustable pool_size and max_overflow ensure workers acquire connections promptly under load, improving throughput and preventing bottlenecks without over-provisioning persistent resources.

This setup scales efficiently for production ASGI apps like FastAPI, balancing performance with resource limits as observed in SQLAlchemy QueuePool behavior.


Neo4j Index Threshold Properties

range_index_creation_threshold determines the minimum number of distinct values required in a property before Neo4j automatically creates a range index during planning, optimizing queries on discrete numeric or temporal data.

vector_index_creation_threshold sets the minimum number of distinct vector embeddings needed for automatic vector index creation, enabling efficient similarity searches in graph embeddings workloads.

Why Configure for Load Testing

Under high-concurrency load tests with FastAPI workers generating dynamic Cypher queries, unset thresholds can delay index creation or force full scans, stalling processes and limiting query parallelism.

Tuning these thresholds ensures automatic indexes form proactively for common access patterns, reducing planning latency and allowing workers to retrieve results faster without manual intervention.

This configuration boosts throughput in production Neo4j deployments by balancing automatic optimization with resource constraints, similar to connection pool tuning.


Fixes/Closes

Fixes #906

Type of change

  • Performance (improves performance without changing functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

  • Performance Test Script (please provide)
  • Manual verification (list step-by-step instructions)

Test Results: [Attach logs, screenshots, or relevant output]

See #837 and #906 for details.

Checklist

  • I have signed the commit(s) within this pull request
  • My code follows the style guidelines of this project (See STYLE_GUIDE.md)
  • I have performed a self-review of my own code
  • I have commented my code
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • Confirmed all checks passed
  • Contributor has signed the commit(s)
  • Reviewed the code
  • Run, Tested, and Verified the change(s) work as expected

Screenshots/Gifs

As above

Further comments

As above

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR exposes configuration options for PostgreSQL connection pooling and Neo4j index creation thresholds to improve performance under load. The changes allow users to tune database behavior without modifying defaults.

Key Changes:

  • Added pool_size and max_overflow configuration fields for PostgreSQL connection pooling
  • Added range_index_creation_threshold and vector_index_creation_threshold configuration fields for Neo4j indexing
  • Updated sample configuration files to demonstrate the new optional settings

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/memmachine/common/configuration/database_conf.py Adds optional configuration fields for PostgreSQL pool settings and Neo4j index thresholds
src/memmachine/common/resource_manager/database_manager.py Conditionally applies the new configuration values when creating database engines and graph stores
sample_configs/episodic_memory_config.gpu.sample Demonstrates the new configuration options with example values
sample_configs/episodic_memory_config.cpu.sample Demonstrates the new configuration options with example values

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sscargal sscargal force-pushed the performance/db-config-tuning branch 3 times, most recently from 56a28cd to 8add149 Compare January 7, 2026 20:46
…erty values to improve performance

Allows users to configure the Neo4
j range_index_creation_threshold and vector_index_creation_threshold property values to improve performance

Signed-off-by: Steve Scargall <37674041+sscargal@users.noreply.github.com>
@sscargal sscargal force-pushed the performance/db-config-tuning branch from 8add149 to cda6b58 Compare January 7, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Neo4j performance warnings: Cartesian product in Cypher queries during load testing

1 participant