Skip to content

Mehdys/localai-engine

Repository files navigation

LocalAI Engine

RAG that never leaves your machine.

Private. Fast. Cited.

Python 3.10+ License: MIT Local-First

Try it β€’ How it works β€’ Roadmap


Try it

rag index ~/projects/myproject
rag ask "How does authentication work?"

Result:

Answer:
Authentication is handled by the `verify_token()` function which validates 
JWT tokens against the secret key. The function checks token expiration and 
signature before allowing access to protected routes.

Sources:
  1. src/auth.py (lines 42-58) (score: 0.847)
     "def verify_token(token: str) -> bool:\n    try:\n        payload = jwt.decode(...)"
  
  2. src/middleware.py (lines 12-28) (score: 0.812)
     "class AuthMiddleware:\n    def __call__(self, request):\n        token = request.headers.get('Authorization')"

Core Guarantees

  • 100% local β€” Your machine. Zero cloud.
  • No API keys β€” Ollama runs locally.
  • Stable IDs β€” vector_id == chunk_id. No mappings.
  • Incremental β€” SHA256 detection. Only changed files.
  • Reproducible β€” Manifest validates config automatically.

Who is this for?

Use Case Why LocalAI Engine
Developers Query codebases privately. Zero cloud exposure.
Researchers Index sensitive papers. Private knowledge bases.
Privacy teams Self-hosted RAG. No lock-in. No API costs.
Offline work Full RAG without internet.

How It Compares

Feature LocalAI Engine Frameworks Managed DBs
Privacy 100% local Your setup Cloud
Cost Free Free Subscription
Offline βœ… βœ… ❌
Setup Simple Moderate Simple
Incremental βœ… SHA256 ⚠️ Varies ⚠️ Varies
Stable IDs βœ… Built-in ⚠️ Varies ⚠️ Varies

Choose LocalAI Engine for: Privacy, zero costs, sensitive content, offline work.


Quick Start

Prerequisites:

brew install ollama
ollama serve
ollama pull nomic-embed-text
ollama pull llama3.2

Install:

git clone https://github.com/Mehdys/localai-engine.git
cd localai-engine
pip install -r requirements.txt
pip install -e .

Run:

rag index ~/projects/myproject
rag ask "How does authentication work?"

Done.


Architecture

Files become searchable knowledge:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         πŸ“ Your Files                                β”‚
β”‚                    (Code, Docs, Text Files)                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   πŸ” Scanner    β”‚  ← Finds indexable files
                    β”‚  (File Finder)  β”‚     Skips .git, node_modules, etc.
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  πŸ“„ Extractor   β”‚  ← Reads file content
                    β”‚ (Text/Code/PDF) β”‚     Returns: List[Segment]
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  βœ‚οΈ  Chunker    β”‚  ← Splits into smart chunks
                    β”‚ (Smart Split)  β”‚     Returns: List[Chunk]
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  🧠 Embeddings  β”‚  ← Converts to vectors
                    β”‚    (Ollama)     β”‚     768-dim vectors
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β–Ό                         β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  πŸ’Ύ Database  β”‚         β”‚  πŸ”’ FAISS     β”‚
        β”‚   (SQLite)    β”‚         β”‚  (Vectors)    β”‚
        β”‚               β”‚         β”‚               β”‚
        β”‚ β€’ Metadata    β”‚         β”‚ β€’ Fast Search β”‚
        β”‚ β€’ File Info   β”‚         β”‚ β€’ HNSW Index  β”‚
        β”‚ β€’ Chunk Hash  β”‚         β”‚ β€’ Stable IDs  β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Indexing Pipeline

Files become searchable in five steps:

πŸ“‚ Files
    β”‚
    β”‚ [1] Scan
    β–Ό
πŸ” Scanner
    Finds: .py, .md, .txt, .js
    Skips: .git/, node_modules/
    β”‚
    β”‚ [2] Extract
    β–Ό
πŸ“„ Extractor
    Reads content
    Detects structure
    β†’ List[Segment]
    β”‚
    β”‚ [3] Chunk
    β–Ό
βœ‚οΈ  Chunker
    Smart splitting
    Preserves boundaries
    β†’ List[Chunk]
    β”‚
    β”‚ [4] Embed
    β–Ό
🧠 Ollama
    text β†’ 768-dim vector
    (nomic-embed-text)
    β”‚
    β”‚ [5] Store
    β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ πŸ’Ύ SQLite    β”‚    β”‚ πŸ”’ FAISS     β”‚
    β”‚ Metadata     β”‚    β”‚ Vectors       β”‚
    β”‚ Chunks       β”‚    β”‚ HNSW Index    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Query Pipeline

Questions become answers in five steps:

❓ Question
    β”‚
    β”‚ [1] Embed
    β–Ό
🧠 Ollama
    question β†’ 768-dim vector
    β”‚
    β”‚ [2] Search
    β–Ό
πŸ”’ FAISS
    Find top-5 similar
    β†’ [(chunk_id, score)]
    β”‚
    β”‚ [3] Retrieve
    β–Ό
πŸ’Ύ SQLite
    Get text, path, lines
    β†’ RetrievedChunk[]
    β”‚
    β”‚ [4] Build Context
    β–Ό
πŸ“ Prompt Builder
    question + chunks
    β†’ formatted prompt
    β”‚
    β”‚ [5] Generate
    β–Ό
πŸ€– Ollama LLM
    (llama3.2)
    β”‚
    β–Ό
βœ… Answer + Sources
    text + citations
Database Schema

Data flows through a unified SQLite database:

                    πŸ“„ documents
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ id           β”‚
                    β”‚ path (unique)β”‚
                    β”‚ doc_type     β”‚
                    β”‚ size_bytes   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚ 1:N
                           β”‚
                    πŸ“ doc_versions
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ id           β”‚
                    β”‚ document_id  │──┐
                    β”‚ sha256       β”‚  β”‚ tracks changes
                    β”‚ mtime        β”‚  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                           β”‚ 1:N      β”‚
                           β”‚          β”‚
                    βœ‚οΈ  chunks       β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                    β”‚ id          β”‚β—„β”€β”˜ (vector_id)
                    β”‚ doc_version β”‚
                    β”‚ chunk_hash  β”‚
                    β”‚ content     β”‚
                    β”‚ loc_json    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                           β”‚ 1:1
                           β”‚
                    🧠 embeddings
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ chunk_id     │───┐
                    β”‚ vector_id    β”‚β—„β”€β”€β”˜ (same as chunk_id)
                    β”‚ model        β”‚
                    β”‚ dim (768)    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
                    πŸ”’ FAISS Index
                    (vector_id β†’ 768-dim vector)

Key relationships:

  • One document β†’ Many versions (file changes tracked)
  • One version β†’ Many chunks (text split)
  • One chunk β†’ One embedding (1:1 mapping)
  • vector_id == chunk_id (stable, no JSON mapping needed)
Type System

Type transformations through the pipeline:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    INDEXING FLOW                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“„ Raw File
    β”‚
    β”‚ Extract
    β–Ό
πŸ“¦ Segment
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ text: str           β”‚
    β”‚ loc: {line_start,   β”‚
    β”‚       line_end}     β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚ Chunk
               β–Ό
βœ‚οΈ  Chunk
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ text: str           β”‚
    β”‚ loc: {line_start,   β”‚
    β”‚       line_end}     β”‚
    β”‚ chunk_hash: SHA256  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚ Store
               β–Ό
πŸ’Ύ Database + πŸ”’ FAISS
    (chunk_id = vector_id)


β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    QUERY FLOW                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

❓ User Query
    β”‚
    β”‚ Embed
    β–Ό
πŸ”’ FAISS Search
    β”‚
    β”‚ Retrieve (chunk_id)
    β–Ό
🎯 RetrievedChunk
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ chunk_id: int       β”‚
    β”‚ text: str           β”‚
    β”‚ path: str           β”‚
    β”‚ loc: {line_start,   β”‚
    β”‚       line_end}     β”‚
    β”‚ score: float        β”‚
    β”‚ display_score: 0-1  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Type contracts:

  • Extractors β†’ List[Segment]
  • Chunkers β†’ List[Chunk]
  • FAISS Search β†’ List[RetrievedChunk]

πŸ”§ CLI Reference

rag index <paths...>

Index files: extract, chunk, embed, and store.

# Index a directory
rag index ~/projects/myproject

# Index multiple directories
rag index ~/code ~/docs

# Use custom config
rag index ~/projects/myproject --config config.yaml

rag ask "<question>"

Query your indexed content.

# Ask a question
rag ask "How does authentication work?"

# Retrieve more chunks
rag ask "What are the main classes?" --top-k 10

rag chat

Start an interactive chat session to query your codebase continuously.

rag chat

Commands inside chat:

  • Type your question and press Enter
  • Type exit, quit, or :q to leave

rag validate

Check system health and integrity.

rag validate

Output:

βœ“ Ollama reachable at http://localhost:11434
βœ“ Embedding model 'nomic-embed-text' found (768 dim)
βœ“ LLM model 'llama3.2' found
βœ“ Documents index: HNSW with 1,234 vectors
βœ“ Database integrity: OK
  - No orphan chunks
  - Total files: 42
  - Total chunks: 1,234

rag explain "<question>"

Debug retrieval process.

rag explain "What does the main function do?"

Shows:

  • Retrieved chunks with scores
  • Citation information
  • Prompt length
  • Which chunks were selected

rag stats

View index statistics.

rag stats

βš™οΈ Configuration

Create a config.yaml file to customize behavior:

ollama:
  base_url: "http://localhost:11434"
  embedding_model: "nomic-embed-text"
  llm_model: "llama3.2"
  timeout: 300

chunking:
  text_chunk_size: 900      # Characters per text chunk
  text_overlap: 150          # Overlap between chunks
  code_chunk_size: 800      # Characters per code chunk
  code_overlap: 100          # Overlap between code chunks

indexing:
  top_k: 5                   # Default number of chunks to retrieve
  batch_size: 32             # Embedding batch size
  use_hnsw: true             # Use HNSW index (faster, approximate)
  hnsw_m: 32                 # HNSW parameter
  hnsw_ef_construction: 200  # HNSW construction parameter

text_extensions:
  - ".txt"
  - ".md"

code_extensions:
  - ".py"
  - ".js"
  - ".ts"
  - ".json"
  - ".yaml"
  - ".yml"

ignore_patterns:
  - ".git/"
  - "node_modules/"
  - "dist/"
  - "build/"
  - ".venv/"
  - "venv/"
  - "__pycache__/"
  - "*.pyc"
  - "*.lock"

Use with --config flag:

rag index /path/to/code --config config.yaml

πŸ—„οΈ Data Storage

All data is stored in ~/.rag_data/:

~/.rag_data/
β”œβ”€β”€ rag.db              # Unified SQLite database
β”‚   β”œβ”€β”€ documents       # File metadata
β”‚   β”œβ”€β”€ doc_versions    # File versioning
β”‚   β”œβ”€β”€ chunks          # Text chunks
β”‚   β”œβ”€β”€ embeddings      # Embedding metadata
β”‚   └── manifests       # Index configuration
└── faiss.index         # FAISS vector index

No JSON mapping files - everything is in the database with stable IDs.


πŸ—ΊοΈ Roadmap

v0 (Current)

βœ… Core Features

  • Local-first RAG pipeline
  • Unified SQLite database
  • FAISS vector store with stable IDs
  • Incremental indexing
  • CLI interface with validation and debugging

βœ… Architecture

  • Type-safe pipeline contracts (Segment β†’ Chunk β†’ RetrievedChunk)
  • File versioning system
  • Manifest-based configuration validation
  • Modular extractor/chunker architecture

v1 (Short-term)

πŸ”² Enhanced Features

  • Session-based memory for conversational queries
  • Multi-index support (separate indexes for different document types)
  • Advanced chunking strategies (semantic chunking, hierarchical)
  • Batch query processing
  • Export/import functionality

πŸ”² Developer Experience

  • Python SDK/API (beyond CLI)
  • Web UI for querying and visualization
  • Performance profiling and optimization tools
  • Extended test coverage

v2 (Future)

πŸ”² Advanced Capabilities

  • Multi-modal support (images, audio transcription)
  • Hybrid search (vector + keyword)
  • Fine-tuned embedding models
  • Distributed indexing for large-scale deployments
  • Plugin system for custom extractors/chunkers

πŸ”² Enterprise Features

  • Multi-user support with access control
  • Audit logging
  • Backup and restore utilities
  • Monitoring and observability dashboards

πŸ› Troubleshooting

Ollama Not Running

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

Models Not Found

# List available models
ollama list

# Pull required models
ollama pull nomic-embed-text
ollama pull llama3.2

FAISS Installation Issues

On macOS:

brew install cmake
pip install faiss-cpu

Database Integrity Issues

# Run validation
rag validate

# If issues found, you may need to re-index
rag index <paths>

Large File Handling

The system streams files and batches embeddings to handle large codebases efficiently. If you encounter memory issues:

  • Reduce batch_size in config
  • Use HNSW index (default) for better memory efficiency
  • Process directories separately

About

πŸ”’ Your data never leaves your machine. Local RAG system that lets you query your own docs with Ollama + FAISS.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors