rag index ~/projects/myproject
rag ask "How does authentication work?"Result:
Answer:
Authentication is handled by the `verify_token()` function which validates
JWT tokens against the secret key. The function checks token expiration and
signature before allowing access to protected routes.
Sources:
1. src/auth.py (lines 42-58) (score: 0.847)
"def verify_token(token: str) -> bool:\n try:\n payload = jwt.decode(...)"
2. src/middleware.py (lines 12-28) (score: 0.812)
"class AuthMiddleware:\n def __call__(self, request):\n token = request.headers.get('Authorization')"
- 100% local β Your machine. Zero cloud.
- No API keys β Ollama runs locally.
- Stable IDs β
vector_id == chunk_id. No mappings. - Incremental β SHA256 detection. Only changed files.
- Reproducible β Manifest validates config automatically.
| Use Case | Why LocalAI Engine |
|---|---|
| Developers | Query codebases privately. Zero cloud exposure. |
| Researchers | Index sensitive papers. Private knowledge bases. |
| Privacy teams | Self-hosted RAG. No lock-in. No API costs. |
| Offline work | Full RAG without internet. |
| Feature | LocalAI Engine | Frameworks | Managed DBs |
|---|---|---|---|
| Privacy | 100% local | Your setup | Cloud |
| Cost | Free | Free | Subscription |
| Offline | β | β | β |
| Setup | Simple | Moderate | Simple |
| Incremental | β SHA256 | ||
| Stable IDs | β Built-in |
Choose LocalAI Engine for: Privacy, zero costs, sensitive content, offline work.
Prerequisites:
brew install ollama
ollama serve
ollama pull nomic-embed-text
ollama pull llama3.2Install:
git clone https://github.com/Mehdys/localai-engine.git
cd localai-engine
pip install -r requirements.txt
pip install -e .Run:
rag index ~/projects/myproject
rag ask "How does authentication work?"Done.
Files become searchable knowledge:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π Your Files β
β (Code, Docs, Text Files) β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββ
β π Scanner β β Finds indexable files
β (File Finder) β Skips .git, node_modules, etc.
ββββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββββ
β π Extractor β β Reads file content
β (Text/Code/PDF) β Returns: List[Segment]
ββββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββββ
β βοΈ Chunker β β Splits into smart chunks
β (Smart Split) β Returns: List[Chunk]
ββββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββββ
β π§ Embeddings β β Converts to vectors
β (Ollama) β 768-dim vectors
ββββββββββ¬ββββββββ
β
ββββββββββββββ΄βββββββββββββ
βΌ βΌ
ββββββββββββββββ ββββββββββββββββ
β πΎ Database β β π’ FAISS β
β (SQLite) β β (Vectors) β
β β β β
β β’ Metadata β β β’ Fast Search β
β β’ File Info β β β’ HNSW Index β
β β’ Chunk Hash β β β’ Stable IDs β
ββββββββββββββββ ββββββββββββββββ
Indexing Pipeline
Files become searchable in five steps:
π Files
β
β [1] Scan
βΌ
π Scanner
Finds: .py, .md, .txt, .js
Skips: .git/, node_modules/
β
β [2] Extract
βΌ
π Extractor
Reads content
Detects structure
β List[Segment]
β
β [3] Chunk
βΌ
βοΈ Chunker
Smart splitting
Preserves boundaries
β List[Chunk]
β
β [4] Embed
βΌ
π§ Ollama
text β 768-dim vector
(nomic-embed-text)
β
β [5] Store
βΌ
ββββββββββββββββ ββββββββββββββββ
β πΎ SQLite β β π’ FAISS β
β Metadata β β Vectors β
β Chunks β β HNSW Index β
ββββββββββββββββ ββββββββββββββββ
Query Pipeline
Questions become answers in five steps:
β Question
β
β [1] Embed
βΌ
π§ Ollama
question β 768-dim vector
β
β [2] Search
βΌ
π’ FAISS
Find top-5 similar
β [(chunk_id, score)]
β
β [3] Retrieve
βΌ
πΎ SQLite
Get text, path, lines
β RetrievedChunk[]
β
β [4] Build Context
βΌ
π Prompt Builder
question + chunks
β formatted prompt
β
β [5] Generate
βΌ
π€ Ollama LLM
(llama3.2)
β
βΌ
β
Answer + Sources
text + citations
Database Schema
Data flows through a unified SQLite database:
π documents
βββββββββββββββ
β id β
β path (unique)β
β doc_type β
β size_bytes β
ββββββββ¬ββββββββ
β 1:N
β
π doc_versions
βββββββββββββββ
β id β
β document_id ββββ
β sha256 β β tracks changes
β mtime β β
ββββββββ¬ββββββββ β
β 1:N β
β β
βοΈ chunks β
βββββββββββββββ β
β id ββββ (vector_id)
β doc_version β
β chunk_hash β
β content β
β loc_json β
ββββββββ¬βββββββ
β 1:1
β
π§ embeddings
βββββββββββββββ
β chunk_id βββββ
β vector_id βββββ (same as chunk_id)
β model β
β dim (768) β
ββββββββββββββββ
β
βΌ
π’ FAISS Index
(vector_id β 768-dim vector)
Key relationships:
- One document β Many versions (file changes tracked)
- One version β Many chunks (text split)
- One chunk β One embedding (1:1 mapping)
vector_id == chunk_id(stable, no JSON mapping needed)
Type System
Type transformations through the pipeline:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INDEXING FLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Raw File
β
β Extract
βΌ
π¦ Segment
βββββββββββββββββββββββ
β text: str β
β loc: {line_start, β
β line_end} β
ββββββββββββ¬βββββββββββ
β
β Chunk
βΌ
βοΈ Chunk
βββββββββββββββββββββββ
β text: str β
β loc: {line_start, β
β line_end} β
β chunk_hash: SHA256 β
ββββββββββββ¬βββββββββββ
β
β Store
βΌ
πΎ Database + π’ FAISS
(chunk_id = vector_id)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUERY FLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Query
β
β Embed
βΌ
π’ FAISS Search
β
β Retrieve (chunk_id)
βΌ
π― RetrievedChunk
βββββββββββββββββββββββ
β chunk_id: int β
β text: str β
β path: str β
β loc: {line_start, β
β line_end} β
β score: float β
β display_score: 0-1 β
βββββββββββββββββββββββ
Type contracts:
ExtractorsβList[Segment]ChunkersβList[Chunk]FAISS SearchβList[RetrievedChunk]
Index files: extract, chunk, embed, and store.
# Index a directory
rag index ~/projects/myproject
# Index multiple directories
rag index ~/code ~/docs
# Use custom config
rag index ~/projects/myproject --config config.yamlQuery your indexed content.
# Ask a question
rag ask "How does authentication work?"
# Retrieve more chunks
rag ask "What are the main classes?" --top-k 10Start an interactive chat session to query your codebase continuously.
rag chatCommands inside chat:
- Type your question and press Enter
- Type
exit,quit, or:qto leave
Check system health and integrity.
rag validateOutput:
β Ollama reachable at http://localhost:11434
β Embedding model 'nomic-embed-text' found (768 dim)
β LLM model 'llama3.2' found
β Documents index: HNSW with 1,234 vectors
β Database integrity: OK
- No orphan chunks
- Total files: 42
- Total chunks: 1,234
Debug retrieval process.
rag explain "What does the main function do?"Shows:
- Retrieved chunks with scores
- Citation information
- Prompt length
- Which chunks were selected
View index statistics.
rag statsCreate a config.yaml file to customize behavior:
ollama:
base_url: "http://localhost:11434"
embedding_model: "nomic-embed-text"
llm_model: "llama3.2"
timeout: 300
chunking:
text_chunk_size: 900 # Characters per text chunk
text_overlap: 150 # Overlap between chunks
code_chunk_size: 800 # Characters per code chunk
code_overlap: 100 # Overlap between code chunks
indexing:
top_k: 5 # Default number of chunks to retrieve
batch_size: 32 # Embedding batch size
use_hnsw: true # Use HNSW index (faster, approximate)
hnsw_m: 32 # HNSW parameter
hnsw_ef_construction: 200 # HNSW construction parameter
text_extensions:
- ".txt"
- ".md"
code_extensions:
- ".py"
- ".js"
- ".ts"
- ".json"
- ".yaml"
- ".yml"
ignore_patterns:
- ".git/"
- "node_modules/"
- "dist/"
- "build/"
- ".venv/"
- "venv/"
- "__pycache__/"
- "*.pyc"
- "*.lock"Use with --config flag:
rag index /path/to/code --config config.yamlAll data is stored in ~/.rag_data/:
~/.rag_data/
βββ rag.db # Unified SQLite database
β βββ documents # File metadata
β βββ doc_versions # File versioning
β βββ chunks # Text chunks
β βββ embeddings # Embedding metadata
β βββ manifests # Index configuration
βββ faiss.index # FAISS vector index
No JSON mapping files - everything is in the database with stable IDs.
β Core Features
- Local-first RAG pipeline
- Unified SQLite database
- FAISS vector store with stable IDs
- Incremental indexing
- CLI interface with validation and debugging
β Architecture
- Type-safe pipeline contracts (Segment β Chunk β RetrievedChunk)
- File versioning system
- Manifest-based configuration validation
- Modular extractor/chunker architecture
π² Enhanced Features
- Session-based memory for conversational queries
- Multi-index support (separate indexes for different document types)
- Advanced chunking strategies (semantic chunking, hierarchical)
- Batch query processing
- Export/import functionality
π² Developer Experience
- Python SDK/API (beyond CLI)
- Web UI for querying and visualization
- Performance profiling and optimization tools
- Extended test coverage
π² Advanced Capabilities
- Multi-modal support (images, audio transcription)
- Hybrid search (vector + keyword)
- Fine-tuned embedding models
- Distributed indexing for large-scale deployments
- Plugin system for custom extractors/chunkers
π² Enterprise Features
- Multi-user support with access control
- Audit logging
- Backup and restore utilities
- Monitoring and observability dashboards
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Start Ollama if needed
ollama serve# List available models
ollama list
# Pull required models
ollama pull nomic-embed-text
ollama pull llama3.2On macOS:
brew install cmake
pip install faiss-cpu# Run validation
rag validate
# If issues found, you may need to re-index
rag index <paths>The system streams files and batches embeddings to handle large codebases efficiently. If you encounter memory issues:
- Reduce
batch_sizein config - Use HNSW index (default) for better memory efficiency
- Process directories separately