This is a comprehensive organizational knowledge base that centralizes all internal company knowledge, including process documentation, best practices, project archives, and employee expertise. The system provides advanced search capabilities using both traditional keyword search and modern semantic search powered by LLM embeddings.
- Keyword Search: Traditional text matching with fuzzy search
- Semantic Search: AI-powered understanding using sentence embeddings
- Hybrid Search: Combines both approaches for optimal results
- Real-time search suggestions and auto-complete
- Confluence pages and spaces
- Jira tickets and projects
- SharePoint documents
- Network drives and file shares
- Git repositories (README files and wikis)
- Internal SQL databases
- Natural language querying
- Content versioning and history
- User and group-based access control
- Analytics and usage reporting
- Automated content synchronization
- Intelligent content recommendations
- API: FastAPI (Python)
- Database: PostgreSQL with pgvector extension
- Search Engine: Elasticsearch
- Cache: Redis
- ETL: Apache Airflow
- LLM: OpenAI API + Sentence Transformers
- Framework: React.js
- Styling: CSS3 with modern design
- State Management: React Context/Hooks
- HTTP Client: Axios
- Containerization: Docker & Docker Compose
- Orchestration: Kubernetes
- Cloud: AWS/GCP ready
- Infrastructure as Code: Terraform
- Monitoring: Prometheus + Grafana
- Docker and Docker Compose
- Node.js 18+ (for frontend development)
- Python 3.9+ (for backend development)
cd knowledge-base
cp .env.example .env
# Edit .env with your configuration# Start all services
docker-compose up -d
# Check service status
docker-compose ps# Run database migrations
docker-compose exec backend alembic upgrade head
# Create initial admin user
docker-compose exec backend python scripts/create_admin.py- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Airflow: http://localhost:8080
- Elasticsearch: http://localhost:9200
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app:app --reload --host 0.0.0.0 --port 8000cd frontend
npm install
npm startcd etl
# Set up Airflow
export AIRFLOW_HOME=$(pwd)
airflow db init
airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@example.com
airflow webserver -p 8080knowledge-base/
├── README.md
├── requirements.txt
├── docker-compose.yml
├── .env.example
├── config/ # Configuration files
├── backend/ # FastAPI backend
│ ├── models/ # Database models
│ ├── api/ # API endpoints
│ ├── services/ # Business logic
│ └── utils/ # Utility functions
├── etl/ # Data pipeline
│ ├── dags/ # Airflow DAGs
│ ├── extractors/ # Data extractors
│ └── transformers/ # Data transformers
├── frontend/ # React frontend
│ └── src/
│ ├── components/ # React components
│ ├── services/ # API services
│ └── utils/ # Utility functions
├── infrastructure/ # Deployment configs
│ ├── terraform/ # Infrastructure as Code
│ └── kubernetes/ # K8s manifests
└── tests/ # Test suites
├── unit/
├── integration/
└── e2e/
Copy .env.example to .env and configure:
# Database
DATABASE_URL=postgresql://kb_user:kb_password@localhost:5432/knowledge_base
# Elasticsearch
ELASTICSEARCH_URL=http://localhost:9200
# Redis
REDIS_URL=redis://localhost:6379
# LLM API Keys
OPENAI_API_KEY=your_openai_api_key_here
# Data Source Credentials
CONFLUENCE_URL=https://your-company.atlassian.net
CONFLUENCE_USERNAME=your_username
CONFLUENCE_API_TOKEN=your_api_token
JIRA_URL=https://your-company.atlassian.net
JIRA_USERNAME=your_username
JIRA_API_TOKEN=your_api_token
SHAREPOINT_URL=https://your-company.sharepoint.com
SHAREPOINT_CLIENT_ID=your_client_id
SHAREPOINT_CLIENT_SECRET=your_client_secret- Extract: Pull data from various sources (Confluence, Jira, etc.)
- Transform: Clean, normalize, and chunk content
- Load: Store in PostgreSQL and index in Elasticsearch
- Embed: Generate vector embeddings for semantic search
- Full Sync: Weekly (Sundays at 2 AM)
- Incremental Sync: Daily (Every 4 hours)
- Real-time: Webhook-triggered updates
POST /api/search/- Perform search with various typesGET /api/search/suggestions- Get search suggestionsGET /api/search/analytics- Search analytics
GET /api/documents/- List documentsGET /api/documents/{id}- Get document detailsPOST /api/documents/- Create new documentPUT /api/documents/{id}- Update document
POST /api/auth/login- User loginPOST /api/auth/logout- User logoutGET /api/auth/me- Get current user
# Build and deploy
docker-compose -f docker-compose.prod.yml up -d
# Or use Kubernetes
kubectl apply -f infrastructure/kubernetes/- Health checks:
/health - Metrics:
/metrics(Prometheus format) - Logs: Centralized logging with structured JSON
# Backend tests
cd backend
pytest tests/
# Frontend tests
cd frontend
npm test
# Integration tests
docker-compose -f docker-compose.test.yml up --abort-on-container-exit- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
- Infrastructure setup
- Database schema design
- Basic LLM integration
- Embedding service
- Natural language query parser
- Vector search service
- Intelligent query router
- Content generation engine
- Real-time embedding updates
- Performance optimization
- Security hardening
- Production deployment
For questions and support:
- Technical Issues: Create an issue in this repository
- Documentation: Check the
/docsfolder - API Questions: Visit http://localhost:8000/docs
This project is proprietary software for internal company use only.