Merge pull request #846 from PromtEngineer/devin/1752388447-readme-improvements

Updated README with installation, API, and configuration details
2025-12-06 00:20:19 +01:00 · 2025-07-12 23:48:35 -07:00 · 2025-07-12 23:48:35 -07:00 · 1e42b46683
commit 1e42b46683
parent 0037eec98c 07d68571e9
1 changed files with 265 additions and 56 deletions
--- a/README.md
+++ b/README.md
@ -50,8 +50,12 @@ The architecture is **modular and lightweight**—enable only the components you
 ### 🤖 AI-Powered Chat
 - **Natural Language Queries**: Ask questions in plain English
 - **Source Attribution**: Every answer includes document references
- **Smart Routing**: Automatically chooses the best approach for each query
- **Multiple AI Models**: Support for Ollama, (support for   OpenAI and Hugging Face models in the future)
+- **Smart Routing**: Automatically chooses between RAG and direct LLM responses
+- **Query Decomposition**: Breaks complex queries into sub-questions for better answers
+- **Semantic Caching**: TTL-based caching with similarity matching for faster responses
+- **Session-Aware History**: Maintains conversation context across interactions
+- **Answer Verification**: Independent verification pass for accuracy
+- **Multiple AI Models**: Ollama for inference, HuggingFace for embeddings and reranking


 ### 🛠️ Developer-Friendly
@ -77,12 +81,12 @@ The architecture is **modular and lightweight**—enable only the components you
 - 8GB+ RAM (16GB+ recommended)
 - Ollama (required for both deployment approaches)

-### Option 1: Docker Deployment (Recommended for Production)
+and ### Option 1: Docker Deployment (Recommended for Production)

 ```bash
 # Clone the repository
-git clone https://github.com/yourusername/localgpt.git
-cd localgpt
+git clone https://github.com/PromtEngineer/localGPT.git
+cd localGPT

 # Install Ollama locally (required even for Docker)
 curl -fsSL https://ollama.ai/install.sh | sh
@ -121,6 +125,14 @@ cd localGPT
 # Install Python dependencies
 pip install -r requirements.txt

+# Key dependencies installed:
+# - torch==2.4.1, transformers==4.51.0 (AI models)
+# - lancedb (vector database)
+# - rank_bm25, fuzzywuzzy (search algorithms)
+# - sentence_transformers, rerankers (embedding/reranking)
+# - docling (document processing)
+# - colpali-engine (multimodal processing)
+
 # Install Node.js dependencies
 npm install

@ -137,19 +149,35 @@ python run_system.py
 open http://localhost:3000
 ```

-**Direct Development Management:**
+**System Management:**
 ```bash
 # Check system health (comprehensive diagnostics)
 python system_health_check.py

-# Check service status
+# Check service status and health
 python run_system.py --health

+# Start in production mode
+python run_system.py --mode prod
+
+# Skip frontend (backend + RAG API only)
+python run_system.py --no-frontend
+
+# View aggregated logs
+python run_system.py --logs-only
+
 # Stop all services
 python run_system.py --stop
 # Or press Ctrl+C in the terminal running python run_system.py
 ```

+**Service Architecture:**
+The `run_system.py` launcher manages four key services:
+- **Ollama Server** (port 11434): AI model serving
+- **RAG API Server** (port 8001): Document processing and retrieval
+- **Backend Server** (port 8000): Session management and API endpoints
+- **Frontend Server** (port 3000): React/Next.js web interface
+
 ### Option 3: Manual Component Startup

 ```bash
@ -215,18 +243,23 @@ nano .env

 **Key Configuration Options:**
 ```env
-# AI Models
+# AI Models (referenced in rag_system/main.py)
 OLLAMA_HOST=http://localhost:11434
-DEFAULT_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
-DEFAULT_GENERATION_MODEL=qwen3:0.6b

-# Database
+# Database Paths (used by backend and RAG system)
 DATABASE_PATH=./backend/chat_data.db
 VECTOR_DB_PATH=./lancedb

-# Server Settings
+# Server Settings (used by run_system.py)
 BACKEND_PORT=8000
 FRONTEND_PORT=3000
+RAG_API_PORT=8001
+
+# Optional: Override default models
+GENERATION_MODEL=qwen3:8b
+ENRICHMENT_MODEL=qwen3:0.6b
+EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
+RERANKER_MODEL=answerdotai/answerai-colbert-small-v1
 ```

 #### 4. Initialize the System
@ -334,47 +367,74 @@ print(response.json()['response'])

 ### Model Configuration

-LocalGPT supports multiple AI model providers:
+LocalGPT supports multiple AI model providers with centralized configuration:

-#### Ollama Models (Local)
+#### Ollama Models (Local Inference)
 ```python
 OLLAMA_CONFIG = {
-    'host': 'http://localhost:11434',
-    'generation_model': 'qwen3:0.6b',
-    'embedding_model': 'nomic-embed-text'
+    "host": "http://localhost:11434",
+    "generation_model": "qwen3:8b",        # Main text generation
+    "enrichment_model": "qwen3:0.6b"       # Lightweight routing/enrichment
 }
 ```

-#### Hugging Face Models
+#### External Models (HuggingFace Direct)
 ```python
 EXTERNAL_MODELS = {
-    'embedding': {
-        'Qwen/Qwen3-Embedding-0.6B': {'dimensions': 1024},
-        'Qwen/Qwen3-Embedding-4B': {'dimensions': 2048},
-        'Qwen/Qwen3-Embedding-8B': {'dimensions': 4096}
-    }
+    "embedding_model": "Qwen/Qwen3-Embedding-0.6B",           # 1024 dimensions
+    "reranker_model": "answerdotai/answerai-colbert-small-v1", # ColBERT reranker
+    "vision_model": "Qwen/Qwen-VL-Chat",                      # Multimodal support
+    "fallback_reranker": "BAAI/bge-reranker-base"             # Backup reranker
 }
 ```

-### Processing Configuration
+### Pipeline Configuration

+LocalGPT offers two main pipeline configurations:
+
+#### Default Pipeline (Production-Ready)
 ```python
-PIPELINE_CONFIGS = {
-    'default': {
-        'chunk_size': 512,
-        'chunk_overlap': 64,
-        'retrieval_mode': 'hybrid',
-        'window_size': 5,
-        'enable_enrich': True,
-        'latechunk': True,
-        'docling_chunk': True
+"default": {
+    "description": "Production-ready pipeline with hybrid search, AI reranking, and verification",
+    "storage": {
+        "lancedb_uri": "./lancedb",
+        "text_table_name": "text_pages_v3",
+        "bm25_path": "./index_store/bm25"
    },
-    'fast': {
-        'chunk_size': 256,
-        'chunk_overlap': 32,
-        'retrieval_mode': 'vector',
-        'enable_enrich': False
-    }
+    "retrieval": {
+        "retriever": "multivector",
+        "search_type": "hybrid",
+        "late_chunking": {"enabled": True},
+        "dense": {"enabled": True, "weight": 0.7},
+        "bm25": {"enabled": True}
+    },
+    "reranker": {
+        "enabled": True,
+        "type": "ai",
+        "strategy": "rerankers-lib",
+        "model_name": "answerdotai/answerai-colbert-small-v1",
+        "top_k": 10
+    },
+    "query_decomposition": {"enabled": True, "max_sub_queries": 3},
+    "verification": {"enabled": True},
+    "retrieval_k": 20,
+    "contextual_enricher": {"enabled": True, "window_size": 1}
+}
+```
+
+#### Fast Pipeline (Speed-Optimized)
+```python
+"fast": {
+    "description": "Speed-optimized pipeline with minimal overhead",
+    "retrieval": {
+        "search_type": "vector_only",
+        "late_chunking": {"enabled": False}
+    },
+    "reranker": {"enabled": False},
+    "query_decomposition": {"enabled": False},
+    "verification": {"enabled": False},
+    "retrieval_k": 10,
+    "contextual_enricher": {"enabled": False}
 }
 ```

@ -442,11 +502,27 @@ export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

 ### Getting Help

-1. **Check Logs**: Look at `logs/system.log` for detailed error messages
-2. **System Health**: Run `python system_health_check.py`
-3. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)
-4. **GitHub Issues**: Report bugs and request features
-5. **Community**: Join our Discord/Slack community
+1. **Check Logs**: The system creates structured logs in the `logs/` directory:
+   - `logs/system.log`: Main system events and errors
+   - `logs/ollama.log`: Ollama server logs
+   - `logs/rag-api.log`: RAG API processing logs
+   - `logs/backend.log`: Backend server logs
+   - `logs/frontend.log`: Frontend build and runtime logs
+
+2. **System Health**: Run comprehensive diagnostics:
+   ```bash
+   python system_health_check.py  # Full system diagnostics
+   python run_system.py --health  # Service status check
+   ```
+
+3. **Health Endpoints**: Check individual service health:
+   - Backend: `http://localhost:8000/health`
+   - RAG API: `http://localhost:8001/health`
+   - Ollama: `http://localhost:11434/api/tags`
+
+4. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)
+5. **GitHub Issues**: Report bugs and request features
+6. **Community**: Join our Discord/Slack community

 ---

@ -456,6 +532,19 @@ export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

 #### Chat API
 ```http
+# Session-based chat (recommended)
+POST /sessions/{session_id}/chat
+Content-Type: application/json
+
+{
+  "query": "What are the main topics discussed?",
+  "search_type": "hybrid",
+  "retrieval_k": 20,
+  "ai_rerank": true,
+  "context_window_size": 5
+}
+
+# Legacy chat endpoint
 POST /chat
 Content-Type: application/json

@ -471,34 +560,125 @@ Content-Type: application/json
 ```http
 # Create index
 POST /indexes
-{"name": "My Index", "description": "Description"}
+Content-Type: application/json
+{
+  "name": "My Index",
+  "description": "Description",
+  "config": "default"
+}

-# Upload documents
+# Get all indexes
+GET /indexes
+
+# Get specific index
+GET /indexes/{id}
+
+# Upload documents to index
 POST /indexes/{id}/upload
 Content-Type: multipart/form-data
+files: [file1.pdf, file2.pdf, ...]

-# Build index
+# Build index (process uploaded documents)
 POST /indexes/{id}/build
+Content-Type: application/json
+{
+  "config_mode": "default",
+  "enable_enrich": true,
+  "chunk_size": 512
+}

-# Get index status
-GET /indexes/{id}
+# Delete index
+DELETE /indexes/{id}
 ```

 #### Session Management
 ```http
 # Create session
 POST /sessions
-{"title": "My Session", "model": "qwen3:0.6b"}
+Content-Type: application/json
+{
+  "title": "My Session",
+  "model": "qwen3:0.6b"
+}

-# Get sessions
+# Get all sessions
 GET /sessions

+# Get specific session
+GET /sessions/{session_id}
+
+# Get session documents
+GET /sessions/{session_id}/documents
+
+# Get session indexes
+GET /sessions/{session_id}/indexes
+
 # Link index to session
 POST /sessions/{session_id}/indexes/{index_id}
+
+# Delete session
+DELETE /sessions/{session_id}
+
+# Rename session
+POST /sessions/{session_id}/rename
+Content-Type: application/json
+{
+  "new_title": "Updated Session Name"
+}
 ```

 ### Advanced Features

+#### Query Decomposition
+The system can break complex queries into sub-questions for better answers:
+```http
+POST /sessions/{session_id}/chat
+Content-Type: application/json
+
+{
+  "query": "Compare the methodologies and analyze their effectiveness",
+  "query_decompose": true,
+  "compose_sub_answers": true
+}
+```
+
+#### Answer Verification
+Independent verification pass for accuracy using a separate verification model:
+```http
+POST /sessions/{session_id}/chat
+Content-Type: application/json
+
+{
+  "query": "What are the key findings?",
+  "verify": true
+}
+```
+
+#### Contextual Enrichment
+Document context enrichment during indexing for better understanding:
+```bash
+# Enable during index building
+POST /indexes/{id}/build
+{
+  "enable_enrich": true,
+  "window_size": 2
+}
+```
+
+#### Late Chunking
+Better context preservation by chunking after embedding:
+```bash
+# Configure in pipeline
+"late_chunking": {"enabled": true}
+```
+
+#### Multimodal Support
+Vision model integration for document images and charts:
+```python
+# Configured in EXTERNAL_MODELS
+"vision_model": "Qwen/Qwen-VL-Chat"
+```
+
 #### Streaming Chat
 ```http
 POST /chat/stream
@ -512,7 +692,34 @@ Content-Type: application/json
 ```

 #### Batch Processing
+```bash
+# Using the batch indexing script
+python demo_batch_indexing.py --config batch_indexing_config.json
+
+# Example batch configuration (batch_indexing_config.json):
+{
+  "index_name": "Sample Batch Index",
+  "index_description": "Example batch index configuration",
+  "documents": [
+    "./rag_system/documents/invoice_1039.pdf",
+    "./rag_system/documents/invoice_1041.pdf"
+  ],
+  "processing": {
+    "chunk_size": 512,
+    "chunk_overlap": 64,
+    "enable_enrich": true,
+    "enable_latechunk": true,
+    "enable_docling": true,
+    "embedding_model": "Qwen/Qwen3-Embedding-0.6B",
+    "generation_model": "qwen3:0.6b",
+    "retrieval_mode": "hybrid",
+    "window_size": 2
+  }
+}
+```
+
 ```http
+# API endpoint for batch processing
 POST /batch/index
 Content-Type: application/json

@ -520,7 +727,9 @@ Content-Type: application/json
  "file_paths": ["doc1.pdf", "doc2.pdf"],
  "config": {
    "chunk_size": 512,
-    "enable_enrich": true
+    "enable_enrich": true,
+    "enable_latechunk": true,
+    "enable_docling": true
  }
 }
 ```
@ -539,17 +748,17 @@ graph TB
    API --> Agent[RAG Agent]
    Agent --> Retrieval[Retrieval Pipeline]
    Agent --> Generation[Generation Pipeline]
-    
+
    Retrieval --> Vector[Vector Search]
    Retrieval --> BM25[BM25 Search]
    Retrieval --> Rerank[Reranking]
-    
+
    Vector --> LanceDB[(LanceDB)]
    BM25 --> BM25DB[(BM25 Index)]
-    
+
    Generation --> Ollama[Ollama Models]
    Generation --> HF[Hugging Face Models]
-    
+
    API --> SQLite[(SQLite DB)]
 ```