docs: fix installation instructions and update API documentation

- Fix incorrect repository URL in Docker deployment section - Update API endpoints to match actual backend implementation - Add session-based chat endpoints and management - Document real index management endpoints - Update model configuration with actual OLLAMA_CONFIG and EXTERNAL_MODELS - Replace generic pipeline configs with actual default/fast configurations - Add system launcher documentation and service architecture - Improve health monitoring and logging documentation - Update environment variables to match code implementation Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
2025-12-06 00:20:19 +01:00 · 2025-07-13 06:35:35 +00:00 · 2025-07-13 06:35:35 +00:00 · 7a6d75a74d
commit 7a6d75a74d
parent 0037eec98c
1 changed files with 172 additions and 50 deletions
--- a/README.md
+++ b/README.md
@ -50,8 +50,12 @@ The architecture is **modular and lightweight**—enable only the components you
 ### 🤖 AI-Powered Chat
 - **Natural Language Queries**: Ask questions in plain English
 - **Source Attribution**: Every answer includes document references
- **Smart Routing**: Automatically chooses the best approach for each query
- **Multiple AI Models**: Support for Ollama, (support for   OpenAI and Hugging Face models in the future)
+- **Smart Routing**: Automatically chooses between RAG, direct LLM, or graph queries
+- **Query Decomposition**: Breaks complex queries into sub-questions for better answers
+- **Semantic Caching**: TTL-based caching with similarity matching for faster responses
+- **Session-Aware History**: Maintains conversation context across interactions
+- **Answer Verification**: Independent verification pass for accuracy
+- **Multiple AI Models**: Ollama for inference, HuggingFace for embeddings and reranking


 ### 🛠️ Developer-Friendly
@ -81,8 +85,8 @@ The architecture is **modular and lightweight**—enable only the components you

 ```bash
 # Clone the repository
-git clone https://github.com/yourusername/localgpt.git
-cd localgpt
+git clone https://github.com/PromtEngineer/localGPT.git
+cd localGPT

 # Install Ollama locally (required even for Docker)
 curl -fsSL https://ollama.ai/install.sh | sh
@ -137,19 +141,35 @@ python run_system.py
 open http://localhost:3000
 ```

-**Direct Development Management:**
+**System Management:**
 ```bash
 # Check system health (comprehensive diagnostics)
 python system_health_check.py

-# Check service status
+# Check service status and health
 python run_system.py --health

+# Start in production mode
+python run_system.py --mode prod
+
+# Skip frontend (backend + RAG API only)
+python run_system.py --no-frontend
+
+# View aggregated logs
+python run_system.py --logs-only
+
 # Stop all services
 python run_system.py --stop
 # Or press Ctrl+C in the terminal running python run_system.py
 ```

+**Service Architecture:**
+The `run_system.py` launcher manages four key services:
+- **Ollama Server** (port 11434): AI model serving
+- **RAG API Server** (port 8001): Document processing and retrieval
+- **Backend Server** (port 8000): Session management and API endpoints  
+- **Frontend Server** (port 3000): React/Next.js web interface
+
 ### Option 3: Manual Component Startup

 ```bash
@ -215,18 +235,23 @@ nano .env

 **Key Configuration Options:**
 ```env
-# AI Models
+# AI Models (referenced in rag_system/main.py)
 OLLAMA_HOST=http://localhost:11434
-DEFAULT_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
-DEFAULT_GENERATION_MODEL=qwen3:0.6b

-# Database
+# Database Paths (used by backend and RAG system)
 DATABASE_PATH=./backend/chat_data.db
 VECTOR_DB_PATH=./lancedb

-# Server Settings
+# Server Settings (used by run_system.py)
 BACKEND_PORT=8000
 FRONTEND_PORT=3000
+RAG_API_PORT=8001
+
+# Optional: Override default models
+GENERATION_MODEL=qwen3:8b
+ENRICHMENT_MODEL=qwen3:0.6b
+EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
+RERANKER_MODEL=answerdotai/answerai-colbert-small-v1
 ```

 #### 4. Initialize the System
@ -334,47 +359,74 @@ print(response.json()['response'])

 ### Model Configuration

-LocalGPT supports multiple AI model providers:
+LocalGPT supports multiple AI model providers with centralized configuration:

-#### Ollama Models (Local)
+#### Ollama Models (Local Inference)
 ```python
 OLLAMA_CONFIG = {
-    'host': 'http://localhost:11434',
-    'generation_model': 'qwen3:0.6b',
-    'embedding_model': 'nomic-embed-text'
+    "host": "http://localhost:11434",
+    "generation_model": "qwen3:8b",        # Main text generation
+    "enrichment_model": "qwen3:0.6b"       # Lightweight routing/enrichment
 }
 ```

-#### Hugging Face Models
+#### External Models (HuggingFace Direct)
 ```python
 EXTERNAL_MODELS = {
-    'embedding': {
-        'Qwen/Qwen3-Embedding-0.6B': {'dimensions': 1024},
-        'Qwen/Qwen3-Embedding-4B': {'dimensions': 2048},
-        'Qwen/Qwen3-Embedding-8B': {'dimensions': 4096}
-    }
+    "embedding_model": "Qwen/Qwen3-Embedding-0.6B",           # 1024 dimensions
+    "reranker_model": "answerdotai/answerai-colbert-small-v1", # ColBERT reranker
+    "vision_model": "Qwen/Qwen-VL-Chat",                      # Multimodal support
+    "fallback_reranker": "BAAI/bge-reranker-base"             # Backup reranker
 }
 ```

-### Processing Configuration
+### Pipeline Configuration

+LocalGPT offers two main pipeline configurations:
+
+#### Default Pipeline (Production-Ready)
 ```python
-PIPELINE_CONFIGS = {
-    'default': {
-        'chunk_size': 512,
-        'chunk_overlap': 64,
-        'retrieval_mode': 'hybrid',
-        'window_size': 5,
-        'enable_enrich': True,
-        'latechunk': True,
-        'docling_chunk': True
+"default": {
+    "description": "Production-ready pipeline with hybrid search, AI reranking, and verification",
+    "storage": {
+        "lancedb_uri": "./lancedb",
+        "text_table_name": "text_pages_v3",
+        "bm25_path": "./index_store/bm25"
    },
-    'fast': {
-        'chunk_size': 256,
-        'chunk_overlap': 32,
-        'retrieval_mode': 'vector',
-        'enable_enrich': False
-    }
+    "retrieval": {
+        "retriever": "multivector",
+        "search_type": "hybrid",
+        "late_chunking": {"enabled": True},
+        "dense": {"enabled": True, "weight": 0.7},
+        "bm25": {"enabled": True}
+    },
+    "reranker": {
+        "enabled": True,
+        "type": "ai",
+        "strategy": "rerankers-lib",
+        "model_name": "answerdotai/answerai-colbert-small-v1",
+        "top_k": 10
+    },
+    "query_decomposition": {"enabled": True, "max_sub_queries": 3},
+    "verification": {"enabled": True},
+    "retrieval_k": 20,
+    "contextual_enricher": {"enabled": True, "window_size": 1}
+}
+```
+
+#### Fast Pipeline (Speed-Optimized)
+```python
+"fast": {
+    "description": "Speed-optimized pipeline with minimal overhead",
+    "retrieval": {
+        "search_type": "vector_only",
+        "late_chunking": {"enabled": False}
+    },
+    "reranker": {"enabled": False},
+    "query_decomposition": {"enabled": False},
+    "verification": {"enabled": False},
+    "retrieval_k": 10,
+    "contextual_enricher": {"enabled": False}
 }
 ```

@ -442,11 +494,27 @@ export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

 ### Getting Help

-1. **Check Logs**: Look at `logs/system.log` for detailed error messages
-2. **System Health**: Run `python system_health_check.py`
-3. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)
-4. **GitHub Issues**: Report bugs and request features
-5. **Community**: Join our Discord/Slack community
+1. **Check Logs**: The system creates structured logs in the `logs/` directory:
+   - `logs/system.log`: Main system events and errors
+   - `logs/ollama.log`: Ollama server logs
+   - `logs/rag-api.log`: RAG API processing logs
+   - `logs/backend.log`: Backend server logs
+   - `logs/frontend.log`: Frontend build and runtime logs
+
+2. **System Health**: Run comprehensive diagnostics:
+   ```bash
+   python system_health_check.py  # Full system diagnostics
+   python run_system.py --health  # Service status check
+   ```
+
+3. **Health Endpoints**: Check individual service health:
+   - Backend: `http://localhost:8000/health`
+   - RAG API: `http://localhost:8001/health` 
+   - Ollama: `http://localhost:11434/api/tags`
+
+4. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)
+5. **GitHub Issues**: Report bugs and request features
+6. **Community**: Join our Discord/Slack community

 ---

@ -456,6 +524,19 @@ export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

 #### Chat API
 ```http
+# Session-based chat (recommended)
+POST /sessions/{session_id}/chat
+Content-Type: application/json
+
+{
+  "query": "What are the main topics discussed?",
+  "search_type": "hybrid",
+  "retrieval_k": 20,
+  "ai_rerank": true,
+  "context_window_size": 5
+}
+
+# Legacy chat endpoint
 POST /chat
 Content-Type: application/json

@ -471,30 +552,71 @@ Content-Type: application/json
 ```http
 # Create index
 POST /indexes
-{"name": "My Index", "description": "Description"}
+Content-Type: application/json
+{
+  "name": "My Index", 
+  "description": "Description",
+  "config": "default"
+}

-# Upload documents
+# Get all indexes
+GET /indexes
+
+# Get specific index
+GET /indexes/{id}
+
+# Upload documents to index
 POST /indexes/{id}/upload
 Content-Type: multipart/form-data
+files: [file1.pdf, file2.pdf, ...]

-# Build index
+# Build index (process uploaded documents)
 POST /indexes/{id}/build
+Content-Type: application/json
+{
+  "config_mode": "default",
+  "enable_enrich": true,
+  "chunk_size": 512
+}

-# Get index status
-GET /indexes/{id}
+# Delete index
+DELETE /indexes/{id}
 ```

 #### Session Management
 ```http
 # Create session
 POST /sessions
-{"title": "My Session", "model": "qwen3:0.6b"}
+Content-Type: application/json
+{
+  "title": "My Session", 
+  "model": "qwen3:0.6b"
+}

-# Get sessions
+# Get all sessions
 GET /sessions

+# Get specific session
+GET /sessions/{session_id}
+
+# Get session documents
+GET /sessions/{session_id}/documents
+
+# Get session indexes
+GET /sessions/{session_id}/indexes
+
 # Link index to session
 POST /sessions/{session_id}/indexes/{index_id}
+
+# Delete session
+DELETE /sessions/{session_id}
+
+# Rename session
+POST /sessions/{session_id}/rename
+Content-Type: application/json
+{
+  "new_title": "Updated Session Name"
+}
 ```

 ### Advanced Features