docs: fix installation instructions and update API documentation

- Fix incorrect repository URL in Docker deployment section - Update API endpoints to match actual backend implementation - Add session-based chat endpoints and management - Document real index management endpoints - Update model configuration with actual OLLAMA_CONFIG and EXTERNAL_MODELS - Replace generic pipeline configs with actual default/fast configurations - Add system launcher documentation and service architecture - Improve health monitoring and logging documentation - Update environment variables to match code implementation Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
2025-12-06 12:20:53 +01:00 · 2025-07-13 06:35:35 +00:00 · 2025-07-13 06:35:35 +00:00 · 7a6d75a74d
commit 7a6d75a74d
parent 0037eec98c
1 changed files with 172 additions and 50 deletions
--- a/README.md
+++ b/README.md
@ -50,8 +50,12 @@ The architecture is **modular and lightweight**—enable only the components you
 ### 🤖 AI-Powered Chat
 - **Natural Language Queries**: Ask questions in plain English
 - **Source Attribution**: Every answer includes document references
- **Smart Routing**: Automatically chooses the best approach for each query
+- **Smart Routing**: Automatically chooses between RAG, direct LLM, or graph queries
- **Multiple AI Models**: Support for Ollama, (support for   OpenAI and Hugging Face models in the future)
+- **Query Decomposition**: Breaks complex queries into sub-questions for better answers
 - **Semantic Caching**: TTL-based caching with similarity matching for faster responses
 - **Session-Aware History**: Maintains conversation context across interactions
 - **Answer Verification**: Independent verification pass for accuracy
 - **Multiple AI Models**: Ollama for inference, HuggingFace for embeddings and reranking
 ### 🛠️ Developer-Friendly
@ -81,8 +85,8 @@ The architecture is **modular and lightweight**—enable only the components you
 ```bash
 # Clone the repository
-git clone https://github.com/yourusername/localgpt.git
+git clone https://github.com/PromtEngineer/localGPT.git
-cd localgpt
+cd localGPT
 # Install Ollama locally (required even for Docker)
 curl -fsSL https://ollama.ai/install.sh | sh
@ -137,19 +141,35 @@ python run_system.py
 open http://localhost:3000
 ```
-**Direct Development Management:**
+**System Management:**
 ```bash
 # Check system health (comprehensive diagnostics)
 python system_health_check.py
-# Check service status
+# Check service status and health
 python run_system.py --health
 # Start in production mode
 python run_system.py --mode prod
 # Skip frontend (backend + RAG API only)
 python run_system.py --no-frontend
 # View aggregated logs
 python run_system.py --logs-only
 # Stop all services
 python run_system.py --stop
 # Or press Ctrl+C in the terminal running python run_system.py
 ```
 **Service Architecture:**
 The `run_system.py` launcher manages four key services:
 - **Ollama Server** (port 11434): AI model serving
 - **RAG API Server** (port 8001): Document processing and retrieval
 - **Backend Server** (port 8000): Session management and API endpoints  
 - **Frontend Server** (port 3000): React/Next.js web interface
 ### Option 3: Manual Component Startup
 ```bash
@ -215,18 +235,23 @@ nano .env
 **Key Configuration Options:**
 ```env
-# AI Models
+# AI Models (referenced in rag_system/main.py)
 OLLAMA_HOST=http://localhost:11434
 DEFAULT_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
 DEFAULT_GENERATION_MODEL=qwen3:0.6b
-# Database
+# Database Paths (used by backend and RAG system)
 DATABASE_PATH=./backend/chat_data.db
 VECTOR_DB_PATH=./lancedb
-# Server Settings
+# Server Settings (used by run_system.py)
 BACKEND_PORT=8000
 FRONTEND_PORT=3000
 RAG_API_PORT=8001
 # Optional: Override default models
 GENERATION_MODEL=qwen3:8b
 ENRICHMENT_MODEL=qwen3:0.6b
 EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
 RERANKER_MODEL=answerdotai/answerai-colbert-small-v1
 ```
 #### 4. Initialize the System
@ -334,47 +359,74 @@ print(response.json()['response'])
 ### Model Configuration
-LocalGPT supports multiple AI model providers:
+LocalGPT supports multiple AI model providers with centralized configuration:
-#### Ollama Models (Local)
+#### Ollama Models (Local Inference)
 ```python
 OLLAMA_CONFIG = {
-    'host': 'http://localhost:11434',
+    "host": "http://localhost:11434",
-    'generation_model': 'qwen3:0.6b',
+    "generation_model": "qwen3:8b",        # Main text generation
-    'embedding_model': 'nomic-embed-text'
+    "enrichment_model": "qwen3:0.6b"       # Lightweight routing/enrichment
 }
 ```
-#### Hugging Face Models
+#### External Models (HuggingFace Direct)
 ```python
 EXTERNAL_MODELS = {
-    'embedding': {
+    "embedding_model": "Qwen/Qwen3-Embedding-0.6B",           # 1024 dimensions
-        'Qwen/Qwen3-Embedding-0.6B': {'dimensions': 1024},
+    "reranker_model": "answerdotai/answerai-colbert-small-v1", # ColBERT reranker
-        'Qwen/Qwen3-Embedding-4B': {'dimensions': 2048},
+    "vision_model": "Qwen/Qwen-VL-Chat",                      # Multimodal support
-        'Qwen/Qwen3-Embedding-8B': {'dimensions': 4096}
+    "fallback_reranker": "BAAI/bge-reranker-base"             # Backup reranker
    }
 }
 ```
-### Processing Configuration
+### Pipeline Configuration
 LocalGPT offers two main pipeline configurations:
 #### Default Pipeline (Production-Ready)
 ```python
-PIPELINE_CONFIGS = {
+"default": {
-    'default': {
+    "description": "Production-ready pipeline with hybrid search, AI reranking, and verification",
-        'chunk_size': 512,
+    "storage": {
-        'chunk_overlap': 64,
+        "lancedb_uri": "./lancedb",
-        'retrieval_mode': 'hybrid',
+        "text_table_name": "text_pages_v3",
-        'window_size': 5,
+        "bm25_path": "./index_store/bm25"
        'enable_enrich': True,
        'latechunk': True,
        'docling_chunk': True
    },
-    'fast': {
+    "retrieval": {
-        'chunk_size': 256,
+        "retriever": "multivector",
-        'chunk_overlap': 32,
+        "search_type": "hybrid",
-        'retrieval_mode': 'vector',
+        "late_chunking": {"enabled": True},
-        'enable_enrich': False
+        "dense": {"enabled": True, "weight": 0.7},
-    }
+        "bm25": {"enabled": True}
    },
    "reranker": {
        "enabled": True,
        "type": "ai",
        "strategy": "rerankers-lib",
        "model_name": "answerdotai/answerai-colbert-small-v1",
        "top_k": 10
    },
    "query_decomposition": {"enabled": True, "max_sub_queries": 3},
    "verification": {"enabled": True},
    "retrieval_k": 20,
    "contextual_enricher": {"enabled": True, "window_size": 1}
 }
 ```
 #### Fast Pipeline (Speed-Optimized)
 ```python
 "fast": {
    "description": "Speed-optimized pipeline with minimal overhead",
    "retrieval": {
        "search_type": "vector_only",
        "late_chunking": {"enabled": False}
    },
    "reranker": {"enabled": False},
    "query_decomposition": {"enabled": False},
    "verification": {"enabled": False},
    "retrieval_k": 10,
    "contextual_enricher": {"enabled": False}
 }
 ```
@ -442,11 +494,27 @@ export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
 ### Getting Help
-1. **Check Logs**: Look at `logs/system.log` for detailed error messages
+1. **Check Logs**: The system creates structured logs in the `logs/` directory:
-2. **System Health**: Run `python system_health_check.py`
+   - `logs/system.log`: Main system events and errors
-3. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)
+   - `logs/ollama.log`: Ollama server logs
-4. **GitHub Issues**: Report bugs and request features
+   - `logs/rag-api.log`: RAG API processing logs
-5. **Community**: Join our Discord/Slack community
+   - `logs/backend.log`: Backend server logs
   - `logs/frontend.log`: Frontend build and runtime logs
 2. **System Health**: Run comprehensive diagnostics:
   ```bash
   python system_health_check.py  # Full system diagnostics
   python run_system.py --health  # Service status check
   ```
 3. **Health Endpoints**: Check individual service health:
   - Backend: `http://localhost:8000/health`
   - RAG API: `http://localhost:8001/health` 
   - Ollama: `http://localhost:11434/api/tags`
 4. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)
 5. **GitHub Issues**: Report bugs and request features
 6. **Community**: Join our Discord/Slack community
 ---
@ -456,6 +524,19 @@ export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
 #### Chat API
 ```http
 # Session-based chat (recommended)
 POST /sessions/{session_id}/chat
 Content-Type: application/json
 {
  "query": "What are the main topics discussed?",
  "search_type": "hybrid",
  "retrieval_k": 20,
  "ai_rerank": true,
  "context_window_size": 5
 }
 # Legacy chat endpoint
 POST /chat
 Content-Type: application/json
@ -471,30 +552,71 @@ Content-Type: application/json
 ```http
 # Create index
 POST /indexes
-{"name": "My Index", "description": "Description"}
+Content-Type: application/json
 {
  "name": "My Index", 
  "description": "Description",
  "config": "default"
 }
-# Upload documents
+# Get all indexes
 GET /indexes
 # Get specific index
 GET /indexes/{id}
 # Upload documents to index
 POST /indexes/{id}/upload
 Content-Type: multipart/form-data
 files: [file1.pdf, file2.pdf, ...]
-# Build index
+# Build index (process uploaded documents)
 POST /indexes/{id}/build
 Content-Type: application/json
 {
  "config_mode": "default",
  "enable_enrich": true,
  "chunk_size": 512
 }
-# Get index status
+# Delete index
-GET /indexes/{id}
+DELETE /indexes/{id}
 ```
 #### Session Management
 ```http
 # Create session
 POST /sessions
-{"title": "My Session", "model": "qwen3:0.6b"}
+Content-Type: application/json
 {
  "title": "My Session", 
  "model": "qwen3:0.6b"
 }
-# Get sessions
+# Get all sessions
 GET /sessions
 # Get specific session
 GET /sessions/{session_id}
 # Get session documents
 GET /sessions/{session_id}/documents
 # Get session indexes
 GET /sessions/{session_id}/indexes
 # Link index to session
 POST /sessions/{session_id}/indexes/{index_id}
 # Delete session
 DELETE /sessions/{session_id}
 # Rename session
 POST /sessions/{session_id}/rename
 Content-Type: application/json
 {
  "new_title": "Updated Session Name"
 }
 ```
 ### Advanced Features