mirror of
https://github.com/zebrajr/localGPT.git
synced 2025-12-06 12:20:53 +01:00
docs: fix installation instructions and update API documentation
- Fix incorrect repository URL in Docker deployment section - Update API endpoints to match actual backend implementation - Add session-based chat endpoints and management - Document real index management endpoints - Update model configuration with actual OLLAMA_CONFIG and EXTERNAL_MODELS - Replace generic pipeline configs with actual default/fast configurations - Add system launcher documentation and service architecture - Improve health monitoring and logging documentation - Update environment variables to match code implementation Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
This commit is contained in:
parent
0037eec98c
commit
7a6d75a74d
222
README.md
222
README.md
|
|
@ -50,8 +50,12 @@ The architecture is **modular and lightweight**—enable only the components you
|
||||||
### 🤖 AI-Powered Chat
|
### 🤖 AI-Powered Chat
|
||||||
- **Natural Language Queries**: Ask questions in plain English
|
- **Natural Language Queries**: Ask questions in plain English
|
||||||
- **Source Attribution**: Every answer includes document references
|
- **Source Attribution**: Every answer includes document references
|
||||||
- **Smart Routing**: Automatically chooses the best approach for each query
|
- **Smart Routing**: Automatically chooses between RAG, direct LLM, or graph queries
|
||||||
- **Multiple AI Models**: Support for Ollama, (support for OpenAI and Hugging Face models in the future)
|
- **Query Decomposition**: Breaks complex queries into sub-questions for better answers
|
||||||
|
- **Semantic Caching**: TTL-based caching with similarity matching for faster responses
|
||||||
|
- **Session-Aware History**: Maintains conversation context across interactions
|
||||||
|
- **Answer Verification**: Independent verification pass for accuracy
|
||||||
|
- **Multiple AI Models**: Ollama for inference, HuggingFace for embeddings and reranking
|
||||||
|
|
||||||
|
|
||||||
### 🛠️ Developer-Friendly
|
### 🛠️ Developer-Friendly
|
||||||
|
|
@ -81,8 +85,8 @@ The architecture is **modular and lightweight**—enable only the components you
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone the repository
|
# Clone the repository
|
||||||
git clone https://github.com/yourusername/localgpt.git
|
git clone https://github.com/PromtEngineer/localGPT.git
|
||||||
cd localgpt
|
cd localGPT
|
||||||
|
|
||||||
# Install Ollama locally (required even for Docker)
|
# Install Ollama locally (required even for Docker)
|
||||||
curl -fsSL https://ollama.ai/install.sh | sh
|
curl -fsSL https://ollama.ai/install.sh | sh
|
||||||
|
|
@ -137,19 +141,35 @@ python run_system.py
|
||||||
open http://localhost:3000
|
open http://localhost:3000
|
||||||
```
|
```
|
||||||
|
|
||||||
**Direct Development Management:**
|
**System Management:**
|
||||||
```bash
|
```bash
|
||||||
# Check system health (comprehensive diagnostics)
|
# Check system health (comprehensive diagnostics)
|
||||||
python system_health_check.py
|
python system_health_check.py
|
||||||
|
|
||||||
# Check service status
|
# Check service status and health
|
||||||
python run_system.py --health
|
python run_system.py --health
|
||||||
|
|
||||||
|
# Start in production mode
|
||||||
|
python run_system.py --mode prod
|
||||||
|
|
||||||
|
# Skip frontend (backend + RAG API only)
|
||||||
|
python run_system.py --no-frontend
|
||||||
|
|
||||||
|
# View aggregated logs
|
||||||
|
python run_system.py --logs-only
|
||||||
|
|
||||||
# Stop all services
|
# Stop all services
|
||||||
python run_system.py --stop
|
python run_system.py --stop
|
||||||
# Or press Ctrl+C in the terminal running python run_system.py
|
# Or press Ctrl+C in the terminal running python run_system.py
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Service Architecture:**
|
||||||
|
The `run_system.py` launcher manages four key services:
|
||||||
|
- **Ollama Server** (port 11434): AI model serving
|
||||||
|
- **RAG API Server** (port 8001): Document processing and retrieval
|
||||||
|
- **Backend Server** (port 8000): Session management and API endpoints
|
||||||
|
- **Frontend Server** (port 3000): React/Next.js web interface
|
||||||
|
|
||||||
### Option 3: Manual Component Startup
|
### Option 3: Manual Component Startup
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -215,18 +235,23 @@ nano .env
|
||||||
|
|
||||||
**Key Configuration Options:**
|
**Key Configuration Options:**
|
||||||
```env
|
```env
|
||||||
# AI Models
|
# AI Models (referenced in rag_system/main.py)
|
||||||
OLLAMA_HOST=http://localhost:11434
|
OLLAMA_HOST=http://localhost:11434
|
||||||
DEFAULT_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
|
|
||||||
DEFAULT_GENERATION_MODEL=qwen3:0.6b
|
|
||||||
|
|
||||||
# Database
|
# Database Paths (used by backend and RAG system)
|
||||||
DATABASE_PATH=./backend/chat_data.db
|
DATABASE_PATH=./backend/chat_data.db
|
||||||
VECTOR_DB_PATH=./lancedb
|
VECTOR_DB_PATH=./lancedb
|
||||||
|
|
||||||
# Server Settings
|
# Server Settings (used by run_system.py)
|
||||||
BACKEND_PORT=8000
|
BACKEND_PORT=8000
|
||||||
FRONTEND_PORT=3000
|
FRONTEND_PORT=3000
|
||||||
|
RAG_API_PORT=8001
|
||||||
|
|
||||||
|
# Optional: Override default models
|
||||||
|
GENERATION_MODEL=qwen3:8b
|
||||||
|
ENRICHMENT_MODEL=qwen3:0.6b
|
||||||
|
EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
|
||||||
|
RERANKER_MODEL=answerdotai/answerai-colbert-small-v1
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 4. Initialize the System
|
#### 4. Initialize the System
|
||||||
|
|
@ -334,47 +359,74 @@ print(response.json()['response'])
|
||||||
|
|
||||||
### Model Configuration
|
### Model Configuration
|
||||||
|
|
||||||
LocalGPT supports multiple AI model providers:
|
LocalGPT supports multiple AI model providers with centralized configuration:
|
||||||
|
|
||||||
#### Ollama Models (Local)
|
#### Ollama Models (Local Inference)
|
||||||
```python
|
```python
|
||||||
OLLAMA_CONFIG = {
|
OLLAMA_CONFIG = {
|
||||||
'host': 'http://localhost:11434',
|
"host": "http://localhost:11434",
|
||||||
'generation_model': 'qwen3:0.6b',
|
"generation_model": "qwen3:8b", # Main text generation
|
||||||
'embedding_model': 'nomic-embed-text'
|
"enrichment_model": "qwen3:0.6b" # Lightweight routing/enrichment
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Hugging Face Models
|
#### External Models (HuggingFace Direct)
|
||||||
```python
|
```python
|
||||||
EXTERNAL_MODELS = {
|
EXTERNAL_MODELS = {
|
||||||
'embedding': {
|
"embedding_model": "Qwen/Qwen3-Embedding-0.6B", # 1024 dimensions
|
||||||
'Qwen/Qwen3-Embedding-0.6B': {'dimensions': 1024},
|
"reranker_model": "answerdotai/answerai-colbert-small-v1", # ColBERT reranker
|
||||||
'Qwen/Qwen3-Embedding-4B': {'dimensions': 2048},
|
"vision_model": "Qwen/Qwen-VL-Chat", # Multimodal support
|
||||||
'Qwen/Qwen3-Embedding-8B': {'dimensions': 4096}
|
"fallback_reranker": "BAAI/bge-reranker-base" # Backup reranker
|
||||||
}
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Processing Configuration
|
### Pipeline Configuration
|
||||||
|
|
||||||
|
LocalGPT offers two main pipeline configurations:
|
||||||
|
|
||||||
|
#### Default Pipeline (Production-Ready)
|
||||||
```python
|
```python
|
||||||
PIPELINE_CONFIGS = {
|
"default": {
|
||||||
'default': {
|
"description": "Production-ready pipeline with hybrid search, AI reranking, and verification",
|
||||||
'chunk_size': 512,
|
"storage": {
|
||||||
'chunk_overlap': 64,
|
"lancedb_uri": "./lancedb",
|
||||||
'retrieval_mode': 'hybrid',
|
"text_table_name": "text_pages_v3",
|
||||||
'window_size': 5,
|
"bm25_path": "./index_store/bm25"
|
||||||
'enable_enrich': True,
|
|
||||||
'latechunk': True,
|
|
||||||
'docling_chunk': True
|
|
||||||
},
|
},
|
||||||
'fast': {
|
"retrieval": {
|
||||||
'chunk_size': 256,
|
"retriever": "multivector",
|
||||||
'chunk_overlap': 32,
|
"search_type": "hybrid",
|
||||||
'retrieval_mode': 'vector',
|
"late_chunking": {"enabled": True},
|
||||||
'enable_enrich': False
|
"dense": {"enabled": True, "weight": 0.7},
|
||||||
}
|
"bm25": {"enabled": True}
|
||||||
|
},
|
||||||
|
"reranker": {
|
||||||
|
"enabled": True,
|
||||||
|
"type": "ai",
|
||||||
|
"strategy": "rerankers-lib",
|
||||||
|
"model_name": "answerdotai/answerai-colbert-small-v1",
|
||||||
|
"top_k": 10
|
||||||
|
},
|
||||||
|
"query_decomposition": {"enabled": True, "max_sub_queries": 3},
|
||||||
|
"verification": {"enabled": True},
|
||||||
|
"retrieval_k": 20,
|
||||||
|
"contextual_enricher": {"enabled": True, "window_size": 1}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Fast Pipeline (Speed-Optimized)
|
||||||
|
```python
|
||||||
|
"fast": {
|
||||||
|
"description": "Speed-optimized pipeline with minimal overhead",
|
||||||
|
"retrieval": {
|
||||||
|
"search_type": "vector_only",
|
||||||
|
"late_chunking": {"enabled": False}
|
||||||
|
},
|
||||||
|
"reranker": {"enabled": False},
|
||||||
|
"query_decomposition": {"enabled": False},
|
||||||
|
"verification": {"enabled": False},
|
||||||
|
"retrieval_k": 10,
|
||||||
|
"contextual_enricher": {"enabled": False}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
@ -442,11 +494,27 @@ export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
|
||||||
|
|
||||||
### Getting Help
|
### Getting Help
|
||||||
|
|
||||||
1. **Check Logs**: Look at `logs/system.log` for detailed error messages
|
1. **Check Logs**: The system creates structured logs in the `logs/` directory:
|
||||||
2. **System Health**: Run `python system_health_check.py`
|
- `logs/system.log`: Main system events and errors
|
||||||
3. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)
|
- `logs/ollama.log`: Ollama server logs
|
||||||
4. **GitHub Issues**: Report bugs and request features
|
- `logs/rag-api.log`: RAG API processing logs
|
||||||
5. **Community**: Join our Discord/Slack community
|
- `logs/backend.log`: Backend server logs
|
||||||
|
- `logs/frontend.log`: Frontend build and runtime logs
|
||||||
|
|
||||||
|
2. **System Health**: Run comprehensive diagnostics:
|
||||||
|
```bash
|
||||||
|
python system_health_check.py # Full system diagnostics
|
||||||
|
python run_system.py --health # Service status check
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Health Endpoints**: Check individual service health:
|
||||||
|
- Backend: `http://localhost:8000/health`
|
||||||
|
- RAG API: `http://localhost:8001/health`
|
||||||
|
- Ollama: `http://localhost:11434/api/tags`
|
||||||
|
|
||||||
|
4. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)
|
||||||
|
5. **GitHub Issues**: Report bugs and request features
|
||||||
|
6. **Community**: Join our Discord/Slack community
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -456,6 +524,19 @@ export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
|
||||||
|
|
||||||
#### Chat API
|
#### Chat API
|
||||||
```http
|
```http
|
||||||
|
# Session-based chat (recommended)
|
||||||
|
POST /sessions/{session_id}/chat
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"query": "What are the main topics discussed?",
|
||||||
|
"search_type": "hybrid",
|
||||||
|
"retrieval_k": 20,
|
||||||
|
"ai_rerank": true,
|
||||||
|
"context_window_size": 5
|
||||||
|
}
|
||||||
|
|
||||||
|
# Legacy chat endpoint
|
||||||
POST /chat
|
POST /chat
|
||||||
Content-Type: application/json
|
Content-Type: application/json
|
||||||
|
|
||||||
|
|
@ -471,30 +552,71 @@ Content-Type: application/json
|
||||||
```http
|
```http
|
||||||
# Create index
|
# Create index
|
||||||
POST /indexes
|
POST /indexes
|
||||||
{"name": "My Index", "description": "Description"}
|
Content-Type: application/json
|
||||||
|
{
|
||||||
|
"name": "My Index",
|
||||||
|
"description": "Description",
|
||||||
|
"config": "default"
|
||||||
|
}
|
||||||
|
|
||||||
# Upload documents
|
# Get all indexes
|
||||||
|
GET /indexes
|
||||||
|
|
||||||
|
# Get specific index
|
||||||
|
GET /indexes/{id}
|
||||||
|
|
||||||
|
# Upload documents to index
|
||||||
POST /indexes/{id}/upload
|
POST /indexes/{id}/upload
|
||||||
Content-Type: multipart/form-data
|
Content-Type: multipart/form-data
|
||||||
|
files: [file1.pdf, file2.pdf, ...]
|
||||||
|
|
||||||
# Build index
|
# Build index (process uploaded documents)
|
||||||
POST /indexes/{id}/build
|
POST /indexes/{id}/build
|
||||||
|
Content-Type: application/json
|
||||||
|
{
|
||||||
|
"config_mode": "default",
|
||||||
|
"enable_enrich": true,
|
||||||
|
"chunk_size": 512
|
||||||
|
}
|
||||||
|
|
||||||
# Get index status
|
# Delete index
|
||||||
GET /indexes/{id}
|
DELETE /indexes/{id}
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Session Management
|
#### Session Management
|
||||||
```http
|
```http
|
||||||
# Create session
|
# Create session
|
||||||
POST /sessions
|
POST /sessions
|
||||||
{"title": "My Session", "model": "qwen3:0.6b"}
|
Content-Type: application/json
|
||||||
|
{
|
||||||
|
"title": "My Session",
|
||||||
|
"model": "qwen3:0.6b"
|
||||||
|
}
|
||||||
|
|
||||||
# Get sessions
|
# Get all sessions
|
||||||
GET /sessions
|
GET /sessions
|
||||||
|
|
||||||
|
# Get specific session
|
||||||
|
GET /sessions/{session_id}
|
||||||
|
|
||||||
|
# Get session documents
|
||||||
|
GET /sessions/{session_id}/documents
|
||||||
|
|
||||||
|
# Get session indexes
|
||||||
|
GET /sessions/{session_id}/indexes
|
||||||
|
|
||||||
# Link index to session
|
# Link index to session
|
||||||
POST /sessions/{session_id}/indexes/{index_id}
|
POST /sessions/{session_id}/indexes/{index_id}
|
||||||
|
|
||||||
|
# Delete session
|
||||||
|
DELETE /sessions/{session_id}
|
||||||
|
|
||||||
|
# Rename session
|
||||||
|
POST /sessions/{session_id}/rename
|
||||||
|
Content-Type: application/json
|
||||||
|
{
|
||||||
|
"new_title": "Updated Session Name"
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Advanced Features
|
### Advanced Features
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue
Block a user