mirror of
https://github.com/zebrajr/localGPT.git
synced 2025-12-06 00:20:19 +01:00
- Replaced existing localGPT codebase with multimodal RAG implementation - Includes full-stack application with backend, frontend, and RAG system - Added Docker support and comprehensive documentation - Enhanced with multimodal capabilities for document processing - Preserved git history for localGPT while integrating new functionality
8.8 KiB
8.8 KiB
🐳 LocalGPT Docker Deployment Guide
This guide covers running LocalGPT using Docker containers with local Ollama for optimal performance.
🚀 Quick Start
Complete Setup (5 Minutes)
# 1. Install Ollama locally
curl -fsSL https://ollama.ai/install.sh | sh
# 2. Start Ollama server
ollama serve
# 3. Install required models (in another terminal)
ollama pull qwen3:0.6b
ollama pull qwen3:8b
# 4. Clone and start LocalGPT
git clone https://github.com/your-org/rag-system.git
cd rag-system
./start-docker.sh
# 5. Access the application
open http://localhost:3000
📋 Prerequisites
- Docker Desktop installed and running
- Ollama installed locally (required for best performance)
- 8GB+ RAM (16GB recommended for larger models)
- 10GB+ free disk space
🏗️ Architecture
Current Setup (Local Ollama + Docker Containers)
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │────│ Backend │────│ RAG API │
│ (Container) │ │ (Container) │ │ (Container) │
│ Port: 3000 │ │ Port: 8000 │ │ Port: 8001 │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
│ API calls
▼
┌─────────────────┐
│ Ollama │
│ (Local/Host) │
│ Port: 11434 │
└─────────────────┘
Why Local Ollama?
- ✅ Better performance (direct GPU access)
- ✅ Simpler setup (one less container)
- ✅ Easier model management
- ✅ More reliable connection
🛠️ Container Details
Frontend Container (rag-frontend)
- Image: Custom Node.js 18 build
- Port: 3000
- Purpose: Next.js web interface
- Health Check: HTTP GET to /
- Memory: ~500MB
Backend Container (rag-backend)
- Image: Custom Python 3.11 build
- Port: 8000
- Purpose: Session management, chat history, API gateway
- Health Check: HTTP GET to /health
- Memory: ~300MB
RAG API Container (rag-api)
- Image: Custom Python 3.11 build
- Port: 8001
- Purpose: Document indexing, retrieval, AI processing
- Health Check: HTTP GET to /models
- Memory: ~2GB (varies with model usage)
📂 Volume Mounts & Data
Persistent Data
./lancedb/→ Vector database storage./index_store/→ Document indexes and metadata./shared_uploads/→ Uploaded document files./backend/chat_data.db→ SQLite chat history database
Shared Between Containers
All containers share access to document storage and databases through bind mounts.
🔧 Configuration
Environment Variables (docker.env)
# Ollama Configuration
OLLAMA_HOST=http://host.docker.internal:11434
# Service Configuration
NODE_ENV=production
RAG_API_URL=http://rag-api:8001
NEXT_PUBLIC_API_URL=http://localhost:8000
# Database Paths (inside containers)
DATABASE_PATH=/app/backend/chat_data.db
LANCEDB_PATH=/app/lancedb
UPLOADS_PATH=/app/shared_uploads
Model Configuration
The system uses these models by default:
- Embedding:
Qwen/Qwen3-Embedding-0.6B(1024 dimensions) - Generation:
qwen3:0.6b(fast) orqwen3:8b(high quality) - Reranking: Built-in cross-encoder
🎯 Management Commands
Start/Stop Services
# Start all services
./start-docker.sh
# Stop all services
./start-docker.sh stop
# Restart services
./start-docker.sh stop && ./start-docker.sh
Monitor Services
# Check container status
./start-docker.sh status
docker compose ps
# View live logs
./start-docker.sh logs
docker compose logs -f
# View specific service logs
docker compose logs -f rag-api
docker compose logs -f backend
docker compose logs -f frontend
Manual Docker Compose
# Start manually
docker compose --env-file docker.env up --build -d
# Stop manually
docker compose down
# Rebuild specific service
docker compose build --no-cache rag-api
docker compose up -d rag-api
Health Checks
# Test all endpoints
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
🐞 Debugging
Access Container Shells
# RAG API container (most debugging happens here)
docker compose exec rag-api bash
# Backend container
docker compose exec backend bash
# Frontend container
docker compose exec frontend sh
Common Debug Commands
# Test RAG system initialization
docker compose exec rag-api python -c "
from rag_system.main import get_agent
agent = get_agent('default')
print('✅ RAG System OK')
"
# Test Ollama connection from container
docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
# Check environment variables
docker compose exec rag-api env | grep OLLAMA
# View Python packages
docker compose exec rag-api pip list | grep -E "(torch|transformers|lancedb)"
Resource Monitoring
# Monitor container resources
docker stats
# Check disk usage
docker system df
df -h ./lancedb ./shared_uploads
# Check memory usage by service
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
🚨 Troubleshooting
Common Issues
Container Won't Start
# Check logs for specific error
docker compose logs [service-name]
# Rebuild from scratch
./start-docker.sh stop
docker system prune -f
./start-docker.sh
# Check for port conflicts
lsof -i :3000 -i :8000 -i :8001
Can't Connect to Ollama
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Restart Ollama
pkill ollama
ollama serve
# Test from container
docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
Memory Issues
# Check memory usage
docker stats --no-stream
free -h # On host
# Increase Docker memory limit
# Docker Desktop → Settings → Resources → Memory → 8GB+
# Use smaller models
ollama pull qwen3:0.6b # Instead of qwen3:8b
Frontend Build Errors
# Clean build
docker compose build --no-cache frontend
docker compose up -d frontend
# Check frontend logs
docker compose logs frontend
Database/Storage Issues
# Check file permissions
ls -la backend/chat_data.db
ls -la lancedb/
# Reset permissions
chmod 664 backend/chat_data.db
chmod -R 755 lancedb/ shared_uploads/
# Test database access
docker compose exec backend sqlite3 /app/backend/chat_data.db ".tables"
Performance Issues
Slow Response Times
- Use faster models:
qwen3:0.6binstead ofqwen3:8b - Increase Docker memory allocation
- Ensure SSD storage for databases
- Monitor with
docker stats
High Memory Usage
- Reduce batch sizes in configuration
- Use smaller embedding models
- Clear unused Docker resources:
docker system prune
Complete Reset
# Nuclear option - reset everything
./start-docker.sh stop
docker system prune -a --volumes
rm -rf lancedb/* shared_uploads/* backend/chat_data.db
./start-docker.sh
🏆 Success Criteria
Your Docker deployment is successful when:
- ✅
./start-docker.sh statusshows all containers healthy - ✅ All health checks pass (see commands above)
- ✅ You can access http://localhost:3000
- ✅ You can upload documents and create indexes
- ✅ You can chat with your documents
- ✅ No errors in container logs
Performance Benchmarks
Good Performance:
- Container startup: < 2 minutes
- Index creation: < 2 min per 100MB document
- Query response: < 30 seconds
- Memory usage: < 4GB total containers
Optimal Performance:
- Container startup: < 1 minute
- Index creation: < 1 min per 100MB document
- Query response: < 10 seconds
- Memory usage: < 2GB total containers
📚 Additional Resources
- Detailed Troubleshooting: See
DOCKER_TROUBLESHOOTING.md - Complete Documentation: See
Documentation/docker_usage.md - System Architecture: See
Documentation/architecture_overview.md - Direct Development: See main
README.mdfor non-Docker setup
Happy Dockerizing! 🐳 Need help? Check the troubleshooting guide or open an issue.