Integrate multimodal RAG codebase

- Replaced existing localGPT codebase with multimodal RAG implementation
- Includes full-stack application with backend, frontend, and RAG system
- Added Docker support and comprehensive documentation
- Enhanced with multimodal capabilities for document processing
- Preserved git history for localGPT while integrating new functionality
This commit is contained in:
PromptEngineer 2025-07-11 00:17:15 -07:00
parent 4e0d9e75e9
commit 2421514f3e
211 changed files with 32131 additions and 123680 deletions

View File

@ -1,4 +0,0 @@
*
!*.py
!requirements.txt
!SOURCE_DOCUMENTS

View File

@ -1,17 +0,0 @@
# http://editorconfig.org
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
[*.{py,rst,ini}]
indent_style = space
indent_size = 4
[*.{html,css,scss,json,yml,xml}]
indent_style = space
indent_size = 2

View File

@ -1,4 +0,0 @@
[flake8]
exclude = docs
max-line-length = 119
extend-ignore = E203

13
.github/FUNDING.yml vendored
View File

@ -1,13 +0,0 @@
# These are supported funding model platforms
github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: promptengineering # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']

63
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@ -0,0 +1,63 @@
---
name: Bug report
about: Create a report to help us improve LocalGPT
title: '[BUG] '
labels: 'bug'
assignees: ''
---
## 🐛 Bug Description
A clear and concise description of what the bug is.
## 🔄 Steps to Reproduce
1. Go to '...'
2. Click on '...'
3. Scroll down to '...'
4. See error
## ✅ Expected Behavior
A clear and concise description of what you expected to happen.
## ❌ Actual Behavior
A clear and concise description of what actually happened.
## 📸 Screenshots
If applicable, add screenshots to help explain your problem.
## 🖥️ Environment Information
**Desktop/Server:**
- OS: [e.g. macOS 13.4, Ubuntu 20.04, Windows 11]
- Python Version: [e.g. 3.11.5]
- Node.js Version: [e.g. 23.10.0]
- Ollama Version: [e.g. 0.9.5]
- Docker Version: [e.g. 24.0.6] (if using Docker)
**Browser (if web interface issue):**
- Browser: [e.g. Chrome, Safari, Firefox]
- Version: [e.g. 118.0.0.0]
## 📋 System Health Check
Please run `python system_health_check.py` and paste the output:
```
[Paste system health check output here]
```
## 📝 Error Logs
Please include relevant error messages or logs:
```
[Paste error logs here]
```
## 🔧 Configuration
- Deployment method: [Docker / Direct Python]
- Models used: [e.g. qwen3:0.6b, qwen3:8b]
- Document types: [e.g. PDF, DOCX, TXT]
## 📎 Additional Context
Add any other context about the problem here.
## 🤔 Possible Solution
If you have ideas for fixing the issue, please share them here.

View File

@ -0,0 +1,50 @@
---
name: Feature request
about: Suggest an idea for LocalGPT
title: '[FEATURE] '
labels: 'enhancement'
assignees: ''
---
## 🚀 Feature Request
### 📝 Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
### 💡 Describe the solution you'd like
A clear and concise description of what you want to happen.
### 🔄 Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
### 🎯 Use Case
Describe the specific use case or scenario where this feature would be valuable:
- Who would use this feature?
- When would they use it?
- How would it improve their workflow?
### 📋 Acceptance Criteria
What would need to be implemented for this feature to be considered complete?
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] Criterion 3
### 🏗️ Implementation Ideas
If you have ideas about how this could be implemented, please share:
- Which components would be affected?
- Any technical considerations?
- Potential challenges?
### 📊 Priority
How important is this feature to you?
- [ ] Critical - Blocking my use case
- [ ] High - Would significantly improve my workflow
- [ ] Medium - Nice to have
- [ ] Low - Minor improvement
### 📎 Additional Context
Add any other context, screenshots, mockups, or examples about the feature request here.
### 🔗 Related Issues
Link any related issues or discussions:

78
.github/pull_request_template.md vendored Normal file
View File

@ -0,0 +1,78 @@
## 📝 Description
Brief description of what this PR does.
Fixes #(issue number) <!-- If applicable -->
## 🎯 Type of Change
- [ ] 🐛 Bug fix (non-breaking change which fixes an issue)
- [ ] ✨ New feature (non-breaking change which adds functionality)
- [ ] 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] 📚 Documentation update
- [ ] 🧪 Test improvements
- [ ] 🔧 Code refactoring
- [ ] 🎨 UI/UX improvements
## 🧪 Testing
### Test Environment
- [ ] Tested with Docker deployment
- [ ] Tested with direct Python deployment
- [ ] Tested on macOS
- [ ] Tested on Linux
- [ ] Tested on Windows
### Test Cases
- [ ] All existing tests pass
- [ ] New tests added for new functionality
- [ ] Manual testing completed
- [ ] System health check passes
```bash
# Commands used for testing
python system_health_check.py
python run_system.py --health
# Add any specific test commands here
```
## 📋 Checklist
### Code Quality
- [ ] Code follows the project's coding standards
- [ ] Self-review of the code completed
- [ ] Code is properly commented
- [ ] Type hints added (Python)
- [ ] No console.log statements left in production code
### Documentation
- [ ] Documentation updated (if applicable)
- [ ] API documentation updated (if applicable)
- [ ] README updated (if applicable)
- [ ] CONTRIBUTING.md guidelines followed
### Dependencies
- [ ] No new dependencies added, or new dependencies are justified
- [ ] requirements.txt updated (if applicable)
- [ ] package.json updated (if applicable)
## 🖥️ Screenshots (if applicable)
Add screenshots to help reviewers understand the changes.
## 📊 Performance Impact
Describe any performance implications:
- [ ] No performance impact
- [ ] Performance improved
- [ ] Performance may be affected (explain below)
## 🔄 Migration Notes
If this is a breaking change, describe what users need to do:
- [ ] No migration needed
- [ ] Migration steps documented below
## 📎 Additional Notes
Any additional information that reviewers should know.

View File

@ -1,19 +0,0 @@
on: [push]
jobs:
precommit:
runs-on: ubuntu-latest
steps:
- name: Check out repository code
uses: actions/checkout@v3
- name: Cache Pre-Commit
uses: actions/cache@v3
with:
path: ~/.cache/pre-commit
key: ${{ runner.os }}-pre-commit-${{ hashFiles('.pre-commit-config.yaml') }}
restore-keys: |
${{ runner.os }}-pre-commit-pip
- name: Install pre-commit
run: pip install -q pre-commit
- name: Run pre-commit
run: pre-commit run --show-diff-on-failure --color=always --all-files

235
.gitignore vendored
View File

@ -1,169 +1,78 @@
# Ignore vscode
/.vscode
/DB
/models
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# dependencies
/node_modules
/.pnp
.pnp.*
.yarn/*
!.yarn/patches
!.yarn/plugins
!.yarn/releases
!.yarn/versions
# C extensions
*.so
# testing
/coverage
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# next.js
/.next/
/out/
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# production
/build
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/
#MacOS
# misc
.DS_Store
SOURCE_DOCUMENTS/.DS_Store
*.pem
# debug
npm-debug.log*
yarn-debug.log*
yarn-error.log*
.pnpm-debug.log*
# env files (can opt-in for committing if needed)
.env*
# vercel
.vercel
# typescript
*.tsbuildinfo
next-env.d.ts
# Python
__pycache__/
*.pyc
# Local Data
/index_store
/shared_uploads
chat_history.db
*.pkl
# Backend generated files
backend/shared_uploads/
# Vector DB artefacts
lancedb/
index_store/overviews/
# Logs and runtime output
logs/
*.log
# SQLite or other database files
*.db
#backend/*.db
# backend/chat_history.db
backend/chroma_db/
backend/chroma_db/**
# Document and user-uploaded files (PDFs, images, etc.)
rag_system/documents/
*.pdf
# Ensure docker.env remains tracked
!docker.env
!backend/chat_data.db

View File

@ -1,49 +0,0 @@
default_stages: [commit]
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-json
- id: check-toml
- id: check-xml
- id: check-yaml
- id: debug-statements
- id: check-builtin-literals
- id: check-case-conflict
- id: detect-private-key
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v3.0.0-alpha.9-for-vscode"
hooks:
- id: prettier
args: ["--tab-width", "2"]
- repo: https://github.com/asottile/pyupgrade
rev: v3.4.0
hooks:
- id: pyupgrade
args: [--py311-plus]
exclude: hooks/
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/PyCQA/flake8
rev: 6.0.0
hooks:
- id: flake8
ci:
autoupdate_schedule: weekly
skip: []
submodules: false

View File

@ -1,17 +0,0 @@
# configure updates globally
# default: all
# allowed: all, insecure, False
update: all
# configure dependency pinning globally
# default: True
# allowed: True, False
pin: True
# add a label to pull requests, default is not set
# requires private repo permissions, even on public repos
# default: empty
label_prs: update
requirements:
- "requirements.txt"

1
3.20.2
View File

@ -1 +0,0 @@
Requirement already satisfied: protobuf in c:\users\kevin\anaconda3\lib\site-packages (4.24.4)

View File

@ -1,10 +0,0 @@
# Acknowledgments
Some code was taken or inspired from other projects:-
- [CookieCutter Django][cookiecutter-django]
- `pre-commit-config.yaml` is taken from there with almost no changes
- `github-actions.yml` is inspired by `gitlab-ci.yml`
- `.pyup.yml`, `.flake8`, `.editorconfig`, `pyproject.toml` are taken from there with minor changes,
[cookiecutter-django]: https://github.com/cookiecutter/cookiecutter-django

View File

@ -1,47 +1,457 @@
# How to Contribute
# Contributing to LocalGPT
Always happy to get issues identified and pull requests!
Thank you for your interest in contributing to LocalGPT! This guide will help you get started with contributing to our private document intelligence platform.
## General considerations
## 🚀 Quick Start for Contributors
1. Keep it small. The smaller the change, the more likely we are to accept.
2. Changes that fix a current issue get priority for review.
3. Check out [GitHub guide][submit-a-pr] if you've never created a pull request before.
### Prerequisites
- Python 3.8+ (we test with 3.11.5)
- Node.js 16+ (we test with 23.10.0)
- Git
- Ollama (for local AI models)
## Getting started
### Development Setup
1. Fork the repo
2. Clone your fork
3. Create a branch for your changes
1. **Fork and Clone**
```bash
# Fork the repository on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/multimodal_rag.git
cd multimodal_rag
# Add upstream remote
git remote add upstream https://github.com/PromtEngineer/multimodal_rag.git
```
This last step is very important, don't start developing from master, it'll cause pain if you need to send another change later.
2. **Set Up Development Environment**
```bash
# Install Python dependencies
pip install -r requirements.txt
# Install Node.js dependencies
npm install
# Install Ollama and models
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen3:0.6b
ollama pull qwen3:8b
```
TIP: If you're working on a GitHub issue, name your branch after the issue number, e.g. `issue-123-<ISSUE-NAME>`. This will help us keep track of what you're working on. If there is not an issue for what you're working on, create one first please. Someone else might be working on the same thing, or we might have a reason for not wanting to do it.
3. **Verify Setup**
```bash
# Run health check
python system_health_check.py
# Start development system
python run_system.py --mode dev
```
## Pre-commit
## 📋 Development Workflow
GitHub Actions is going to run Pre-commit hooks on your PR. If the hooks fail, you will need to fix them before your PR can be merged. It will save you a lot of time if you run the hooks locally before you push your changes. To do that, you need to install pre-commit on your local machine.
### Branch Strategy
```shell
We use a feature branch workflow:
- `main` - Production-ready code
- `docker` - Docker deployment features and documentation
- `feature/*` - New features
- `fix/*` - Bug fixes
- `docs/*` - Documentation updates
### Making Changes
1. **Create a Feature Branch**
```bash
# Update your main branch
git checkout main
git pull upstream main
# Create feature branch
git checkout -b feature/your-feature-name
```
2. **Make Your Changes**
- Follow our [coding standards](#coding-standards)
- Write tests for new functionality
- Update documentation as needed
3. **Test Your Changes**
```bash
# Run health checks
python system_health_check.py
# Test specific components
python -m pytest tests/ -v
# Test system integration
python run_system.py --health
```
4. **Commit Your Changes**
```bash
git add .
git commit -m "feat: add new feature description"
```
5. **Push and Create PR**
```bash
git push origin feature/your-feature-name
# Create pull request on GitHub
```
## 🎯 Types of Contributions
### 🐛 Bug Fixes
- Check existing issues first
- Include reproduction steps
- Add tests to prevent regression
### ✨ New Features
- Discuss in issues before implementing
- Follow existing architecture patterns
- Include comprehensive tests
- Update documentation
### 📚 Documentation
- Fix typos and improve clarity
- Add examples and use cases
- Update API documentation
- Improve setup guides
### 🧪 Testing
- Add unit tests
- Improve integration tests
- Add performance benchmarks
- Test edge cases
## 📝 Coding Standards
### Python Code Style
We follow PEP 8 with some modifications:
```python
# Use type hints
def process_document(file_path: str, config: Dict[str, Any]) -> ProcessingResult:
"""Process a document with the given configuration.
Args:
file_path: Path to the document file
config: Processing configuration dictionary
Returns:
ProcessingResult object with metadata and chunks
"""
pass
# Use descriptive variable names
embedding_model_name = "Qwen/Qwen3-Embedding-0.6B"
retrieval_results = retriever.search(query, top_k=20)
# Use dataclasses for structured data
@dataclass
class IndexingConfig:
embedding_batch_size: int = 50
enable_late_chunking: bool = True
chunk_size: int = 512
```
### TypeScript/React Code Style
```typescript
// Use TypeScript interfaces
interface ChatMessage {
id: string;
content: string;
role: 'user' | 'assistant';
timestamp: Date;
sources?: DocumentSource[];
}
// Use functional components with hooks
const ChatInterface: React.FC<ChatProps> = ({ sessionId }) => {
const [messages, setMessages] = useState<ChatMessage[]>([]);
const handleSendMessage = useCallback(async (content: string) => {
// Implementation
}, [sessionId]);
return (
<div className="chat-interface">
{/* Component JSX */}
</div>
);
};
```
### File Organization
```
rag_system/
├── agent/ # ReAct agent implementation
├── indexing/ # Document processing and indexing
├── retrieval/ # Search and retrieval components
├── pipelines/ # End-to-end processing pipelines
├── rerankers/ # Result reranking implementations
└── utils/ # Shared utilities
src/
├── components/ # React components
├── lib/ # Utility functions and API clients
└── app/ # Next.js app router pages
```
## 🧪 Testing Guidelines
### Unit Tests
```python
# Test file: tests/test_embeddings.py
import pytest
from rag_system.indexing.embedders import HuggingFaceEmbedder
def test_embedding_generation():
embedder = HuggingFaceEmbedder("sentence-transformers/all-MiniLM-L6-v2")
embeddings = embedder.create_embeddings(["test text"])
assert embeddings.shape[0] == 1
assert embeddings.shape[1] == 384 # Model dimension
assert embeddings.dtype == np.float32
```
### Integration Tests
```python
# Test file: tests/test_integration.py
def test_end_to_end_indexing():
"""Test complete document indexing pipeline."""
agent = get_agent("test")
result = agent.index_documents(["test_document.pdf"])
assert result.success
assert len(result.indexed_chunks) > 0
```
### Frontend Tests
```typescript
// Test file: src/components/__tests__/ChatInterface.test.tsx
import { render, screen, fireEvent } from '@testing-library/react';
import { ChatInterface } from '../ChatInterface';
test('sends message when form is submitted', async () => {
render(<ChatInterface sessionId="test-session" />);
const input = screen.getByPlaceholderText('Type your message...');
const button = screen.getByRole('button', { name: /send/i });
fireEvent.change(input, { target: { value: 'test message' } });
fireEvent.click(button);
expect(screen.getByText('test message')).toBeInTheDocument();
});
```
## 📖 Documentation Standards
### Code Documentation
```python
def create_index(
documents: List[str],
config: IndexingConfig,
progress_callback: Optional[Callable[[float], None]] = None
) -> IndexingResult:
"""Create a searchable index from documents.
This function processes documents through the complete indexing pipeline:
1. Text extraction and chunking
2. Embedding generation
3. Vector database storage
4. BM25 index creation
Args:
documents: List of document file paths to index
config: Indexing configuration with model settings and parameters
progress_callback: Optional callback function for progress updates
Returns:
IndexingResult containing success status, metrics, and any errors
Raises:
IndexingError: If document processing fails
ModelLoadError: If embedding model cannot be loaded
Example:
>>> config = IndexingConfig(embedding_batch_size=32)
>>> result = create_index(["doc1.pdf", "doc2.pdf"], config)
>>> print(f"Indexed {result.chunk_count} chunks")
"""
```
### API Documentation
```python
# Use OpenAPI/FastAPI documentation
@app.post("/chat", response_model=ChatResponse)
async def chat_endpoint(request: ChatRequest) -> ChatResponse:
"""Chat with indexed documents.
Send a natural language query and receive an AI-generated response
based on the indexed document collection.
- **query**: The user's question or prompt
- **session_id**: Chat session identifier
- **search_type**: Type of search (vector, hybrid, bm25)
- **retrieval_k**: Number of documents to retrieve
Returns a response with the AI-generated answer and source documents.
"""
```
## 🔧 Development Tools
### Recommended VS Code Extensions
```json
{
"recommendations": [
"ms-python.python",
"ms-python.pylint",
"ms-python.black-formatter",
"bradlc.vscode-tailwindcss",
"esbenp.prettier-vscode",
"ms-vscode.vscode-typescript-next"
]
}
```
### Pre-commit Hooks
```bash
# Install pre-commit
pip install pre-commit
```
Once installed, you need to add the pre-commit hooks to your local repo.
```shell
# Set up hooks
pre-commit install
```
Now, every time you commit, the hooks will run and check your code. If they fail, you will need to fix them before you can commit.
If it happened that you committed changes already without having pre-commit hooks and do not want to reset and recommit again, you can run the following command to run the hooks on your local repo.
```shell
# Run manually
pre-commit run --all-files
```
## Help Us Improve This Documentation
### Development Scripts
```bash
# Lint Python code
python -m pylint rag_system/
If you find that something is missing or have suggestions for improvements, please submit a PR.
# Format Python code
python -m black rag_system/
[submit-a-pr]: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request
# Type check
python -m mypy rag_system/
# Lint TypeScript
npm run lint
# Format TypeScript
npm run format
```
## 🐛 Issue Reporting
### Bug Reports
When reporting bugs, please include:
1. **Environment Information**
```
- OS: macOS 13.4
- Python: 3.11.5
- Node.js: 23.10.0
- Ollama: 0.9.5
```
2. **Steps to Reproduce**
```
1. Start system with `python run_system.py`
2. Upload document via web interface
3. Ask question "What is this document about?"
4. Error occurs during response generation
```
3. **Expected vs Actual Behavior**
4. **Error Messages and Logs**
5. **Screenshots (if applicable)**
### Feature Requests
Include:
- **Use Case**: Why is this feature needed?
- **Proposed Solution**: How should it work?
- **Alternatives**: What other approaches were considered?
- **Additional Context**: Any relevant examples or references
## 📦 Release Process
### Version Numbering
We use semantic versioning (semver):
- `MAJOR.MINOR.PATCH`
- Major: Breaking changes
- Minor: New features (backward compatible)
- Patch: Bug fixes
### Release Checklist
- [ ] All tests pass
- [ ] Documentation updated
- [ ] Version bumped in relevant files
- [ ] Changelog updated
- [ ] Docker images built and tested
- [ ] Release notes prepared
## 🤝 Community Guidelines
### Code of Conduct
- Be respectful and inclusive
- Focus on constructive feedback
- Help others learn and grow
- Maintain professional communication
### Getting Help
- **GitHub Issues**: For bugs and feature requests
- **GitHub Discussions**: For questions and general discussion
- **Documentation**: Check existing docs first
- **Code Review**: Provide thoughtful, actionable feedback
## 🎯 Project Priorities
### Current Focus Areas
1. **Performance Optimization**: Improving indexing and retrieval speed
2. **Model Support**: Adding more embedding and generation models
3. **User Experience**: Enhancing the web interface
4. **Documentation**: Improving setup and usage guides
5. **Testing**: Expanding test coverage
### Architecture Goals
- **Modularity**: Components should be loosely coupled
- **Extensibility**: Easy to add new models and features
- **Performance**: Optimize for speed and memory usage
- **Reliability**: Robust error handling and recovery
- **Privacy**: Keep user data secure and local
## 📚 Additional Resources
### Learning Resources
- [RAG System Architecture Overview](Documentation/architecture_overview.md)
- [API Reference](Documentation/api_reference.md)
- [Deployment Guide](Documentation/deployment_guide.md)
- [Troubleshooting Guide](DOCKER_TROUBLESHOOTING.md)
### External References
- [LangChain Documentation](https://python.langchain.com/)
- [Ollama Documentation](https://ollama.ai/docs)
- [Next.js Documentation](https://nextjs.org/docs)
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
---
## 🙏 Thank You!
Thank you for contributing to LocalGPT! Your contributions help make private document intelligence accessible to everyone.
For questions about contributing, please:
1. Check existing documentation
2. Search existing issues
3. Create a new issue with the `question` label
4. Join our community discussions
Happy coding! 🚀

340
DOCKER_README.md Normal file
View File

@ -0,0 +1,340 @@
# 🐳 LocalGPT Docker Deployment Guide
This guide covers running LocalGPT using Docker containers with local Ollama for optimal performance.
## 🚀 Quick Start
### Complete Setup (5 Minutes)
```bash
# 1. Install Ollama locally
curl -fsSL https://ollama.ai/install.sh | sh
# 2. Start Ollama server
ollama serve
# 3. Install required models (in another terminal)
ollama pull qwen3:0.6b
ollama pull qwen3:8b
# 4. Clone and start LocalGPT
git clone https://github.com/your-org/rag-system.git
cd rag-system
./start-docker.sh
# 5. Access the application
open http://localhost:3000
```
## 📋 Prerequisites
- **Docker Desktop** installed and running
- **Ollama** installed locally (required for best performance)
- **8GB+ RAM** (16GB recommended for larger models)
- **10GB+ free disk space**
## 🏗️ Architecture
### Current Setup (Local Ollama + Docker Containers)
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │────│ Backend │────│ RAG API │
│ (Container) │ │ (Container) │ │ (Container) │
│ Port: 3000 │ │ Port: 8000 │ │ Port: 8001 │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ API calls
┌─────────────────┐
│ Ollama │
│ (Local/Host) │
│ Port: 11434 │
└─────────────────┘
```
**Why Local Ollama?**
- ✅ Better performance (direct GPU access)
- ✅ Simpler setup (one less container)
- ✅ Easier model management
- ✅ More reliable connection
## 🛠️ Container Details
### Frontend Container (rag-frontend)
- **Image**: Custom Node.js 18 build
- **Port**: 3000
- **Purpose**: Next.js web interface
- **Health Check**: HTTP GET to /
- **Memory**: ~500MB
### Backend Container (rag-backend)
- **Image**: Custom Python 3.11 build
- **Port**: 8000
- **Purpose**: Session management, chat history, API gateway
- **Health Check**: HTTP GET to /health
- **Memory**: ~300MB
### RAG API Container (rag-api)
- **Image**: Custom Python 3.11 build
- **Port**: 8001
- **Purpose**: Document indexing, retrieval, AI processing
- **Health Check**: HTTP GET to /models
- **Memory**: ~2GB (varies with model usage)
## 📂 Volume Mounts & Data
### Persistent Data
- `./lancedb/` → Vector database storage
- `./index_store/` → Document indexes and metadata
- `./shared_uploads/` → Uploaded document files
- `./backend/chat_data.db` → SQLite chat history database
### Shared Between Containers
All containers share access to document storage and databases through bind mounts.
## 🔧 Configuration
### Environment Variables (docker.env)
```bash
# Ollama Configuration
OLLAMA_HOST=http://host.docker.internal:11434
# Service Configuration
NODE_ENV=production
RAG_API_URL=http://rag-api:8001
NEXT_PUBLIC_API_URL=http://localhost:8000
# Database Paths (inside containers)
DATABASE_PATH=/app/backend/chat_data.db
LANCEDB_PATH=/app/lancedb
UPLOADS_PATH=/app/shared_uploads
```
### Model Configuration
The system uses these models by default:
- **Embedding**: `Qwen/Qwen3-Embedding-0.6B` (1024 dimensions)
- **Generation**: `qwen3:0.6b` (fast) or `qwen3:8b` (high quality)
- **Reranking**: Built-in cross-encoder
## 🎯 Management Commands
### Start/Stop Services
```bash
# Start all services
./start-docker.sh
# Stop all services
./start-docker.sh stop
# Restart services
./start-docker.sh stop && ./start-docker.sh
```
### Monitor Services
```bash
# Check container status
./start-docker.sh status
docker compose ps
# View live logs
./start-docker.sh logs
docker compose logs -f
# View specific service logs
docker compose logs -f rag-api
docker compose logs -f backend
docker compose logs -f frontend
```
### Manual Docker Compose
```bash
# Start manually
docker compose --env-file docker.env up --build -d
# Stop manually
docker compose down
# Rebuild specific service
docker compose build --no-cache rag-api
docker compose up -d rag-api
```
### Health Checks
```bash
# Test all endpoints
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
```
## 🐞 Debugging
### Access Container Shells
```bash
# RAG API container (most debugging happens here)
docker compose exec rag-api bash
# Backend container
docker compose exec backend bash
# Frontend container
docker compose exec frontend sh
```
### Common Debug Commands
```bash
# Test RAG system initialization
docker compose exec rag-api python -c "
from rag_system.main import get_agent
agent = get_agent('default')
print('✅ RAG System OK')
"
# Test Ollama connection from container
docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
# Check environment variables
docker compose exec rag-api env | grep OLLAMA
# View Python packages
docker compose exec rag-api pip list | grep -E "(torch|transformers|lancedb)"
```
### Resource Monitoring
```bash
# Monitor container resources
docker stats
# Check disk usage
docker system df
df -h ./lancedb ./shared_uploads
# Check memory usage by service
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
```
## 🚨 Troubleshooting
### Common Issues
#### Container Won't Start
```bash
# Check logs for specific error
docker compose logs [service-name]
# Rebuild from scratch
./start-docker.sh stop
docker system prune -f
./start-docker.sh
# Check for port conflicts
lsof -i :3000 -i :8000 -i :8001
```
#### Can't Connect to Ollama
```bash
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Restart Ollama
pkill ollama
ollama serve
# Test from container
docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
```
#### Memory Issues
```bash
# Check memory usage
docker stats --no-stream
free -h # On host
# Increase Docker memory limit
# Docker Desktop → Settings → Resources → Memory → 8GB+
# Use smaller models
ollama pull qwen3:0.6b # Instead of qwen3:8b
```
#### Frontend Build Errors
```bash
# Clean build
docker compose build --no-cache frontend
docker compose up -d frontend
# Check frontend logs
docker compose logs frontend
```
#### Database/Storage Issues
```bash
# Check file permissions
ls -la backend/chat_data.db
ls -la lancedb/
# Reset permissions
chmod 664 backend/chat_data.db
chmod -R 755 lancedb/ shared_uploads/
# Test database access
docker compose exec backend sqlite3 /app/backend/chat_data.db ".tables"
```
### Performance Issues
#### Slow Response Times
- Use faster models: `qwen3:0.6b` instead of `qwen3:8b`
- Increase Docker memory allocation
- Ensure SSD storage for databases
- Monitor with `docker stats`
#### High Memory Usage
- Reduce batch sizes in configuration
- Use smaller embedding models
- Clear unused Docker resources: `docker system prune`
### Complete Reset
```bash
# Nuclear option - reset everything
./start-docker.sh stop
docker system prune -a --volumes
rm -rf lancedb/* shared_uploads/* backend/chat_data.db
./start-docker.sh
```
## 🏆 Success Criteria
Your Docker deployment is successful when:
- ✅ `./start-docker.sh status` shows all containers healthy
- ✅ All health checks pass (see commands above)
- ✅ You can access http://localhost:3000
- ✅ You can upload documents and create indexes
- ✅ You can chat with your documents
- ✅ No errors in container logs
### Performance Benchmarks
**Good Performance:**
- Container startup: < 2 minutes
- Index creation: < 2 min per 100MB document
- Query response: < 30 seconds
- Memory usage: < 4GB total containers
**Optimal Performance:**
- Container startup: < 1 minute
- Index creation: < 1 min per 100MB document
- Query response: < 10 seconds
- Memory usage: < 2GB total containers
## 📚 Additional Resources
- **Detailed Troubleshooting**: See `DOCKER_TROUBLESHOOTING.md`
- **Complete Documentation**: See `Documentation/docker_usage.md`
- **System Architecture**: See `Documentation/architecture_overview.md`
- **Direct Development**: See main `README.md` for non-Docker setup
---
**Happy Dockerizing! 🐳** Need help? Check the troubleshooting guide or open an issue.

604
DOCKER_TROUBLESHOOTING.md Normal file
View File

@ -0,0 +1,604 @@
# 🐳 Docker Troubleshooting Guide - LocalGPT
_Last updated: 2025-01-07_
This guide helps diagnose and fix Docker-related issues with LocalGPT's containerized deployment.
---
## 🏁 Quick Health Check
### System Status Check
```bash
# Check Docker daemon
docker version
# Check Ollama status
curl http://localhost:11434/api/tags
# Check containers
./start-docker.sh status
# Test all endpoints
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
```
### Expected Success Output
```
✅ Frontend OK
✅ Backend OK
✅ RAG API OK
✅ Ollama OK
```
---
## 🚨 Common Issues & Solutions
### 1. Docker Daemon Issues
#### Problem: "Cannot connect to Docker daemon"
```
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
```
#### Solution A: Restart Docker Desktop (macOS/Windows)
```bash
# Quit Docker Desktop completely
# macOS: Click Docker icon → "Quit Docker Desktop"
# Windows: Right-click Docker icon → "Quit Docker Desktop"
# Wait for it to fully shut down
sleep 10
# Start Docker Desktop
open -a Docker # macOS
# Windows: Click Docker Desktop from Start menu
# Wait for Docker to be ready (2-3 minutes)
docker version
```
#### Solution B: Linux Docker Service
```bash
# Check Docker service status
sudo systemctl status docker
# Restart Docker service
sudo systemctl restart docker
# Enable auto-start
sudo systemctl enable docker
# Test connection
docker version
```
#### Solution C: Hard Reset
```bash
# Kill all Docker processes
sudo pkill -f docker
# Remove socket files
sudo rm -f /var/run/docker.sock
sudo rm -f /Users/prompt/.docker/run/docker.sock # macOS
# Restart Docker Desktop
open -a Docker # macOS
```
### 2. Ollama Connection Issues
#### Problem: RAG API can't connect to Ollama
```
ConnectionError: Failed to connect to Ollama at http://host.docker.internal:11434
```
#### Solution A: Verify Ollama is Running
```bash
# Check if Ollama is running
curl http://localhost:11434/api/tags
# If not running, start it
ollama serve
# Install required models
ollama pull qwen3:0.6b
ollama pull qwen3:8b
```
#### Solution B: Test from Container
```bash
# Test Ollama connection from RAG API container
docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
# If this fails, check Docker network settings
docker network ls
docker network inspect rag_system_old_default
```
#### Solution C: Alternative Ollama Host
```bash
# Edit docker.env to use different host
echo "OLLAMA_HOST=http://172.17.0.1:11434" >> docker.env
# Or use IP address
echo "OLLAMA_HOST=http://$(ipconfig getifaddr en0):11434" >> docker.env # macOS
```
### 3. Container Build Failures
#### Problem: Frontend build fails
```
ERROR: Failed to build frontend container
```
#### Solution: Clean Build
```bash
# Stop containers
./start-docker.sh stop
# Clean Docker cache
docker system prune -f
docker builder prune -f
# Rebuild frontend only
docker compose build --no-cache frontend
docker compose up -d frontend
# Check logs
docker compose logs frontend
```
#### Problem: Python package installation fails
```
ERROR: Could not install packages due to an EnvironmentError
```
#### Solution: Update Dependencies
```bash
# Check requirements file exists
ls -la requirements-docker.txt
# Test package installation locally
pip install -r requirements-docker.txt --dry-run
# Rebuild with updated base image
docker compose build --no-cache --pull rag-api
```
### 4. Port Conflicts
#### Problem: "Port already in use"
```
Error starting userland proxy: listen tcp4 0.0.0.0:3000: bind: address already in use
```
#### Solution: Find and Kill Conflicting Processes
```bash
# Check what's using the ports
lsof -i :3000 -i :8000 -i :8001
# Kill specific processes
pkill -f "npm run dev" # Frontend
pkill -f "server.py" # Backend
pkill -f "api_server" # RAG API
# Or kill by port
sudo kill -9 $(lsof -t -i:3000)
sudo kill -9 $(lsof -t -i:8000)
sudo kill -9 $(lsof -t -i:8001)
# Restart containers
./start-docker.sh
```
### 5. Memory Issues
#### Problem: Containers crash due to OOM (Out of Memory)
```
Container killed due to memory limit
```
#### Solution: Increase Docker Memory
```bash
# Check current memory usage
docker stats --no-stream
# Increase Docker Desktop memory allocation
# Docker Desktop → Settings → Resources → Memory → 8GB+
# Monitor memory usage
docker stats
# Use smaller models if needed
ollama pull qwen3:0.6b # Instead of qwen3:8b
```
#### Problem: System running slow
```bash
# Check host memory
free -h # Linux
vm_stat # macOS
# Clean up Docker resources
docker system prune -f
docker volume prune -f
```
### 6. Volume Mount Issues
#### Problem: Permission denied accessing files
```
Permission denied: /app/lancedb
```
#### Solution: Fix Permissions
```bash
# Create directories if they don't exist
mkdir -p lancedb index_store shared_uploads backend
# Fix permissions
chmod -R 755 lancedb index_store shared_uploads
chmod 664 backend/chat_data.db
# Check ownership
ls -la lancedb/ shared_uploads/ backend/
# Reset permissions if needed
sudo chown -R $USER:$USER lancedb shared_uploads backend
```
#### Problem: Database file not found
```
No such file or directory: '/app/backend/chat_data.db'
```
#### Solution: Initialize Database
```bash
# Create empty database file
touch backend/chat_data.db
# Or initialize with schema
python -c "
from backend.database import ChatDatabase
db = ChatDatabase()
db.init_database()
print('Database initialized')
"
# Restart containers
./start-docker.sh stop
./start-docker.sh
```
---
## 🔍 Advanced Debugging
### Container-Level Debugging
#### Access Container Shells
```bash
# RAG API container (most issues happen here)
docker compose exec rag-api bash
# Check environment variables
docker compose exec rag-api env | grep -E "(OLLAMA|RAG|NODE)"
# Test Python imports
docker compose exec rag-api python -c "
import sys
print('Python version:', sys.version)
from rag_system.main import get_agent
print('✅ RAG system imports work')
"
# Backend container
docker compose exec backend bash
python -c "
from backend.database import ChatDatabase
print('✅ Database imports work')
"
# Frontend container
docker compose exec frontend sh
npm --version
node --version
```
#### Check Container Resources
```bash
# Monitor real-time resource usage
docker stats
# Check individual container health
docker compose ps
docker inspect rag-api --format='{{.State.Health.Status}}'
# View container configurations
docker compose config
```
#### Network Debugging
```bash
# Check network connectivity
docker compose exec rag-api ping backend
docker compose exec backend ping rag-api
docker compose exec rag-api ping host.docker.internal
# Check DNS resolution
docker compose exec rag-api nslookup host.docker.internal
# Test HTTP connections
docker compose exec rag-api curl -v http://backend:8000/health
docker compose exec rag-api curl -v http://host.docker.internal:11434/api/tags
```
### Log Analysis
#### Container Logs
```bash
# View all logs
./start-docker.sh logs
# Follow specific service logs
docker compose logs -f rag-api
docker compose logs -f backend
docker compose logs -f frontend
# Search for errors
docker compose logs rag-api 2>&1 | grep -i error
docker compose logs backend 2>&1 | grep -i "traceback\|error"
# Save logs to file
docker compose logs > docker-debug.log 2>&1
```
#### System Logs
```bash
# Docker daemon logs (Linux)
journalctl -u docker.service -f
# macOS: Check Console app for Docker logs
# Windows: Check Event Viewer
```
---
## 🧪 Testing & Validation
### Manual Container Testing
#### Test Individual Containers
```bash
# Test RAG API alone
docker build -f Dockerfile.rag-api -t test-rag-api .
docker run --rm -p 8001:8001 -e OLLAMA_HOST=http://host.docker.internal:11434 test-rag-api &
sleep 30
curl http://localhost:8001/models
pkill -f test-rag-api
# Test Backend alone
docker build -f Dockerfile.backend -t test-backend .
docker run --rm -p 8000:8000 test-backend &
sleep 30
curl http://localhost:8000/health
pkill -f test-backend
```
#### Integration Testing
```bash
# Full system test
./start-docker.sh
# Wait for all services to be ready
sleep 60
# Test complete workflow
curl -X POST http://localhost:8000/sessions \
-H "Content-Type: application/json" \
-d '{"title": "Test Session"}'
# Test document upload (if you have a test PDF)
# curl -X POST http://localhost:8000/upload -F "file=@test.pdf"
# Clean up
./start-docker.sh stop
```
### Automated Testing Script
Create `test-docker-health.sh`:
```bash
#!/bin/bash
set -e
echo "🐳 Docker Health Test Starting..."
# Start containers
./start-docker.sh
# Wait for services
echo "⏳ Waiting for services to start..."
sleep 60
# Test endpoints
echo "🔍 Testing endpoints..."
curl -f http://localhost:3000 && echo "✅ Frontend OK" || echo "❌ Frontend FAIL"
curl -f http://localhost:8000/health && echo "✅ Backend OK" || echo "❌ Backend FAIL"
curl -f http://localhost:8001/models && echo "✅ RAG API OK" || echo "❌ RAG API FAIL"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK" || echo "❌ Ollama FAIL"
# Test container health
echo "🔍 Checking container health..."
docker compose ps
echo "🎉 Health test complete!"
```
---
## 🔄 Recovery Procedures
### Complete System Reset
#### Soft Reset
```bash
# Stop containers
./start-docker.sh stop
# Clean up Docker resources
docker system prune -f
# Restart containers
./start-docker.sh
```
#### Hard Reset (⚠️ Deletes all data)
```bash
# Stop everything
./start-docker.sh stop
# Remove all containers, images, and volumes
docker system prune -a --volumes
# Remove local data (CAUTION: This deletes all your documents and chat history)
rm -rf lancedb/* shared_uploads/* backend/chat_data.db
# Rebuild from scratch
./start-docker.sh
```
#### Selective Reset
Reset only specific components:
```bash
# Reset just the database
./start-docker.sh stop
rm backend/chat_data.db
./start-docker.sh
# Reset just vector storage
./start-docker.sh stop
rm -rf lancedb/*
./start-docker.sh
# Reset just uploaded documents
rm -rf shared_uploads/*
```
---
## 📊 Performance Optimization
### Resource Monitoring
```bash
# Monitor containers continuously
watch -n 5 'docker stats --no-stream'
# Check disk usage
docker system df
du -sh lancedb shared_uploads backend
# Monitor host resources
htop # Linux
top # macOS/Windows
```
### Performance Tuning
```bash
# Use smaller models for better performance
ollama pull qwen3:0.6b # Instead of qwen3:8b
# Reduce Docker memory if needed
# Docker Desktop → Settings → Resources → Memory
# Clean up regularly
docker system prune -f
docker volume prune -f
```
---
## 🆘 When All Else Fails
### Alternative Deployment Options
#### 1. Direct Development (No Docker)
```bash
# Stop Docker containers
./start-docker.sh stop
# Use direct development instead
python run_system.py
```
#### 2. Minimal Docker (RAG API only)
```bash
# Run only RAG API in Docker
docker build -f Dockerfile.rag-api -t rag-api .
docker run -p 8001:8001 rag-api
# Run other components directly
cd backend && python server.py &
npm run dev
```
#### 3. Hybrid Approach
```bash
# Run some services in Docker, others directly
docker compose up -d rag-api
cd backend && python server.py &
npm run dev
```
### Getting Help
#### Diagnostic Information to Collect
```bash
# System information
docker version
docker compose version
uname -a
# Container information
docker compose ps
docker compose config
# Resource information
docker stats --no-stream
docker system df
# Error logs
docker compose logs > docker-errors.log 2>&1
```
#### Support Channels
1. **Check GitHub Issues**: Search existing issues for similar problems
2. **Documentation**: Review the complete documentation in `Documentation/`
3. **Create Issue**: Include diagnostic information above
---
## ✅ Success Checklist
Your Docker deployment is working correctly when:
- ✅ `docker version` shows Docker is running
- ✅ `curl http://localhost:11434/api/tags` shows Ollama is accessible
- ✅ `./start-docker.sh status` shows all containers healthy
- ✅ All health check URLs return 200 OK
- ✅ You can access the frontend at http://localhost:3000
- ✅ You can create document indexes successfully
- ✅ You can chat with your documents
- ✅ No error messages in container logs
**If all boxes are checked, your Docker deployment is successful! 🎉**
---
**Still having issues?** Check the main `DOCKER_README.md` or create an issue with your diagnostic information.

View File

@ -1,21 +0,0 @@
# syntax=docker/dockerfile:1
# Build as `docker build . -t localgpt`, requires BuildKit.
# Run as `docker run -it --mount src="$HOME/.cache",target=/root/.cache,type=bind --gpus=all localgpt`, requires Nvidia container toolkit.
FROM nvidia/cuda:11.7.1-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y software-properties-common
RUN apt-get install -y g++-11 make python3 python-is-python3 pip
# only copy what's needed at every step to optimize layer cache
COPY ./requirements.txt .
# use BuildKit cache mount to drastically reduce redownloading from pip on repeated builds
RUN --mount=type=cache,target=/root/.cache CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --timeout 100 -r requirements.txt llama-cpp-python==0.1.83
COPY SOURCE_DOCUMENTS ./SOURCE_DOCUMENTS
COPY ingest.py constants.py ./
# Docker BuildKit does not support GPU during *docker build* time right now, only during *docker run*.
# See <https://github.com/moby/buildkit/issues/1436>.
# If this changes in the future you can `docker build --build-arg device_type=cuda . -t localgpt` (+GPU argument to be determined).
ARG device_type=cpu
RUN --mount=type=cache,target=/root/.cache python ingest.py --device_type $device_type
COPY . .
ENV device_type=cuda
CMD python run_localGPT.py --device_type $device_type

31
Dockerfile.backend Normal file
View File

@ -0,0 +1,31 @@
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements and install Python dependencies (using Docker-specific requirements)
COPY requirements-docker.txt ./requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Copy backend code and dependencies
COPY backend/ ./backend/
COPY rag_system/ ./rag_system/
# Create necessary directories
RUN mkdir -p shared_uploads logs
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Run the backend server
WORKDIR /app/backend
CMD ["python", "server.py"]

31
Dockerfile.frontend Normal file
View File

@ -0,0 +1,31 @@
FROM node:18-alpine
# Set working directory
WORKDIR /app
# Install dependencies (including dev dependencies for build)
COPY package.json package-lock.json ./
RUN npm ci
# Copy source code and configuration files
COPY src/ ./src/
COPY public/ ./public/
COPY next.config.ts ./
COPY tsconfig.json ./
COPY tailwind.config.js ./
COPY postcss.config.mjs ./
COPY eslint.config.mjs ./
# Build the application (skip linting for Docker)
ENV NEXT_LINT=false
RUN npm run build
# Expose port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:3000 || exit 1
# Start the application
CMD ["npm", "start"]

31
Dockerfile.rag-api Normal file
View File

@ -0,0 +1,31 @@
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements and install Python dependencies (using Docker-specific requirements)
COPY requirements-docker.txt ./requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Copy RAG system code and backend dependencies
COPY rag_system/ ./rag_system/
COPY backend/ ./backend/
# Create necessary directories
RUN mkdir -p lancedb index_store shared_uploads logs
# Expose port
EXPOSE 8001
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8001/models || exit 1
# Run the RAG API server
CMD ["python", "-m", "rag_system.api_server"]

View File

@ -1,45 +0,0 @@
FROM vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest
ENV HABANA_VISIBLE_DEVICES=all
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
ENV PT_HPU_LAZY_ACC_PAR_MODE=0
ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=1
# Install linux packages
ENV DEBIAN_FRONTEND="noninteractive" TZ=Etc/UTC
RUN apt-get update && apt-get install -y tzdata bash-completion python3-pip openssh-server \
vim git iputils-ping net-tools protobuf-compiler curl bc gawk tmux \
&& rm -rf /var/lib/apt/lists/*
# Add repo contents
ADD localGPT /root/localGPT
WORKDIR /root/localGPT
# Install python packages
RUN pip install --upgrade pip \
&& pip install langchain-experimental==0.0.62 \
&& pip install langchain==0.0.329 \
&& pip install protobuf==3.20.2 \
&& pip install grpcio-tools \
&& pip install pymilvus==2.4.0 \
&& pip install chromadb==0.5.15 \
&& pip install llama-cpp-python==0.1.66 \
&& pip install pdfminer.six==20221105 \
&& pip install transformers==4.43.1 \
&& pip install optimum[habana]==1.13.1 \
&& pip install InstructorEmbedding==1.0.1 \
&& pip install sentence-transformers==3.0.1 \
&& pip install faiss-cpu==1.7.4 \
&& pip install huggingface_hub==0.16.4 \
&& pip install protobuf==3.20.2 \
&& pip install auto-gptq==0.2.2 \
&& pip install docx2txt unstructured unstructured[pdf] urllib3 accelerate \
&& pip install bitsandbytes \
&& pip install click flask requests openpyxl \
&& pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.17.0 \
&& pip install python-multipart \
&& pip install fastapi \
&& pip install uvicorn \
&& pip install gptcache==0.1.43 \
&& pip install pypdf==4.3.1 \
&& pip install python-jose[cryptography]

View File

@ -0,0 +1,161 @@
# 📚 API Reference (Backend & RAG API)
_Last updated: 2025-01-07_
---
## Backend HTTP API (Python `backend/server.py`)
**Base URL**: `http://localhost:8000`
| Endpoint | Method | Description | Request Body | Success Response |
|----------|--------|-------------|--------------|------------------|
| `/health` | GET | Health probe incl. Ollama status & DB stats | | 200 JSON `{ status, ollama_running, available_models, database_stats }` |
| `/chat` | POST | Stateless chat (no session) | `{ message:str, model?:str, conversation_history?:[{role,content}]}` | 200 `{ response:str, model:str, message_count:int }` |
| `/sessions` | GET | List all sessions | | `{ sessions:ChatSession[], total:int }` |
| `/sessions` | POST | Create session | `{ title?:str, model?:str }` | 201 `{ session:ChatSession, session_id }` |
| `/sessions/<id>` | GET | Get session + msgs | | `{ session, messages }` |
| `/sessions/<id>` | DELETE | Delete session | | `{ message, deleted_session_id }` |
| `/sessions/<id>/rename` | POST | Rename session | `{ title:str }` | `{ message, session }` |
| `/sessions/<id>/messages` | POST | Session chat (builds history) | See ChatRequest + retrieval opts ▼ | `{ response, session, user_message_id, ai_message_id }` |
| `/sessions/<id>/documents` | GET | List uploaded docs | | `{ files:string[], file_count:int, session }` |
| `/sessions/<id>/upload` | POST multipart | Upload docs to session | field `files[]` | `{ message, uploaded_files, processing_results?, session_documents?, total_session_documents? }` |
| `/sessions/<id>/index` | POST | Trigger RAG indexing for session | `{ latechunk?, doclingChunk?, chunkSize?, ... }` | `{ message }` |
| `/sessions/<id>/indexes` | GET | List indexes linked to session | | `{ indexes, total }` |
| `/sessions/<sid>/indexes/<idxid>` | POST | Link index to session | | `{ message }` |
| `/sessions/cleanup` | GET | Remove empty sessions | | `{ message, cleanup_count }` |
| `/models` | GET | List generation / embedding models | | `{ generation_models:str[], embedding_models:str[] }` |
| `/indexes` | GET | List all indexes | | `{ indexes, total }` |
| `/indexes` | POST | Create index | `{ name:str, description?:str, metadata?:dict }` | `{ index_id }` |
| `/indexes/<id>` | GET | Get single index | | `{ index }` |
| `/indexes/<id>` | DELETE | Delete index | | `{ message, index_id }` |
| `/indexes/<id>/upload` | POST multipart | Upload docs to index | field `files[]` | `{ message, uploaded_files }` |
| `/indexes/<id>/build` | POST | Build / rebuild index (RAG) | `{ latechunk?, doclingChunk?, ...}` | 200 `{ response?, message?}` (idempotent) |
---
## RAG API (Python `rag_system/api_server.py`)
**Base URL**: `http://localhost:8001`
| Endpoint | Method | Description | Request Body | Success Response |
|----------|--------|-------------|--------------|------------------|
| `/chat` | POST | Run RAG query with full pipeline | See RAG ChatRequest ▼ | `{ answer:str, source_documents:[], reasoning?:str, confidence?:float }` |
| `/chat/stream` | POST | Run RAG query with SSE streaming | Same as /chat | Server-Sent Events stream |
| `/index` | POST | Index documents with full configuration | See Index Request ▼ | `{ message:str, indexed_files:[], table_name:str }` |
| `/models` | GET | List available models | | `{ generation_models:str[], embedding_models:str[] }` |
### RAG ChatRequest (Advanced Options)
```jsonc
{
"query": "string", // Required user question
"session_id": "string", // Optional for session context
"table_name": "string", // Optional specific index table
"compose_sub_answers": true, // Optional compose sub-answers
"query_decompose": true, // Optional decompose complex queries
"ai_rerank": false, // Optional AI-powered reranking
"context_expand": false, // Optional context expansion
"verify": true, // Optional answer verification
"retrieval_k": 20, // Optional number of chunks to retrieve
"context_window_size": 1, // Optional context window size
"reranker_top_k": 10, // Optional top-k after reranking
"search_type": "hybrid", // Optional "hybrid|dense|fts"
"dense_weight": 0.7, // Optional dense search weight (0-1)
"force_rag": false, // Optional bypass triage, force RAG
"provence_prune": false, // Optional sentence-level pruning
"provence_threshold": 0.8, // Optional pruning threshold
"model": "qwen3:8b" // Optional generation model override
}
```
### Index Request (Document Indexing)
```jsonc
{
"file_paths": ["path1.pdf", "path2.pdf"], // Required files to index
"session_id": "string", // Required session identifier
"chunk_size": 512, // Optional chunk size (default: 512)
"chunk_overlap": 64, // Optional chunk overlap (default: 64)
"enable_latechunk": true, // Optional enable late chunking
"enable_docling_chunk": false, // Optional enable DocLing chunking
"retrieval_mode": "hybrid", // Optional "hybrid|dense|fts"
"window_size": 2, // Optional context window
"enable_enrich": true, // Optional enable enrichment
"embedding_model": "Qwen/Qwen3-Embedding-0.6B", // Optional embedding model
"enrich_model": "qwen3:0.6b", // Optional enrichment model
"overview_model_name": "qwen3:0.6b", // Optional overview model
"batch_size_embed": 50, // Optional embedding batch size
"batch_size_enrich": 25 // Optional enrichment batch size
}
```
> **Note on CORS** All endpoints include `Access-Control-Allow-Origin: *` header.
---
## Frontend Wrapper (`src/lib/api.ts`)
The React/Next.js frontend calls the backend via a typed wrapper. Important methods & payloads:
| Method | Backend Endpoint | Payload Shape |
|--------|------------------|---------------|
| `checkHealth()` | `/health` | |
| `sendMessage({ message, model?, conversation_history? })` | `/chat` | ChatRequest |
| `getSessions()` | `/sessions` | |
| `createSession(title?, model?)` | `/sessions` | |
| `getSession(sessionId)` | `/sessions/<id>` | |
| `sendSessionMessage(sessionId, message, opts)` | `/sessions/<id>/messages` | `ChatRequest + retrieval opts` |
| `uploadFiles(sessionId, files[])` | `/sessions/<id>/upload` | multipart |
| `indexDocuments(sessionId)` | `/sessions/<id>/index` | opts similar to buildIndex |
| `buildIndex(indexId, opts)` | `/indexes/<id>/build` | Index build options |
| `linkIndexToSession` | `/sessions/<sid>/indexes/<idx>` | |
---
## Payload Definitions (Canonical)
### ChatRequest (frontend ⇄ backend)
```jsonc
{
"message": "string", // Required raw user text
"model": "string", // Optional generation model id
"conversation_history": [ // Optional prior turn list
{ "role": "user|assistant", "content": "string" }
]
}
```
### Session Chat Extended Options
```jsonc
{
"composeSubAnswers": true,
"decompose": true,
"aiRerank": false,
"contextExpand": false,
"verify": true,
"retrievalK": 10,
"contextWindowSize": 5,
"rerankerTopK": 20,
"searchType": "fts|hybrid|dense",
"denseWeight": 0.75,
"force_rag": false
}
```
### Index Build Options
```jsonc
{
"latechunk": true,
"doclingChunk": false,
"chunkSize": 512,
"chunkOverlap": 64,
"retrievalMode": "hybrid|dense|fts",
"windowSize": 2,
"enableEnrich": true,
"embeddingModel": "Qwen/Qwen3-Embedding-0.6B",
"enrichModel": "qwen3:0.6b",
"overviewModel": "qwen3:0.6b",
"batchSizeEmbed": 64,
"batchSizeEnrich": 32
}
```
---
_This reference is derived from static code analysis of `backend/server.py`, `rag_system/api_server.py`, and `src/lib/api.ts`. Keep it in sync with route or type changes._

View File

@ -0,0 +1,83 @@
# 🏗️ System Architecture Overview
_Last updated: 2025-07-06_
This document explains how data and control flow through the Advanced **RAG System** — from a user's browser all the way to model inference and back. It is intended as the **ground-truth reference** for engineers and integrators.
---
## 1. Bird's-Eye Diagram
```mermaid
flowchart LR
subgraph Client
U["👤 User (Browser)"]
FE["Next.js Front-end\nReact Components"]
U --> FE
end
subgraph Network
FE -->|HTTP/JSON| BE["Python HTTP Server\nbackend/server.py"]
end
subgraph Core["rag_system core package"]
BE --> LOOP["Agent Loop\n(rag_system/agent/loop.py)"]
BE --> IDX["Indexing Pipeline\n(pipelines/indexing_pipeline.py)"]
LOOP --> RP["Retrieval Pipeline\n(pipelines/retrieval_pipeline.py)"]
LOOP --> VER["Verifier (Grounding Check)"]
RP --> RET["Retrievers\nBM25 | Dense | Hybrid"]
RP --> RER["AI Reranker"]
RP --> SYNT["Answer Synthesiser"]
end
subgraph Storage
LDB[("LanceDB Vector Tables")]
SQL[("SQLite chat & metadata")]
end
subgraph Models
OLLAMA["Ollama Server\n(qwen3, etc.)"]
HF["HuggingFace Hosted\nEmbedding/Reranker Models"]
end
%% data edges
IDX -->|chunks & embeddings| LDB
RET -->|vector search| LDB
LOOP -->|LLM calls| OLLAMA
RP -->|LLM calls| OLLAMA
VER -->|LLM calls| OLLAMA
RP -->|rerank| HF
BE -->|CRUD| SQL
```
---
### Data-flow Narrative
1. **User** interacts with the Next.js UI; messages are posted via `src/lib/api.ts`.
2. **backend/server.py** receives JSON over HTTP, applies CORS, and proxies the request into `rag_system`.
3. **Agent Loop** decides (via _Triage_) whether to perform Retrieval-Augmented Generation (RAG) or direct LLM answering.
4. If RAG is chosen:
1. **Retrieval Pipeline** fetches candidates from **LanceDB** using BM25 + dense vectors.
2. **AI Reranker** (HF model) sorts snippets.
3. **Answer Synthesiser** calls **Ollama** to write the final answer.
5. Answers can be **Verified** for grounding (optional flag).
6. Index-building is an offline path triggered from the UI — PDF/📄 files are chunked, embedded and stored in LanceDB.
---
## 2. Component Documents
The table below links to deep-dives for each major component.
| **Component** | **Documentation** |
|---------------|-------------------|
| Agent Loop | [`system_overview.md`](system_overview.md) |
| Indexing Pipeline | [`indexing_pipeline.md`](indexing_pipeline.md) |
| Retrieval Pipeline | [`retrieval_pipeline.md`](retrieval_pipeline.md) |
| Verifier | [`verifier.md`](verifier.md) |
| Triage System | [`triage_system.md`](triage_system.md) |
---
> **Change-management**: whenever architecture changes (new micro-service, different DB, etc.) update this overview diagram first, then individual component docs.

View File

@ -0,0 +1,598 @@
# 🚀 RAG System Deployment Guide
_Last updated: 2025-01-07_
This guide provides comprehensive instructions for deploying the RAG system using both Docker and direct development approaches.
---
## 🎯 Deployment Options
### Option 1: Docker Deployment (Production) 🐳
- **Best for**: Production environments, containerized deployments, scaling
- **Pros**: Isolated, reproducible, easy to manage
- **Cons**: Slightly more complex setup, resource overhead
### Option 2: Direct Development (Development) 💻
- **Best for**: Development, debugging, customization
- **Pros**: Direct access to code, faster iteration, easier debugging
- **Cons**: More dependencies to manage
---
## 1. Prerequisites
### 1.1 System Requirements
#### **Minimum Requirements**
- **CPU**: 4 cores, 2.5GHz+
- **RAM**: 8GB (16GB recommended)
- **Storage**: 50GB free space
- **OS**: Linux, macOS, or Windows with WSL2
#### **Recommended Requirements**
- **CPU**: 8+ cores, 3.0GHz+
- **RAM**: 32GB+ (for large models)
- **Storage**: 200GB+ SSD
- **GPU**: NVIDIA GPU with 8GB+ VRAM (optional, for acceleration)
### 1.2 Common Dependencies
**Both deployment methods require:**
```bash
# Ollama (required for both approaches)
curl -fsSL https://ollama.ai/install.sh | sh
# Git for cloning
git 2.30+
```
### 1.3 Docker-Specific Dependencies
**For Docker deployment:**
```bash
# Docker & Docker Compose
Docker Engine 24.0+
Docker Compose 2.20+
```
### 1.4 Direct Development Dependencies
**For direct development:**
```bash
# Python & Node.js
Python 3.8+
Node.js 16+
npm 8+
```
---
## 2. 🐳 Docker Deployment
### 2.1 Installation
#### **Step 1: Install Docker**
**Ubuntu/Debian:**
```bash
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
newgrp docker
# Install Docker Compose V2
sudo apt-get update
sudo apt-get install docker-compose-plugin
```
**macOS:**
```bash
# Install Docker Desktop
brew install --cask docker
# Or download from: https://www.docker.com/products/docker-desktop
```
**Windows:**
```bash
# Install Docker Desktop with WSL2 backend
# Download from: https://www.docker.com/products/docker-desktop
```
#### **Step 2: Clone Repository**
```bash
git clone https://github.com/your-org/rag-system.git
cd rag-system
```
#### **Step 3: Install Ollama**
```bash
# Install Ollama (runs locally even with Docker)
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama
ollama serve
# In another terminal, install models
ollama pull qwen3:0.6b
ollama pull qwen3:8b
```
#### **Step 4: Launch Docker System**
```bash
# Start all containers using the convenience script
./start-docker.sh
# Or manually:
docker compose --env-file docker.env up --build -d
```
#### **Step 5: Verify Deployment**
```bash
# Check container status
docker compose ps
# Test all endpoints
curl http://localhost:3000 # Frontend
curl http://localhost:8000/health # Backend
curl http://localhost:8001/models # RAG API
curl http://localhost:11434/api/tags # Ollama
```
### 2.2 Docker Management
#### **Container Operations**
```bash
# Start system
./start-docker.sh
# Stop system
./start-docker.sh stop
# View logs
./start-docker.sh logs
# Check status
./start-docker.sh status
# Manual Docker Compose commands
docker compose ps # Check status
docker compose logs -f # Follow logs
docker compose down # Stop all containers
docker compose up --build -d # Rebuild and restart
```
#### **Individual Container Management**
```bash
# Restart specific service
docker compose restart rag-api
# View specific service logs
docker compose logs -f backend
# Execute commands in container
docker compose exec rag-api python -c "print('Hello')"
```
---
## 3. 💻 Direct Development
### 3.1 Installation
#### **Step 1: Install Dependencies**
**Python Dependencies:**
```bash
# Clone repository
git clone https://github.com/your-org/rag-system.git
cd rag-system
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install Python packages
pip install -r requirements.txt
```
**Node.js Dependencies:**
```bash
# Install Node.js dependencies
npm install
```
#### **Step 2: Install and Configure Ollama**
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama
ollama serve
# In another terminal, install models
ollama pull qwen3:0.6b
ollama pull qwen3:8b
```
#### **Step 3: Launch System**
**Option A: Integrated Launcher (Recommended)**
```bash
# Start all components with one command
python run_system.py
```
**Option B: Manual Component Startup**
```bash
# Terminal 1: RAG API
python -m rag_system.api_server
# Terminal 2: Backend
cd backend && python server.py
# Terminal 3: Frontend
npm run dev
# Access at http://localhost:3000
```
#### **Step 4: Verify Installation**
```bash
# Check system health
python system_health_check.py
# Test endpoints
curl http://localhost:3000 # Frontend
curl http://localhost:8000/health # Backend
curl http://localhost:8001/models # RAG API
```
### 3.2 Direct Development Management
#### **System Operations**
```bash
# Start system
python run_system.py
# Check system health
python system_health_check.py
# Stop system
# Press Ctrl+C in terminal running run_system.py
```
#### **Individual Component Management**
```bash
# Start components individually
python -m rag_system.api_server # RAG API on port 8001
cd backend && python server.py # Backend on port 8000
npm run dev # Frontend on port 3000
# Development tools
npm run build # Build frontend for production
pip install -r requirements.txt --upgrade # Update Python packages
```
---
## 4. Architecture Comparison
### 4.1 Docker Architecture
```mermaid
graph TB
subgraph "Docker Containers"
Frontend[Frontend Container<br/>Next.js<br/>Port 3000]
Backend[Backend Container<br/>Python API<br/>Port 8000]
RAG[RAG API Container<br/>Document Processing<br/>Port 8001]
end
subgraph "Local System"
Ollama[Ollama Server<br/>Port 11434]
end
Frontend --> Backend
Backend --> RAG
RAG --> Ollama
```
### 4.2 Direct Development Architecture
```mermaid
graph TB
subgraph "Local Processes"
Frontend[Next.js Dev Server<br/>Port 3000]
Backend[Python Backend<br/>Port 8000]
RAG[RAG API<br/>Port 8001]
Ollama[Ollama Server<br/>Port 11434]
end
Frontend --> Backend
Backend --> RAG
RAG --> Ollama
```
---
## 5. Configuration
### 5.1 Environment Variables
#### **Docker Configuration (`docker.env`)**
```bash
# Ollama Configuration
OLLAMA_HOST=http://host.docker.internal:11434
# Service Configuration
NODE_ENV=production
RAG_API_URL=http://rag-api:8001
NEXT_PUBLIC_API_URL=http://localhost:8000
```
#### **Direct Development Configuration**
```bash
# Environment variables are set automatically by run_system.py
# Override in environment if needed:
export OLLAMA_HOST=http://localhost:11434
export RAG_API_URL=http://localhost:8001
```
### 5.2 Model Configuration
#### **Default Models**
```python
# Embedding Models
EMBEDDING_MODELS = [
"Qwen/Qwen3-Embedding-0.6B", # Fast, 1024 dimensions
"Qwen/Qwen3-Embedding-4B", # High quality, 2048 dimensions
]
# Generation Models
GENERATION_MODELS = [
"qwen3:0.6b", # Fast responses
"qwen3:8b", # High quality
]
```
### 5.3 Performance Tuning
#### **Memory Settings**
```bash
# For Docker: Increase memory allocation
# Docker Desktop → Settings → Resources → Memory → 16GB+
# For Direct Development: Monitor with
htop # or top on macOS
```
#### **Model Settings**
```python
# Batch sizes (adjust based on available RAM)
EMBEDDING_BATCH_SIZE = 50 # Reduce if OOM
ENRICHMENT_BATCH_SIZE = 25 # Reduce if OOM
# Chunk settings
CHUNK_SIZE = 512 # Text chunk size
CHUNK_OVERLAP = 64 # Overlap between chunks
```
---
## 6. Operational Procedures
### 6.1 System Monitoring
#### **Health Checks**
```bash
# Comprehensive system check
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
```
#### **Performance Monitoring**
```bash
# Docker monitoring
docker stats
# Direct development monitoring
htop # Overall system
nvidia-smi # GPU usage (if available)
```
### 6.2 Log Management
#### **Docker Logs**
```bash
# All services
docker compose logs -f
# Specific service
docker compose logs -f rag-api
# Save logs to file
docker compose logs > system.log 2>&1
```
#### **Direct Development Logs**
```bash
# Logs are printed to terminal
# Redirect to file if needed:
python run_system.py > system.log 2>&1
```
### 6.3 Backup and Restore
#### **Data Backup**
```bash
# Create backup directory
mkdir -p backups/$(date +%Y%m%d)
# Backup databases and indexes
cp -r backend/chat_data.db backups/$(date +%Y%m%d)/
cp -r lancedb backups/$(date +%Y%m%d)/
cp -r index_store backups/$(date +%Y%m%d)/
# For Docker: also backup volumes
docker compose down
docker run --rm -v rag_system_old_ollama_data:/data -v $(pwd)/backups:/backup alpine tar czf /backup/ollama_models_$(date +%Y%m%d).tar.gz -C /data .
```
#### **Data Restore**
```bash
# Stop system
./start-docker.sh stop # Docker
# Or Ctrl+C for direct development
# Restore files
cp -r backups/YYYYMMDD/* ./
# Restart system
./start-docker.sh # Docker
python run_system.py # Direct development
```
---
## 7. Troubleshooting
### 7.1 Common Issues
#### **Port Conflicts**
```bash
# Check what's using ports
lsof -i :3000 -i :8000 -i :8001 -i :11434
# For Docker: Stop conflicting containers
./start-docker.sh stop
# For Direct: Kill processes
pkill -f "npm run dev"
pkill -f "server.py"
pkill -f "api_server"
```
#### **Docker Issues**
```bash
# Docker daemon not running
docker version # Check if daemon responds
# Restart Docker Desktop (macOS/Windows)
# Or restart docker service (Linux)
sudo systemctl restart docker
# Clear Docker cache
docker system prune -f
```
#### **Ollama Issues**
```bash
# Check Ollama status
curl http://localhost:11434/api/tags
# Restart Ollama
pkill ollama
ollama serve
# Reinstall models
ollama pull qwen3:0.6b
ollama pull qwen3:8b
```
### 7.2 Performance Issues
#### **Memory Problems**
```bash
# Check memory usage
free -h # Linux
vm_stat # macOS
docker stats # Docker containers
# Solutions:
# 1. Increase system RAM
# 2. Reduce batch sizes in configuration
# 3. Use smaller models (qwen3:0.6b instead of qwen3:8b)
```
#### **Slow Response Times**
```bash
# Check model loading
curl http://localhost:11434/api/tags
# Monitor component response times
time curl http://localhost:8001/models
# Solutions:
# 1. Use SSD storage
# 2. Increase CPU cores
# 3. Use GPU acceleration (if available)
```
---
## 8. Production Considerations
### 8.1 Security
#### **Network Security**
```bash
# Use reverse proxy (nginx/traefik) for production
# Enable HTTPS/TLS
# Restrict port access with firewall
```
#### **Data Security**
```bash
# Enable authentication in production
# Encrypt sensitive data
# Regular security updates
```
### 8.2 Scaling
#### **Horizontal Scaling**
```bash
# Use Docker Swarm or Kubernetes
# Load balance frontend and backend
# Scale RAG API instances based on load
```
#### **Resource Optimization**
```bash
# Use dedicated GPU nodes for AI workloads
# Implement model caching
# Optimize batch processing
```
---
## 9. Success Criteria
### 9.1 Deployment Verification
Your deployment is successful when:
- ✅ All health checks pass
- ✅ Frontend loads at http://localhost:3000
- ✅ You can create document indexes
- ✅ You can chat with uploaded documents
- ✅ No error messages in logs
### 9.2 Performance Benchmarks
**Acceptable Performance:**
- Index creation: < 2 minutes per 100MB document
- Query response: < 30 seconds for complex questions
- Memory usage: < 8GB total system memory
**Optimal Performance:**
- Index creation: < 1 minute per 100MB document
- Query response: < 10 seconds for complex questions
- Memory usage: < 16GB total system memory
---
**Happy Deploying! 🚀**

View File

@ -0,0 +1,543 @@
# 🐳 Docker Usage Guide - RAG System
_Last updated: 2025-01-07_
This guide provides practical Docker commands and procedures for running the RAG system in containerized environments with local Ollama.
---
## 📋 Prerequisites
### Required Setup
- Docker Desktop installed and running
- Ollama installed locally (even for Docker deployment)
- 8GB+ RAM available
### Architecture Overview
```
┌─────────────────────────────────────┐
│ Docker Containers │
├─────────────────────────────────────┤
│ Frontend (Port 3000) │
│ Backend (Port 8000) │
│ RAG API (Port 8001) │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Local System │
├─────────────────────────────────────┤
│ Ollama Server (Port 11434) │
└─────────────────────────────────────┘
```
---
## 1. Quick Start Commands
### Step 1: Clone and Setup
```bash
# Clone repository
git clone <your-repository-url>
cd rag_system_old
# Verify Docker is running
docker version
```
### Step 2: Install and Configure Ollama (Required)
**⚠️ Important**: Even with Docker, Ollama must be installed locally for optimal performance.
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama (in one terminal)
ollama serve
# Install required models (in another terminal)
ollama pull qwen3:0.6b # Fast model (650MB)
ollama pull qwen3:8b # High-quality model (4.7GB)
# Verify models are installed
ollama list
# Test Ollama connection
curl http://localhost:11434/api/tags
```
### Step 3: Start Docker Containers
```bash
# Start all containers
./start-docker.sh
# Stop all containers
./start-docker.sh stop
# View logs
./start-docker.sh logs
# Check status
./start-docker.sh status
# Restart containers
./start-docker.sh stop
./start-docker.sh
```
### 1.2 Service Access
Once running, access the system at:
- **Frontend**: http://localhost:3000
- **Backend API**: http://localhost:8000
- **RAG API**: http://localhost:8001
- **Ollama**: http://localhost:11434
---
## 2. Container Management
### 2.1 Using the Convenience Script
```bash
# Start all containers
./start-docker.sh
# Stop all containers
./start-docker.sh stop
# View logs
./start-docker.sh logs
# Check status
./start-docker.sh status
# Restart containers
./start-docker.sh stop
./start-docker.sh
```
### 2.2 Manual Docker Compose Commands
```bash
# Start all services
docker compose --env-file docker.env up --build -d
# Check status
docker compose ps
# View logs
docker compose logs -f
# Stop all services
docker compose down
# Force rebuild
docker compose build --no-cache
docker compose up --build -d
```
### 2.3 Individual Service Management
```bash
# Start specific service
docker compose up -d frontend
docker compose up -d backend
docker compose up -d rag-api
# Restart specific service
docker compose restart rag-api
# Stop specific service
docker compose stop backend
# View specific service logs
docker compose logs -f rag-api
```
---
## 3. Development Workflow
### 3.1 Code Changes
```bash
# After frontend changes
docker compose restart frontend
# After backend changes
docker compose restart backend
# After RAG system changes
docker compose restart rag-api
# Rebuild after dependency changes
docker compose build --no-cache rag-api
docker compose up -d rag-api
```
### 3.2 Debugging Containers
```bash
# Access container shell
docker compose exec frontend sh
docker compose exec backend bash
docker compose exec rag-api bash
# Run commands in container
docker compose exec rag-api python -c "from rag_system.main import get_agent; print('✅ RAG System OK')"
docker compose exec backend curl http://localhost:8000/health
# Check environment variables
docker compose exec rag-api env | grep OLLAMA
```
### 3.3 Development vs Production
```bash
# Development mode (if docker-compose.dev.yml exists)
docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d
# Production mode (default)
docker compose --env-file docker.env up -d
```
---
## 4. Logging & Monitoring
### 4.1 Log Management
```bash
# View all logs
docker compose logs
# View specific service logs
docker compose logs frontend
docker compose logs backend
docker compose logs rag-api
# Follow logs in real-time
docker compose logs -f
# View last N lines
docker compose logs --tail=100
# View logs with timestamps
docker compose logs -t
# Save logs to file
docker compose logs > system.log 2>&1
# View logs since specific time
docker compose logs --since=2h
docker compose logs --since=2025-01-01T00:00:00
```
### 4.2 System Monitoring
```bash
# Monitor resource usage
docker stats
# Monitor specific containers
docker stats rag-frontend rag-backend rag-api
# Check container health
docker compose ps
# System information
docker system info
docker system df
```
---
## 5. Ollama Integration
### 5.1 Ollama Setup
```bash
# Install Ollama (one-time setup)
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama server
ollama serve
# Check Ollama status
curl http://localhost:11434/api/tags
# Install models
ollama pull qwen3:0.6b # Fast model
ollama pull qwen3:8b # High-quality model
# List installed models
ollama list
```
### 5.2 Ollama Management
```bash
# Check model status from container
docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
# Test Ollama connection
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{"model": "qwen3:0.6b", "prompt": "Hello", "stream": false}'
# Monitor Ollama logs (if running with logs)
# Ollama logs appear in the terminal where you ran 'ollama serve'
```
### 5.3 Model Management
```bash
# Update models
ollama pull qwen3:0.6b
ollama pull qwen3:8b
# Remove unused models
ollama rm old-model-name
# Check model information
ollama show qwen3:0.6b
```
---
## 6. Data Management
### 6.1 Volume Management
```bash
# List volumes
docker volume ls
# View volume usage
docker system df -v
# Backup volumes
docker run --rm -v rag_system_old_lancedb:/data -v $(pwd)/backup:/backup alpine tar czf /backup/lancedb_backup.tar.gz -C /data .
# Clean unused volumes
docker volume prune
```
### 6.2 Database Management
```bash
# Access SQLite database
docker compose exec backend sqlite3 /app/backend/chat_data.db
# Backup database
cp backend/chat_data.db backup/chat_data_$(date +%Y%m%d).db
# Check LanceDB tables from container
docker compose exec rag-api python -c "
import lancedb
db = lancedb.connect('/app/lancedb')
print('Tables:', db.table_names())
"
```
### 6.3 File Management
```bash
# Access shared files
docker compose exec rag-api ls -la /app/shared_uploads
# Copy files to/from containers
docker cp local_file.pdf rag-api:/app/shared_uploads/
docker cp rag-api:/app/shared_uploads/file.pdf ./local_file.pdf
# Check disk usage
docker compose exec rag-api df -h
```
---
## 7. Troubleshooting
### 7.1 Common Issues
#### Container Won't Start
```bash
# Check Docker daemon
docker version
# Check for port conflicts
lsof -i :3000 -i :8000 -i :8001
# Check container logs
docker compose logs [service-name]
# Restart Docker Desktop
# macOS/Windows: Restart Docker Desktop
# Linux: sudo systemctl restart docker
```
#### Ollama Connection Issues
```bash
# Check Ollama is running
curl http://localhost:11434/api/tags
# Restart Ollama
pkill ollama
ollama serve
# Check from container
docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
```
#### Performance Issues
```bash
# Check resource usage
docker stats
# Increase Docker memory (Docker Desktop Settings)
# Recommended: 8GB+ for Docker
# Check container health
docker compose ps
```
### 7.2 Reset and Clean
```bash
# Stop everything
./start-docker.sh stop
# Clean containers and images
docker system prune -a
# Clean volumes (⚠️ deletes data)
docker volume prune
# Complete reset (⚠️ deletes everything)
docker compose down -v
docker system prune -a --volumes
```
### 7.3 Health Checks
```bash
# Comprehensive health check
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
# Check all container status
docker compose ps
# Test model loading
docker compose exec rag-api python -c "
from rag_system.main import get_agent
agent = get_agent('default')
print('✅ RAG System initialized successfully')
"
```
---
## 8. Advanced Usage
### 8.1 Production Deployment
```bash
# Use production environment
export NODE_ENV=production
# Start with resource limits
docker compose --env-file docker.env up -d
# Enable automatic restarts
docker update --restart unless-stopped $(docker ps -q)
```
### 8.2 Scaling
```bash
# Scale specific services
docker compose up -d --scale backend=2 --scale rag-api=2
# Use Docker Swarm for clustering
docker swarm init
docker stack deploy -c docker-compose.yml rag-system
```
### 8.3 Security
```bash
# Scan images for vulnerabilities
docker scout cves rag-frontend
docker scout cves rag-backend
docker scout cves rag-api
# Update base images
docker compose build --no-cache --pull
```
---
## 9. Configuration
### 9.1 Environment Variables
The system uses `docker.env` for configuration:
```bash
# Ollama configuration
OLLAMA_HOST=http://host.docker.internal:11434
# Service configuration
NODE_ENV=production
RAG_API_URL=http://rag-api:8001
NEXT_PUBLIC_API_URL=http://localhost:8000
```
### 9.2 Custom Configuration
```bash
# Create custom environment file
cp docker.env docker.custom.env
# Edit custom configuration
nano docker.custom.env
# Use custom configuration
docker compose --env-file docker.custom.env up -d
```
---
## 10. Success Checklist
Your Docker deployment is successful when:
- ✅ All containers are running: `docker compose ps`
- ✅ Ollama is accessible: `curl http://localhost:11434/api/tags`
- ✅ Frontend loads: `curl http://localhost:3000`
- ✅ Backend responds: `curl http://localhost:8000/health`
- ✅ RAG API works: `curl http://localhost:8001/models`
- ✅ You can create indexes and chat with documents
### Performance Expectations
**Acceptable Performance:**
- Container startup: < 2 minutes
- Memory usage: < 4GB Docker containers + Ollama
- Response time: < 30 seconds for complex queries
**Optimal Performance:**
- Container startup: < 1 minute
- Memory usage: < 2GB Docker containers + Ollama
- Response time: < 10 seconds for complex queries
---
**Happy Containerizing! 🐳**

View File

@ -0,0 +1,87 @@
# RAG System Improvement Road-map
_Revision: 2025-07-05_
This document captures high-impact enhancements identified during the July 2025 code-review. Items are grouped by theme and include a short rationale plus suggested implementation notes. **No code has been changed this file is planning only.**
---
## 1. Retrieval Accuracy & Speed
| ID | Item | Rationale | Notes |
|----|------|-----------|-------|
| 1.1 | Late-chunk result merging | Returned snippets can be single late-chunks → fragmented. | After retrieval, gather sibling chunks (±1) and concatenate before reranking / display. |
| 1.2 | Tiered retrieval (ANN pre-filter) | Large indexes → LanceDB full scan can be slow. | Use in-memory FAISS/HNSW to narrow to top-N, then exact LanceDB search. |
| 1.3 | Dynamic fusion weights | Different corpora favour dense vs BM25 differently. | Learn weight on small validation set; store in index `metadata`. |
| 1.4 | Query expansion via KG | Use extracted entities to enrich queries. | Requires Graph-RAG path clean-up first. |
## 2. Routing / Triage
| ID | Item | Rationale |
|----|------|-----------|
| 2.1 | Embed + cache document overviews | LLM router costs tokens; cosine-similarity pre-check is cheaper. |
| 2.2 | Session-level routing memo | Avoid repeated LLM triage for follow-up queries. |
| 2.3 | Remove legacy pattern rules | Simplifies maintenance once overview & ML routing mature. |
## 3. Indexing Pipeline
| ID | Item | Rationale |
|----|------|-----------|
| 3.1 | Parallel document conversion | PDF→MD + chunking is serial today; speed gains possible. |
| 3.2 | Incremental indexing | Re-embedding whole corpus wastes time. |
| 3.3 | Auto GPU dtype selection | Use FP16 on CUDA / MPS for memory and speed. |
| 3.4 | Post-build health check | Catch broken indexes (dim mismatch etc.) early. |
## 4. Embedding Model Management
* **Registry file** mapping tag → dims/source/license. UI & backend validate against it.
* **Embedder pool** caches loaded HF/Ollama weights per model to save RAM.
## 5. Database & Storage
* LanceDB table GC for orphaned tables.
* Scheduled SQLite `VACUUM` when fragmentation > X %.
## 6. Observability & Ops
* JSON structured logging.
* `/metrics` endpoint for Prometheus.
* Deep health-probe (`/health/deep`) exercising end-to-end query.
## 7. Front-end UX
* SSE-driven progress bar for indexing.
* Matched-term highlighting in retrieved snippets.
* Preset buttons (Fast / Balanced / High-Recall) for retrieval settings.
## 8. Testing & CI
* Replace deleted BM25 tests with LanceDB hybrid tests.
* Integration test: build → query → assert ≥1 doc.
* GitHub Action that spins up Ollama, pulls small embedding model, runs smoke test.
## 9. Codebase Hygiene
* Graph-RAG integration (currently disabled, can be implemented if needed).
* Consolidate duplicate config keys (`embedding_model_name`, etc.).
* Run `mypy --strict`, pylint, and black in CI.
---
### 🧹 System Cleanup (Priority: **HIGH**)
Reduce complexity and improve maintainability.
* **✅ COMPLETED**: Remove experimental DSPy integration and unused modules (35+ files removed)
* **✅ COMPLETED**: Clean up duplicate or obsolete documentation files
* **✅ COMPLETED**: Remove unused import statements and dependencies
* **✅ COMPLETED**: Consolidate similar configuration files
* **✅ COMPLETED**: Remove broken or non-functional ReAct agent implementation
### Priority Matrix (suggested order)
1. **Critical reliability**: 3.4, 5.1, 9.2
2. **User-visible wins**: 1.1, 7.1, 7.2
3. **Performance**: 1.2, 3.1, 3.3
4. **Long-term maintainability**: 2.3, 9.1, 9.3
Feel free to rearrange based on team objectives and resource availability.

View File

@ -0,0 +1,665 @@
# 🗂️ Indexing Pipeline
_Implementation entry-point: `rag_system/pipelines/indexing_pipeline.py` + helpers in `indexing/` & `ingestion/`._
## Overview
Transforms raw documents (PDF, TXT, etc.) into search-ready **chunks** with embeddings, storing them in LanceDB and generating auxiliary assets (overviews, context summaries).
## High-Level Diagram
```mermaid
flowchart TD
A["Uploaded Files"] --> B{Converter}
B -->|PDF→text| C["Plain Text"]
C --> D{Chunker}
D -->|docling| D1[DocLing Chunking]
D -->|latechunk| D2[Late Chunking]
D -->|standard| D3[Fixed-size]
D1 & D2 & D3 --> E["Contextual Enricher"]
E -->|local ctx summary| F["Embedding Generator"]
F -->|vectors| G[(LanceDB Table)]
E --> H["Overview Builder"]
H -->|JSONL| OVR[[`index_store/overviews/<idx>.jsonl`]]
```
## Steps in Detail
| Step | Module | Key Classes | Notes |
|------|--------|------------|-------|
| Conversion | `ingestion/pdf_converter.py` | `PDFConverter` | Uses `Docling` library to extract text with structure preservation. |
| Chunking | `ingestion/chunking.py`, `indexing/latechunk.py`, `ingestion/docling_chunker.py` | `MarkdownRecursiveChunker`, `DoclingChunker` | Controlled by flags `latechunk`, `doclingChunk`, `chunkSize`, `chunkOverlap`. |
| Contextual Enrichment | `indexing/contextualizer.py` | `ContextualEnricher` | Generates per-chunk summaries (LLM call). |
| Embedding | `indexing/embedders.py`, `indexing/representations.py` | `QwenEmbedder`, `EmbeddingGenerator` | Batch size tunable (`batchSizeEmbed`). Uses Qwen3-Embedding models. |
| LanceDB Ingest | `index_store/lancedb/…` | | Each index has a dedicated table `text_pages_<index_id>`. |
| Overview | `indexing/overview_builder.py` | `OverviewBuilder` | First-N chunks summarised for triage routing. |
### Control Flow (Code)
1. **backend/server.py → handle_build_index()** collects files + opts and POSTs to `/index` endpoint on advanced RAG API (local process).
2. **indexing_pipeline.IndexingPipeline.run()** orchestrates conversion → chunking → enrichment → embedding → storage.
3. Metadata (chunk_size, models, etc.) stored in SQLite `indexes` table.
## Configuration Flags
| Flag | Description | Default |
|------|-------------|---------|
| `latechunk` | Merge k adjacent sibling chunks at query time | false |
| `doclingChunk` | Use DocLing structural chunking | false |
| `chunkSize` / `chunkOverlap` | Standard fixed slicing | 512 / 64 |
| `enableEnrich` | Run contextual summaries | true |
| `embeddingModel` | Override embedder | `Qwen/Qwen3-Embedding-0.6B` |
| `overviewModel` | Model used in `OverviewBuilder` | `qwen3:0.6b` |
| `batchSizeEmbed / Enrich` | Batch sizes | 50 / 25 |
## Error Handling
* Duplicate LanceDB table ➟ now idempotent (commit `af99b38`).
* Failed PDF parse ➟ chunker skips file, logs warning.
## Extension Ideas
* Add OCR layer before PDF conversion.
* Store embeddings in Remote LanceDB instance (update URL in config).
## Detailed Implementation Analysis
### Pipeline Architecture Pattern
The `IndexingPipeline` uses a **sequential processing pattern** with parallel batch operations. Each stage processes all documents before moving to the next stage, enabling efficient memory usage and progress tracking.
```python
def run(self, file_paths: List[str]):
with timer("Complete Indexing Pipeline"):
# Stage 1: Document Processing & Chunking
all_chunks = []
doc_chunks_map = {}
# Stage 2: Contextual Enrichment (optional)
if self.contextual_enricher:
all_chunks = self.contextual_enricher.enrich_batch(all_chunks)
# Stage 3: Dense Indexing (embedding + storage)
if self.vector_indexer:
self.vector_indexer.index_chunks(all_chunks, table_name)
# Stage 4: Graph Extraction (optional)
if self.graph_extractor:
self.graph_extractor.extract_and_store(all_chunks)
```
### Document Processing Deep-Dive
#### PDF Conversion Strategy
```python
# PDFConverter uses Docling for robust text extraction with structure
def convert_to_markdown(self, file_path: str) -> List[Tuple[str, Dict, Any]]:
# Quick heuristic: if PDF has text layer, skip OCR for speed
use_ocr = not self._pdf_has_text(file_path)
converter = self.converter_ocr if use_ocr else self.converter_no_ocr
result = converter.convert(file_path)
markdown_content = result.document.export_to_markdown()
metadata = {"source": file_path}
# Return DoclingDocument object for advanced chunkers
return [(markdown_content, metadata, result.document)]
```
**Benefits**:
- Preserves document structure (headings, lists, tables)
- Automatic OCR fallback for image-based PDFs
- Maintains page-level metadata for source attribution
- Structured output supports advanced chunking strategies
#### Chunking Strategy Selection
```python
# Dynamic chunker selection based on config
chunker_mode = config.get("chunker_mode", "legacy")
if chunker_mode == "docling":
self.chunker = DoclingChunker(
max_tokens=chunk_size,
overlap=overlap_sentences,
tokenizer_model="Qwen/Qwen3-Embedding-0.6B"
)
else:
self.chunker = MarkdownRecursiveChunker(
max_chunk_size=chunk_size,
min_chunk_size=min(chunk_overlap, chunk_size // 4)
)
```
#### Recursive Markdown Chunking Algorithm
```python
def chunk(self, text: str, document_id: str, metadata: Dict) -> List[Dict]:
# Priority hierarchy for splitting
separators = [
"\n\n# ", # H1 headers (highest priority)
"\n\n## ", # H2 headers
"\n\n### ", # H3 headers
"\n\n", # Paragraph breaks
"\n", # Line breaks
". ", # Sentence boundaries
" " # Word boundaries (last resort)
]
chunks = []
current_chunk = ""
for separator in separators:
if len(current_chunk) <= self.max_chunk_size:
continue
# Split on current separator
parts = current_chunk.split(separator)
# Reassemble with overlap
for i, part in enumerate(parts):
if len(part) > self.max_chunk_size:
# Recursively split large parts
continue
# Add overlap from previous chunk
if i > 0 and len(chunks) > 0:
overlap_text = chunks[-1]["text"][-self.chunk_overlap:]
part = overlap_text + separator + part
chunks.append({
"text": part,
"document_id": document_id,
"metadata": {**metadata, "chunk_index": len(chunks)}
})
```
### DocLing Chunking Implementation
#### Token-Aware Sentence Packing
```python
class DoclingChunker:
def __init__(self, max_tokens: int = 512, overlap: int = 1,
tokenizer_model: str = "Qwen/Qwen3-Embedding-0.6B"):
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_model)
self.max_tokens = max_tokens
self.overlap = overlap # sentences of overlap
def split_markdown(self, markdown: str, document_id: str, metadata: Dict):
sentences = self._sentence_split(markdown)
chunks = []
window = []
while sentences:
# Add sentences until token limit
while (sentences and
self._token_len(" ".join(window + [sentences[0]])) <= self.max_tokens):
window.append(sentences.pop(0))
if not window: # Single sentence > limit
window.append(sentences.pop(0))
# Create chunk
chunk_text = " ".join(window)
chunks.append({
"chunk_id": f"{document_id}_{len(chunks)}",
"text": chunk_text,
"metadata": {
**metadata,
"chunk_index": len(chunks),
"heading_path": metadata.get("heading_path", []),
"block_type": metadata.get("block_type", "paragraph")
}
})
# Add overlap for next chunk
if self.overlap and sentences:
overlap_sentences = window[-self.overlap:]
sentences = overlap_sentences + sentences
window = []
return chunks
```
#### Document Structure Preservation
```python
def chunk_document(self, doc, document_id: str, metadata: Dict):
"""Walk DoclingDocument tree and emit structured chunks."""
chunks = []
current_heading_path = []
buffer = []
# Process document elements in reading order
for txt_item in doc.texts:
role = getattr(txt_item, "role", None)
if role == "heading":
self._flush_buffer(buffer, chunks, current_heading_path)
level = getattr(txt_item, "level", 1)
# Update heading hierarchy
current_heading_path = current_heading_path[:level-1]
current_heading_path.append(txt_item.text.strip())
continue
# Accumulate text in token-aware buffer
text_piece = txt_item.text
if self._buffer_would_exceed_limit(buffer, text_piece):
self._flush_buffer(buffer, chunks, current_heading_path)
buffer.append(text_piece)
self._flush_buffer(buffer, chunks, current_heading_path)
return chunks
```
### Contextual Enrichment Implementation
#### Batch Processing Pattern
```python
class ContextualEnricher:
def enrich_batch(self, chunks: List[Dict]) -> List[Dict]:
enriched_chunks = []
# Process in batches to manage memory
for i in range(0, len(chunks), self.batch_size):
batch = chunks[i:i + self.batch_size]
# Parallel enrichment within batch
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
futures = [
executor.submit(self._enrich_single_chunk, chunk)
for chunk in batch
]
for future in concurrent.futures.as_completed(futures):
enriched_chunks.append(future.result())
return enriched_chunks
```
#### Contextual Prompt Engineering
```python
def _generate_context_summary(self, chunk_text: str, surrounding_context: str) -> str:
prompt = f"""
Analyze this text chunk and provide a concise summary that captures:
1. Main topics and key information
2. Context within the broader document
3. Relevance for search and retrieval
Document Context:
{surrounding_context}
Chunk to Analyze:
{chunk_text}
Summary (max 2 sentences):
"""
response = self.llm_client.complete(
prompt=prompt,
model=self.ollama_config["enrichment_model"] # qwen3:0.6b
)
return response.strip()
```
### Embedding Generation Pipeline
#### Model Selection Strategy
```python
def select_embedder(model_name: str, ollama_host: str = None):
"""Select appropriate embedder based on model name."""
if "Qwen3-Embedding" in model_name:
return QwenEmbedder(model_name=model_name)
elif "bge-" in model_name:
return BGEEmbedder(model_name=model_name)
elif ollama_host and model_name in ["nomic-embed-text"]:
return OllamaEmbedder(model_name=model_name, host=ollama_host)
else:
# Default to Qwen embedder
return QwenEmbedder(model_name="Qwen/Qwen3-Embedding-0.6B")
```
#### Batch Embedding Generation
```python
class QwenEmbedder:
def create_embeddings(self, texts: List[str]) -> np.ndarray:
"""Generate embeddings in batches for efficiency."""
embeddings = []
for i in range(0, len(texts), self.batch_size):
batch = texts[i:i + self.batch_size]
# Tokenize and encode
inputs = self.tokenizer(
batch,
padding=True,
truncation=True,
max_length=512,
return_tensors='pt'
)
with torch.no_grad():
outputs = self.model(**inputs)
# Mean pooling over token embeddings
batch_embeddings = outputs.last_hidden_state.mean(dim=1)
embeddings.append(batch_embeddings.cpu().numpy())
return np.vstack(embeddings)
```
### LanceDB Storage Implementation
#### Table Management Strategy
```python
class LanceDBManager:
def create_table_if_not_exists(self, table_name: str, schema: Schema):
"""Create LanceDB table with proper schema."""
try:
table = self.db.open_table(table_name)
print(f"Table {table_name} already exists")
return table
except FileNotFoundError:
# Table doesn't exist, create it
table = self.db.create_table(
table_name,
schema=schema,
mode="create"
)
print(f"Created new table: {table_name}")
return table
def index_chunks(self, chunks: List[Dict], table_name: str):
"""Store chunks with embeddings in LanceDB."""
table = self.get_table(table_name)
# Prepare data for insertion
records = []
for chunk in chunks:
record = {
"chunk_id": chunk["chunk_id"],
"text": chunk["text"],
"vector": chunk["embedding"].tolist(),
"metadata": json.dumps(chunk["metadata"]),
"document_id": chunk["metadata"]["document_id"],
"chunk_index": chunk["metadata"]["chunk_index"]
}
records.append(record)
# Batch insert
table.add(records)
# Create vector index for fast similarity search
table.create_index("vector", config=IvfPq(num_partitions=256))
```
### Overview Building for Query Routing
#### Document Summarization Strategy
```python
class OverviewBuilder:
def build_overview(self, chunks: List[Dict], document_id: str) -> Dict:
"""Generate document overview for query routing."""
# Take first N chunks for overview (usually most important)
sample_chunks = chunks[:self.max_chunks_for_overview]
combined_text = "\n\n".join([c["text"] for c in sample_chunks])
overview_prompt = f"""
Analyze this document and create a brief overview that includes:
1. Main topic and purpose
2. Key themes and concepts
3. Document type and domain
4. Relevant search keywords
Document text:
{combined_text}
Overview (max 3 sentences):
"""
overview = self.llm_client.complete(
prompt=overview_prompt,
model=self.overview_model # qwen3:0.6b for speed
)
return {
"document_id": document_id,
"overview": overview.strip(),
"chunk_count": len(chunks),
"keywords": self._extract_keywords(combined_text),
"created_at": datetime.now().isoformat()
}
def save_overview(self, overview: Dict):
"""Save overview to JSONL file for query routing."""
overview_path = f"./index_store/overviews/{overview['document_id']}.jsonl"
with open(overview_path, 'w') as f:
json.dump(overview, f)
```
### Performance Optimizations
#### Memory Management
```python
class IndexingPipeline:
def __init__(self, config: Dict, ollama_client: OllamaClient, ollama_config: Dict):
# Lazy initialization to save memory
self._pdf_converter = None
self._chunker = None
self._embedder = None
def _get_embedder(self):
"""Lazy load embedder to avoid memory overhead."""
if self._embedder is None:
model_name = self.config.get("embedding_model_name", "Qwen/Qwen3-Embedding-0.6B")
self._embedder = select_embedder(model_name)
return self._embedder
def process_document_batch(self, file_paths: List[str]):
"""Process documents in batches to manage memory."""
for batch_start in range(0, len(file_paths), self.batch_size):
batch = file_paths[batch_start:batch_start + self.batch_size]
# Process batch
self._process_batch(batch)
# Cleanup to free memory
if hasattr(self, '_embedder') and self._embedder:
self._embedder.cleanup()
```
#### Parallel Processing
```python
def run_parallel_processing(self, file_paths: List[str]):
"""Process multiple documents in parallel."""
with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
futures = []
for file_path in file_paths:
future = executor.submit(self._process_single_file, file_path)
futures.append(future)
# Collect results
results = []
for future in concurrent.futures.as_completed(futures):
try:
result = future.result(timeout=300) # 5 minute timeout
results.append(result)
except Exception as e:
print(f"Error processing file: {e}")
return results
```
### Error Handling and Recovery
#### Graceful Degradation
```python
def run(self, file_paths: List[str], table_name: str):
"""Main pipeline with comprehensive error handling."""
processed_files = []
failed_files = []
for file_path in file_paths:
try:
# Attempt processing
chunks = self._process_single_file(file_path)
if chunks:
# Store successfully processed chunks
self._store_chunks(chunks, table_name)
processed_files.append(file_path)
else:
print(f"⚠️ No chunks generated from {file_path}")
failed_files.append((file_path, "No chunks generated"))
except Exception as e:
print(f"❌ Error processing {file_path}: {e}")
failed_files.append((file_path, str(e)))
continue # Continue with other files
# Return summary
return {
"processed": len(processed_files),
"failed": len(failed_files),
"processed_files": processed_files,
"failed_files": failed_files
}
```
#### Recovery Mechanisms
```python
def recover_from_partial_failure(self, table_name: str, document_id: str):
"""Recover from partial indexing failures."""
try:
# Check what was already processed
table = self.db_manager.get_table(table_name)
existing_chunks = table.search().where(f"document_id = '{document_id}'").to_list()
if existing_chunks:
print(f"Found {len(existing_chunks)} existing chunks for {document_id}")
return True
# Cleanup partial data
self._cleanup_partial_data(table_name, document_id)
return False
except Exception as e:
print(f"Recovery failed: {e}")
return False
```
### Configuration and Customization
#### Pipeline Configuration Options
```python
DEFAULT_CONFIG = {
"chunking": {
"strategy": "docling", # "docling", "recursive", "fixed"
"max_tokens": 512,
"overlap": 64,
"min_chunk_size": 100
},
"embedding": {
"model_name": "Qwen/Qwen3-Embedding-0.6B",
"batch_size": 32,
"max_length": 512
},
"enrichment": {
"enabled": True,
"model": "qwen3:0.6b",
"batch_size": 16
},
"overview": {
"enabled": True,
"max_chunks": 5,
"model": "qwen3:0.6b"
},
"storage": {
"create_index": True,
"index_type": "IvfPq",
"num_partitions": 256
}
}
```
#### Custom Processing Hooks
```python
class IndexingPipeline:
def __init__(self, config: Dict, hooks: Dict = None):
self.hooks = hooks or {}
def _run_hook(self, hook_name: str, *args, **kwargs):
"""Execute custom processing hooks."""
if hook_name in self.hooks:
return self.hooks[hook_name](*args, **kwargs)
return None
def process_chunk(self, chunk: Dict) -> Dict:
"""Process single chunk with custom hooks."""
# Pre-processing hook
chunk = self._run_hook("pre_chunk_process", chunk) or chunk
# Standard processing
if self.contextual_enricher:
chunk = self.contextual_enricher.enrich_chunk(chunk)
# Post-processing hook
chunk = self._run_hook("post_chunk_process", chunk) or chunk
return chunk
```
---
## Current Implementation Status
### Completed Features ✅
- DocLing-based PDF processing with OCR fallback
- Multiple chunking strategies (DocLing, Recursive, Fixed-size)
- Qwen3-Embedding-0.6B integration
- Contextual enrichment with qwen3:0.6b
- LanceDB storage with vector indexing
- Overview generation for query routing
- Batch processing and parallel execution
- Comprehensive error handling
### In Development 🚧
- Graph extraction and knowledge graph building
- Multimodal processing for images and tables
- Advanced late-chunking optimization
- Distributed processing support
### Planned Features 📋
- Custom model fine-tuning pipeline
- Real-time incremental indexing
- Cross-document relationship extraction
- Advanced metadata enrichment
---
## Performance Benchmarks
| Document Type | Processing Speed | Memory Usage | Storage Efficiency |
|---------------|------------------|--------------|-------------------|
| Text PDFs | 2-5 pages/sec | 2-4GB | 1MB/100 pages |
| Image PDFs | 0.5-1 page/sec | 4-8GB | 2MB/100 pages |
| Technical Docs | 1-3 pages/sec | 3-6GB | 1.5MB/100 pages |
| Research Papers | 2-4 pages/sec | 2-4GB | 1.2MB/100 pages |
## Extension Points
### Custom Chunkers
```python
class CustomChunker(BaseChunker):
def chunk(self, text: str, document_id: str, metadata: Dict) -> List[Dict]:
# Implement custom chunking logic
pass
```
### Custom Embedders
```python
class CustomEmbedder(BaseEmbedder):
def create_embeddings(self, texts: List[str]) -> np.ndarray:
# Implement custom embedding generation
pass
```
### Custom Enrichers
```python
class CustomEnricher(BaseEnricher):
def enrich_chunk(self, chunk: Dict) -> Dict:
# Implement custom enrichment logic
pass
```

View File

@ -0,0 +1,542 @@
# 📦 RAG System Installation Guide
_Last updated: 2025-01-07_
This guide provides step-by-step instructions for installing and setting up the RAG system using either Docker or direct development approaches.
---
## 🎯 Installation Options
### Option 1: Docker Deployment (Production Ready) 🐳
- **Best for**: Production environments, isolated setups, easy management
- **Requirements**: Docker Desktop + Local Ollama
- **Setup time**: ~10 minutes
### Option 2: Direct Development (Developer Friendly) 💻
- **Best for**: Development, customization, debugging
- **Requirements**: Python + Node.js + Ollama
- **Setup time**: ~15 minutes
---
## 1. Prerequisites
### 1.1 System Requirements
#### **Minimum Requirements**
- **CPU**: 4 cores, 2.5GHz+
- **RAM**: 8GB (16GB recommended)
- **Storage**: 50GB free space
- **OS**: macOS 10.15+, Ubuntu 20.04+, Windows 10+
#### **Recommended Requirements**
- **CPU**: 8+ cores, 3.0GHz+
- **RAM**: 32GB+ (for large models)
- **Storage**: 200GB+ SSD
- **GPU**: NVIDIA GPU with 8GB+ VRAM (optional)
### 1.2 Common Dependencies
**Required for both approaches:**
- **Ollama**: AI model runtime (always required)
- **Git**: 2.30+ for cloning repository
**Docker-specific:**
- **Docker Desktop**: 24.0+ with Docker Compose
**Direct Development-specific:**
- **Python**: 3.8+
- **Node.js**: 16+ with npm
---
## 2. Ollama Installation (Required for Both)
### 2.1 Install Ollama
#### **macOS/Linux:**
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Verify installation
ollama --version
```
#### **Windows:**
```bash
# Download from: https://ollama.ai/download
# Run the installer and follow setup wizard
```
### 2.2 Configure Ollama
```bash
# Start Ollama server
ollama serve
# In another terminal, install required models
ollama pull qwen3:0.6b # Fast model (650MB)
ollama pull qwen3:8b # High-quality model (4.7GB)
# Verify models are installed
ollama list
# Test Ollama
ollama run qwen3:0.6b "Hello, how are you?"
```
**⚠️ Important**: Keep Ollama running (`ollama serve`) for the entire setup process.
---
## 3. 🐳 Docker Installation & Setup
### 3.1 Install Docker
#### **macOS:**
```bash
# Install Docker Desktop via Homebrew
brew install --cask docker
# Or download from: https://www.docker.com/products/docker-desktop/
# Start Docker Desktop from Applications
# Verify installation
docker --version
docker compose version
```
#### **Ubuntu/Debian:**
```bash
# Update system
sudo apt-get update
# Install Docker using convenience script
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Add user to docker group
sudo usermod -aG docker $USER
newgrp docker
# Install Docker Compose V2
sudo apt-get install docker-compose-plugin
# Verify installation
docker --version
docker compose version
```
#### **Windows:**
1. Download Docker Desktop from https://www.docker.com/products/docker-desktop/
2. Run installer and enable WSL 2 integration
3. Restart computer and start Docker Desktop
4. Verify in PowerShell: `docker --version`
### 3.2 Clone and Setup RAG System
```bash
# Clone repository
git clone <your-repository-url>
cd rag_system_old
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Start Docker containers
./start-docker.sh
# Wait for containers to start (2-3 minutes)
sleep 120
# Verify deployment
./start-docker.sh status
```
### 3.3 Test Docker Deployment
```bash
# Test all endpoints
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
# Access the application
open http://localhost:3000
```
---
## 4. 💻 Direct Development Setup
### 4.1 Install Development Dependencies
#### **Python Setup:**
```bash
# Clone repository
git clone https://github.com/your-org/rag-system.git
cd rag-system
# Create virtual environment (recommended)
python -m venv venv
# Activate virtual environment
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Install Python dependencies
pip install -r requirements.txt
# Verify Python setup
python -c "import torch; print('✅ PyTorch OK')"
python -c "import transformers; print('✅ Transformers OK')"
python -c "import lancedb; print('✅ LanceDB OK')"
```
#### **Node.js Setup:**
```bash
# Install Node.js dependencies
npm install
# Verify Node.js setup
node --version # Should be 16+
npm --version
npm list --depth=0
```
### 4.2 Start Direct Development
```bash
# Ensure Ollama is running
curl http://localhost:11434/api/tags
# Start all components with one command
python run_system.py
# Or start components manually in separate terminals:
# Terminal 1: python -m rag_system.api_server
# Terminal 2: cd backend && python server.py
# Terminal 3: npm run dev
```
### 4.3 Test Direct Development
```bash
# Check system health
python system_health_check.py
# Test endpoints
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
# Access the application
open http://localhost:3000
```
---
## 5. Detailed Installation Steps
### 5.1 Repository Setup
```bash
# Clone repository
git clone https://github.com/your-org/rag-system.git
cd rag-system
# Check repository structure
ls -la
# Create required directories
mkdir -p lancedb index_store shared_uploads logs backend
touch backend/chat_data.db
# Set permissions
chmod -R 755 lancedb index_store shared_uploads
chmod 664 backend/chat_data.db
```
### 5.2 Configuration
#### **Environment Variables**
For Docker (automatic via `docker.env`):
```bash
OLLAMA_HOST=http://host.docker.internal:11434
NODE_ENV=production
RAG_API_URL=http://rag-api:8001
NEXT_PUBLIC_API_URL=http://localhost:8000
```
For Direct Development (set automatically by `run_system.py`):
```bash
OLLAMA_HOST=http://localhost:11434
RAG_API_URL=http://localhost:8001
NEXT_PUBLIC_API_URL=http://localhost:8000
```
#### **Model Configuration**
The system defaults to these models:
- **Embedding**: `Qwen/Qwen3-Embedding-0.6B` (1024 dimensions)
- **Generation**: `qwen3:0.6b` for fast responses, `qwen3:8b` for quality
- **Reranking**: Built-in cross-encoder
### 5.3 Database Initialization
```bash
# Initialize SQLite database
python -c "
from backend.database import ChatDatabase
db = ChatDatabase()
db.init_database()
print('✅ Database initialized')
"
# Verify database
sqlite3 backend/chat_data.db ".tables"
```
---
## 6. Verification & Testing
### 6.1 System Health Checks
#### **Comprehensive Health Check:**
```bash
# For Docker deployment
./start-docker.sh status
docker compose ps
# For Direct development
python system_health_check.py
# Universal health check
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
```
#### **RAG System Test:**
```bash
# Test RAG system initialization
python -c "
from rag_system.main import get_agent
agent = get_agent('default')
print('✅ RAG System initialized successfully')
"
# Test embedding generation
python -c "
from rag_system.main import get_agent
agent = get_agent('default')
embedder = agent.retrieval_pipeline._get_text_embedder()
test_emb = embedder.create_embeddings(['Hello world'])
print(f'✅ Embedding generated: {test_emb.shape}')
"
```
### 6.2 Functional Testing
#### **Document Upload Test:**
1. Access http://localhost:3000
2. Click "Create New Index"
3. Upload a PDF document
4. Configure settings and build index
5. Test chat functionality
#### **API Testing:**
```bash
# Test session creation
curl -X POST http://localhost:8000/sessions \
-H "Content-Type: application/json" \
-d '{"title": "Test Session"}'
# Test models endpoint
curl http://localhost:8001/models
# Test health endpoints
curl http://localhost:8000/health
curl http://localhost:8001/health
```
---
## 7. Troubleshooting Installation
### 7.1 Common Issues
#### **Ollama Issues:**
```bash
# Ollama not responding
curl http://localhost:11434/api/tags
# If fails, restart Ollama
pkill ollama
ollama serve
# Reinstall models if needed
ollama pull qwen3:0.6b
ollama pull qwen3:8b
```
#### **Docker Issues:**
```bash
# Docker daemon not running
docker version
# Restart Docker Desktop (macOS/Windows)
# Or restart docker service (Linux)
sudo systemctl restart docker
# Clear Docker cache if build fails
docker system prune -f
```
#### **Python Issues:**
```bash
# Check Python version
python --version # Should be 3.8+
# Check virtual environment
which python
pip list | grep torch
# Reinstall dependencies
pip install -r requirements.txt --force-reinstall
```
#### **Node.js Issues:**
```bash
# Check Node version
node --version # Should be 16+
# Clear and reinstall
rm -rf node_modules package-lock.json
npm install
```
### 7.2 Performance Issues
#### **Memory Problems:**
```bash
# Check system memory
free -h # Linux
vm_stat # macOS
# For Docker: Increase memory allocation
# Docker Desktop → Settings → Resources → Memory → 8GB+
# Use smaller models
ollama pull qwen3:0.6b # Instead of qwen3:8b
```
#### **Slow Performance:**
- Use SSD storage for databases (`lancedb/`, `shared_uploads/`)
- Increase CPU cores if possible
- Close unnecessary applications
- Use smaller batch sizes in configuration
---
## 8. Post-Installation Setup
### 8.1 Model Optimization
```bash
# Install additional models (optional)
ollama pull nomic-embed-text # Alternative embedding model
ollama pull llama3.1:8b # Alternative generation model
# Test model switching
curl -X POST http://localhost:8001/chat \
-H "Content-Type: application/json" \
-d '{"query": "Hello", "model": "qwen3:8b"}'
```
### 8.2 Security Configuration
```bash
# Set proper file permissions
chmod 600 backend/chat_data.db # Restrict database access
chmod 700 lancedb/ # Restrict vector DB access
# Configure firewall (production)
sudo ufw allow 3000/tcp # Frontend
sudo ufw deny 8000/tcp # Backend (internal only)
sudo ufw deny 8001/tcp # RAG API (internal only)
```
### 8.3 Backup Setup
```bash
# Create backup script
cat > backup_system.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
# Backup databases and indexes
cp -r backend/chat_data.db "$BACKUP_DIR/"
cp -r lancedb "$BACKUP_DIR/"
cp -r index_store "$BACKUP_DIR/"
cp -r shared_uploads "$BACKUP_DIR/"
echo "Backup completed: $BACKUP_DIR"
EOF
chmod +x backup_system.sh
```
---
## 9. Success Criteria
### 9.1 Installation Complete When:
- ✅ All health checks pass without errors
- ✅ Frontend loads at http://localhost:3000
- ✅ All models are installed and responding
- ✅ You can create document indexes
- ✅ You can chat with uploaded documents
- ✅ No error messages in logs/terminal
### 9.2 Performance Benchmarks
**Acceptable Performance:**
- System startup: < 5 minutes
- Index creation: < 2 minutes per 100MB document
- Query response: < 30 seconds
- Memory usage: < 8GB total
**Optimal Performance:**
- System startup: < 2 minutes
- Index creation: < 1 minute per 100MB document
- Query response: < 10 seconds
- Memory usage: < 4GB total
---
## 10. Next Steps
### 10.1 Getting Started
1. **Upload Documents**: Create your first index with PDF documents
2. **Explore Features**: Try different query types and models
3. **Customize**: Adjust model settings and chunk sizes
4. **Scale**: Add more documents and create multiple indexes
### 10.2 Additional Resources
- **Quick Start**: See `Documentation/quick_start.md`
- **Docker Usage**: See `Documentation/docker_usage.md`
- **System Architecture**: See `Documentation/architecture_overview.md`
- **API Reference**: See `Documentation/api_reference.md`
---
**Congratulations! 🎉** Your RAG system is now ready to use. Visit http://localhost:3000 to start chatting with your documents.

View File

@ -0,0 +1,70 @@
# 📜 Prompt Inventory (Ground-Truth)
_All generation / verification prompts currently hard-coded in the codebase._
_Last updated: 2025-07-06_
> Edit process: if you change a prompt in code, please **update this file** or, once we migrate to the central registry, delete the entry here.
---
## 1. Indexing / Context Enrichment
| ID | File & Lines | Variable / Builder | Purpose |
|----|--------------|--------------------|---------|
| `overview_builder.default` | `rag_system/indexing/overview_builder.py` `12-21` | `DEFAULT_PROMPT` | Generate 1-paragraph document overview for search-time routing.
| `contextualizer.system` | `rag_system/indexing/contextualizer.py` `11` | `SYSTEM_PROMPT` | System instruction: explain summarisation role.
| `contextualizer.local_context` | same file `13-15` | `LOCAL_CONTEXT_PROMPT_TEMPLATE` | Human message wraps neighbouring chunks.
| `contextualizer.chunk` | same file `17-19` | `CHUNK_PROMPT_TEMPLATE` | Human message shows the target chunk.
| `graph_extractor.entities` | `rag_system/indexing/graph_extractor.py` `20-31` | `entity_prompt` | Ask LLM to list entities.
| `graph_extractor.relationships` | same file `53-64` | `relationship_prompt` | Ask LLM to list relationships.
## 2. Retrieval / Query Transformation
| ID | File & Lines | Purpose |
|----|--------------|---------|
| `query_transformer.expand` | `rag_system/retrieval/query_transformer.py` `10-26` | Produce query rewrites (keywords, boolean). |
| `hyde.hypothetical_doc` | same `115-122` | HyDE hypothetical document generator. |
| `graph_query.translate` | same `124-140` | Translate user question to JSON KG query. |
## 3. Pipeline Answer Synthesis
| ID | File & Lines | Purpose |
|----|--------------|---------|
| `retrieval_pipeline.synth_final` | `rag_system/pipelines/retrieval_pipeline.py` `217-256` | Turn verified facts into answer (with directives 1-6). |
## 4. Agent Classical Loop
| ID | File & Lines | Purpose |
|----|--------------|---------|
| `agent.loop.initial_thought` | `rag_system/agent/loop.py` `157-180` | First LLM call to think about query. |
| `agent.loop.verify_path` | same `190-205` | Secondary thought loop. |
| `agent.loop.compose_sub` | same `506-542` | Compose answer from sub-answers. |
| `agent.loop.router` | same `648-660` | Decide which subsystem handles query. |
## 5. Verifier
| ID | File & Lines | Purpose |
|----|--------------|---------|
| `verifier.fact_check` | `rag_system/agent/verifier.py` `18-58` | Strict JSON-format grounding verifier. |
## 6. Backend Router (Fast path)
| ID | File & Lines | Purpose |
|----|--------------|---------|
| `backend.router` | `backend/server.py` `435-448` | Decide "RAG vs direct LLM" before heavy processing. |
## 7. Miscellaneous
| ID | File & Lines | Purpose |
|----|--------------|---------|
| `vision.placeholder` | `rag_system/utils/ollama_client.py` `169` | Dummy prompt for VLM colour check. |
---
### Missing / To-Do
1. Verify whether **ReActAgent.PROMPT_TEMPLATE** captures every placeholder some earlier lines may need explicit ID when we move to central registry.
2. Search TS/JS code once the backend prompts are ported (currently none).
---
**Next step:** create `rag_system/prompts/registry.yaml` and start moving each prompt above into a keyvalue entry with identical IDs. Update callers gradually using the helper proposed earlier.

View File

@ -0,0 +1,379 @@
# ⚡ Quick Start Guide - RAG System
_Get up and running in 5 minutes!_
---
## 🚀 Choose Your Deployment Method
### Option 1: Docker Deployment (Production Ready) 🐳
Best for: Production deployments, isolated environments, easy scaling
### Option 2: Direct Development (Developer Friendly) 💻
Best for: Development, customization, debugging, faster iteration
---
## 🐳 Docker Deployment
### Prerequisites
- Docker Desktop installed and running
- 8GB+ RAM available
- Internet connection
### Step 1: Clone and Setup
```bash
# Clone repository
git clone <your-repository-url>
cd rag_system_old
# Ensure Docker is running
docker version
```
### Step 2: Install Ollama Locally
**Even with Docker, Ollama runs locally for better performance:**
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama (in one terminal)
ollama serve
# Install models (in another terminal)
ollama pull qwen3:0.6b
ollama pull qwen3:8b
```
### Step 3: Start Docker Containers
```bash
# Start all containers
./start-docker.sh
# Or manually:
docker compose --env-file docker.env up --build -d
```
### Step 4: Verify Deployment
```bash
# Check container status
docker compose ps
# Test endpoints
curl http://localhost:3000 # Frontend
curl http://localhost:8000/health # Backend
curl http://localhost:8001/models # RAG API
```
### Step 5: Access Application
Open your browser to: **http://localhost:3000**
---
## 💻 Direct Development
### Prerequisites
- Python 3.8+
- Node.js 16+ and npm
- 8GB+ RAM available
### Step 1: Clone and Install Dependencies
```bash
# Clone repository
git clone <your-repository-url>
cd rag_system_old
# Install Python dependencies
pip install -r requirements.txt
# Install Node.js dependencies
npm install
```
### Step 2: Install and Configure Ollama
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama (in one terminal)
ollama serve
# Install models (in another terminal)
ollama pull qwen3:0.6b
ollama pull qwen3:8b
```
### Step 3: Start the System
```bash
# Start all components with one command
python run_system.py
```
**Or start components manually in separate terminals:**
```bash
# Terminal 1: RAG API
python -m rag_system.api_server
# Terminal 2: Backend
cd backend && python server.py
# Terminal 3: Frontend
npm run dev
```
### Step 4: Verify Installation
```bash
# Check system health
python system_health_check.py
# Test endpoints
curl http://localhost:3000 # Frontend
curl http://localhost:8000/health # Backend
curl http://localhost:8001/models # RAG API
```
### Step 5: Access Application
Open your browser to: **http://localhost:3000**
---
## 🎯 First Use Guide
### 1. Create a Chat Session
- Click "New Chat" in the interface
- Give your session a descriptive name
### 2. Upload Documents
- Click "Create New Index" button
- Upload PDF files from your computer
- Configure processing options:
- **Chunk Size**: 512 (recommended)
- **Embedding Model**: Qwen/Qwen3-Embedding-0.6B
- **Enable Enrichment**: Yes
- Click "Build Index" and wait for processing
### 3. Start Chatting
- Select your built index
- Ask questions about your documents:
- "What is this document about?"
- "Summarize the key points"
- "What are the main findings?"
- "Compare the arguments in section 3 and 5"
---
## 🔧 Management Commands
### Docker Commands
```bash
# Container management
./start-docker.sh # Start all containers
./start-docker.sh stop # Stop all containers
./start-docker.sh logs # View logs
./start-docker.sh status # Check status
# Manual Docker Compose
docker compose ps # Check status
docker compose logs -f # Follow logs
docker compose down # Stop containers
docker compose up --build -d # Rebuild and start
```
### Direct Development Commands
```bash
# System management
python run_system.py # Start all services
python system_health_check.py # Check system health
# Individual components
python -m rag_system.api_server # RAG API only
cd backend && python server.py # Backend only
npm run dev # Frontend only
# Stop: Press Ctrl+C in terminal running services
```
---
## 🆘 Quick Troubleshooting
### Docker Issues
**Containers not starting?**
```bash
# Check Docker daemon
docker version
# Restart Docker Desktop and try again
./start-docker.sh
```
**Port conflicts?**
```bash
# Check what's using ports
lsof -i :3000 -i :8000 -i :8001
# Stop conflicting processes
./start-docker.sh stop
```
### Direct Development Issues
**Import errors?**
```bash
# Check Python installation
python --version # Should be 3.8+
# Reinstall dependencies
pip install -r requirements.txt --force-reinstall
```
**Node.js errors?**
```bash
# Check Node version
node --version # Should be 16+
# Reinstall dependencies
rm -rf node_modules package-lock.json
npm install
```
### Common Issues
**Ollama not responding?**
```bash
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Restart Ollama
pkill ollama
ollama serve
```
**Out of memory?**
```bash
# Check memory usage
docker stats # For Docker
htop # For direct development
# Recommended: 16GB+ RAM for optimal performance
```
---
## 📊 System Verification
Run this comprehensive check:
```bash
# Check all endpoints
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
# For Docker: Check containers
docker compose ps
```
---
## 🎉 Success!
If you see:
- ✅ All services responding
- ✅ Frontend accessible at http://localhost:3000
- ✅ No error messages
You're ready to start using LocalGPT!
### What's Next?
1. **📚 Upload Documents**: Add your PDF files to create indexes
2. **💬 Start Chatting**: Ask questions about your documents
3. **🔧 Customize**: Explore different models and settings
4. **📖 Learn More**: Check the full documentation below
### 📁 Key Files
```
rag-system/
├── 🐳 start-docker.sh # Docker deployment script
├── 🏃 run_system.py # Direct development launcher
├── 🩺 system_health_check.py # System verification
├── 📋 requirements.txt # Python dependencies
├── 📦 package.json # Node.js dependencies
├── 📁 Documentation/ # Complete documentation
└── 📁 rag_system/ # Core system code
```
### 📖 Additional Resources
- **🏗️ Architecture**: See `Documentation/architecture_overview.md`
- **🔧 Configuration**: See `Documentation/system_overview.md`
- **🚀 Deployment**: See `Documentation/deployment_guide.md`
- **🐛 Troubleshooting**: See `DOCKER_TROUBLESHOOTING.md`
---
**Happy RAG-ing! 🚀**
---
## 🛠️ Indexing Scripts
The repository includes several convenient scripts for document indexing:
### Simple Index Creation Script
For quick document indexing without the UI:
```bash
# Basic usage
./simple_create_index.sh "Index Name" "document.pdf"
# Multiple documents
./simple_create_index.sh "Research Papers" "paper1.pdf" "paper2.pdf" "notes.txt"
# Using wildcards
./simple_create_index.sh "Invoice Collection" ./invoices/*.pdf
```
**Supported file types**: PDF, TXT, DOCX, MD
### Batch Indexing Script
For processing large document collections:
```bash
# Using the Python batch indexing script
python demo_batch_indexing.py
# Or using the direct indexing script
python create_index_script.py
```
These scripts automatically:
- ✅ Check prerequisites (Ollama running, Python dependencies)
- ✅ Validate document formats
- ✅ Create database entries
- ✅ Process documents with the RAG pipeline
- ✅ Generate searchable indexes
---

View File

@ -0,0 +1,616 @@
# 📥 Retrieval Pipeline
_Maps to `rag_system/pipelines/retrieval_pipeline.py` and helpers in `retrieval/`, `rerankers/`._
## Role
Given a **user query** and one or more indexed tables, retrieve the most relevant text chunks and synthesise an answer.
## Sub-components
| Stage | Module | Key Classes / Fns | Notes |
|-------|--------|-------------------|-------|
| Query Pre-processing | `retrieval/query_transformer.py` | `QueryTransformer`, `HyDEGenerator`, `GraphQueryTranslator` | Expands, rewrites, or translates the raw query. |
| Retrieval | `retrieval/retrievers.py` | `BM25Retriever`, `DenseRetriever`, `HybridRetriever` | Abstract over LanceDB vector + FTS search. |
| Reranking | `rerankers/reranker.py` | `ColBERTSmall`, fallback `bge-reranker` | Optionally improves result ordering. |
| Synthesis | `pipelines/retrieval_pipeline.py` | `_synthesize_final_answer()` | Calls LLM with evidence snippets. |
## End-to-End Flow
```mermaid
flowchart LR
Q["User Query"] --> XT["Query Transformer"]
XT -->|variants| RETRIEVE
subgraph Retrieval
RET_BM25[BM25] --> MERGE
RET_DENSE[Dense Vector] --> MERGE
style RET_BM25 fill:#444,stroke:#ccc,color:#fff
style RET_DENSE fill:#444,stroke:#ccc,color:#fff
end
MERGE --> RERANK
RERANK --> K[["Top-K Chunks"]]
K --> SYNTH["Answer Synthesiser\n(LLM)"]
SYNTH --> A["Answer + Sources"]
```
### Narrative
1. **Query Transformer** may expand the query (keyword list, HyDE doc, KG translation) depending on `searchType`.
2. **Retrievers** execute BM25 and/or dense similarity against LanceDB. Combination controlled by `retrievalMode` and `denseWeight`.
3. **Reranker** (if `aiRerank=true` or hybrid search) scores snippets; top `rerankerTopK` chosen.
4. **Synthesiser** streams an LLM completion using the prompt described in `prompt_inventory.md` (`retrieval_pipeline.synth_final`).
## Configuration Flags (passed from UI → backend)
| Flag | Default | Effect |
|------|---------|--------|
| `searchType` | `fts` | UI label (FTS / Dense / Hybrid). |
| `retrievalK` | 10 | Initial candidate count per retriever. |
| `contextWindowSize` | 5 | How many adjacent chunks to merge (late-chunk). |
| `rerankerTopK` | 20 | How many docs to pass into AI reranker. |
| `denseWeight` | 0.5 | When `hybrid`, linear mix weight. |
| `aiRerank` | bool | Toggle reranker. |
| `verify` | bool | If true, pass answer to **Verifier** component. |
## Interfaces
* Reads from **LanceDB** tables `text_pages_<index>`.
* Calls **Ollama** generation model specified in `PIPELINE_CONFIGS`.
* Exposes `RetrievalPipeline.answer_stream()` iterator consumed by SSE API.
## Extension Points
* Plug new retriever by inheriting `BaseRetriever` and registering in `retrievers.py`.
* Swap reranker model via `EXTERNAL_MODELS['reranker_model']`.
* Custom answer prompt can be overridden by passing `prompt_override` to `_synthesize_final_answer()` (not yet surfaced in UI).
## Detailed Implementation Analysis
### Core Architecture Pattern
The `RetrievalPipeline` uses **lazy initialization** for all components to avoid heavy memory usage during startup. Each component (embedder, retrievers, rerankers) is only loaded when first accessed via private `_get_*()` methods.
```python
def _get_text_embedder(self):
if self.text_embedder is None:
self.text_embedder = select_embedder(
self.config.get("embedding_model_name", "Qwen/Qwen3-Embedding-0.6B"),
self.ollama_config.get("host")
)
return self.text_embedder
```
### Thread Safety Implementation
**Critical Issue**: ColBERT reranker and model loading are not thread-safe. The system uses multiple locks:
```python
# Global locks to prevent race conditions
_rerank_lock: Lock = Lock() # Protects .rank() calls
_ai_reranker_init_lock: Lock = Lock() # Prevents concurrent model loading
_sentence_pruner_lock: Lock = Lock() # Serializes Provence model init
```
When multiple queries run in parallel, only one thread can initialize heavy models or perform reranking operations.
### Retrieval Strategy Deep-Dive
#### 1. Multi-Vector Dense Retrieval (`_get_dense_retriever()`)
```python
self.dense_retriever = MultiVectorRetriever(
db_manager, # LanceDB connection
text_embedder, # Qwen3-Embedding embedder
vision_model=None, # Optional multimodal
fusion_config={} # Score combination rules
)
```
**Process**:
1. Query → embedding vector (1024D for Qwen3-Embedding-0.6B)
2. LanceDB ANN search using IVF-PQ index
3. Cosine similarity scoring
4. Returns top-K with metadata
#### 2. BM25 Full-Text Search (`_get_bm25_retriever()`)
```python
# Uses SQLite FTS5 under the hood
SELECT chunk_id, text, bm25(fts_table) as score
FROM fts_table
WHERE fts_table MATCH ?
ORDER BY bm25(fts_table)
LIMIT ?
```
**Token Processing**:
- Stemming via Porter algorithm
- Stop-word removal
- N-gram tokenization (configurable)
#### 3. Hybrid Score Fusion
When both retrievers are enabled:
```python
final_score = (1 - dense_weight) * bm25_score + dense_weight * dense_score
```
Default `dense_weight = 0.7` favors semantic over lexical matching (updated from 0.5).
### Late-Chunk Merging Algorithm
**Problem**: Small chunks lose context; large chunks dilute relevance.
**Solution**: Retrieve small chunks, then expand with neighbors.
```python
def _get_surrounding_chunks_lancedb(self, chunk, window_size):
start_index = max(0, chunk_index - window_size)
end_index = chunk_index + window_size
sql_filter = f"document_id = '{document_id}' AND chunk_index >= {start_index} AND chunk_index <= {end_index}"
results = tbl.search().where(sql_filter).to_list()
# Sort by chunk_index to maintain document order
return sorted(results, key=lambda x: x.get("chunk_index", 0))
```
**Benefits**:
- Maintains granular search precision
- Provides richer context for answer generation
- Configurable window size (default: 5 chunks = ~2500 tokens)
### AI Reranker Implementation
#### ColBERT Strategy (via rerankers-lib)
```python
from rerankers import Reranker
self.ai_reranker = Reranker("answerdotai/answerai-colbert-small-v1", model_type="colbert")
# Usage
scores = reranker.rank(query, [doc.text for doc in candidates])
```
**ColBERT Architecture**:
- **Query encoding**: Each token → 128D vector
- **Document encoding**: Each token → 128D vector
- **Interaction**: MaxSim between all query-doc token pairs
- **Advantage**: Fine-grained token-level matching
#### Fallback: BGE Cross-Encoder
```python
# When ColBERT fails/unavailable
from sentence_transformers import CrossEncoder
model = CrossEncoder('BAAI/bge-reranker-base')
scores = model.predict([(query, doc.text) for doc in candidates])
```
### Answer Synthesis Pipeline
#### Prompt Engineering Pattern
```python
def _synthesize_final_answer(self, query: str, facts: str, *, event_callback=None):
prompt = f"""
You are an AI assistant specialised in answering questions from retrieved context.
Context you receive
• VERIFIED FACTS text snippets retrieved from the user's documents.
• ORIGINAL QUESTION the user's actual query.
Instructions
1. Evaluate each snippet for relevance to the ORIGINAL QUESTION
2. Synthesise an answer **using only information from relevant snippets**
3. If snippets contradict, mention the contradiction explicitly
4. If insufficient information: "I could not find that information in the provided documents."
5. Provide thorough, well-structured answer with relevant numbers/names
6. Do **not** introduce external knowledge
Retrieved Snippets
{facts}
ORIGINAL QUESTION: "{query}"
"""
response = self.llm_client.complete_stream(
prompt=prompt,
model=self.ollama_config["generation_model"] # qwen3:8b
)
for chunk in response:
if event_callback:
event_callback({"type": "answer_chunk", "content": chunk})
yield chunk
```
**Advanced Features**:
- **Source Attribution**: Automatic citation generation
- **Confidence Scoring**: Based on retrieval scores and snippet relevance
- **Answer Verification**: Optional grounding check via Verifier component
### Query Processing and Transformation
#### Query Decomposition
```python
class QueryDecomposer:
def decompose_query(self, query: str) -> List[str]:
"""Break complex queries into simpler sub-queries."""
decomposition_prompt = f"""
Break down this complex question into 2-4 simpler sub-questions that would help answer the original question.
Original question: {query}
Sub-questions:
1.
2.
3.
4.
"""
response = self.llm_client.complete(
prompt=decomposition_prompt,
model=self.enrichment_model # qwen3:0.6b for speed
)
# Parse response into list of sub-queries
return self._parse_subqueries(response)
```
#### HyDE (Hypothetical Document Embeddings)
```python
class HyDEGenerator:
def generate_hypothetical_doc(self, query: str) -> str:
"""Generate hypothetical document that would answer the query."""
hyde_prompt = f"""
Generate a hypothetical document passage that would perfectly answer this question:
Question: {query}
Hypothetical passage:
"""
response = self.llm_client.complete(
prompt=hyde_prompt,
model=self.enrichment_model
)
return response.strip()
```
### Caching and Performance Optimization
#### Semantic Query Caching
```python
class RetrievalPipeline:
def __init__(self, config, ollama_client, ollama_config):
# TTL cache for embeddings and results
self.query_cache = TTLCache(maxsize=100, ttl=300) # 5 min TTL
self.embedding_cache = LRUCache(maxsize=500)
self.semantic_threshold = 0.98 # Similarity threshold for cache hits
def get_cached_result(self, query: str, session_id: str = None) -> Optional[Dict]:
"""Check for semantically similar cached queries."""
query_embedding = self._get_text_embedder().create_embeddings([query])[0]
for cached_query, cached_data in self.query_cache.items():
cached_embedding = cached_data["embedding"]
similarity = cosine_similarity([query_embedding], [cached_embedding])[0][0]
if similarity > self.semantic_threshold:
# Check session scope if configured
if self.cache_scope == "session" and cached_data.get("session_id") != session_id:
continue
print(f"🎯 Cache hit: {similarity:.3f} similarity")
return cached_data["result"]
return None
```
#### Batch Processing Optimizations
```python
def process_query_batch(self, queries: List[str]) -> List[Dict]:
"""Process multiple queries efficiently."""
# Batch embed all queries
query_embeddings = self._get_text_embedder().create_embeddings(queries)
# Batch search
results = []
for i, query in enumerate(queries):
embedding = query_embeddings[i]
# Search with pre-computed embedding
dense_results = self._search_dense_with_embedding(embedding)
bm25_results = self._search_bm25(query)
# Combine and rerank
combined = self._combine_results(dense_results, bm25_results)
reranked = self._rerank_batch([query], [combined])[0]
results.append(reranked)
return results
```
### Advanced Search Features
#### Conversational Context Integration
```python
def answer_with_history(self, query: str, conversation_history: List[Dict], **kwargs):
"""Answer query with conversation context."""
# Build conversational context
context_prompt = self._build_conversation_context(conversation_history)
# Expand query with context
expanded_query = f"{context_prompt}\n\nCurrent question: {query}"
# Process with expanded context
return self.answer_stream(expanded_query, **kwargs)
def _build_conversation_context(self, history: List[Dict]) -> str:
"""Build context from conversation history."""
context_parts = []
for turn in history[-3:]: # Last 3 turns for context
if turn.get("role") == "user":
context_parts.append(f"Previous question: {turn['content']}")
elif turn.get("role") == "assistant":
# Extract key points from previous answers
context_parts.append(f"Previous context: {turn['content'][:200]}...")
return "\n".join(context_parts)
```
#### Multi-Index Search
```python
def search_multiple_indexes(self, query: str, index_ids: List[str], **kwargs):
"""Search across multiple document indexes."""
all_results = []
for index_id in index_ids:
table_name = f"text_pages_{index_id}"
try:
# Search individual index
index_results = self._search_single_index(query, table_name, **kwargs)
# Add index metadata
for result in index_results:
result["source_index"] = index_id
all_results.extend(index_results)
except Exception as e:
print(f"⚠️ Error searching index {index_id}: {e}")
continue
# Global reranking across all indexes
if len(all_results) > kwargs.get("retrieval_k", 20):
all_results = self._rerank_global(query, all_results, **kwargs)
return all_results
```
### Error Handling and Resilience
#### Graceful Degradation
```python
def answer_stream(self, query: str, **kwargs):
"""Main answer method with comprehensive error handling."""
try:
# Try full pipeline
return self._answer_stream_full_pipeline(query, **kwargs)
except Exception as e:
print(f"⚠️ Full pipeline failed: {e}")
try:
# Fallback: Dense-only search
kwargs["search_type"] = "dense"
kwargs["ai_rerank"] = False
return self._answer_stream_fallback(query, **kwargs)
except Exception as e2:
print(f"⚠️ Fallback failed: {e2}")
# Last resort: Direct LLM answer
return self._direct_llm_answer(query)
def _direct_llm_answer(self, query: str):
"""Direct LLM answer as last resort."""
prompt = f"""
The document retrieval system is temporarily unavailable.
Please provide a helpful response acknowledging this limitation.
User question: {query}
Response:
"""
response = self.llm_client.complete_stream(
prompt=prompt,
model=self.ollama_config["generation_model"]
)
yield "⚠️ Document search unavailable. Providing general response:\n\n"
for chunk in response:
yield chunk
```
#### Recovery Mechanisms
```python
def recover_from_embedding_failure(self, query: str, **kwargs):
"""Recover when embedding model fails."""
print("🔄 Attempting embedding model recovery...")
# Try to reinitialize embedder
try:
self.text_embedder = None # Clear failed instance
embedder = self._get_text_embedder() # Reinitialize
# Test with simple query
test_embedding = embedder.create_embeddings(["test"])
if test_embedding is not None:
print("✅ Embedding model recovered")
return True
except Exception as e:
print(f"❌ Recovery failed: {e}")
# Fallback to BM25-only search
kwargs["search_type"] = "bm25"
kwargs["ai_rerank"] = False
print("🔄 Falling back to keyword search only")
return False
```
### Performance Monitoring and Metrics
#### Query Performance Tracking
```python
class PerformanceTracker:
def __init__(self):
self.metrics = {
"query_count": 0,
"avg_response_time": 0,
"cache_hit_rate": 0,
"error_rate": 0,
"embedding_time": 0,
"retrieval_time": 0,
"reranking_time": 0,
"synthesis_time": 0
}
@contextmanager
def track_query(self, query: str):
"""Context manager for tracking query performance."""
start_time = time.time()
try:
yield
# Success metrics
duration = time.time() - start_time
self.metrics["query_count"] += 1
self.metrics["avg_response_time"] = (
(self.metrics["avg_response_time"] * (self.metrics["query_count"] - 1) + duration)
/ self.metrics["query_count"]
)
except Exception as e:
# Error metrics
self.metrics["error_rate"] = (
self.metrics["error_rate"] * self.metrics["query_count"] + 1
) / (self.metrics["query_count"] + 1)
raise e
finally:
self.metrics["query_count"] += 1
```
#### Resource Usage Monitoring
```python
def monitor_memory_usage(self):
"""Monitor memory usage of pipeline components."""
import psutil
import gc
process = psutil.Process()
memory_info = process.memory_info()
print(f"Memory Usage: {memory_info.rss / 1024 / 1024:.1f} MB")
# Component-specific monitoring
if hasattr(self, 'text_embedder') and self.text_embedder:
print(f"Embedder loaded: {type(self.text_embedder).__name__}")
if hasattr(self, 'ai_reranker') and self.ai_reranker:
print(f"Reranker loaded: {type(self.ai_reranker).__name__}")
# Suggest cleanup if memory usage is high
if memory_info.rss > 8 * 1024 * 1024 * 1024: # 8GB
print("⚠️ High memory usage detected - consider cleanup")
gc.collect()
```
---
## Configuration Reference
### Default Pipeline Configuration
```python
RETRIEVAL_CONFIG = {
"retriever": "multivector",
"search_type": "hybrid",
"retrieval_k": 20,
"reranker_top_k": 10,
"dense_weight": 0.7,
"late_chunking": {
"enabled": True,
"window_size": 5
},
"ai_rerank": True,
"verify_answers": False,
"cache_enabled": True,
"cache_ttl": 300,
"semantic_cache_threshold": 0.98
}
```
### Model Configuration
```python
MODEL_CONFIG = {
"embedding_model": "Qwen/Qwen3-Embedding-0.6B",
"generation_model": "qwen3:8b",
"enrichment_model": "qwen3:0.6b",
"reranker_model": "answerdotai/answerai-colbert-small-v1",
"fallback_reranker": "BAAI/bge-reranker-base"
}
```
### Performance Tuning
```python
PERFORMANCE_CONFIG = {
"batch_sizes": {
"embedding": 32,
"reranking": 16,
"synthesis": 1
},
"timeouts": {
"embedding": 30,
"retrieval": 60,
"reranking": 30,
"synthesis": 120
},
"memory_limits": {
"max_cache_size": 1000,
"max_results_per_query": 100,
"chunk_size_limit": 2048
}
}
```
## Extension Examples
### Custom Retriever Implementation
```python
class CustomRetriever(BaseRetriever):
def search(self, query: str, k: int = 10) -> List[Dict]:
"""Implement custom search logic."""
# Your custom retrieval implementation
pass
def get_embeddings(self, texts: List[str]) -> np.ndarray:
"""Generate embeddings for custom retrieval."""
# Your custom embedding logic
pass
```
### Custom Reranker Implementation
```python
class CustomReranker(BaseReranker):
def rank(self, query: str, documents: List[Dict]) -> List[Dict]:
"""Implement custom reranking logic."""
# Your custom reranking implementation
pass
```
### Custom Query Transformer
```python
class CustomQueryTransformer:
def transform(self, query: str, context: Dict = None) -> str:
"""Transform query based on context."""
# Your custom query transformation logic
pass
```

View File

@ -0,0 +1,429 @@
# 🏗️ RAG System - Complete System Overview
_Last updated: 2025-01-09_
This document provides a comprehensive overview of the Advanced Retrieval-Augmented Generation (RAG) System, covering its architecture, components, data flow, and operational characteristics.
---
## 1. System Architecture
### 1.1 High-Level Architecture
The RAG system implements a sophisticated 4-tier microservices architecture:
```mermaid
graph TB
subgraph "Client Layer"
Browser[👤 User Browser]
UI[Next.js Frontend<br/>React/TypeScript]
Browser --> UI
end
subgraph "API Gateway Layer"
Backend[Backend Server<br/>Python HTTP Server<br/>Port 8000]
UI -->|REST API| Backend
end
subgraph "Processing Layer"
RAG[RAG API Server<br/>Document Processing<br/>Port 8001]
Backend -->|Internal API| RAG
end
subgraph "LLM Service Layer"
Ollama[Ollama Server<br/>LLM Inference<br/>Port 11434]
RAG -->|Model Calls| Ollama
end
subgraph "Storage Layer"
SQLite[(SQLite Database<br/>Sessions & Metadata)]
LanceDB[(LanceDB<br/>Vector Embeddings)]
FileSystem[File System<br/>Documents & Indexes]
Backend --> SQLite
RAG --> LanceDB
RAG --> FileSystem
end
```
### 1.2 Component Breakdown
| Component | Technology | Port | Purpose |
|-----------|------------|------|---------|
| **Frontend** | Next.js 15, React 19, TypeScript | 3000 | User interface, chat interactions |
| **Backend** | Python 3.11, HTTP Server | 8000 | API gateway, session management, routing |
| **RAG API** | Python 3.11, Advanced NLP | 8001 | Document processing, retrieval, generation |
| **Ollama** | Go-based LLM server | 11434 | Local LLM inference (embedding, generation) |
| **SQLite** | Embedded database | - | Sessions, messages, index metadata |
| **LanceDB** | Vector database | - | Document embeddings, similarity search |
---
## 2. Core Functionality
### 2.1 Intelligent Dual-Layer Routing
The system's key innovation is its **dual-layer routing architecture** that optimizes both speed and intelligence:
#### **Layer 1: Speed Optimization Routing**
- **Location**: `backend/server.py`
- **Purpose**: Route simple queries to Direct LLM (~1.3s) vs complex queries to RAG Pipeline (~20s)
- **Decision Logic**: Pattern matching, keyword detection, query complexity analysis
```python
# Example routing decisions
"Hello!" → Direct LLM (greeting pattern)
"What does the document say about pricing?" → RAG Pipeline (document keyword)
"What's 2+2?" → Direct LLM (simple + short)
"Summarize the key findings from the report" → RAG Pipeline (complex + indicators)
```
#### **Layer 2: Intelligence Optimization Routing**
- **Location**: `rag_system/agent/loop.py`
- **Purpose**: Within RAG pipeline, route to optimal processing method
- **Methods**:
- `direct_answer`: General knowledge queries
- `rag_query`: Document-specific queries requiring retrieval
- `graph_query`: Entity relationship queries (future feature)
### 2.2 Document Processing Pipeline
#### **Indexing Process**
1. **Document Upload**: PDF files uploaded via web interface
2. **Text Extraction**: Docling library extracts text with layout preservation
3. **Chunking**: Intelligent chunking with configurable strategies (DocLing, Late Chunking, Standard)
4. **Embedding**: Text converted to vector embeddings using Qwen models
5. **Storage**: Vectors stored in LanceDB with metadata in SQLite
#### **Retrieval Process**
1. **Query Processing**: User query analyzed and contextualized
2. **Embedding**: Query converted to vector embedding
3. **Search**: Hybrid search combining vector similarity and BM25 keyword matching
4. **Reranking**: AI-powered reranking for relevance optimization
5. **Synthesis**: LLM generates final answer using retrieved context
### 2.3 Advanced Features
#### **Query Decomposition**
- Complex queries automatically broken into sub-queries
- Parallel processing of sub-queries for efficiency
- Intelligent composition of final answers
#### **Contextual Enrichment**
- Conversation history integration
- Context-aware query expansion
- Session-based memory management
#### **Verification System**
- Answer verification against source documents
- Confidence scoring and grounding checks
- Source attribution and citation
---
## 3. Data Architecture
### 3.1 Storage Systems
#### **SQLite Database** (`backend/chat_data.db`)
```sql
-- Core tables
sessions -- Chat sessions with metadata
messages -- Individual messages and responses
indexes -- Document index metadata
session_indexes -- Links sessions to their indexes
```
#### **LanceDB Vector Store** (`./lancedb/`)
```
tables/
├── text_pages_[uuid] -- Document text embeddings
├── image_pages_[uuid] -- Image embeddings (future)
└── metadata_[uuid] -- Document metadata
```
#### **File System** (`./index_store/`)
```
index_store/
├── overviews/ -- Document summaries for routing
├── bm25/ -- BM25 keyword indexes
└── graph/ -- Knowledge graph data
```
### 3.2 Data Flow
1. **Document Upload** → File System (`shared_uploads/`)
2. **Processing** → Embeddings stored in LanceDB
3. **Metadata** → Index info stored in SQLite
4. **Query** → Search LanceDB + SQLite coordination
5. **Response** → Message history stored in SQLite
---
## 4. Model Architecture
### 4.1 Configurable Model Pipeline
The system supports multiple embedding and generation models with automatic switching:
#### **Current Model Configuration**
```python
EXTERNAL_MODELS = {
"embedding_model": "Qwen/Qwen3-Embedding-0.6B", # 1024D
"reranker_model": "answerdotai/answerai-colbert-small-v1", # ColBERT reranker
"vision_model": "Qwen/Qwen-VL-Chat", # Vision model for multimodal
"fallback_reranker": "BAAI/bge-reranker-base", # Backup reranker
}
OLLAMA_CONFIG = {
"generation_model": "qwen3:8b", # High-quality generation
"enrichment_model": "qwen3:0.6b", # Fast enrichment/routing
"host": "http://localhost:11434"
}
```
#### **Model Switching**
- **Per-Session**: Each chat session can use different embedding models
- **Automatic**: System automatically switches models based on index metadata
- **Dynamic**: Models loaded just-in-time to optimize memory usage
### 4.2 Supported Models
#### **Embedding Models**
- `Qwen/Qwen3-Embedding-0.6B` (1024D) - Default, fast and high-quality
#### **Generation Models** (via Ollama)
- `qwen3:8b` - Primary generation model (high quality)
- `qwen3:0.6b` - Fast enrichment and routing model
#### **Reranking Models**
- `answerdotai/answerai-colbert-small-v1` - Primary ColBERT reranker
- `BAAI/bge-reranker-base` - Fallback cross-encoder reranker
#### **Vision Models** (Multimodal)
- `Qwen/Qwen-VL-Chat` - Vision-language model for image processing
---
## 5. Pipeline Configurations
### 5.1 Default Production Pipeline
```python
PIPELINE_CONFIGS = {
"default": {
"description": "Production-ready pipeline with hybrid search, AI reranking, and verification",
"storage": {
"lancedb_uri": "./lancedb",
"text_table_name": "text_pages_v3",
"bm25_path": "./index_store/bm25",
"graph_path": "./index_store/graph/knowledge_graph.gml"
},
"retrieval": {
"retriever": "multivector",
"search_type": "hybrid",
"late_chunking": {
"enabled": True,
"table_suffix": "_lc_v3"
},
"dense": {
"enabled": True,
"weight": 0.7
},
"bm25": {
"enabled": True,
"index_name": "rag_bm25_index"
}
},
"embedding_model_name": "Qwen/Qwen3-Embedding-0.6B",
"reranker": {
"enabled": True,
"model_name": "answerdotai/answerai-colbert-small-v1",
"top_k": 20
}
}
}
```
### 5.2 Processing Options
#### **Chunking Strategies**
- **Standard**: Fixed-size chunks with overlap
- **DocLing**: Structure-aware chunking using DocLing library
- **Late Chunking**: Small chunks expanded at query time
#### **Enrichment Options**
- **Contextual Enrichment**: AI-generated chunk summaries
- **Overview Building**: Document-level summaries for routing
- **Graph Extraction**: Entity and relationship extraction
---
## 6. Performance Characteristics
### 6.1 Response Times
| Operation | Time Range | Notes |
|-----------|------------|-------|
| Simple Chat | 1-3 seconds | Direct LLM, no retrieval |
| Document Query | 5-15 seconds | Includes retrieval and reranking |
| Complex Analysis | 15-30 seconds | Multi-step reasoning |
| Document Indexing | 2-5 min/100MB | Depends on enrichment settings |
### 6.2 Memory Usage
| Component | Memory Usage | Notes |
|-----------|--------------|-------|
| Embedding Model | 1-2GB | Qwen3-Embedding-0.6B |
| Generation Model | 8-16GB | qwen3:8b |
| Reranker Model | 500MB-1GB | ColBERT reranker |
| Database Cache | 500MB-2GB | LanceDB and SQLite |
### 6.3 Scalability
- **Concurrent Users**: 5-10 users with 16GB RAM
- **Document Capacity**: 10,000+ documents per index
- **Query Throughput**: 10-20 queries/minute per instance
- **Storage**: Approximately 1MB per 100 pages indexed
---
## 7. Security & Privacy
### 7.1 Data Privacy
- **Local Processing**: All AI models run locally via Ollama
- **No External Calls**: No data sent to external APIs
- **Document Isolation**: Documents stored locally with session-based access
- **User Isolation**: Each session maintains separate context
---
## 8. Configuration & Customization
### 8.1 Model Configuration
Models can be configured in `rag_system/main.py`:
```python
# Embedding model configuration
EXTERNAL_MODELS = {
"embedding_model": "Qwen/Qwen3-Embedding-0.6B", # Your preferred model
"reranker_model": "answerdotai/answerai-colbert-small-v1",
}
# Generation model configuration
OLLAMA_CONFIG = {
"generation_model": "qwen3:8b", # Your LLM model
"enrichment_model": "qwen3:0.6b", # Your fast model
}
```
### 8.2 Pipeline Configuration
Processing behavior configured in `PIPELINE_CONFIGS`:
```python
PIPELINE_CONFIGS = {
"retrieval": {
"search_type": "hybrid",
"dense": {"weight": 0.7},
"bm25": {"enabled": True}
},
"chunking": {
"chunk_size": 512,
"chunk_overlap": 64,
"enable_latechunk": True,
"enable_docling": True
}
}
```
### 8.3 UI Configuration
Frontend behavior configured in environment variables:
```bash
NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_ENABLE_STREAMING=true
NEXT_PUBLIC_MAX_FILE_SIZE=50MB
```
---
## 9. Monitoring & Observability
### 9.1 Logging System
- **Structured Logging**: JSON-formatted logs with timestamps
- **Log Levels**: DEBUG, INFO, WARNING, ERROR
- **Log Rotation**: Automatic log file rotation
- **Component Isolation**: Separate logs per service
### 9.2 Health Monitoring
- **Health Endpoints**: `/health` on all services
- **Service Dependencies**: Cascading health checks
- **Performance Metrics**: Response times, error rates
- **Resource Monitoring**: Memory, CPU, disk usage
### 9.3 Debugging Features
- **Debug Mode**: Detailed operation tracing
- **Query Inspection**: Step-by-step query processing
- **Model Switching Logs**: Embedding model change tracking
- **Error Reporting**: Comprehensive error context
---
## ⚙️ Configuration Modes
The system supports multiple configuration modes optimized for different use cases:
### **Default Mode** (`"default"`)
- **Description**: Production-ready pipeline with full features
- **Search**: Hybrid (dense + BM25) with 0.7 dense weight
- **Reranking**: AI-powered ColBERT reranker
- **Query Processing**: Query decomposition enabled
- **Verification**: Grounding verification enabled
- **Performance**: ~3-8 seconds per query
- **Memory**: ~10-16GB (with models loaded)
### **Fast Mode** (`"fast"`)
- **Description**: Speed-optimized pipeline with minimal overhead
- **Search**: Vector-only (no BM25, no late chunking)
- **Reranking**: Disabled
- **Query Processing**: Single-pass, no decomposition
- **Verification**: Disabled
- **Performance**: ~1-3 seconds per query
- **Memory**: ~8-12GB (with models loaded)
### **BM25 Mode** (`"bm25"`)
- **Description**: Traditional keyword-based search
- **Search**: BM25 only
- **Use Case**: Exact keyword matching, legacy compatibility
### **Graph RAG Mode** (`"graph_rag"`)
- **Description**: Knowledge graph integration (currently disabled)
- **Status**: Available for future implementation
- **Use Case**: Relationship-aware retrieval
---
## 10. Development & Extension
### 10.1 Architecture Principles
- **Modular Design**: Clear separation of concerns
- **Configuration-Driven**: Behavior controlled via config files
- **Lazy Loading**: Components loaded on-demand
- **Thread Safety**: Proper synchronization for concurrent access
### 10.2 Extension Points
- **Custom Retrievers**: Implement `BaseRetriever` interface
- **Custom Chunkers**: Extend chunking strategies
- **Custom Models**: Add new embedding or generation models
- **Custom Pipelines**: Create specialized processing workflows
### 10.3 Testing Strategy
- **Unit Tests**: Individual component testing
- **Integration Tests**: End-to-end workflow testing
- **Performance Tests**: Load and stress testing
- **Health Checks**: Automated system validation
---
> **Note**: This overview reflects the current implementation as of 2025-01-09. For the latest changes, check the git history and individual component documentation.

View File

@ -0,0 +1,60 @@
# 🔀 Triage / Routing System
_Maps to `rag_system/agent/loop.Agent._should_use_rag`, `_route_using_overviews`, and the fast-path router in `backend/server.py`._
## Purpose
Determine, for every incoming query, whether it should be answered by:
1. **Direct LLM Generation** (no retrieval) — faster, cheaper.
2. **Retrieval-Augmented Generation (RAG)** — when the answer likely requires document context.
## Decision Signals
| Signal | Source | Notes |
|--------|--------|-------|
| Keyword/regex check | `backend/server.py` (fast path) | Hard-coded quick wins (`what time`, `define`, etc.). |
| Index presence | SQLite (session → indexes) | If no indexes linked, direct LLM. |
| Overview routing | `_route_using_overviews()` | Uses document overviews and enrichment model to predict relevance. |
| LLM router prompt | `agent/loop.py` lines 648-665 | Final arbitrator (Ollama call, JSON output). |
## High-level Flow
```mermaid
flowchart TD
Q["Incoming Query"] --> S1{Session\nHas Indexes?}
S1 -- no --> LLM["Direct LLM Generation"]
S1 -- yes --> S2{Fast Regex\nHeuristics}
S2 -- match--> LLM
S2 -- no --> S3{Overview\nRelevance > τ?}
S3 -- low --> LLM
S3 -- high --> S4[LLM Router\n(prompt @648)]
S4 -- "route: RAG" --> RAG["Retrieval Pipeline"]
S4 -- "route: DIRECT" --> LLM
```
## Detailed Sequence (Code-level)
1. **backend/server.py**
* `handle_session_chat()` builds `router_prompt` (line ~435) and makes a **first pass** decision before calling the heavy agent code.
2. **agent.loop._should_use_rag()**
* Re-evaluates using richer features (e.g., token count, query type).
3. **Overviews Phase** (`_route_using_overviews()`)
* Loads JSONL overviews file per index.
* Calls enrichment model (`qwen3:0.6b`) with prompt: _"Does this overview mention … ? "_ → returns yes/no.
4. **LLM Router** (prompt lines 648-665)
* JSON-only response `{ "route": "RAG" | "DIRECT" }`.
## Interfaces & Dependencies
| Component | Calls / Data |
|-----------|--------------|
| SQLite `chat_sessions` | Reads `indexes` column to know linked index IDs. |
| LanceDB Overviews | Reads `index_store/overviews/<idx>.jsonl`. |
| `OllamaClient` | Generates LLM router decision. |
## Config Flags
* `PIPELINE_CONFIGS.triage.enabled` global toggle.
* Env var `TRIAGE_OVERVIEW_THRESHOLD` min similarity score to prefer RAG (default 0.35).
## Failure / Fallback Modes
1. If overview file missing → skip to LLM router.
2. If LLM router errors → default to RAG (safer) but log warning.
---
_Keep this document updated whenever routing heuristics, thresholds, or prompt wording change._

49
Documentation/verifier.md Normal file
View File

@ -0,0 +1,49 @@
# ✅ Answer Verifier
_File: `rag_system/agent/verifier.py`_
## Objective
Assess whether an answer produced by RAG is **grounded** in the retrieved context snippets.
## Prompt (see `prompt_inventory.md` `verifier.fact_check`)
Strict JSON schema:
```jsonc
{
"verdict": "SUPPORTED" | "NOT_SUPPORTED" | "NEEDS_CLARIFICATION",
"is_grounded": true | false,
"reasoning": "< ≤30 words >",
"confidence_score": 0-100
}
```
## Sequence Diagram
```mermaid
sequenceDiagram
participant RP as Retrieval Pipeline
participant V as Verifier
participant LLM as Ollama
RP->>V: query, context, answer
V->>LLM: verification prompt
LLM-->>V: JSON verdict
V-->>RP: VerificationResult
```
## Usage Sites
| Caller | Code | When |
|--------|------|------|
| `RetrievalPipeline.answer_stream()` | `pipelines/retrieval_pipeline.py` | If `verify=true` flag from frontend. |
| `Agent.loop.run()` | fallback path | Experimental for composed answers. |
## Config
| Flag | Default | Meaning |
|------|---------|---------|
| `verify` | false | Frontend toggle; if true verifier runs. |
| `generation_model` | `qwen3:8b` | Same model as answer generation.
## Failure Modes
* If LLM returns invalid JSON → parse exception handled, result = NOT_SUPPORTED.
* If verification call times out → pipeline logs but still returns answer (unverified).
---
_Keep updated when schema or usage flags change._

201
LICENSE
View File

@ -1,201 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

816
README.md
View File

@ -1,346 +1,688 @@
# LocalGPT: Secure, Local Conversations with Your Documents 🌐
# LocalGPT - Private Document Intelligence Platform
<p align="center">
<a href="https://trendshift.io/repositories/2947" target="_blank"><img src="https://trendshift.io/api/badge/repositories/2947" alt="PromtEngineer%2FlocalGPT | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</p>
<div align="center">
[![GitHub Stars](https://img.shields.io/github/stars/PromtEngineer/localGPT?style=social)](https://github.com/PromtEngineer/localGPT/stargazers)
[![GitHub Forks](https://img.shields.io/github/forks/PromtEngineer/localGPT?style=social)](https://github.com/PromtEngineer/localGPT/network/members)
[![GitHub Issues](https://img.shields.io/github/issues/PromtEngineer/localGPT)](https://github.com/PromtEngineer/localGPT/issues)
[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/PromtEngineer/localGPT)](https://github.com/PromtEngineer/localGPT/pulls)
[![License](https://img.shields.io/github/license/PromtEngineer/localGPT)](https://github.com/PromtEngineer/localGPT/blob/main/LICENSE)
![LocalGPT Logo](https://img.shields.io/badge/LocalGPT-Private%20AI-blue?style=for-the-badge)
🚨🚨 You can run localGPT on a pre-configured [Virtual Machine](https://bit.ly/localGPT). Make sure to use the code: PromptEngineering to get 50% off. I will get a small commision!
**Transform your documents into intelligent, searchable knowledge with complete privacy**
**LocalGPT** is an open-source initiative that allows you to converse with your documents without compromising your privacy. With everything running locally, you can be assured that no data ever leaves your computer. Dive into the world of secure, local document interactions with LocalGPT.
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Docker](https://img.shields.io/badge/docker-supported-blue.svg)](https://www.docker.com/)
## Features 🌟
- **Utmost Privacy**: Your data remains on your computer, ensuring 100% security.
- **Versatile Model Support**: Seamlessly integrate a variety of open-source models, including HF, GPTQ, GGML, and GGUF.
- **Diverse Embeddings**: Choose from a range of open-source embeddings.
- **Reuse Your LLM**: Once downloaded, reuse your LLM without the need for repeated downloads.
- **Chat History**: Remembers your previous conversations (in a session).
- **API**: LocalGPT has an API that you can use for building RAG Applications.
- **Graphical Interface**: LocalGPT comes with two GUIs, one uses the API and the other is standalone (based on streamlit).
- **GPU, CPU, HPU & MPS Support**: Supports multiple platforms out of the box, Chat with your data using `CUDA`, `CPU`, `HPU (Intel® Gaudi®)` or `MPS` and more!
[Quick Start](#quick-start) • [Features](#features) • [Installation](#installation) • [Documentation](#documentation) • [API Reference](#api-reference)
## Dive Deeper with Our Videos 🎥
- [Detailed code-walkthrough](https://youtu.be/MlyoObdIHyo)
- [Llama-2 with LocalGPT](https://youtu.be/lbFmceo4D5E)
- [Adding Chat History](https://youtu.be/d7otIM_MCZs)
- [LocalGPT - Updated (09/17/2023)](https://youtu.be/G_prHSKX9d4)
</div>
## Technical Details 🛠️
By selecting the right local models and the power of `LangChain` you can run the entire RAG pipeline locally, without any data leaving your environment, and with reasonable performance.
## 🚀 What is LocalGPT?
- `ingest.py` uses `LangChain` tools to parse the document and create embeddings locally using `InstructorEmbeddings`. It then stores the result in a local vector database using `Chroma` vector store.
- `run_localGPT.py` uses a local LLM to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.
- You can replace this local LLM with any other LLM from the HuggingFace. Make sure whatever LLM you select is in the HF format.
LocalGPT is a **private, local document intelligence platform** that allows you to chat with your documents using advanced AI models - all while keeping your data completely private and secure on your own infrastructure.
This project was inspired by the original [privateGPT](https://github.com/imartinez/privateGPT).
### 🎯 Key Benefits
## Built Using 🧩
- [LangChain](https://github.com/hwchase17/langchain)
- [HuggingFace LLMs](https://huggingface.co/models)
- [InstructorEmbeddings](https://instructor-embedding.github.io/)
- [LLAMACPP](https://github.com/abetlen/llama-cpp-python)
- [ChromaDB](https://www.trychroma.com/)
- [Streamlit](https://streamlit.io/)
- **🔒 Complete Privacy**: Your documents never leave your server
- **🧠 Advanced AI**: State-of-the-art RAG (Retrieval-Augmented Generation) with smart routing
- **📚 Multi-Format Support**: PDFs, Word docs, text files, and more
- **🔍 Intelligent Search**: Hybrid search combining semantic similarity and keyword matching
- **⚡ High Performance**: Optimized for speed with batch processing and caching
- **🐳 Easy Deployment**: Docker support for simple setup and scaling
# Environment Setup 🌍
---
1. 📥 Clone the repo using git:
## ✨ Features
```shell
git clone https://github.com/PromtEngineer/localGPT.git
### 📖 Document Processing
- **Multi-format Support**: PDF, DOCX, TXT, Markdown, and more
- **Smart Chunking**: Intelligent text segmentation with overlap optimization
- **Contextual Enrichment**: Enhanced document understanding with AI-generated context
- **Batch Processing**: Handle multiple documents simultaneously
### 🤖 AI-Powered Chat
- **Natural Language Queries**: Ask questions in plain English
- **Source Attribution**: Every answer includes document references
- **Smart Routing**: Automatically chooses the best approach for each query
- **Multiple AI Models**: Support for Ollama, OpenAI, and Hugging Face models
### 🔍 Advanced Search
- **Hybrid Search**: Combines semantic similarity with keyword matching
- **Vector Embeddings**: State-of-the-art embedding models for semantic understanding
- **BM25 Ranking**: Traditional information retrieval for precise keyword matching
- **Reranking**: AI-powered result refinement for better relevance
### 🛠️ Developer-Friendly
- **RESTful APIs**: Complete API access for integration
- **Real-time Progress**: Live updates during document processing
- **Flexible Configuration**: Customize models, chunk sizes, and search parameters
- **Extensible Architecture**: Plugin system for custom components
### 🎨 Modern Interface
- **Intuitive Web UI**: Clean, responsive design
- **Session Management**: Organize conversations by topic
- **Index Management**: Easy document collection management
- **Real-time Chat**: Streaming responses for immediate feedback
---
## 🚀 Quick Start
### Prerequisites
- Python 3.8 or higher (tested with Python 3.11.5)
- Node.js 16+ and npm (tested with Node.js 23.10.0, npm 10.9.2)
- Docker (optional, for containerized deployment)
- 8GB+ RAM (16GB+ recommended)
- Ollama (required for both deployment approaches)
### Option 1: Docker Deployment (Recommended for Production)
```bash
# Clone the repository
git clone https://github.com/yourusername/localgpt.git
cd localgpt
# Install Ollama locally (required even for Docker)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen3:0.6b
ollama pull qwen3:8b
# Start Ollama
ollama serve
# Start with Docker (in a new terminal)
./start-docker.sh
# Access the application
open http://localhost:3000
```
2. 🐍 Install [conda](https://www.anaconda.com/download) for virtual environment management. Create and activate a new virtual environment.
**Docker Management Commands:**
```bash
# Check container status
docker compose ps
```shell
conda create -n localGPT python=3.10.0
conda activate localGPT
# View logs
docker compose logs -f
# Stop containers
./start-docker.sh stop
```
3. 🛠️ Install the dependencies using pip
### Option 2: Direct Development (Recommended for Development)
To set up your environment to run the code, first install all requirements:
```bash
# Clone the repository
git clone https://github.com/yourusername/localgpt.git
cd localgpt
```shell
# Install Python dependencies
pip install -r requirements.txt
# Install Node.js dependencies
npm install
# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen3:0.6b
ollama pull qwen3:8b
ollama serve
# Start the system (in a new terminal)
python run_system.py
# Access the application
open http://localhost:3000
```
***Installing LLAMA-CPP :***
**Direct Development Management:**
```bash
# Check system health (comprehensive diagnostics)
python system_health_check.py
LocalGPT uses [LlamaCpp-Python](https://github.com/abetlen/llama-cpp-python) for GGML (you will need llama-cpp-python <=0.1.76) and GGUF (llama-cpp-python >=0.1.83) models.
# Check service status
python run_system.py --health
To run the quantized Llama3 model, ensure you have llama-cpp-python version 0.2.62 or higher installed.
If you want to use BLAS or Metal with [llama-cpp](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal) you can set appropriate flags:
For `NVIDIA` GPUs support, use `cuBLAS`
```shell
# Example: cuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir
# Stop all services
python run_system.py --stop
# Or press Ctrl+C in the terminal running python run_system.py
```
For Apple Metal (`M1/M2`) support, use
### Option 3: Manual Component Startup
```shell
# Example: METAL
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir
```bash
# Terminal 1: Start Ollama
ollama serve
# Terminal 2: Start RAG API
python -m rag_system.api_server
# Terminal 3: Start Backend
cd backend && python server.py
# Terminal 4: Start Frontend
npm run dev
# Access at http://localhost:3000
```
For more details, please refer to [llama-cpp](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal)
## Docker 🐳
---
Installing the required packages for GPU inference on NVIDIA GPUs, like gcc 11 and CUDA 11, may cause conflicts with other packages in your system.
As an alternative to Conda, you can use Docker with the provided Dockerfile.
It includes CUDA, your system just needs Docker, BuildKit, your NVIDIA GPU driver and the NVIDIA container toolkit.
Build as `docker build -t localgpt .`, requires BuildKit.
Docker BuildKit does not support GPU during *docker build* time right now, only during *docker run*.
Run as `docker run -it --mount src="$HOME/.cache",target=/root/.cache,type=bind --gpus=all localgpt`.
For running the code on Intel® Gaudi® HPU, use the following Dockerfile - `Dockerfile_hpu`.
## 📋 Installation Guide
## Test dataset
### System Requirements
For testing, this repository comes with [Constitution of USA](https://constitutioncenter.org/media/files/constitution.pdf) as an example file to use.
| Component | Minimum | Recommended | Tested |
|-----------|---------|-------------|--------|
| Python | 3.8+ | 3.11+ | 3.11.5 |
| Node.js | 16+ | 18+ | 23.10.0 |
| RAM | 8GB | 16GB+ | 16GB+ |
| Storage | 10GB | 50GB+ | 50GB+ |
| CPU | 4 cores | 8+ cores | 8+ cores |
| GPU | Optional | NVIDIA GPU with 8GB+ VRAM | MPS (Apple Silicon) |
## Ingesting your OWN Data.
Put your files in the `SOURCE_DOCUMENTS` folder. You can put multiple folders within the `SOURCE_DOCUMENTS` folder and the code will recursively read your files.
### Detailed Installation
### Support file formats:
LocalGPT currently supports the following file formats. LocalGPT uses `LangChain` for loading these file formats. The code in `constants.py` uses a `DOCUMENT_MAP` dictionary to map a file format to the corresponding loader. In order to add support for another file format, simply add this dictionary with the file format and the corresponding loader from [LangChain](https://python.langchain.com/docs/modules/data_connection/document_loaders/).
#### 1. Install System Dependencies
```shell
DOCUMENT_MAP = {
".txt": TextLoader,
".md": TextLoader,
".py": TextLoader,
".pdf": PDFMinerLoader,
".csv": CSVLoader,
".xls": UnstructuredExcelLoader,
".xlsx": UnstructuredExcelLoader,
".docx": Docx2txtLoader,
".doc": Docx2txtLoader,
**Ubuntu/Debian:**
```bash
sudo apt update
sudo apt install python3.8 python3-pip nodejs npm docker.io docker-compose
```
**macOS:**
```bash
brew install python@3.8 node npm docker docker-compose
```
**Windows:**
```bash
# Install Python 3.8+, Node.js, and Docker Desktop
# Then use PowerShell or WSL2
```
#### 2. Install AI Models
**Install Ollama (Recommended):**
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull recommended models
ollama pull qwen3:0.6b # Fast generation model
ollama pull qwen3:8b # High-quality generation model
```
#### 3. Configure Environment
```bash
# Copy environment template
cp .env.example .env
# Edit configuration
nano .env
```
**Key Configuration Options:**
```env
# AI Models
OLLAMA_HOST=http://localhost:11434
DEFAULT_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
DEFAULT_GENERATION_MODEL=qwen3:0.6b
# Database
DATABASE_PATH=./backend/chat_data.db
VECTOR_DB_PATH=./lancedb
# Server Settings
BACKEND_PORT=8000
FRONTEND_PORT=3000
```
#### 4. Initialize the System
```bash
# Run system health check
python system_health_check.py
# Initialize databases
python -c "from backend.database import ChatDatabase; ChatDatabase().init_database()"
# Test installation
python -c "from rag_system.main import get_agent; print('✅ Installation successful!')"
# Validate complete setup
python run_system.py --health
```
---
## 🎯 Getting Started
### 1. Create Your First Index
An **index** is a collection of processed documents that you can chat with.
#### Using the Web Interface:
1. Open http://localhost:3000
2. Click "Create New Index"
3. Upload your documents (PDF, DOCX, TXT)
4. Configure processing options
5. Click "Build Index"
#### Using Scripts:
```bash
# Simple script approach
./simple_create_index.sh "My Documents" "path/to/document.pdf"
# Interactive script
python create_index_script.py
```
#### Using API:
```bash
# Create index
curl -X POST http://localhost:8000/indexes \
-H "Content-Type: application/json" \
-d '{"name": "My Index", "description": "My documents"}'
# Upload documents
curl -X POST http://localhost:8000/indexes/INDEX_ID/upload \
-F "files=@document.pdf"
# Build index
curl -X POST http://localhost:8000/indexes/INDEX_ID/build
```
### 2. Start Chatting
Once your index is built:
1. **Create a Chat Session**: Click "New Chat" or use an existing session
2. **Select Your Index**: Choose which document collection to query
3. **Ask Questions**: Type natural language questions about your documents
4. **Get Answers**: Receive AI-generated responses with source citations
### 3. Advanced Features
#### Custom Model Configuration
```bash
# Use different models for different tasks
curl -X POST http://localhost:8000/sessions \
-H "Content-Type: application/json" \
-d '{
"title": "High Quality Session",
"model": "qwen3:8b",
"embedding_model": "Qwen/Qwen3-Embedding-4B"
}'
```
#### Batch Document Processing
```bash
# Process multiple documents at once
python demo_batch_indexing.py --config batch_indexing_config.json
```
#### API Integration
```python
import requests
# Chat with your documents via API
response = requests.post('http://localhost:8000/chat', json={
'query': 'What are the key findings in the research papers?',
'session_id': 'your-session-id',
'search_type': 'hybrid',
'retrieval_k': 20
})
print(response.json()['response'])
```
---
## 🔧 Configuration
### Model Configuration
LocalGPT supports multiple AI model providers:
#### Ollama Models (Local)
```python
OLLAMA_CONFIG = {
'host': 'http://localhost:11434',
'generation_model': 'qwen3:0.6b',
'embedding_model': 'nomic-embed-text'
}
```
### Ingest
Run the following command to ingest all the data.
If you have `cuda` setup on your system.
```shell
python ingest.py
```
You will see an output like this:
<img width="1110" alt="Screenshot 2023-09-14 at 3 36 27 PM" src="https://github.com/PromtEngineer/localGPT/assets/134474669/c9274e9a-842c-49b9-8d95-606c3d80011f">
Use the device type argument to specify a given device.
To run on `cpu`
```sh
python ingest.py --device_type cpu
#### Hugging Face Models
```python
EXTERNAL_MODELS = {
'embedding': {
'Qwen/Qwen3-Embedding-0.6B': {'dimensions': 1024},
'Qwen/Qwen3-Embedding-4B': {'dimensions': 2048},
'Qwen/Qwen3-Embedding-8B': {'dimensions': 4096}
}
}
```
To run on `M1/M2`
### Processing Configuration
```sh
python ingest.py --device_type mps
```python
PIPELINE_CONFIGS = {
'default': {
'chunk_size': 512,
'chunk_overlap': 64,
'retrieval_mode': 'hybrid',
'window_size': 5,
'enable_enrich': True,
'latechunk': True,
'docling_chunk': True
},
'fast': {
'chunk_size': 256,
'chunk_overlap': 32,
'retrieval_mode': 'vector',
'enable_enrich': False
}
}
```
Use help for a full list of supported devices.
### Search Configuration
```sh
python ingest.py --help
```python
SEARCH_CONFIG = {
'hybrid': {
'dense_weight': 0.7,
'sparse_weight': 0.3,
'retrieval_k': 20,
'reranker_top_k': 10
}
}
```
This will create a new folder called `DB` and use it for the newly created vector store. You can ingest as many documents as you want, and all will be accumulated in the local embeddings database.
If you want to start from an empty database, delete the `DB` and reingest your documents.
---
Note: When you run this for the first time, it will need internet access to download the embedding model (default: `Instructor Embedding`). In the subsequent runs, no data will leave your local environment and you can ingest data without internet connection.
## 📚 Use Cases
## Ask questions to your documents, locally!
### 📊 Business Intelligence
- **Document Analysis**: Extract insights from reports, contracts, and presentations
- **Compliance**: Query regulatory documents and policies
- **Knowledge Management**: Build searchable company knowledge bases
In order to chat with your documents, run the following command (by default, it will run on `cuda`).
### 🔬 Research & Academia
- **Literature Review**: Analyze research papers and academic publications
- **Data Analysis**: Query experimental results and datasets
- **Collaboration**: Share findings with team members securely
```shell
python run_localGPT.py
```
You can also specify the device type just like `ingest.py`
### ⚖️ Legal & Compliance
- **Case Research**: Search through legal documents and precedents
- **Contract Analysis**: Extract key terms and obligations
- **Regulatory Compliance**: Query compliance requirements and guidelines
```shell
python run_localGPT.py --device_type mps # to run on Apple silicon
### 🏥 Healthcare
- **Medical Records**: Analyze patient data and treatment histories
- **Research**: Query medical literature and clinical studies
- **Compliance**: Navigate healthcare regulations and standards
### 💼 Personal Productivity
- **Document Organization**: Create searchable personal knowledge bases
- **Research**: Analyze books, articles, and reference materials
- **Learning**: Build interactive study materials from textbooks
---
## 🛠️ Troubleshooting
### Common Issues
#### Installation Problems
```bash
# Check Python version
python --version # Should be 3.8+
# Check dependencies
pip list | grep -E "(torch|transformers|lancedb)"
# Reinstall dependencies
pip install -r requirements.txt --force-reinstall
```
```shell
# To run on Intel® Gaudi® hpu
MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.2" # in constants.py
python run_localGPT.py --device_type hpu
#### Model Loading Issues
```bash
# Check Ollama status
ollama list
curl http://localhost:11434/api/tags
# Pull missing models
ollama pull qwen3:0.6b
```
This will load the ingested vector store and embedding model. You will be presented with a prompt:
#### Database Issues
```bash
# Check database connectivity
python -c "from backend.database import ChatDatabase; db = ChatDatabase(); print('✅ Database OK')"
```shell
> Enter a query:
# Reset database (WARNING: This deletes all data)
rm backend/chat_data.db
python -c "from backend.database import ChatDatabase; ChatDatabase().init_database()"
```
After typing your question, hit enter. LocalGPT will take some time based on your hardware. You will get a response like this below.
<img width="1312" alt="Screenshot 2023-09-14 at 3 33 19 PM" src="https://github.com/PromtEngineer/localGPT/assets/134474669/a7268de9-ade0-420b-a00b-ed12207dbe41">
#### Performance Issues
```bash
# Check system resources
python system_health_check.py
Once the answer is generated, you can then ask another question without re-running the script, just wait for the prompt again.
# Monitor memory usage
htop # or Task Manager on Windows
***Note:*** When you run this for the first time, it will need internet connection to download the LLM (default: `TheBloke/Llama-2-7b-Chat-GGUF`). After that you can turn off your internet connection, and the script inference would still work. No data gets out of your local environment.
Type `exit` to finish the script.
### Extra Options with run_localGPT.py
You can use the `--show_sources` flag with `run_localGPT.py` to show which chunks were retrieved by the embedding model. By default, it will show 4 different sources/chunks. You can change the number of sources/chunks
```shell
python run_localGPT.py --show_sources
# Optimize for low-memory systems
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
```
Another option is to enable chat history. ***Note***: This is disabled by default and can be enabled by using the `--use_history` flag. The context window is limited so keep in mind enabling history will use it and might overflow.
### Getting Help
```shell
python run_localGPT.py --use_history
1. **Check Logs**: Look at `logs/system.log` for detailed error messages
2. **System Health**: Run `python system_health_check.py`
3. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)
4. **GitHub Issues**: Report bugs and request features
5. **Community**: Join our Discord/Slack community
---
## 🔗 API Reference
### Core Endpoints
#### Chat API
```http
POST /chat
Content-Type: application/json
{
"query": "What are the main topics discussed?",
"session_id": "uuid",
"search_type": "hybrid",
"retrieval_k": 20
}
```
You can store user questions and model responses with flag `--save_qa` into a csv file `/local_chat_history/qa_log.csv`. Every interaction will be stored.
#### Index Management
```http
# Create index
POST /indexes
{"name": "My Index", "description": "Description"}
```shell
python run_localGPT.py --save_qa
# Upload documents
POST /indexes/{id}/upload
Content-Type: multipart/form-data
# Build index
POST /indexes/{id}/build
# Get index status
GET /indexes/{id}
```
# Run the Graphical User Interface
#### Session Management
```http
# Create session
POST /sessions
{"title": "My Session", "model": "qwen3:0.6b"}
1. Open `constants.py` in an editor of your choice and depending on choice add the LLM you want to use. By default, the following model will be used:
# Get sessions
GET /sessions
```shell
MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"
```
# Link index to session
POST /sessions/{session_id}/indexes/{index_id}
```
3. Open up a terminal and activate your python environment that contains the dependencies installed from requirements.txt.
### Advanced Features
4. Navigate to the `/LOCALGPT` directory.
#### Streaming Chat
```http
POST /chat/stream
Content-Type: application/json
5. Run the following command `python run_localGPT_API.py`. The API should being to run.
{
"query": "Explain the methodology",
"session_id": "uuid",
"stream": true
}
```
6. Wait until everything has loaded in. You should see something like `INFO:werkzeug:Press CTRL+C to quit`.
#### Batch Processing
```http
POST /batch/index
Content-Type: application/json
7. Open up a second terminal and activate the same python environment.
{
"file_paths": ["doc1.pdf", "doc2.pdf"],
"config": {
"chunk_size": 512,
"enable_enrich": true
}
}
```
8. Navigate to the `/LOCALGPT/localGPTUI` directory.
For complete API documentation, see [API_REFERENCE.md](API_REFERENCE.md).
9. Run the command `python localGPTUI.py`.
---
10. Open up a web browser and go the address `http://localhost:5111/`.
## 🏗️ Architecture
LocalGPT is built with a modular, scalable architecture:
# How to select different LLM models?
```mermaid
graph TB
UI[Web Interface] --> API[Backend API]
API --> Agent[RAG Agent]
Agent --> Retrieval[Retrieval Pipeline]
Agent --> Generation[Generation Pipeline]
Retrieval --> Vector[Vector Search]
Retrieval --> BM25[BM25 Search]
Retrieval --> Rerank[Reranking]
Vector --> LanceDB[(LanceDB)]
BM25 --> BM25DB[(BM25 Index)]
Generation --> Ollama[Ollama Models]
Generation --> HF[Hugging Face Models]
API --> SQLite[(SQLite DB)]
```
To change the models you will need to set both `MODEL_ID` and `MODEL_BASENAME`.
### Key Components
1. Open up `constants.py` in the editor of your choice.
2. Change the `MODEL_ID` and `MODEL_BASENAME`. If you are using a quantized model (`GGML`, `GPTQ`, `GGUF`), you will need to provide `MODEL_BASENAME`. For unquantized models, set `MODEL_BASENAME` to `NONE`
5. There are a number of example models from HuggingFace that have already been tested to be run with the original trained model (ending with HF or have a .bin in its "Files and versions"), and quantized models (ending with GPTQ or have a .no-act-order or .safetensors in its "Files and versions").
6. For models that end with HF or have a .bin inside its "Files and versions" on its HuggingFace page.
- **Frontend**: React/Next.js web interface
- **Backend**: Python FastAPI server
- **RAG Agent**: Intelligent query routing and processing
- **Vector Database**: LanceDB for semantic search
- **Search Engine**: BM25 for keyword search
- **AI Models**: Ollama and Hugging Face integration
- Make sure you have a `MODEL_ID` selected. For example -> `MODEL_ID = "TheBloke/guanaco-7B-HF"`
- Go to the [HuggingFace Repo](https://huggingface.co/TheBloke/guanaco-7B-HF)
---
7. For models that contain GPTQ in its name and or have a .no-act-order or .safetensors extension inside its "Files and versions on its HuggingFace page.
## 🤝 Contributing
- Make sure you have a `MODEL_ID` selected. For example -> model_id = `"TheBloke/wizardLM-7B-GPTQ"`
- Got to the corresponding [HuggingFace Repo](https://huggingface.co/TheBloke/wizardLM-7B-GPTQ) and select "Files and versions".
- Pick one of the model names and set it as `MODEL_BASENAME`. For example -> `MODEL_BASENAME = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"`
We welcome contributions from developers of all skill levels! LocalGPT is an open-source project that benefits from community involvement.
8. Follow the same steps for `GGUF` and `GGML` models.
### 🚀 Quick Start for Contributors
# GPU and VRAM Requirements
```bash
# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/multimodal_rag.git
cd multimodal_rag
Below is the VRAM requirement for different models depending on their size (Billions of parameters). The estimates in the table does not include VRAM used by the Embedding models - which use an additional 2GB-7GB of VRAM depending on the model.
# Set up development environment
pip install -r requirements.txt
npm install
| Mode Size (B) | float32 | float16 | GPTQ 8bit | GPTQ 4bit |
| ------- | --------- | --------- | -------------- | ------------------ |
| 7B | 28 GB | 14 GB | 7 GB - 9 GB | 3.5 GB - 5 GB |
| 13B | 52 GB | 26 GB | 13 GB - 15 GB | 6.5 GB - 8 GB |
| 32B | 130 GB | 65 GB | 32.5 GB - 35 GB| 16.25 GB - 19 GB |
| 65B | 260.8 GB | 130.4 GB | 65.2 GB - 67 GB| 32.6 GB - 35 GB |
# Install Ollama and models
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen3:0.6b qwen3:8b
# Verify setup
python system_health_check.py
python run_system.py --mode dev
```
# System Requirements
### 📋 How to Contribute
## Python Version
1. **🐛 Report Bugs**: Use our [bug report template](.github/ISSUE_TEMPLATE/bug_report.md)
2. **💡 Request Features**: Use our [feature request template](.github/ISSUE_TEMPLATE/feature_request.md)
3. **🔧 Submit Code**: Follow our [development workflow](CONTRIBUTING.md#development-workflow)
4. **📚 Improve Docs**: Help make our documentation better
To use this software, you must have Python 3.10 or later installed. Earlier versions of Python will not compile.
### 🎯 Priority Areas
## C++ Compiler
- **Performance Optimization**: Improve indexing and retrieval speed
- **Model Integration**: Add support for new AI models
- **User Experience**: Enhance the web interface
- **Testing**: Expand test coverage
- **Documentation**: Improve setup and usage guides
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer.
### 📖 Detailed Guidelines
### For Windows 10/11
For comprehensive contributing guidelines, including:
- Development setup and workflow
- Coding standards and best practices
- Testing requirements
- Documentation standards
- Release process
To install a C++ compiler on Windows 10/11, follow these steps:
**👉 See our [CONTRIBUTING.md](CONTRIBUTING.md) guide**
1. Install Visual Studio 2022.
2. Make sure the following components are selected:
- Universal Windows Platform development
- C++ CMake tools for Windows
3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
4. Run the installer and select the "gcc" component.
---
### NVIDIA Driver's Issues:
## 📄 License
Follow this [page](https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-22-04) to install NVIDIA Drivers.
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Star History
---
[![Star History Chart](https://api.star-history.com/svg?repos=PromtEngineer/localGPT&type=Date)](https://star-history.com/#PromtEngineer/localGPT&Date)
## 🙏 Acknowledgments
# Disclaimer
- **Ollama**: For providing excellent local AI model serving
- **LanceDB**: For high-performance vector database
- **Hugging Face**: For state-of-the-art AI models
- **React/Next.js**: For the modern web interface
- **FastAPI**: For the robust backend framework
This is a test project to validate the feasibility of a fully local solution for question answering using LLMs and Vector embeddings. It is not production ready, and it is not meant to be used in production. Vicuna-7B is based on the Llama model so that has the original Llama license.
---
# Common Errors
## 📞 Support
- [Torch not compatible with CUDA enabled](https://github.com/pytorch/pytorch/issues/30664)
- **Documentation**: [Technical Docs](TECHNICAL_DOCS.md)
- **Issues**: [GitHub Issues](https://github.com/yourusername/localgpt/issues)
- **Discussions**: [GitHub Discussions](https://github.com/yourusername/localgpt/discussions)
- **Email**: support@localgpt.com
- Get CUDA version
```shell
nvcc --version
```
```shell
nvidia-smi
```
- Try installing PyTorch depending on your CUDA version
```shell
conda install -c pytorch torchvision cudatoolkit=10.1 pytorch
```
- If it doesn't work, try reinstalling
```shell
pip uninstall torch
pip cache purge
pip install torch -f https://download.pytorch.org/whl/torch_stable.html
```
---
- [ERROR: pip's dependency resolver does not currently take into account all the packages that are installed](https://stackoverflow.com/questions/72672196/error-pips-dependency-resolver-does-not-currently-take-into-account-all-the-pa/76604141#76604141)
```shell
pip install h5py
pip install typing-extensions
pip install wheel
```
- [Failed to import transformers](https://github.com/huggingface/transformers/issues/11262)
- Try re-install
```shell
conda uninstall tokenizers, transformers
pip install transformers
```
<div align="center">
**Made with ❤️ for private, intelligent document processing**
[⭐ Star us on GitHub](https://github.com/yourusername/localgpt) • [🐛 Report Bug](https://github.com/yourusername/localgpt/issues) • [💡 Request Feature](https://github.com/yourusername/localgpt/issues)
</div>

Binary file not shown.

93
backend/README.md Normal file
View File

@ -0,0 +1,93 @@
# localGPT Backend
Simple Python backend that connects your frontend to Ollama for local LLM chat.
## Prerequisites
1. **Install Ollama** (if not already installed):
```bash
# Visit https://ollama.ai or run:
curl -fsSL https://ollama.ai/install.sh | sh
```
2. **Start Ollama**:
```bash
ollama serve
```
3. **Pull a model** (optional, server will suggest if needed):
```bash
ollama pull llama3.2
```
## Setup
1. **Install Python dependencies**:
```bash
pip install -r requirements.txt
```
2. **Test Ollama connection**:
```bash
python ollama_client.py
```
3. **Start the backend server**:
```bash
python server.py
```
Server will run on `http://localhost:8000`
## API Endpoints
### Health Check
```bash
GET /health
```
Returns server status and available models.
### Chat
```bash
POST /chat
Content-Type: application/json
{
"message": "Hello!",
"model": "llama3.2:latest",
"conversation_history": []
}
```
Returns:
```json
{
"response": "Hello! How can I help you?",
"model": "llama3.2:latest",
"message_count": 1
}
```
## Testing
Test the chat endpoint:
```bash
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello!", "model": "llama3.2:latest"}'
```
## Frontend Integration
Your React frontend should connect to:
- **Backend**: `http://localhost:8000`
- **Chat endpoint**: `http://localhost:8000/chat`
## What's Next
This simple backend is ready for:
- ✅ **Real-time chat** with local LLMs
- 🔜 **Document upload** for RAG
- 🔜 **Vector database** integration
- 🔜 **Streaming responses**
- 🔜 **Chat history** persistence

BIN
backend/chat_data.db Normal file

Binary file not shown.

684
backend/database.py Normal file
View File

@ -0,0 +1,684 @@
import sqlite3
import uuid
import json
from datetime import datetime
from typing import List, Dict, Optional, Tuple
class ChatDatabase:
def __init__(self, db_path: str = "chat_history.db"):
self.db_path = db_path
self.init_database()
def init_database(self):
"""Initialize the SQLite database with required tables"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
# Enable foreign keys
conn.execute("PRAGMA foreign_keys = ON")
# Sessions table
conn.execute('''
CREATE TABLE IF NOT EXISTS sessions (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
model_used TEXT NOT NULL,
message_count INTEGER DEFAULT 0
)
''')
# Messages table
conn.execute('''
CREATE TABLE IF NOT EXISTS messages (
id TEXT PRIMARY KEY,
session_id TEXT NOT NULL,
content TEXT NOT NULL,
sender TEXT NOT NULL CHECK (sender IN ('user', 'assistant')),
timestamp TEXT NOT NULL,
metadata TEXT DEFAULT '{}',
FOREIGN KEY (session_id) REFERENCES sessions (id) ON DELETE CASCADE
)
''')
# Create indexes for better performance
conn.execute('CREATE INDEX IF NOT EXISTS idx_messages_session_id ON messages(session_id)')
conn.execute('CREATE INDEX IF NOT EXISTS idx_messages_timestamp ON messages(timestamp)')
conn.execute('CREATE INDEX IF NOT EXISTS idx_sessions_updated_at ON sessions(updated_at)')
# Documents table
conn.execute('''
CREATE TABLE IF NOT EXISTS session_documents (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
file_path TEXT NOT NULL,
indexed INTEGER DEFAULT 0,
FOREIGN KEY (session_id) REFERENCES sessions (id) ON DELETE CASCADE
)
''')
conn.execute('CREATE INDEX IF NOT EXISTS idx_session_documents_session_id ON session_documents(session_id)')
# --- NEW: Index persistence tables ---
cursor.execute('''
CREATE TABLE IF NOT EXISTS indexes (
id TEXT PRIMARY KEY,
name TEXT UNIQUE,
description TEXT,
created_at TEXT,
updated_at TEXT,
vector_table_name TEXT,
metadata TEXT
)
''')
cursor.execute('''
CREATE TABLE IF NOT EXISTS index_documents (
id INTEGER PRIMARY KEY AUTOINCREMENT,
index_id TEXT,
original_filename TEXT,
stored_path TEXT,
FOREIGN KEY(index_id) REFERENCES indexes(id)
)
''')
cursor.execute('''
CREATE TABLE IF NOT EXISTS session_indexes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT,
index_id TEXT,
linked_at TEXT,
FOREIGN KEY(session_id) REFERENCES sessions(id),
FOREIGN KEY(index_id) REFERENCES indexes(id)
)
''')
conn.commit()
conn.close()
print("✅ Database initialized successfully")
def create_session(self, title: str, model: str) -> str:
"""Create a new chat session"""
session_id = str(uuid.uuid4())
now = datetime.now().isoformat()
conn = sqlite3.connect(self.db_path)
conn.execute('''
INSERT INTO sessions (id, title, created_at, updated_at, model_used)
VALUES (?, ?, ?, ?, ?)
''', (session_id, title, now, now, model))
conn.commit()
conn.close()
print(f"📝 Created new session: {session_id[:8]}... - {title}")
return session_id
def get_sessions(self, limit: int = 50) -> List[Dict]:
"""Get all chat sessions, ordered by most recent"""
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row
cursor = conn.execute('''
SELECT id, title, created_at, updated_at, model_used, message_count
FROM sessions
ORDER BY updated_at DESC
LIMIT ?
''', (limit,))
sessions = [dict(row) for row in cursor.fetchall()]
conn.close()
return sessions
def get_session(self, session_id: str) -> Optional[Dict]:
"""Get a specific session"""
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row
cursor = conn.execute('''
SELECT id, title, created_at, updated_at, model_used, message_count
FROM sessions
WHERE id = ?
''', (session_id,))
row = cursor.fetchone()
conn.close()
return dict(row) if row else None
def add_message(self, session_id: str, content: str, sender: str, metadata: Dict = None) -> str:
"""Add a message to a session"""
message_id = str(uuid.uuid4())
now = datetime.now().isoformat()
metadata_json = json.dumps(metadata or {})
conn = sqlite3.connect(self.db_path)
# Add the message
conn.execute('''
INSERT INTO messages (id, session_id, content, sender, timestamp, metadata)
VALUES (?, ?, ?, ?, ?, ?)
''', (message_id, session_id, content, sender, now, metadata_json))
# Update session timestamp and message count
conn.execute('''
UPDATE sessions
SET updated_at = ?,
message_count = message_count + 1
WHERE id = ?
''', (now, session_id))
conn.commit()
conn.close()
return message_id
def get_messages(self, session_id: str, limit: int = 100) -> List[Dict]:
"""Get all messages for a session"""
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row
cursor = conn.execute('''
SELECT id, content, sender, timestamp, metadata
FROM messages
WHERE session_id = ?
ORDER BY timestamp ASC
LIMIT ?
''', (session_id, limit))
messages = []
for row in cursor.fetchall():
message = dict(row)
message['metadata'] = json.loads(message['metadata'])
messages.append(message)
conn.close()
return messages
def get_conversation_history(self, session_id: str) -> List[Dict]:
"""Get conversation history in the format expected by Ollama"""
messages = self.get_messages(session_id)
history = []
for msg in messages:
history.append({
"role": msg["sender"],
"content": msg["content"]
})
return history
def update_session_title(self, session_id: str, title: str):
"""Update session title"""
conn = sqlite3.connect(self.db_path)
conn.execute('''
UPDATE sessions
SET title = ?, updated_at = ?
WHERE id = ?
''', (title, datetime.now().isoformat(), session_id))
conn.commit()
conn.close()
def delete_session(self, session_id: str) -> bool:
"""Delete a session and all its messages"""
conn = sqlite3.connect(self.db_path)
cursor = conn.execute('DELETE FROM sessions WHERE id = ?', (session_id,))
deleted = cursor.rowcount > 0
conn.commit()
conn.close()
if deleted:
print(f"🗑️ Deleted session: {session_id[:8]}...")
return deleted
def cleanup_empty_sessions(self) -> int:
"""Remove sessions with no messages"""
conn = sqlite3.connect(self.db_path)
# Find sessions with no messages
cursor = conn.execute('''
SELECT s.id FROM sessions s
LEFT JOIN messages m ON s.id = m.session_id
WHERE m.id IS NULL
''')
empty_sessions = [row[0] for row in cursor.fetchall()]
# Delete empty sessions
deleted_count = 0
for session_id in empty_sessions:
cursor = conn.execute('DELETE FROM sessions WHERE id = ?', (session_id,))
if cursor.rowcount > 0:
deleted_count += 1
print(f"🗑️ Cleaned up empty session: {session_id[:8]}...")
conn.commit()
conn.close()
if deleted_count > 0:
print(f"✨ Cleaned up {deleted_count} empty sessions")
return deleted_count
def get_stats(self) -> Dict:
"""Get database statistics"""
conn = sqlite3.connect(self.db_path)
# Get session count
cursor = conn.execute('SELECT COUNT(*) FROM sessions')
session_count = cursor.fetchone()[0]
# Get message count
cursor = conn.execute('SELECT COUNT(*) FROM messages')
message_count = cursor.fetchone()[0]
# Get most used model
cursor = conn.execute('''
SELECT model_used, COUNT(*) as count
FROM sessions
GROUP BY model_used
ORDER BY count DESC
LIMIT 1
''')
most_used_model = cursor.fetchone()
conn.close()
return {
"total_sessions": session_count,
"total_messages": message_count,
"most_used_model": most_used_model[0] if most_used_model else None
}
def add_document_to_session(self, session_id: str, file_path: str) -> int:
"""Adds a document file path to a session."""
conn = sqlite3.connect(self.db_path)
cursor = conn.execute(
"INSERT INTO session_documents (session_id, file_path) VALUES (?, ?)",
(session_id, file_path)
)
doc_id = cursor.lastrowid
conn.commit()
conn.close()
print(f"📄 Added document '{file_path}' to session {session_id[:8]}...")
return doc_id
def get_documents_for_session(self, session_id: str) -> List[str]:
"""Retrieves all document file paths for a given session."""
conn = sqlite3.connect(self.db_path)
cursor = conn.execute(
"SELECT file_path FROM session_documents WHERE session_id = ?",
(session_id,)
)
paths = [row[0] for row in cursor.fetchall()]
conn.close()
return paths
# -------- Index helpers ---------
def create_index(self, name: str, description: str|None = None, metadata: dict | None = None) -> str:
idx_id = str(uuid.uuid4())
created = datetime.now().isoformat()
vector_table = f"text_pages_{idx_id}"
conn = sqlite3.connect(self.db_path)
conn.execute('''
INSERT INTO indexes (id, name, description, created_at, updated_at, vector_table_name, metadata)
VALUES (?,?,?,?,?,?,?)
''', (idx_id, name, description, created, created, vector_table, json.dumps(metadata or {})))
conn.commit()
conn.close()
print(f"📂 Created new index '{name}' ({idx_id[:8]})")
return idx_id
def get_index(self, index_id: str) -> dict | None:
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row
cur = conn.execute('SELECT * FROM indexes WHERE id=?', (index_id,))
row = cur.fetchone()
if not row:
conn.close()
return None
idx = dict(row)
idx['metadata'] = json.loads(idx['metadata'] or '{}')
cur = conn.execute('SELECT original_filename, stored_path FROM index_documents WHERE index_id=?', (index_id,))
docs = [{'filename': r[0], 'stored_path': r[1]} for r in cur.fetchall()]
idx['documents'] = docs
conn.close()
return idx
def list_indexes(self) -> list[dict]:
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row
rows = conn.execute('SELECT * FROM indexes').fetchall()
res = []
for r in rows:
item = dict(r)
item['metadata'] = json.loads(item['metadata'] or '{}')
# attach documents list for convenience
docs_cur = conn.execute('SELECT original_filename, stored_path FROM index_documents WHERE index_id=?', (item['id'],))
docs = [{'filename':d[0],'stored_path':d[1]} for d in docs_cur.fetchall()]
item['documents'] = docs
res.append(item)
conn.close()
return res
def add_document_to_index(self, index_id: str, filename: str, stored_path: str):
conn = sqlite3.connect(self.db_path)
conn.execute('INSERT INTO index_documents (index_id, original_filename, stored_path) VALUES (?,?,?)', (index_id, filename, stored_path))
conn.commit()
conn.close()
def link_index_to_session(self, session_id: str, index_id: str):
conn = sqlite3.connect(self.db_path)
conn.execute('INSERT INTO session_indexes (session_id, index_id, linked_at) VALUES (?,?,?)', (session_id, index_id, datetime.now().isoformat()))
conn.commit()
conn.close()
def get_indexes_for_session(self, session_id: str) -> list[str]:
conn = sqlite3.connect(self.db_path)
cursor = conn.execute('SELECT index_id FROM session_indexes WHERE session_id=? ORDER BY linked_at', (session_id,))
ids = [r[0] for r in cursor.fetchall()]
conn.close()
return ids
def delete_index(self, index_id: str) -> bool:
"""Delete an index and its related records (documents, session links). Returns True if deleted."""
conn = sqlite3.connect(self.db_path)
try:
# Get vector table name before deletion (optional, for LanceDB cleanup)
cur = conn.execute('SELECT vector_table_name FROM indexes WHERE id = ?', (index_id,))
row = cur.fetchone()
vector_table_name = row[0] if row else None
# Remove child rows first due to foreignkey constraints
conn.execute('DELETE FROM index_documents WHERE index_id = ?', (index_id,))
conn.execute('DELETE FROM session_indexes WHERE index_id = ?', (index_id,))
cursor = conn.execute('DELETE FROM indexes WHERE id = ?', (index_id,))
deleted = cursor.rowcount > 0
conn.commit()
finally:
conn.close()
if deleted:
print(f"🗑️ Deleted index {index_id[:8]}... and related records")
# Optional: attempt to drop LanceDB table if available
if vector_table_name:
try:
from rag_system.indexing.embedders import LanceDBManager
import os
db_path = os.getenv('LANCEDB_PATH') or './rag_system/index_store/lancedb'
ldb = LanceDBManager(db_path)
db = ldb.db
if hasattr(db, 'table_names') and vector_table_name in db.table_names():
db.drop_table(vector_table_name)
print(f"🚮 Dropped LanceDB table '{vector_table_name}'")
except Exception as e:
print(f"⚠️ Could not drop LanceDB table '{vector_table_name}': {e}")
return deleted
def update_index_metadata(self, index_id: str, updates: dict):
"""Merge new key/values into an index's metadata JSON column."""
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row
cur = conn.execute('SELECT metadata FROM indexes WHERE id=?', (index_id,))
row = cur.fetchone()
if row is None:
conn.close()
raise ValueError("Index not found")
existing = json.loads(row['metadata'] or '{}')
existing.update(updates)
conn.execute('UPDATE indexes SET metadata=?, updated_at=? WHERE id=?', (json.dumps(existing), datetime.now().isoformat(), index_id))
conn.commit()
conn.close()
def inspect_and_populate_index_metadata(self, index_id: str) -> dict:
"""
Inspect LanceDB table to extract metadata for older indexes.
Returns the inferred metadata or empty dict if inspection fails.
"""
try:
# Get index info
index_info = self.get_index(index_id)
if not index_info:
return {}
# Check if metadata is already populated
if index_info.get('metadata') and len(index_info['metadata']) > 0:
return index_info['metadata']
# Try to inspect the LanceDB table
vector_table_name = index_info.get('vector_table_name')
if not vector_table_name:
return {}
try:
# Try to import the RAG system modules
try:
from rag_system.indexing.embedders import LanceDBManager
import os
# Use the same path as the system
db_path = os.getenv('LANCEDB_PATH') or './rag_system/index_store/lancedb'
ldb = LanceDBManager(db_path)
# Check if table exists
if not hasattr(ldb.db, 'table_names') or vector_table_name not in ldb.db.table_names():
# Table doesn't exist - this means the index was never properly built
inferred_metadata = {
'status': 'incomplete',
'issue': 'Vector table not found - index may not have been built properly',
'vector_table_expected': vector_table_name,
'available_tables': list(ldb.db.table_names()) if hasattr(ldb.db, 'table_names') else [],
'metadata_inferred_at': datetime.now().isoformat(),
'metadata_source': 'lancedb_inspection'
}
self.update_index_metadata(index_id, inferred_metadata)
print(f"⚠️ Index {index_id[:8]}... appears incomplete - vector table missing")
return inferred_metadata
# Get table and inspect schema/data
table = ldb.db.open_table(vector_table_name)
# Get a sample record to inspect - use correct LanceDB API
try:
# Try to get sample data using proper LanceDB methods
sample_df = table.to_pandas()
if len(sample_df) == 0:
inferred_metadata = {
'status': 'empty',
'issue': 'Vector table exists but contains no data',
'metadata_inferred_at': datetime.now().isoformat(),
'metadata_source': 'lancedb_inspection'
}
self.update_index_metadata(index_id, inferred_metadata)
return inferred_metadata
# Take only first row for inspection
sample_df = sample_df.head(1)
except Exception as e:
print(f"⚠️ Could not read data from table {vector_table_name}: {e}")
return {}
# Infer metadata from table structure
inferred_metadata = {
'status': 'functional',
'total_chunks': len(table.to_pandas()), # Get total count
}
# Check vector dimensions
if 'vector' in sample_df.columns:
vector_data = sample_df['vector'].iloc[0]
if isinstance(vector_data, list):
inferred_metadata['vector_dimensions'] = len(vector_data)
# Try to infer embedding model from vector dimensions
dim_to_model = {
384: 'BAAI/bge-small-en-v1.5 (or similar)',
512: 'sentence-transformers/all-MiniLM-L6-v2 (or similar)',
768: 'BAAI/bge-base-en-v1.5 (or similar)',
1024: 'Qwen/Qwen3-Embedding-0.6B (or similar)',
1536: 'text-embedding-ada-002 (or similar)'
}
if len(vector_data) in dim_to_model:
inferred_metadata['embedding_model_inferred'] = dim_to_model[len(vector_data)]
# Try to parse metadata from sample record
if 'metadata' in sample_df.columns:
try:
sample_metadata = json.loads(sample_df['metadata'].iloc[0])
# Look for common metadata fields that might give us clues
if 'document_id' in sample_metadata:
inferred_metadata['has_document_structure'] = True
if 'chunk_index' in sample_metadata:
inferred_metadata['has_chunk_indexing'] = True
if 'original_text' in sample_metadata:
inferred_metadata['has_contextual_enrichment'] = True
inferred_metadata['retrieval_mode_inferred'] = 'hybrid (contextual enrichment detected)'
# Check for chunk size patterns
if 'text' in sample_df.columns:
text_length = len(sample_df['text'].iloc[0])
if text_length > 0:
inferred_metadata['sample_chunk_length'] = text_length
# Rough chunk size estimation
estimated_tokens = text_length // 4 # rough estimate: 4 chars per token
if estimated_tokens < 300:
inferred_metadata['chunk_size_inferred'] = '256 tokens (estimated)'
elif estimated_tokens < 600:
inferred_metadata['chunk_size_inferred'] = '512 tokens (estimated)'
else:
inferred_metadata['chunk_size_inferred'] = '1024+ tokens (estimated)'
except (json.JSONDecodeError, KeyError):
pass
# Check if FTS index exists
try:
indices = table.list_indices()
fts_exists = any('fts' in idx.name.lower() for idx in indices)
if fts_exists:
inferred_metadata['has_fts_index'] = True
inferred_metadata['retrieval_mode_inferred'] = 'hybrid (FTS + vector)'
else:
inferred_metadata['retrieval_mode_inferred'] = 'vector-only'
except:
pass
# Add inspection timestamp
inferred_metadata['metadata_inferred_at'] = datetime.now().isoformat()
inferred_metadata['metadata_source'] = 'lancedb_inspection'
# Update the database with inferred metadata
if inferred_metadata:
self.update_index_metadata(index_id, inferred_metadata)
print(f"🔍 Inferred metadata for index {index_id[:8]}...: {len(inferred_metadata)} fields")
return inferred_metadata
except ImportError as import_error:
# RAG system modules not available - provide basic fallback metadata
print(f"⚠️ RAG system modules not available for inspection: {import_error}")
# Check if this is actually a legacy index by looking at creation date
created_at = index_info.get('created_at', '')
is_recent = False
if created_at:
try:
from datetime import datetime, timedelta
created_date = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
# Consider indexes created in the last 30 days as "recent"
is_recent = created_date > datetime.now().replace(tzinfo=created_date.tzinfo) - timedelta(days=30)
except:
pass
# Provide basic fallback metadata with better status detection
if is_recent:
status = 'functional'
issue = 'Detailed configuration inspection requires RAG system modules, but index appears functional'
else:
status = 'legacy'
issue = 'This index was created before metadata tracking was implemented. Configuration details are not available.'
fallback_metadata = {
'status': status,
'issue': issue,
'metadata_inferred_at': datetime.now().isoformat(),
'metadata_source': 'fallback_inspection',
'documents_count': len(index_info.get('documents', [])),
'created_at': index_info.get('created_at', 'unknown'),
'inspection_limitation': 'Backend server cannot access full RAG system modules for detailed inspection'
}
# Try to infer some basic info from the vector table name
if vector_table_name:
fallback_metadata['vector_table_name'] = vector_table_name
fallback_metadata['note'] = 'Vector table exists but detailed inspection requires RAG system modules'
self.update_index_metadata(index_id, fallback_metadata)
status_msg = "recent but limited inspection" if is_recent else "legacy"
print(f"📝 Added fallback metadata for {status_msg} index {index_id[:8]}...")
return fallback_metadata
except Exception as e:
print(f"⚠️ Could not inspect LanceDB table for index {index_id[:8]}...: {e}")
return {}
except Exception as e:
print(f"⚠️ Failed to inspect index metadata for {index_id[:8]}...: {e}")
return {}
def generate_session_title(first_message: str, max_length: int = 50) -> str:
"""Generate a session title from the first message"""
# Clean up the message
title = first_message.strip()
# Remove common prefixes
prefixes = ["hey", "hi", "hello", "can you", "please", "i want", "i need"]
title_lower = title.lower()
for prefix in prefixes:
if title_lower.startswith(prefix):
title = title[len(prefix):].strip()
break
# Capitalize first letter
if title:
title = title[0].upper() + title[1:]
# Truncate if too long
if len(title) > max_length:
title = title[:max_length].strip() + "..."
# Fallback
if not title or len(title) < 3:
title = "New Chat"
return title
# Global database instance
db = ChatDatabase()
if __name__ == "__main__":
# Test the database
print("🧪 Testing database...")
# Create a test session
session_id = db.create_session("Test Chat", "llama3.2:latest")
# Add some messages
db.add_message(session_id, "Hello!", "user")
db.add_message(session_id, "Hi there! How can I help you?", "assistant")
# Get messages
messages = db.get_messages(session_id)
print(f"📨 Messages: {len(messages)}")
# Get sessions
sessions = db.get_sessions()
print(f"📋 Sessions: {len(sessions)}")
# Get stats
stats = db.get_stats()
print(f"📊 Stats: {stats}")
print("✅ Database test completed!")

200
backend/ollama_client.py Normal file
View File

@ -0,0 +1,200 @@
import requests
import json
import os
from typing import List, Dict
class OllamaClient:
def __init__(self, base_url: str = "http://localhost:11434"):
self.base_url = base_url
self.api_url = f"{base_url}/api"
def is_ollama_running(self) -> bool:
"""Check if Ollama server is running"""
try:
response = requests.get(f"{self.base_url}/api/tags", timeout=5)
return response.status_code == 200
except requests.exceptions.RequestException:
return False
def list_models(self) -> List[str]:
"""Get list of available models"""
try:
response = requests.get(f"{self.api_url}/tags")
if response.status_code == 200:
models = response.json().get("models", [])
return [model["name"] for model in models]
return []
except requests.exceptions.RequestException as e:
print(f"Error fetching models: {e}")
return []
def pull_model(self, model_name: str) -> bool:
"""Pull a model if not available"""
try:
response = requests.post(
f"{self.api_url}/pull",
json={"name": model_name},
stream=True
)
if response.status_code == 200:
print(f"Pulling model {model_name}...")
for line in response.iter_lines():
if line:
data = json.loads(line)
if "status" in data:
print(f"Status: {data['status']}")
if data.get("status") == "success":
return True
return True
return False
except requests.exceptions.RequestException as e:
print(f"Error pulling model: {e}")
return False
def chat(self, message: str, model: str = "llama3.2", conversation_history: List[Dict] = None, enable_thinking: bool = True) -> str:
"""Send a chat message to Ollama"""
if conversation_history is None:
conversation_history = []
# Add user message to conversation
messages = conversation_history + [{"role": "user", "content": message}]
try:
payload = {
"model": model,
"messages": messages,
"stream": False,
}
# Multiple approaches to disable thinking tokens
if not enable_thinking:
payload.update({
"think": False, # Native Ollama parameter
"options": {
"think": False,
"thinking": False,
"temperature": 0.7,
"top_p": 0.9
}
})
else:
payload["think"] = True
response = requests.post(
f"{self.api_url}/chat",
json=payload,
timeout=60
)
if response.status_code == 200:
result = response.json()
response_text = result["message"]["content"]
# Additional cleanup: remove any thinking tokens that might slip through
if not enable_thinking:
# Remove common thinking token patterns
import re
response_text = re.sub(r'<think>.*?</think>', '', response_text, flags=re.DOTALL | re.IGNORECASE)
response_text = re.sub(r'<thinking>.*?</thinking>', '', response_text, flags=re.DOTALL | re.IGNORECASE)
response_text = response_text.strip()
return response_text
else:
return f"Error: {response.status_code} - {response.text}"
except requests.exceptions.RequestException as e:
return f"Connection error: {e}"
def chat_stream(self, message: str, model: str = "llama3.2", conversation_history: List[Dict] = None, enable_thinking: bool = True):
"""Stream chat response from Ollama"""
if conversation_history is None:
conversation_history = []
messages = conversation_history + [{"role": "user", "content": message}]
try:
payload = {
"model": model,
"messages": messages,
"stream": True,
}
# Multiple approaches to disable thinking tokens
if not enable_thinking:
payload.update({
"think": False, # Native Ollama parameter
"options": {
"think": False,
"thinking": False,
"temperature": 0.7,
"top_p": 0.9
}
})
else:
payload["think"] = True
response = requests.post(
f"{self.api_url}/chat",
json=payload,
stream=True,
timeout=60
)
if response.status_code == 200:
for line in response.iter_lines():
if line:
try:
data = json.loads(line)
if "message" in data and "content" in data["message"]:
content = data["message"]["content"]
# Filter out thinking tokens in streaming mode
if not enable_thinking:
# Skip content that looks like thinking tokens
if '<think>' in content.lower() or '<thinking>' in content.lower():
continue
yield content
except json.JSONDecodeError:
continue
else:
yield f"Error: {response.status_code} - {response.text}"
except requests.exceptions.RequestException as e:
yield f"Connection error: {e}"
def main():
"""Test the Ollama client"""
client = OllamaClient()
# Check if Ollama is running
if not client.is_ollama_running():
print("❌ Ollama is not running. Please start Ollama first.")
print("Install: https://ollama.ai")
print("Run: ollama serve")
return
print("✅ Ollama is running!")
# List available models
models = client.list_models()
print(f"Available models: {models}")
# Try to use llama3.2, pull if needed
model_name = "llama3.2"
if model_name not in [m.split(":")[0] for m in models]:
print(f"Model {model_name} not found. Pulling...")
if client.pull_model(model_name):
print(f"✅ Model {model_name} pulled successfully!")
else:
print(f"❌ Failed to pull model {model_name}")
return
# Test chat
print("\n🤖 Testing chat...")
response = client.chat("Hello! Can you tell me a short joke?", model_name)
print(f"AI: {response}")
if __name__ == "__main__":
main()

3
backend/requirements.txt Normal file
View File

@ -0,0 +1,3 @@
requests
python-dotenv
PyPDF2

1142
backend/server.py Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,216 @@
"""
Simple PDF Processing Service
Handles PDF upload and text extraction for RAG functionality
"""
import os
import uuid
from typing import List, Dict, Any
import PyPDF2
from io import BytesIO
import sqlite3
import json
from datetime import datetime
class SimplePDFProcessor:
def __init__(self, db_path: str = "chat_data.db"):
"""Initialize simple PDF processor with SQLite storage"""
self.db_path = db_path
self.init_database()
print("✅ Simple PDF processor initialized")
def init_database(self):
"""Initialize SQLite database for storing PDF content"""
conn = sqlite3.connect(self.db_path)
conn.execute('''
CREATE TABLE IF NOT EXISTS pdf_documents (
id TEXT PRIMARY KEY,
session_id TEXT NOT NULL,
filename TEXT NOT NULL,
content TEXT NOT NULL,
created_at TEXT NOT NULL
)
''')
conn.commit()
conn.close()
def extract_text_from_pdf(self, pdf_bytes: bytes) -> str:
"""Extract text from PDF bytes"""
try:
print(f"📄 Starting PDF text extraction ({len(pdf_bytes)} bytes)")
pdf_file = BytesIO(pdf_bytes)
pdf_reader = PyPDF2.PdfReader(pdf_file)
print(f"📖 PDF has {len(pdf_reader.pages)} pages")
text = ""
for page_num, page in enumerate(pdf_reader.pages):
print(f"📄 Processing page {page_num + 1}")
try:
page_text = page.extract_text()
if page_text.strip():
text += f"\n--- Page {page_num + 1} ---\n"
text += page_text + "\n"
print(f"✅ Page {page_num + 1}: extracted {len(page_text)} characters")
except Exception as page_error:
print(f"❌ Error on page {page_num + 1}: {str(page_error)}")
continue
print(f"📄 Total extracted text: {len(text)} characters")
return text.strip()
except Exception as e:
print(f"❌ Error extracting text from PDF: {str(e)}")
print(f"❌ Error type: {type(e).__name__}")
return ""
def process_pdf(self, pdf_bytes: bytes, filename: str, session_id: str) -> Dict[str, Any]:
"""Process a PDF file and store in database"""
print(f"📄 Processing PDF: {filename}")
# Extract text
text = self.extract_text_from_pdf(pdf_bytes)
if not text:
return {
"success": False,
"error": "Could not extract text from PDF",
"filename": filename
}
print(f"📝 Extracted {len(text)} characters from {filename}")
# Store in database
document_id = str(uuid.uuid4())
now = datetime.now().isoformat()
try:
conn = sqlite3.connect(self.db_path)
# Store document
conn.execute('''
INSERT INTO pdf_documents (id, session_id, filename, content, created_at)
VALUES (?, ?, ?, ?, ?)
''', (document_id, session_id, filename, text, now))
conn.commit()
conn.close()
print(f"💾 Stored document {filename} in database")
return {
"success": True,
"filename": filename,
"file_id": document_id,
"text_length": len(text)
}
except Exception as e:
print(f"❌ Error storing in database: {str(e)}")
return {
"success": False,
"error": f"Database storage failed: {str(e)}",
"filename": filename
}
def get_session_documents(self, session_id: str) -> List[Dict[str, Any]]:
"""Get all documents for a session"""
try:
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row
cursor = conn.execute('''
SELECT id, filename, created_at
FROM pdf_documents
WHERE session_id = ?
ORDER BY created_at DESC
''', (session_id,))
documents = [dict(row) for row in cursor.fetchall()]
conn.close()
return documents
except Exception as e:
print(f"❌ Error getting session documents: {str(e)}")
return []
def get_document_content(self, session_id: str) -> str:
"""Get all document content for a session (for LLM context)"""
try:
conn = sqlite3.connect(self.db_path)
cursor = conn.execute('''
SELECT filename, content
FROM pdf_documents
WHERE session_id = ?
ORDER BY created_at ASC
''', (session_id,))
rows = cursor.fetchall()
conn.close()
if not rows:
return ""
# Combine all document content
combined_content = ""
for filename, content in rows:
combined_content += f"\n\n=== Document: {filename} ===\n\n"
combined_content += content
return combined_content.strip()
except Exception as e:
print(f"❌ Error getting document content: {str(e)}")
return ""
def delete_session_documents(self, session_id: str) -> bool:
"""Delete all documents for a session"""
try:
conn = sqlite3.connect(self.db_path)
cursor = conn.execute('''
DELETE FROM pdf_documents
WHERE session_id = ?
''', (session_id,))
deleted_count = cursor.rowcount
conn.commit()
conn.close()
if deleted_count > 0:
print(f"🗑️ Deleted {deleted_count} documents for session {session_id[:8]}...")
return deleted_count > 0
except Exception as e:
print(f"❌ Error deleting session documents: {str(e)}")
return False
# Global instance
simple_pdf_processor = None
def initialize_simple_pdf_processor():
"""Initialize the global PDF processor"""
global simple_pdf_processor
try:
simple_pdf_processor = SimplePDFProcessor()
print("✅ Global PDF processor initialized")
except Exception as e:
print(f"❌ Failed to initialize PDF processor: {str(e)}")
simple_pdf_processor = None
def get_simple_pdf_processor():
"""Get the global PDF processor instance"""
global simple_pdf_processor
if simple_pdf_processor is None:
initialize_simple_pdf_processor()
return simple_pdf_processor
if __name__ == "__main__":
# Test the simple PDF processor
print("🧪 Testing simple PDF processor...")
processor = SimplePDFProcessor()
print("✅ Simple PDF processor test completed!")

155
backend/test_backend.py Normal file
View File

@ -0,0 +1,155 @@
#!/usr/bin/env python3
"""
Simple test script for the localGPT backend
"""
import requests
import json
import time
def test_health_endpoint():
"""Test the health endpoint"""
print("🔍 Testing health endpoint...")
try:
response = requests.get("http://localhost:8000/health", timeout=5)
if response.status_code == 200:
data = response.json()
print(f"✅ Health check passed")
print(f" Ollama running: {data['ollama_running']}")
print(f" Models available: {len(data['available_models'])}")
return True
else:
print(f"❌ Health check failed: {response.status_code}")
return False
except requests.exceptions.RequestException as e:
print(f"❌ Health check failed: {e}")
return False
def test_chat_endpoint():
"""Test the chat endpoint"""
print("\n💬 Testing chat endpoint...")
test_message = {
"message": "Say 'Hello World' and nothing else.",
"model": "llama3.2:latest"
}
try:
response = requests.post(
"http://localhost:8000/chat",
headers={"Content-Type": "application/json"},
json=test_message,
timeout=30
)
if response.status_code == 200:
data = response.json()
print(f"✅ Chat test passed")
print(f" Model: {data['model']}")
print(f" Response: {data['response']}")
print(f" Message count: {data['message_count']}")
return True
else:
print(f"❌ Chat test failed: {response.status_code}")
print(f" Response: {response.text}")
return False
except requests.exceptions.RequestException as e:
print(f"❌ Chat test failed: {e}")
return False
def test_conversation_history():
"""Test conversation with history"""
print("\n🗨️ Testing conversation history...")
# First message
conversation = []
message1 = {
"message": "My name is Alice. Remember this.",
"model": "llama3.2:latest",
"conversation_history": conversation
}
try:
response1 = requests.post(
"http://localhost:8000/chat",
headers={"Content-Type": "application/json"},
json=message1,
timeout=30
)
if response1.status_code == 200:
data1 = response1.json()
# Add to conversation history
conversation.append({"role": "user", "content": "My name is Alice. Remember this."})
conversation.append({"role": "assistant", "content": data1["response"]})
# Second message asking about the name
message2 = {
"message": "What is my name?",
"model": "llama3.2:latest",
"conversation_history": conversation
}
response2 = requests.post(
"http://localhost:8000/chat",
headers={"Content-Type": "application/json"},
json=message2,
timeout=30
)
if response2.status_code == 200:
data2 = response2.json()
print(f"✅ Conversation history test passed")
print(f" First response: {data1['response']}")
print(f" Second response: {data2['response']}")
# Check if the AI remembered the name
if "alice" in data2['response'].lower():
print(f"✅ AI correctly remembered the name!")
else:
print(f"⚠️ AI might not have remembered the name")
return True
else:
print(f"❌ Second message failed: {response2.status_code}")
return False
else:
print(f"❌ First message failed: {response1.status_code}")
return False
except requests.exceptions.RequestException as e:
print(f"❌ Conversation test failed: {e}")
return False
def main():
print("🧪 Testing localGPT Backend")
print("=" * 40)
# Test health endpoint
health_ok = test_health_endpoint()
if not health_ok:
print("\n❌ Backend server is not running or not healthy")
print(" Make sure to run: python server.py")
return
# Test basic chat
chat_ok = test_chat_endpoint()
if not chat_ok:
print("\n❌ Chat functionality is not working")
return
# Test conversation history
conversation_ok = test_conversation_history()
print("\n" + "=" * 40)
if health_ok and chat_ok and conversation_ok:
print("🎉 All tests passed! Backend is ready for frontend integration.")
else:
print("⚠️ Some tests failed. Check the issues above.")
print("\n🔗 Ready to connect to frontend at http://localhost:3000")
if __name__ == "__main__":
main()

View File

@ -0,0 +1,19 @@
{
"index_name": "Sample Batch Index",
"index_description": "Example batch index configuration",
"documents": [
"./rag_system/documents/invoice_1039.pdf",
"./rag_system/documents/invoice_1041.pdf"
],
"processing": {
"chunk_size": 512,
"chunk_overlap": 64,
"enable_enrich": true,
"enable_latechunk": true,
"enable_docling": true,
"embedding_model": "Qwen/Qwen3-Embedding-0.6B",
"generation_model": "qwen3:0.6b",
"retrieval_mode": "hybrid",
"window_size": 2
}
}

View File

@ -1,202 +0,0 @@
import os
# from dotenv import load_dotenv
from chromadb.config import Settings
# https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/excel.html?highlight=xlsx#microsoft-excel
from langchain.document_loaders import CSVLoader, PDFMinerLoader, TextLoader, UnstructuredExcelLoader, Docx2txtLoader
from langchain.document_loaders import UnstructuredFileLoader, UnstructuredMarkdownLoader
from langchain.document_loaders import UnstructuredHTMLLoader
# load_dotenv()
ROOT_DIRECTORY = os.path.dirname(os.path.realpath(__file__))
# Define the folder for storing database
SOURCE_DIRECTORY = f"{ROOT_DIRECTORY}/SOURCE_DOCUMENTS"
PERSIST_DIRECTORY = f"{ROOT_DIRECTORY}/DB"
MODELS_PATH = "./models"
# Can be changed to a specific number
INGEST_THREADS = os.cpu_count() or 8
# Define the Chroma settings
CHROMA_SETTINGS = Settings(
anonymized_telemetry=False,
is_persistent=True,
)
# Context Window and Max New Tokens
CONTEXT_WINDOW_SIZE = 8096
MAX_NEW_TOKENS = CONTEXT_WINDOW_SIZE # int(CONTEXT_WINDOW_SIZE/4)
#### If you get a "not enough space in the buffer" error, you should reduce the values below, start with half of the original values and keep halving the value until the error stops appearing
N_GPU_LAYERS = 100 # Llama-2-70B has 83 layers
N_BATCH = 512
### From experimenting with the Llama-2-7B-Chat-GGML model on 8GB VRAM, these values work:
# N_GPU_LAYERS = 20
# N_BATCH = 512
# https://python.langchain.com/en/latest/_modules/langchain/document_loaders/excel.html#UnstructuredExcelLoader
DOCUMENT_MAP = {
".html": UnstructuredHTMLLoader,
".txt": TextLoader,
".md": UnstructuredMarkdownLoader,
".py": TextLoader,
# ".pdf": PDFMinerLoader,
".pdf": UnstructuredFileLoader,
".csv": CSVLoader,
".xls": UnstructuredExcelLoader,
".xlsx": UnstructuredExcelLoader,
".docx": Docx2txtLoader,
".doc": Docx2txtLoader,
}
# Default Instructor Model
EMBEDDING_MODEL_NAME = "hkunlp/instructor-large" # Uses 1.5 GB of VRAM (High Accuracy with lower VRAM usage)
####
#### OTHER EMBEDDING MODEL OPTIONS
####
# EMBEDDING_MODEL_NAME = "hkunlp/instructor-xl" # Uses 5 GB of VRAM (Most Accurate of all models)
# EMBEDDING_MODEL_NAME = "intfloat/e5-large-v2" # Uses 1.5 GB of VRAM (A little less accurate than instructor-large)
# EMBEDDING_MODEL_NAME = "intfloat/e5-base-v2" # Uses 0.5 GB of VRAM (A good model for lower VRAM GPUs)
# EMBEDDING_MODEL_NAME = "all-MiniLM-L6-v2" # Uses 0.2 GB of VRAM (Less accurate but fastest - only requires 150mb of vram)
####
#### MULTILINGUAL EMBEDDING MODELS
####
# EMBEDDING_MODEL_NAME = "intfloat/multilingual-e5-large" # Uses 2.5 GB of VRAM
# EMBEDDING_MODEL_NAME = "intfloat/multilingual-e5-base" # Uses 1.2 GB of VRAM
#### SELECT AN OPEN SOURCE LLM (LARGE LANGUAGE MODEL)
# Select the Model ID and model_basename
# load the LLM for generating Natural Language responses
#### GPU VRAM Memory required for LLM Models (ONLY) by Billion Parameter value (B Model)
#### Does not include VRAM used by Embedding Models - which use an additional 2GB-7GB of VRAM depending on the model.
####
#### (B Model) (float32) (float16) (GPTQ 8bit) (GPTQ 4bit)
#### 7b 28 GB 14 GB 7 GB - 9 GB 3.5 GB - 5 GB
#### 13b 52 GB 26 GB 13 GB - 15 GB 6.5 GB - 8 GB
#### 32b 130 GB 65 GB 32.5 GB - 35 GB 16.25 GB - 19 GB
#### 65b 260.8 GB 130.4 GB 65.2 GB - 67 GB 32.6 GB - - 35 GB
# MODEL_ID = "TheBloke/Llama-2-7B-Chat-GGML"
# MODEL_BASENAME = "llama-2-7b-chat.ggmlv3.q4_0.bin"
####
#### (FOR GGUF MODELS)
####
# MODEL_ID = "TheBloke/Llama-2-13b-Chat-GGUF"
# MODEL_BASENAME = "llama-2-13b-chat.Q4_K_M.gguf"
# MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
# MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"
# MODEL_ID = "QuantFactory/Meta-Llama-3-8B-Instruct-GGUF"
# MODEL_BASENAME = "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf"
# Use mistral to run on hpu
# MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.2"
# LLAMA 3 # use for Apple Silicon
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
MODEL_BASENAME = None
# LLAMA 3 # use for NVIDIA GPUs
# MODEL_ID = "unsloth/llama-3-8b-bnb-4bit"
# MODEL_BASENAME = None
# MODEL_ID = "TheBloke/Mistral-7B-Instruct-v0.1-GGUF"
# MODEL_BASENAME = "mistral-7b-instruct-v0.1.Q8_0.gguf"
# MODEL_ID = "TheBloke/Llama-2-70b-Chat-GGUF"
# MODEL_BASENAME = "llama-2-70b-chat.Q4_K_M.gguf"
####
#### (FOR HF MODELS)
####
# MODEL_ID = "NousResearch/Llama-2-7b-chat-hf"
# MODEL_BASENAME = None
# MODEL_ID = "TheBloke/vicuna-7B-1.1-HF"
# MODEL_BASENAME = None
# MODEL_ID = "TheBloke/Wizard-Vicuna-7B-Uncensored-HF"
# MODEL_ID = "TheBloke/guanaco-7B-HF"
# MODEL_ID = 'NousResearch/Nous-Hermes-13b' # Requires ~ 23GB VRAM. Using STransformers
# alongside will 100% create OOM on 24GB cards.
# llm = load_model(device_type, model_id=model_id)
####
#### (FOR GPTQ QUANTIZED) Select a llm model based on your GPU and VRAM GB. Does not include Embedding Models VRAM usage.
####
##### 48GB VRAM Graphics Cards (RTX 6000, RTX A6000 and other 48GB VRAM GPUs) #####
### 65b GPTQ LLM Models for 48GB GPUs (*** With best embedding model: hkunlp/instructor-xl ***)
# MODEL_ID = "TheBloke/guanaco-65B-GPTQ"
# MODEL_BASENAME = "model.safetensors"
# MODEL_ID = "TheBloke/Airoboros-65B-GPT4-2.0-GPTQ"
# MODEL_BASENAME = "model.safetensors"
# MODEL_ID = "TheBloke/gpt4-alpaca-lora_mlp-65B-GPTQ"
# MODEL_BASENAME = "model.safetensors"
# MODEL_ID = "TheBloke/Upstage-Llama1-65B-Instruct-GPTQ"
# MODEL_BASENAME = "model.safetensors"
##### 24GB VRAM Graphics Cards (RTX 3090 - RTX 4090 (35% Faster) - RTX A5000 - RTX A5500) #####
### 13b GPTQ Models for 24GB GPUs (*** With best embedding model: hkunlp/instructor-xl ***)
# MODEL_ID = "TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ"
# MODEL_BASENAME = "Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors"
# MODEL_ID = "TheBloke/vicuna-13B-v1.5-GPTQ"
# MODEL_BASENAME = "model.safetensors"
# MODEL_ID = "TheBloke/Nous-Hermes-13B-GPTQ"
# MODEL_BASENAME = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"
# MODEL_ID = "TheBloke/WizardLM-13B-V1.2-GPTQ"
# MODEL_BASENAME = "gptq_model-4bit-128g.safetensors
### 30b GPTQ Models for 24GB GPUs (*** Requires using intfloat/e5-base-v2 instead of hkunlp/instructor-large as embedding model ***)
# MODEL_ID = "TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ"
# MODEL_BASENAME = "Wizard-Vicuna-30B-Uncensored-GPTQ-4bit--1g.act.order.safetensors"
# MODEL_ID = "TheBloke/WizardLM-30B-Uncensored-GPTQ"
# MODEL_BASENAME = "WizardLM-30B-Uncensored-GPTQ-4bit.act-order.safetensors"
##### 8-10GB VRAM Graphics Cards (RTX 3080 - RTX 3080 Ti - RTX 3070 Ti - 3060 Ti - RTX 2000 Series, Quadro RTX 4000, 5000, 6000) #####
### (*** Requires using intfloat/e5-small-v2 instead of hkunlp/instructor-large as embedding model ***)
### 7b GPTQ Models for 8GB GPUs
# MODEL_ID = "TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ"
# MODEL_BASENAME = "Wizard-Vicuna-7B-Uncensored-GPTQ-4bit-128g.no-act.order.safetensors"
# MODEL_ID = "TheBloke/WizardLM-7B-uncensored-GPTQ"
# MODEL_BASENAME = "WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors"
# MODEL_ID = "TheBloke/wizardLM-7B-GPTQ"
# MODEL_BASENAME = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"
####
#### (FOR GGML) (Quantized cpu+gpu+mps) models - check if they support llama.cpp
####
# MODEL_ID = "TheBloke/wizard-vicuna-13B-GGML"
# MODEL_BASENAME = "wizard-vicuna-13B.ggmlv3.q4_0.bin"
# MODEL_BASENAME = "wizard-vicuna-13B.ggmlv3.q6_K.bin"
# MODEL_BASENAME = "wizard-vicuna-13B.ggmlv3.q2_K.bin"
# MODEL_ID = "TheBloke/orca_mini_3B-GGML"
# MODEL_BASENAME = "orca-mini-3b.ggmlv3.q4_0.bin"
####
#### (FOR AWQ QUANTIZED) Select a llm model based on your GPU and VRAM GB. Does not include Embedding Models VRAM usage.
### (*** MODEL_BASENAME is not actually used but have to contain .awq so the correct model loading is used ***)
### (*** Compute capability 7.5 (sm75) and CUDA Toolkit 11.8+ are required ***)
####
# MODEL_ID = "TheBloke/Llama-2-7B-Chat-AWQ"
# MODEL_BASENAME = "model.safetensors.awq"

View File

@ -1,91 +0,0 @@
import os
import shutil
import click
import subprocess
from constants import (
DOCUMENT_MAP,
SOURCE_DIRECTORY
)
def logToFile(logentry):
file1 = open("crawl.log","a")
file1.write(logentry + "\n")
file1.close()
print(logentry + "\n")
@click.command()
@click.option(
"--device_type",
default="cuda",
type=click.Choice(
[
"cpu",
"cuda",
"ipu",
"xpu",
"mkldnn",
"opengl",
"opencl",
"ideep",
"hip",
"ve",
"fpga",
"ort",
"xla",
"lazy",
"vulkan",
"mps",
"meta",
"hpu",
"mtia",
],
),
help="Device to run on. (Default is cuda)",
)
@click.option(
"--landing_directory",
default="./LANDING_DOCUMENTS"
)
@click.option(
"--processed_directory",
default="./PROCESSED_DOCUMENTS"
)
@click.option(
"--error_directory",
default="./ERROR_DOCUMENTS"
)
@click.option(
"--unsupported_directory",
default="./UNSUPPORTED_DOCUMENTS"
)
def main(device_type, landing_directory, processed_directory, error_directory, unsupported_directory):
paths = []
os.makedirs(processed_directory, exist_ok=True)
os.makedirs(error_directory, exist_ok=True)
os.makedirs(unsupported_directory, exist_ok=True)
for root, _, files in os.walk(landing_directory):
for file_name in files:
file_extension = os.path.splitext(file_name)[1]
short_filename = os.path.basename(file_name)
if not os.path.isdir(root + "/" + file_name):
if file_extension in DOCUMENT_MAP.keys():
shutil.move(root + "/" + file_name, SOURCE_DIRECTORY+ "/" + short_filename)
logToFile("START: " + root + "/" + short_filename)
process = subprocess.Popen("python ingest.py --device_type=" + device_type, shell=True, stdout=subprocess.PIPE)
process.wait()
if process.returncode > 0:
shutil.move(SOURCE_DIRECTORY + "/" + short_filename, error_directory + "/" + short_filename)
logToFile("ERROR: " + root + "/" + short_filename)
else:
logToFile("VALID: " + root + "/" + short_filename)
shutil.move(SOURCE_DIRECTORY + "/" + short_filename, processed_directory+ "/" + short_filename)
else:
shutil.move(root + "/" + file_name, unsupported_directory+ "/" + short_filename)
if __name__ == "__main__":
main()

372
create_index_script.py Normal file
View File

@ -0,0 +1,372 @@
#!/usr/bin/env python3
"""
Interactive Index Creation Script for LocalGPT RAG System
This script provides a user-friendly interface for creating document indexes
using the LocalGPT RAG system. It supports both single documents and batch
processing of multiple documents.
Usage:
python create_index_script.py
python create_index_script.py --batch
python create_index_script.py --config custom_config.json
"""
import os
import sys
import json
import argparse
from typing import List, Optional
from pathlib import Path
# Add the project root to the path so we can import rag_system modules
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
try:
from rag_system.main import PIPELINE_CONFIGS, get_agent
from rag_system.pipelines.indexing_pipeline import IndexingPipeline
from rag_system.utils.ollama_client import OllamaClient
from backend.database import ChatDatabase
except ImportError as e:
print(f"❌ Error importing required modules: {e}")
print("Please ensure you're running this script from the project root directory.")
sys.exit(1)
class IndexCreator:
"""Interactive index creation utility."""
def __init__(self, config_path: Optional[str] = None):
"""Initialize the index creator with optional custom configuration."""
self.db = ChatDatabase()
self.config = self._load_config(config_path)
# Initialize Ollama client
self.ollama_client = OllamaClient()
self.ollama_config = {
"generation_model": "qwen3:0.6b",
"embedding_model": "qwen3:0.6b"
}
# Initialize indexing pipeline
self.pipeline = IndexingPipeline(
self.config,
self.ollama_client,
self.ollama_config
)
def _load_config(self, config_path: Optional[str] = None) -> dict:
"""Load configuration from file or use default."""
if config_path and os.path.exists(config_path):
try:
with open(config_path, 'r') as f:
return json.load(f)
except Exception as e:
print(f"⚠️ Error loading config from {config_path}: {e}")
print("Using default configuration...")
return PIPELINE_CONFIGS.get("default", {})
def get_user_input(self, prompt: str, default: str = "") -> str:
"""Get user input with optional default value."""
if default:
user_input = input(f"{prompt} [{default}]: ").strip()
return user_input if user_input else default
return input(f"{prompt}: ").strip()
def select_documents(self) -> List[str]:
"""Interactive document selection."""
print("\n📁 Document Selection")
print("=" * 50)
documents = []
while True:
print("\nOptions:")
print("1. Add a single document")
print("2. Add all documents from a directory")
print("3. Finish and proceed with selected documents")
print("4. Show selected documents")
choice = self.get_user_input("Select an option (1-4)", "1")
if choice == "1":
doc_path = self.get_user_input("Enter document path")
if os.path.exists(doc_path):
documents.append(os.path.abspath(doc_path))
print(f"✅ Added: {doc_path}")
else:
print(f"❌ File not found: {doc_path}")
elif choice == "2":
dir_path = self.get_user_input("Enter directory path")
if os.path.isdir(dir_path):
supported_extensions = ['.pdf', '.txt', '.docx', '.md']
found_docs = []
for ext in supported_extensions:
found_docs.extend(Path(dir_path).glob(f"*{ext}"))
found_docs.extend(Path(dir_path).glob(f"**/*{ext}"))
if found_docs:
print(f"Found {len(found_docs)} documents:")
for doc in found_docs:
print(f" - {doc}")
if self.get_user_input("Add all these documents? (y/n)", "y").lower() == 'y':
documents.extend([str(doc.absolute()) for doc in found_docs])
print(f"✅ Added {len(found_docs)} documents")
else:
print("❌ No supported documents found in directory")
else:
print(f"❌ Directory not found: {dir_path}")
elif choice == "3":
if documents:
break
else:
print("❌ No documents selected. Please add at least one document.")
elif choice == "4":
if documents:
print(f"\n📄 Selected documents ({len(documents)}):")
for i, doc in enumerate(documents, 1):
print(f" {i}. {doc}")
else:
print("No documents selected yet.")
else:
print("Invalid choice. Please select 1-4.")
return documents
def configure_processing(self) -> dict:
"""Interactive processing configuration."""
print("\n⚙️ Processing Configuration")
print("=" * 50)
print("Configure how documents will be processed:")
# Basic settings
chunk_size = int(self.get_user_input("Chunk size", "512"))
chunk_overlap = int(self.get_user_input("Chunk overlap", "64"))
# Advanced settings
print("\nAdvanced options:")
enable_enrich = self.get_user_input("Enable contextual enrichment? (y/n)", "y").lower() == 'y'
enable_latechunk = self.get_user_input("Enable late chunking? (y/n)", "y").lower() == 'y'
enable_docling = self.get_user_input("Enable Docling chunking? (y/n)", "y").lower() == 'y'
# Model selection
print("\nModel Configuration:")
embedding_model = self.get_user_input("Embedding model", "Qwen/Qwen3-Embedding-0.6B")
generation_model = self.get_user_input("Generation model", "qwen3:0.6b")
return {
"chunk_size": chunk_size,
"chunk_overlap": chunk_overlap,
"enable_enrich": enable_enrich,
"enable_latechunk": enable_latechunk,
"enable_docling": enable_docling,
"embedding_model": embedding_model,
"generation_model": generation_model,
"retrieval_mode": "hybrid",
"window_size": 2
}
def create_index_interactive(self) -> None:
"""Run the interactive index creation process."""
print("🚀 LocalGPT Index Creation Tool")
print("=" * 50)
# Get index details
index_name = self.get_user_input("Enter index name")
index_description = self.get_user_input("Enter index description (optional)")
# Select documents
documents = self.select_documents()
# Configure processing
processing_config = self.configure_processing()
# Confirm creation
print("\n📋 Index Summary")
print("=" * 50)
print(f"Name: {index_name}")
print(f"Description: {index_description or 'None'}")
print(f"Documents: {len(documents)}")
print(f"Chunk size: {processing_config['chunk_size']}")
print(f"Enrichment: {'Enabled' if processing_config['enable_enrich'] else 'Disabled'}")
print(f"Embedding model: {processing_config['embedding_model']}")
if self.get_user_input("\nProceed with index creation? (y/n)", "y").lower() != 'y':
print("❌ Index creation cancelled.")
return
# Create the index
try:
print("\n🔥 Creating index...")
# Create index record in database
index_id = self.db.create_index(
name=index_name,
description=index_description,
metadata=processing_config
)
# Add documents to index
for doc_path in documents:
filename = os.path.basename(doc_path)
self.db.add_document_to_index(index_id, filename, doc_path)
# Process documents through pipeline
print("📚 Processing documents...")
self.pipeline.process_documents(documents)
print(f"\n✅ Index '{index_name}' created successfully!")
print(f"Index ID: {index_id}")
print(f"Processed {len(documents)} documents")
# Test the index
if self.get_user_input("\nTest the index with a sample query? (y/n)", "y").lower() == 'y':
self.test_index(index_id)
except Exception as e:
print(f"❌ Error creating index: {e}")
import traceback
traceback.print_exc()
def test_index(self, index_id: str) -> None:
"""Test the created index with a sample query."""
try:
print("\n🧪 Testing Index")
print("=" * 50)
# Get agent for testing
agent = get_agent("default")
# Test query
test_query = self.get_user_input("Enter a test query", "What is this document about?")
print(f"\nProcessing query: {test_query}")
response = agent.run(test_query, table_name=f"text_pages_{index_id}")
print(f"\n🤖 Response:")
print(response)
except Exception as e:
print(f"❌ Error testing index: {e}")
def batch_create_from_config(self, config_file: str) -> None:
"""Create index from batch configuration file."""
try:
with open(config_file, 'r') as f:
batch_config = json.load(f)
index_name = batch_config.get("index_name", "Batch Index")
index_description = batch_config.get("index_description", "")
documents = batch_config.get("documents", [])
processing_config = batch_config.get("processing", {})
if not documents:
print("❌ No documents specified in batch configuration")
return
# Validate documents exist
valid_documents = []
for doc_path in documents:
if os.path.exists(doc_path):
valid_documents.append(doc_path)
else:
print(f"⚠️ Document not found: {doc_path}")
if not valid_documents:
print("❌ No valid documents found")
return
print(f"🚀 Creating batch index: {index_name}")
print(f"📄 Processing {len(valid_documents)} documents...")
# Create index
index_id = self.db.create_index(
name=index_name,
description=index_description,
metadata=processing_config
)
# Add documents
for doc_path in valid_documents:
filename = os.path.basename(doc_path)
self.db.add_document_to_index(index_id, filename, doc_path)
# Process documents
self.pipeline.process_documents(valid_documents)
print(f"✅ Batch index '{index_name}' created successfully!")
print(f"Index ID: {index_id}")
except Exception as e:
print(f"❌ Error creating batch index: {e}")
import traceback
traceback.print_exc()
def create_sample_batch_config():
"""Create a sample batch configuration file."""
sample_config = {
"index_name": "Sample Batch Index",
"index_description": "Example batch index configuration",
"documents": [
"./rag_system/documents/invoice_1039.pdf",
"./rag_system/documents/invoice_1041.pdf"
],
"processing": {
"chunk_size": 512,
"chunk_overlap": 64,
"enable_enrich": True,
"enable_latechunk": True,
"enable_docling": True,
"embedding_model": "Qwen/Qwen3-Embedding-0.6B",
"generation_model": "qwen3:0.6b",
"retrieval_mode": "hybrid",
"window_size": 2
}
}
with open("batch_indexing_config.json", "w") as f:
json.dump(sample_config, f, indent=2)
print("📄 Sample batch configuration created: batch_indexing_config.json")
def main():
"""Main entry point for the script."""
parser = argparse.ArgumentParser(description="LocalGPT Index Creation Tool")
parser.add_argument("--batch", help="Batch configuration file", type=str)
parser.add_argument("--config", help="Custom pipeline configuration file", type=str)
parser.add_argument("--create-sample", action="store_true", help="Create sample batch config")
args = parser.parse_args()
if args.create_sample:
create_sample_batch_config()
return
try:
creator = IndexCreator(config_path=args.config)
if args.batch:
creator.batch_create_from_config(args.batch)
else:
creator.create_index_interactive()
except KeyboardInterrupt:
print("\n\n❌ Operation cancelled by user.")
except Exception as e:
print(f"❌ Unexpected error: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
main()

386
demo_batch_indexing.py Normal file
View File

@ -0,0 +1,386 @@
#!/usr/bin/env python3
"""
Demo Batch Indexing Script for LocalGPT RAG System
This script demonstrates how to perform batch indexing of multiple documents
using configuration files. It's designed to showcase the full capabilities
of the indexing pipeline with various configuration options.
Usage:
python demo_batch_indexing.py --config batch_indexing_config.json
python demo_batch_indexing.py --create-sample-config
python demo_batch_indexing.py --help
"""
import os
import sys
import json
import argparse
import time
import logging
from typing import List, Dict, Any, Optional
from pathlib import Path
from datetime import datetime
# Add the project root to the path so we can import rag_system modules
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
try:
from rag_system.main import PIPELINE_CONFIGS
from rag_system.pipelines.indexing_pipeline import IndexingPipeline
from rag_system.utils.ollama_client import OllamaClient
from backend.database import ChatDatabase
except ImportError as e:
print(f"❌ Error importing required modules: {e}")
print("Please ensure you're running this script from the project root directory.")
sys.exit(1)
# Configure logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s | %(levelname)-7s | %(name)s | %(message)s",
)
class BatchIndexingDemo:
"""Demonstration of batch indexing capabilities."""
def __init__(self, config_path: str):
"""Initialize the batch indexing demo."""
self.config_path = config_path
self.config = self._load_config()
self.db = ChatDatabase()
# Initialize Ollama client
self.ollama_client = OllamaClient()
# Initialize pipeline with merged configuration
self.pipeline_config = self._merge_configurations()
self.pipeline = IndexingPipeline(
self.pipeline_config,
self.ollama_client,
self.config.get("ollama_config", {
"generation_model": "qwen3:0.6b",
"embedding_model": "qwen3:0.6b"
})
)
def _load_config(self) -> Dict[str, Any]:
"""Load batch indexing configuration from file."""
try:
with open(self.config_path, 'r') as f:
config = json.load(f)
print(f"✅ Loaded configuration from {self.config_path}")
return config
except FileNotFoundError:
print(f"❌ Configuration file not found: {self.config_path}")
sys.exit(1)
except json.JSONDecodeError as e:
print(f"❌ Invalid JSON in configuration file: {e}")
sys.exit(1)
def _merge_configurations(self) -> Dict[str, Any]:
"""Merge batch config with default pipeline config."""
# Start with default pipeline configuration
merged_config = PIPELINE_CONFIGS.get("default", {}).copy()
# Override with batch-specific settings
batch_settings = self.config.get("pipeline_settings", {})
# Deep merge for nested dictionaries
def deep_merge(base: dict, override: dict) -> dict:
result = base.copy()
for key, value in override.items():
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
result[key] = deep_merge(result[key], value)
else:
result[key] = value
return result
return deep_merge(merged_config, batch_settings)
def validate_documents(self, documents: List[str]) -> List[str]:
"""Validate and filter document paths."""
valid_documents = []
print(f"📋 Validating {len(documents)} documents...")
for doc_path in documents:
# Handle relative paths
if not os.path.isabs(doc_path):
doc_path = os.path.abspath(doc_path)
if os.path.exists(doc_path):
# Check file extension
ext = Path(doc_path).suffix.lower()
if ext in ['.pdf', '.txt', '.docx', '.md']:
valid_documents.append(doc_path)
print(f"{doc_path}")
else:
print(f" ⚠️ Unsupported file type: {doc_path}")
else:
print(f" ❌ File not found: {doc_path}")
print(f"📊 {len(valid_documents)} valid documents found")
return valid_documents
def create_indexes(self) -> List[str]:
"""Create multiple indexes based on configuration."""
indexes = self.config.get("indexes", [])
created_indexes = []
for index_config in indexes:
index_id = self.create_single_index(index_config)
if index_id:
created_indexes.append(index_id)
return created_indexes
def create_single_index(self, index_config: Dict[str, Any]) -> Optional[str]:
"""Create a single index from configuration."""
try:
# Extract index metadata
index_name = index_config.get("name", "Unnamed Index")
index_description = index_config.get("description", "")
documents = index_config.get("documents", [])
if not documents:
print(f"⚠️ No documents specified for index '{index_name}', skipping...")
return None
# Validate documents
valid_documents = self.validate_documents(documents)
if not valid_documents:
print(f"❌ No valid documents found for index '{index_name}'")
return None
print(f"\n🚀 Creating index: {index_name}")
print(f"📄 Processing {len(valid_documents)} documents")
# Create index record in database
index_metadata = {
"created_by": "demo_batch_indexing.py",
"created_at": datetime.now().isoformat(),
"document_count": len(valid_documents),
"config_used": index_config.get("processing_options", {})
}
index_id = self.db.create_index(
name=index_name,
description=index_description,
metadata=index_metadata
)
# Add documents to index
for doc_path in valid_documents:
filename = os.path.basename(doc_path)
self.db.add_document_to_index(index_id, filename, doc_path)
# Process documents through pipeline
start_time = time.time()
self.pipeline.process_documents(valid_documents)
processing_time = time.time() - start_time
print(f"✅ Index '{index_name}' created successfully!")
print(f" Index ID: {index_id}")
print(f" Processing time: {processing_time:.2f} seconds")
print(f" Documents processed: {len(valid_documents)}")
return index_id
except Exception as e:
print(f"❌ Error creating index '{index_name}': {e}")
import traceback
traceback.print_exc()
return None
def demonstrate_features(self):
"""Demonstrate various indexing features."""
print("\n🎯 Batch Indexing Demo Features:")
print("=" * 50)
# Show configuration
print(f"📋 Configuration file: {self.config_path}")
print(f"📊 Number of indexes to create: {len(self.config.get('indexes', []))}")
# Show pipeline settings
pipeline_settings = self.config.get("pipeline_settings", {})
if pipeline_settings:
print("\n⚙️ Pipeline Settings:")
for key, value in pipeline_settings.items():
print(f" {key}: {value}")
# Show model configuration
ollama_config = self.config.get("ollama_config", {})
if ollama_config:
print("\n🤖 Model Configuration:")
for key, value in ollama_config.items():
print(f" {key}: {value}")
def run_demo(self):
"""Run the complete batch indexing demo."""
print("🚀 LocalGPT Batch Indexing Demo")
print("=" * 50)
# Show demo features
self.demonstrate_features()
# Create indexes
print(f"\n📚 Starting batch indexing process...")
start_time = time.time()
created_indexes = self.create_indexes()
total_time = time.time() - start_time
# Summary
print(f"\n📊 Batch Indexing Summary")
print("=" * 50)
print(f"✅ Successfully created {len(created_indexes)} indexes")
print(f"⏱️ Total processing time: {total_time:.2f} seconds")
if created_indexes:
print(f"\n📋 Created Indexes:")
for i, index_id in enumerate(created_indexes, 1):
index_info = self.db.get_index(index_id)
if index_info:
print(f" {i}. {index_info['name']} ({index_id[:8]}...)")
print(f" Documents: {len(index_info.get('documents', []))}")
print(f"\n🎉 Demo completed successfully!")
print(f"💡 You can now use these indexes in the LocalGPT interface.")
def create_sample_config():
"""Create a comprehensive sample configuration file."""
sample_config = {
"description": "Demo batch indexing configuration showcasing various features",
"pipeline_settings": {
"embedding_model_name": "Qwen/Qwen3-Embedding-0.6B",
"indexing": {
"embedding_batch_size": 50,
"enrichment_batch_size": 25,
"enable_progress_tracking": True
},
"contextual_enricher": {
"enabled": True,
"window_size": 2,
"model_name": "qwen3:0.6b"
},
"chunking": {
"chunk_size": 512,
"chunk_overlap": 64,
"enable_latechunk": True,
"enable_docling": True
},
"retrievers": {
"dense": {
"enabled": True,
"lancedb_table_name": "demo_text_pages"
},
"bm25": {
"enabled": True,
"index_name": "demo_bm25_index"
}
},
"storage": {
"lancedb_uri": "./index_store/lancedb",
"bm25_path": "./index_store/bm25"
}
},
"ollama_config": {
"generation_model": "qwen3:0.6b",
"embedding_model": "qwen3:0.6b"
},
"indexes": [
{
"name": "Sample Invoice Collection",
"description": "Demo index containing sample invoice documents",
"documents": [
"./rag_system/documents/invoice_1039.pdf",
"./rag_system/documents/invoice_1041.pdf"
],
"processing_options": {
"chunk_size": 512,
"enable_enrichment": True,
"retrieval_mode": "hybrid"
}
},
{
"name": "Research Papers Demo",
"description": "Demo index for research papers and whitepapers",
"documents": [
"./rag_system/documents/Newwhitepaper_Agents2.pdf"
],
"processing_options": {
"chunk_size": 1024,
"enable_enrichment": True,
"retrieval_mode": "dense"
}
}
]
}
config_filename = "batch_indexing_config.json"
with open(config_filename, "w") as f:
json.dump(sample_config, f, indent=2)
print(f"✅ Sample configuration created: {config_filename}")
print(f"📝 Edit this file to customize your batch indexing setup")
print(f"🚀 Run: python demo_batch_indexing.py --config {config_filename}")
def main():
"""Main entry point for the demo script."""
parser = argparse.ArgumentParser(
description="LocalGPT Batch Indexing Demo",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python demo_batch_indexing.py --config batch_indexing_config.json
python demo_batch_indexing.py --create-sample-config
This demo showcases the advanced batch indexing capabilities of LocalGPT,
including multi-index creation, advanced configuration options, and
comprehensive processing pipelines.
"""
)
parser.add_argument(
"--config",
type=str,
default="batch_indexing_config.json",
help="Path to batch indexing configuration file"
)
parser.add_argument(
"--create-sample-config",
action="store_true",
help="Create a sample configuration file"
)
args = parser.parse_args()
if args.create_sample_config:
create_sample_config()
return
if not os.path.exists(args.config):
print(f"❌ Configuration file not found: {args.config}")
print(f"💡 Create a sample config with: python {sys.argv[0]} --create-sample-config")
sys.exit(1)
try:
demo = BatchIndexingDemo(args.config)
demo.run_demo()
except KeyboardInterrupt:
print("\n\n❌ Demo cancelled by user.")
except Exception as e:
print(f"❌ Demo failed: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
main()

View File

@ -0,0 +1,77 @@
services:
# RAG API server (connects to host Ollama)
rag-api:
build:
context: .
dockerfile: Dockerfile.rag-api
container_name: rag-api
ports:
- "8001:8001"
environment:
- OLLAMA_HOST=http://host.docker.internal:11434
- NODE_ENV=production
volumes:
- ./lancedb:/app/lancedb
- ./index_store:/app/index_store
- ./shared_uploads:/app/shared_uploads
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/models"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
networks:
- rag-network
# Backend API server
backend:
build:
context: .
dockerfile: Dockerfile.backend
container_name: rag-backend
ports:
- "8000:8000"
environment:
- NODE_ENV=production
- RAG_API_URL=http://rag-api:8001
volumes:
- ./backend/chat_data.db:/app/backend/chat_data.db
- ./shared_uploads:/app/shared_uploads
depends_on:
rag-api:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
networks:
- rag-network
# Frontend Next.js application
frontend:
build:
context: .
dockerfile: Dockerfile.frontend
container_name: rag-frontend
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- NEXT_PUBLIC_API_URL=http://localhost:8000
depends_on:
backend:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
networks:
- rag-network
networks:
rag-network:
driver: bridge

103
docker-compose.yml Normal file
View File

@ -0,0 +1,103 @@
services:
# Ollama service for LLM inference (optional - can use host Ollama instead)
ollama:
image: ollama/ollama:latest
container_name: rag-ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
networks:
- rag-network
profiles:
- with-ollama # Optional service - enable with --profile with-ollama
# RAG API server
rag-api:
build:
context: .
dockerfile: Dockerfile.rag-api
container_name: rag-api
ports:
- "8001:8001"
environment:
# Use host Ollama by default, or containerized Ollama if enabled
- OLLAMA_HOST=${OLLAMA_HOST:-http://host.docker.internal:11434}
- NODE_ENV=production
volumes:
- ./lancedb:/app/lancedb
- ./index_store:/app/index_store
- ./shared_uploads:/app/shared_uploads
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/models"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
networks:
- rag-network
# Backend API server
backend:
build:
context: .
dockerfile: Dockerfile.backend
container_name: rag-backend
ports:
- "8000:8000"
environment:
- NODE_ENV=production
- RAG_API_URL=http://rag-api:8001
volumes:
- ./backend/chat_data.db:/app/backend/chat_data.db
- ./shared_uploads:/app/shared_uploads
depends_on:
rag-api:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
networks:
- rag-network
# Frontend Next.js application
frontend:
build:
context: .
dockerfile: Dockerfile.frontend
container_name: rag-frontend
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- NEXT_PUBLIC_API_URL=http://localhost:8000
depends_on:
backend:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
networks:
- rag-network
volumes:
ollama_data:
driver: local
networks:
rag-network:
driver: bridge

11
docker.env Normal file
View File

@ -0,0 +1,11 @@
# Docker environment configuration
# Set this to use local Ollama instance running on host
OLLAMA_HOST=http://host.docker.internal:11434
# Alternative: Use containerized Ollama (uncomment and run with --profile with-ollama)
# OLLAMA_HOST=http://ollama:11434
# Other configuration
NODE_ENV=production
NEXT_PUBLIC_API_URL=http://localhost:8000
RAG_API_URL=http://rag-api:8001

16
eslint.config.mjs Normal file
View File

@ -0,0 +1,16 @@
import { dirname } from "path";
import { fileURLToPath } from "url";
import { FlatCompat } from "@eslint/eslintrc";
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const compat = new FlatCompat({
baseDirectory: __dirname,
});
const eslintConfig = [
...compat.extends("next/core-web-vitals", "next/typescript"),
];
export default eslintConfig;

View File

@ -1,40 +0,0 @@
import logging
import torch
from langchain.embeddings import HuggingFaceEmbeddings
from habana_frameworks.torch.utils.library_loader import load_habana_module
from optimum.habana.sentence_transformers.modeling_utils import (
adapt_sentence_transformers_to_gaudi,
)
from constants import EMBEDDING_MODEL_NAME
def load_embeddings():
"""Load HuggingFace Embeddings object onto Gaudi or CPU"""
load_habana_module()
if torch.hpu.is_available():
logging.info("Loading embedding model on hpu")
adapt_sentence_transformers_to_gaudi()
embeddings = HuggingFaceEmbeddings(
model_name=EMBEDDING_MODEL_NAME, model_kwargs={"device": "hpu"}
)
else:
logging.info("Loading embedding model on cpu")
embeddings = HuggingFaceEmbeddings(
model_name=EMBEDDING_MODEL_NAME, model_kwargs={"device": "cpu"}
)
return embeddings
def calculate_similarity(model, response, expected_answer):
"""Calculate similarity between response and expected answer using the model"""
response_embedding = model.client.encode(response, convert_to_tensor=True).squeeze()
expected_embedding = model.client.encode(
expected_answer, convert_to_tensor=True
).squeeze()
similarity_score = torch.nn.functional.cosine_similarity(
response_embedding, expected_embedding, dim=0
)
return similarity_score.item()

View File

@ -1,168 +0,0 @@
import copy
import os
import torch
from pathlib import Path
from typing import List
import habana_frameworks.torch.hpu as torch_hpu
from habana_frameworks.torch.hpu import wrap_in_hpu_graph
from huggingface_hub import snapshot_download
from optimum.habana.transformers.generation import MODELS_OPTIMIZED_WITH_STATIC_SHAPES
from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi
from optimum.habana.utils import set_seed
from transformers import AutoModelForCausalLM, AutoTokenizer, TextGenerationPipeline
from transformers.utils import is_offline_mode
def get_repo_root(model_name_or_path, local_rank=-1, token=None):
"""
Downloads the specified model checkpoint and returns the repository where it was downloaded.
"""
if Path(model_name_or_path).is_dir():
# If it is a local model, no need to download anything
return model_name_or_path
else:
# Checks if online or not
if is_offline_mode():
if local_rank == 0:
print("Offline mode: forcing local_files_only=True")
# Only download PyTorch weights by default
allow_patterns = ["*.bin"]
# Download only on first process
if local_rank in [-1, 0]:
cache_dir = snapshot_download(
model_name_or_path,
local_files_only=is_offline_mode(),
cache_dir=os.getenv("TRANSFORMERS_CACHE", None),
allow_patterns=allow_patterns,
max_workers=16,
token=token,
)
if local_rank == -1:
# If there is only one process, then the method is finished
return cache_dir
# Make all processes wait so that other processes can get the checkpoint directly from cache
torch.distributed.barrier()
return snapshot_download(
model_name_or_path,
local_files_only=is_offline_mode(),
cache_dir=os.getenv("TRANSFORMERS_CACHE", None),
allow_patterns=allow_patterns,
token=token,
)
def get_optimized_model_name(config):
for model_type in MODELS_OPTIMIZED_WITH_STATIC_SHAPES:
if model_type == config.model_type:
return model_type
return None
def model_is_optimized(config):
"""
Checks if the given config belongs to a model in optimum/habana/transformers/models, which has a
new input token_idx.
"""
return get_optimized_model_name(config) is not None
class GaudiTextGenerationPipeline(TextGenerationPipeline):
"""
An end-to-end text-generation pipeline that can used to initialize LangChain classes.
"""
def __init__(self, model_name_or_path=None, revision="main", **kwargs):
self.task = "text-generation"
self.device = "hpu"
# Tweak generation so that it runs faster on Gaudi
adapt_transformers_to_gaudi()
set_seed(27)
# Initialize tokenizer and define datatype
self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, revision=revision)
model_dtype = torch.bfloat16
# Intialize model
get_repo_root(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, revision=revision, torch_dtype=model_dtype)
model = model.eval().to(self.device)
is_optimized = model_is_optimized(model.config)
model = wrap_in_hpu_graph(model)
self.model = model
# Used for padding input to fixed length
self.tokenizer.padding_side = "left"
self.max_padding_length = kwargs.get("max_padding_length", self.model.config.max_position_embeddings)
# Define config params for llama and mistral models
if self.model.config.model_type in ["llama", "mistral"]:
self.model.generation_config.pad_token_id = 0
self.model.generation_config.bos_token_id = 1
self.model.generation_config.eos_token_id = 2
self.tokenizer.bos_token_id = self.model.generation_config.bos_token_id
self.tokenizer.eos_token_id = self.model.generation_config.eos_token_id
self.tokenizer.pad_token_id = self.model.generation_config.pad_token_id
self.tokenizer.pad_token = self.tokenizer.decode(self.tokenizer.pad_token_id)
self.tokenizer.eos_token = self.tokenizer.decode(self.tokenizer.eos_token_id)
self.tokenizer.bos_token = self.tokenizer.decode(self.tokenizer.bos_token_id)
# Applicable to models that do not have pad tokens
if self.tokenizer.pad_token is None:
self.tokenizer.pad_token = self.tokenizer.eos_token
self.model.generation_config.pad_token_id = self.model.generation_config.eos_token_id
# Edit generation configuration based on input arguments
self.generation_config = copy.deepcopy(self.model.generation_config)
self.generation_config.max_new_tokens = kwargs.get("max_new_tokens", 100)
self.generation_config.use_cache = kwargs.get("use_kv_cache", True)
self.generation_config.static_shapes = is_optimized
self.generation_config.do_sample = kwargs.get("do_sample", False)
self.generation_config.num_beams = kwargs.get("num_beams", 1)
self.generation_config.temperature = kwargs.get("temperature", 1.0)
self.generation_config.top_p = kwargs.get("top_p", 1.0)
self.generation_config.repetition_penalty = kwargs.get("repetition_penalty", 1.0)
self.generation_config.num_return_sequences = kwargs.get("num_return_sequences", 1)
self.generation_config.bad_words_ids = None
self.generation_config.force_words_ids = None
self.generation_config.ignore_eos = False
# Define empty post-process params dict as there is no postprocesing
self._postprocess_params = {}
# Warm-up hpu and compile computation graphs
self.compile_graph()
def __call__(self, prompt: List[str]):
"""
__call__ method of pipeline class
"""
# Tokenize input string
model_inputs = self.tokenizer.encode_plus(prompt[0], return_tensors="pt", max_length=self.max_padding_length, padding="max_length", truncation=True)
# Move tensors to hpu
for t in model_inputs:
if torch.is_tensor(model_inputs[t]):
model_inputs[t] = model_inputs[t].to(self.device)
# Call model's generate method
output = self.model.generate(**model_inputs, generation_config=self.generation_config, lazy_mode=True, hpu_graphs=True, profiling_steps=0, profiling_warmup_steps=0).cpu()
# Decode and return result
output_text = self.tokenizer.decode(output[0], skip_special_tokens=True)
del output, model_inputs
return [{"generated_text": output_text}]
def compile_graph(self):
"""
Function to compile computation graphs and synchronize hpus.
"""
for _ in range(3):
self(["Here is my prompt"])
torch_hpu.synchronize()

185
ingest.py
View File

@ -1,185 +0,0 @@
import logging
import os
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, as_completed
import click
import torch
from langchain.docstore.document import Document
from langchain.text_splitter import Language, RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from utils import get_embeddings
from constants import (
CHROMA_SETTINGS,
DOCUMENT_MAP,
EMBEDDING_MODEL_NAME,
INGEST_THREADS,
PERSIST_DIRECTORY,
SOURCE_DIRECTORY,
)
import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')
def file_log(logentry):
file1 = open("file_ingest.log", "a")
file1.write(logentry + "\n")
file1.close()
print(logentry + "\n")
def load_single_document(file_path: str) -> Document:
# Loads a single document from a file path
try:
file_extension = os.path.splitext(file_path)[1]
loader_class = DOCUMENT_MAP.get(file_extension)
if loader_class:
file_log(file_path + " loaded.")
loader = loader_class(file_path)
else:
file_log(file_path + " document type is undefined.")
raise ValueError("Document type is undefined")
return loader.load()[0]
except Exception as ex:
file_log("%s loading error: \n%s" % (file_path, ex))
return None
def load_document_batch(filepaths):
logging.info("Loading document batch")
# create a thread pool
with ThreadPoolExecutor(len(filepaths)) as exe:
# load files
futures = [exe.submit(load_single_document, name) for name in filepaths]
# collect data
if futures is None:
file_log(name + " failed to submit")
return None
else:
data_list = [future.result() for future in futures]
# return data and file paths
return (data_list, filepaths)
def load_documents(source_dir: str) -> list[Document]:
# Loads all documents from the source documents directory, including nested folders
paths = []
for root, _, files in os.walk(source_dir):
for file_name in files:
print("Importing: " + file_name)
file_extension = os.path.splitext(file_name)[1]
source_file_path = os.path.join(root, file_name)
if file_extension in DOCUMENT_MAP.keys():
paths.append(source_file_path)
# Have at least one worker and at most INGEST_THREADS workers
n_workers = min(INGEST_THREADS, max(len(paths), 1))
chunksize = round(len(paths) / n_workers)
docs = []
with ProcessPoolExecutor(n_workers) as executor:
futures = []
# split the load operations into chunks
for i in range(0, len(paths), chunksize):
# select a chunk of filenames
filepaths = paths[i : (i + chunksize)]
# submit the task
try:
future = executor.submit(load_document_batch, filepaths)
except Exception as ex:
file_log("executor task failed: %s" % (ex))
future = None
if future is not None:
futures.append(future)
# process all results
for future in as_completed(futures):
# open the file and load the data
try:
contents, _ = future.result()
docs.extend(contents)
except Exception as ex:
file_log("Exception: %s" % (ex))
return docs
def split_documents(documents: list[Document]) -> tuple[list[Document], list[Document]]:
# Splits documents for correct Text Splitter
text_docs, python_docs = [], []
for doc in documents:
if doc is not None:
file_extension = os.path.splitext(doc.metadata["source"])[1]
if file_extension == ".py":
python_docs.append(doc)
else:
text_docs.append(doc)
return text_docs, python_docs
@click.command()
@click.option(
"--device_type",
default="cuda" if torch.cuda.is_available() else "cpu",
type=click.Choice(
[
"cpu",
"cuda",
"ipu",
"xpu",
"mkldnn",
"opengl",
"opencl",
"ideep",
"hip",
"ve",
"fpga",
"ort",
"xla",
"lazy",
"vulkan",
"mps",
"meta",
"hpu",
"mtia",
],
),
help="Device to run on. (Default is cuda)",
)
def main(device_type):
# Load documents and split in chunks
logging.info(f"Loading documents from {SOURCE_DIRECTORY}")
documents = load_documents(SOURCE_DIRECTORY)
text_documents, python_documents = split_documents(documents)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
python_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.PYTHON, chunk_size=880, chunk_overlap=200
)
texts = text_splitter.split_documents(text_documents)
texts.extend(python_splitter.split_documents(python_documents))
logging.info(f"Loaded {len(documents)} documents from {SOURCE_DIRECTORY}")
logging.info(f"Split into {len(texts)} chunks of text")
"""
(1) Chooses an appropriate langchain library based on the enbedding model name. Matching code is contained within fun_localGPT.py.
(2) Provides additional arguments for instructor and BGE models to improve results, pursuant to the instructions contained on
their respective huggingface repository, project page or github repository.
"""
embeddings = get_embeddings(device_type)
logging.info(f"Loaded embeddings from {EMBEDDING_MODEL_NAME}")
db = Chroma.from_documents(
texts,
embeddings,
persist_directory=PERSIST_DIRECTORY,
client_settings=CHROMA_SETTINGS,
)
if __name__ == "__main__":
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(filename)s:%(lineno)s - %(message)s", level=logging.INFO
)
main()

View File

@ -1,213 +0,0 @@
import sys
import torch
if sys.platform != "darwin":
from auto_gptq import AutoGPTQForCausalLM
from huggingface_hub import hf_hub_download
from langchain.llms import LlamaCpp
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaForCausalLM, LlamaTokenizer, BitsAndBytesConfig
from constants import CONTEXT_WINDOW_SIZE, MAX_NEW_TOKENS, MODELS_PATH, N_BATCH, N_GPU_LAYERS
def load_quantized_model_gguf_ggml(model_id, model_basename, device_type, logging):
"""
Load a GGUF/GGML quantized model using LlamaCpp.
This function attempts to load a GGUF/GGML quantized model using the LlamaCpp library.
If the model is of type GGML, and newer version of LLAMA-CPP is used which does not support GGML,
it logs a message indicating that LLAMA-CPP has dropped support for GGML.
Parameters:
- model_id (str): The identifier for the model on HuggingFace Hub.
- model_basename (str): The base name of the model file.
- device_type (str): The type of device where the model will run, e.g., 'mps', 'cuda', etc.
- logging (logging.Logger): Logger instance for logging messages.
Returns:
- LlamaCpp: An instance of the LlamaCpp model if successful, otherwise None.
Notes:
- The function uses the `hf_hub_download` function to download the model from the HuggingFace Hub.
- The number of GPU layers is set based on the device type.
"""
try:
logging.info("Using Llamacpp for GGUF/GGML quantized models")
model_path = hf_hub_download(
repo_id=model_id,
filename=model_basename,
resume_download=True,
cache_dir=MODELS_PATH,
)
kwargs = {
"model_path": model_path,
"n_ctx": CONTEXT_WINDOW_SIZE,
"max_tokens": MAX_NEW_TOKENS,
"n_batch": N_BATCH, # set this based on your GPU & CPU RAM
}
if device_type.lower() == "mps":
kwargs["n_gpu_layers"] = 1
if device_type.lower() == "cuda":
kwargs["n_gpu_layers"] = N_GPU_LAYERS # set this based on your GPU
return LlamaCpp(**kwargs)
except TypeError:
if "ggml" in model_basename:
logging.INFO("If you were using GGML model, LLAMA-CPP Dropped Support, Use GGUF Instead")
return None
def load_quantized_model_qptq(model_id, model_basename, device_type, logging):
"""
Load a GPTQ quantized model using AutoGPTQForCausalLM.
This function loads a quantized model that ends with GPTQ and may have variations
of .no-act.order or .safetensors in their HuggingFace repo.
It will not work for Macs, as AutoGPTQ only supports Linux and Windows:
- Nvidia CUDA (Windows and Linux)
- AMD ROCm (Linux only)
- CPU QiGen (Linux only, new and experimental)
Parameters:
- model_id (str): The identifier for the model on HuggingFace Hub.
- model_basename (str): The base name of the model file.
- device_type (str): The type of device where the model will run.
- logging (logging.Logger): Logger instance for logging messages.
Returns:
- model (AutoGPTQForCausalLM): The loaded quantized model.
- tokenizer (AutoTokenizer): The tokenizer associated with the model.
Notes:
- The function checks for the ".safetensors" ending in the model_basename and removes it if present.
"""
if sys.platform == "darwin":
logging.INFO("GPTQ models will NOT work on Mac devices. Please choose a different model.")
return None, None
# The code supports all huggingface models that ends with GPTQ and have some variation
# of .no-act.order or .safetensors in their HF repo.
logging.info("Using AutoGPTQForCausalLM for quantized models")
if ".safetensors" in model_basename:
# Remove the ".safetensors" ending if present
model_basename = model_basename.replace(".safetensors", "")
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
logging.info("Tokenizer loaded")
model = AutoGPTQForCausalLM.from_quantized(
model_id,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device_map="auto",
use_triton=False,
quantize_config=None,
)
return model, tokenizer
def load_full_model(model_id, model_basename, device_type, logging):
"""
Load a full model using either LlamaTokenizer or AutoModelForCausalLM.
This function loads a full model based on the specified device type.
If the device type is 'mps' or 'cpu', it uses LlamaTokenizer and LlamaForCausalLM.
Otherwise, it uses AutoModelForCausalLM.
Parameters:
- model_id (str): The identifier for the model on HuggingFace Hub.
- model_basename (str): The base name of the model file.
- device_type (str): The type of device where the model will run.
- logging (logging.Logger): Logger instance for logging messages.
Returns:
- model (Union[LlamaForCausalLM, AutoModelForCausalLM]): The loaded model.
- tokenizer (Union[LlamaTokenizer, AutoTokenizer]): The tokenizer associated with the model.
Notes:
- The function uses the `from_pretrained` method to load both the model and the tokenizer.
- Additional settings are provided for NVIDIA GPUs, such as loading in 4-bit and setting the compute dtype.
"""
if device_type.lower() in ["mps", "cpu", "hpu"]:
logging.info("Using AutoModelForCausalLM")
# tokenizer = LlamaTokenizer.from_pretrained(model_id, cache_dir="./models/")
# model = LlamaForCausalLM.from_pretrained(model_id, cache_dir="./models/")
model = AutoModelForCausalLM.from_pretrained(model_id,
# quantization_config=quantization_config,
# low_cpu_mem_usage=True,
# torch_dtype="auto",
torch_dtype=torch.bfloat16,
device_map="auto",
cache_dir="./models/")
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="./models/")
else:
logging.info("Using AutoModelForCausalLM for full models")
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="./models/")
logging.info("Tokenizer loaded")
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
cache_dir=MODELS_PATH,
trust_remote_code=True, # set these if you are using NVIDIA GPU
quantization_config=bnb_config
# load_in_4bit=True,
# bnb_4bit_quant_type="nf4",
# bnb_4bit_compute_dtype=torch.float16,
# max_memory={0: "15GB"}, # Uncomment this line with you encounter CUDA out of memory errors
)
model.tie_weights()
return model, tokenizer
def load_quantized_model_awq(model_id, logging):
"""
Load a AWQ quantized model using AutoModelForCausalLM.
This function loads a quantized model that ends with AWQ.
It will not work for Macs as AutoAWQ currently only supports Nvidia GPUs.
Parameters:
- model_id (str): The identifier for the model on HuggingFace Hub.
- logging (logging.Logger): Logger instance for logging messages.
Returns:
- model (AutoModelForCausalLM): The loaded quantized model.
- tokenizer (AutoTokenizer): The tokenizer associated with the model.
"""
if sys.platform == "darwin":
logging.INFO("AWQ models will NOT work on Mac devices. Please choose a different model.")
return None, None
# The code supports all huggingface models that ends with AWQ.
logging.info("Using AutoModelForCausalLM for AWQ quantized models")
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
logging.info("Tokenizer loaded")
model = AutoModelForCausalLM.from_pretrained(
model_id,
use_safetensors=True,
trust_remote_code=True,
device_map="auto",
)
return model, tokenizer

View File

@ -1,72 +0,0 @@
import argparse
import os
import sys
import tempfile
import requests
from flask import Flask, render_template, request
from werkzeug.utils import secure_filename
sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
app = Flask(__name__)
app.secret_key = "LeafmanZSecretKey"
API_HOST = "http://localhost:5110/api"
# PAGES #
@app.route("/", methods=["GET", "POST"])
def home_page():
if request.method == "POST":
if "user_prompt" in request.form:
user_prompt = request.form["user_prompt"]
print(f"User Prompt: {user_prompt}")
main_prompt_url = f"{API_HOST}/prompt_route"
response = requests.post(main_prompt_url, data={"user_prompt": user_prompt})
print(response.status_code) # print HTTP response status code for debugging
if response.status_code == 200:
# print(response.json()) # Print the JSON data from the response
return render_template("home.html", show_response_modal=True, response_dict=response.json())
elif "documents" in request.files:
delete_source_url = f"{API_HOST}/delete_source" # URL of the /api/delete_source endpoint
if request.form.get("action") == "reset":
response = requests.get(delete_source_url)
save_document_url = f"{API_HOST}/save_document"
run_ingest_url = f"{API_HOST}/run_ingest" # URL of the /api/run_ingest endpoint
files = request.files.getlist("documents")
for file in files:
print(file.filename)
filename = secure_filename(file.filename)
with tempfile.SpooledTemporaryFile() as f:
f.write(file.read())
f.seek(0)
response = requests.post(save_document_url, files={"document": (filename, f)})
print(response.status_code) # print HTTP response status code for debugging
# Make a GET request to the /api/run_ingest endpoint
response = requests.get(run_ingest_url)
print(response.status_code) # print HTTP response status code for debugging
# Display the form for GET request
return render_template(
"home.html",
show_response_modal=False,
response_dict={"Prompt": "None", "Answer": "None", "Sources": [("ewf", "wef")]},
)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--port", type=int, default=5111, help="Port to run the UI on. Defaults to 5111.")
parser.add_argument(
"--host",
type=str,
default="127.0.0.1",
help="Host to run the UI on. Defaults to 127.0.0.1. "
"Set to 0.0.0.0 to make the UI externally "
"accessible from other devices.",
)
args = parser.parse_args()
app.run(debug=False, host=args.host, port=args.port)

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,498 +0,0 @@
/*!
* Bootstrap Reboot v5.1.3 (https://getbootstrap.com/)
* Copyright 2011-2021 The Bootstrap Authors
* Copyright 2011-2021 Twitter, Inc.
* Licensed under MIT (https://github.com/twbs/bootstrap/blob/main/LICENSE)
* Forked from Normalize.css, licensed MIT (https://github.com/necolas/normalize.css/blob/master/LICENSE.md)
*/
:root {
--bs-blue: #0d6efd;
--bs-indigo: #6610f2;
--bs-purple: #6f42c1;
--bs-pink: #d63384;
--bs-red: #dc3545;
--bs-orange: #fd7e14;
--bs-yellow: #ffc107;
--bs-green: #198754;
--bs-teal: #20c997;
--bs-cyan: #0dcaf0;
--bs-white: #fff;
--bs-gray: #6c757d;
--bs-gray-dark: #343a40;
--bs-gray-100: #f8f9fa;
--bs-gray-200: #e9ecef;
--bs-gray-300: #dee2e6;
--bs-gray-400: #ced4da;
--bs-gray-500: #adb5bd;
--bs-gray-600: #6c757d;
--bs-gray-700: #495057;
--bs-gray-800: #343a40;
--bs-gray-900: #212529;
--bs-primary: #0d6efd;
--bs-secondary: #6c757d;
--bs-success: #198754;
--bs-info: #0dcaf0;
--bs-warning: #ffc107;
--bs-danger: #dc3545;
--bs-light: #f8f9fa;
--bs-dark: #212529;
--bs-primary-rgb: 13, 110, 253;
--bs-secondary-rgb: 108, 117, 125;
--bs-success-rgb: 25, 135, 84;
--bs-info-rgb: 13, 202, 240;
--bs-warning-rgb: 255, 193, 7;
--bs-danger-rgb: 220, 53, 69;
--bs-light-rgb: 248, 249, 250;
--bs-dark-rgb: 33, 37, 41;
--bs-white-rgb: 255, 255, 255;
--bs-black-rgb: 0, 0, 0;
--bs-body-color-rgb: 33, 37, 41;
--bs-body-bg-rgb: 255, 255, 255;
--bs-font-sans-serif: system-ui, -apple-system, "Segoe UI", Roboto,
"Helvetica Neue", Arial, "Noto Sans", "Liberation Sans", sans-serif,
"Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
--bs-font-monospace: SFMono-Regular, Menlo, Monaco, Consolas,
"Liberation Mono", "Courier New", monospace;
--bs-gradient: linear-gradient(
180deg,
rgba(255, 255, 255, 0.15),
rgba(255, 255, 255, 0)
);
--bs-body-font-family: var(--bs-font-sans-serif);
--bs-body-font-size: 1rem;
--bs-body-font-weight: 400;
--bs-body-line-height: 1.5;
--bs-body-color: #212529;
--bs-body-bg: #fff;
}
*,
*::before,
*::after {
box-sizing: border-box;
}
@media (prefers-reduced-motion: no-preference) {
:root {
scroll-behavior: smooth;
}
}
body {
margin: 0;
font-family: var(--bs-body-font-family);
font-size: var(--bs-body-font-size);
font-weight: var(--bs-body-font-weight);
line-height: var(--bs-body-line-height);
color: var(--bs-body-color);
text-align: var(--bs-body-text-align);
background-color: var(--bs-body-bg);
-webkit-text-size-adjust: 100%;
-webkit-tap-highlight-color: rgba(0, 0, 0, 0);
}
hr {
margin: 1rem 0;
color: inherit;
background-color: currentColor;
border: 0;
opacity: 0.25;
}
hr:not([size]) {
height: 1px;
}
h6,
h5,
h4,
h3,
h2,
h1 {
margin-top: 0;
margin-bottom: 0.5rem;
font-weight: 500;
line-height: 1.2;
}
h1 {
font-size: calc(1.375rem + 1.5vw);
}
@media (min-width: 1200px) {
h1 {
font-size: 2.5rem;
}
}
h2 {
font-size: calc(1.325rem + 0.9vw);
}
@media (min-width: 1200px) {
h2 {
font-size: 2rem;
}
}
h3 {
font-size: calc(1.3rem + 0.6vw);
}
@media (min-width: 1200px) {
h3 {
font-size: 1.75rem;
}
}
h4 {
font-size: calc(1.275rem + 0.3vw);
}
@media (min-width: 1200px) {
h4 {
font-size: 1.5rem;
}
}
h5 {
font-size: 1.25rem;
}
h6 {
font-size: 1rem;
}
p {
margin-top: 0;
margin-bottom: 1rem;
}
abbr[title],
abbr[data-bs-original-title] {
-webkit-text-decoration: underline dotted;
text-decoration: underline dotted;
cursor: help;
-webkit-text-decoration-skip-ink: none;
text-decoration-skip-ink: none;
}
address {
margin-bottom: 1rem;
font-style: normal;
line-height: inherit;
}
ol,
ul {
padding-left: 2rem;
}
ol,
ul,
dl {
margin-top: 0;
margin-bottom: 1rem;
}
ol ol,
ul ul,
ol ul,
ul ol {
margin-bottom: 0;
}
dt {
font-weight: 700;
}
dd {
margin-bottom: 0.5rem;
margin-left: 0;
}
blockquote {
margin: 0 0 1rem;
}
b,
strong {
font-weight: bolder;
}
small {
font-size: 0.875em;
}
mark {
padding: 0.2em;
background-color: #fcf8e3;
}
sub,
sup {
position: relative;
font-size: 0.75em;
line-height: 0;
vertical-align: baseline;
}
sub {
bottom: -0.25em;
}
sup {
top: -0.5em;
}
a {
color: #0d6efd;
text-decoration: underline;
}
a:hover {
color: #0a58ca;
}
a:not([href]):not([class]),
a:not([href]):not([class]):hover {
color: inherit;
text-decoration: none;
}
pre,
code,
kbd,
samp {
font-family: var(--bs-font-monospace);
font-size: 1em;
direction: ltr /* rtl:ignore */;
unicode-bidi: bidi-override;
}
pre {
display: block;
margin-top: 0;
margin-bottom: 1rem;
overflow: auto;
font-size: 0.875em;
}
pre code {
font-size: inherit;
color: inherit;
word-break: normal;
}
code {
font-size: 0.875em;
color: #d63384;
word-wrap: break-word;
}
a > code {
color: inherit;
}
kbd {
padding: 0.2rem 0.4rem;
font-size: 0.875em;
color: #fff;
background-color: #212529;
border-radius: 0.2rem;
}
kbd kbd {
padding: 0;
font-size: 1em;
font-weight: 700;
}
figure {
margin: 0 0 1rem;
}
img,
svg {
vertical-align: middle;
}
table {
caption-side: bottom;
border-collapse: collapse;
}
caption {
padding-top: 0.5rem;
padding-bottom: 0.5rem;
color: #6c757d;
text-align: left;
}
th {
text-align: inherit;
text-align: -webkit-match-parent;
}
thead,
tbody,
tfoot,
tr,
td,
th {
border-color: inherit;
border-style: solid;
border-width: 0;
}
label {
display: inline-block;
}
button {
border-radius: 0;
}
button:focus:not(:focus-visible) {
outline: 0;
}
input,
button,
select,
optgroup,
textarea {
margin: 0;
font-family: inherit;
font-size: inherit;
line-height: inherit;
}
button,
select {
text-transform: none;
}
[role="button"] {
cursor: pointer;
}
select {
word-wrap: normal;
}
select:disabled {
opacity: 1;
}
[list]::-webkit-calendar-picker-indicator {
display: none;
}
button,
[type="button"],
[type="reset"],
[type="submit"] {
-webkit-appearance: button;
}
button:not(:disabled),
[type="button"]:not(:disabled),
[type="reset"]:not(:disabled),
[type="submit"]:not(:disabled) {
cursor: pointer;
}
::-moz-focus-inner {
padding: 0;
border-style: none;
}
textarea {
resize: vertical;
}
fieldset {
min-width: 0;
padding: 0;
margin: 0;
border: 0;
}
legend {
float: left;
width: 100%;
padding: 0;
margin-bottom: 0.5rem;
font-size: calc(1.275rem + 0.3vw);
line-height: inherit;
}
@media (min-width: 1200px) {
legend {
font-size: 1.5rem;
}
}
legend + * {
clear: left;
}
::-webkit-datetime-edit-fields-wrapper,
::-webkit-datetime-edit-text,
::-webkit-datetime-edit-minute,
::-webkit-datetime-edit-hour-field,
::-webkit-datetime-edit-day-field,
::-webkit-datetime-edit-month-field,
::-webkit-datetime-edit-year-field {
padding: 0;
}
::-webkit-inner-spin-button {
height: auto;
}
[type="search"] {
outline-offset: -2px;
-webkit-appearance: textfield;
}
/* rtl:raw:
[type="tel"],
[type="url"],
[type="email"],
[type="number"] {
direction: ltr;
}
*/
::-webkit-search-decoration {
-webkit-appearance: none;
}
::-webkit-color-swatch-wrapper {
padding: 0;
}
::-webkit-file-upload-button {
font: inherit;
}
::file-selector-button {
font: inherit;
}
::-webkit-file-upload-button {
font: inherit;
-webkit-appearance: button;
}
output {
display: inline-block;
}
iframe {
border: 0;
}
summary {
display: list-item;
cursor: pointer;
}
progress {
vertical-align: baseline;
}
[hidden] {
display: none !important;
}
/*# sourceMappingURL=bootstrap-reboot.css.map */

File diff suppressed because one or more lines are too long

View File

@ -1,424 +0,0 @@
/*!
* Bootstrap Reboot v5.1.3 (https://getbootstrap.com/)
* Copyright 2011-2021 The Bootstrap Authors
* Copyright 2011-2021 Twitter, Inc.
* Licensed under MIT (https://github.com/twbs/bootstrap/blob/main/LICENSE)
* Forked from Normalize.css, licensed MIT (https://github.com/necolas/normalize.css/blob/master/LICENSE.md)
*/
:root {
--bs-blue: #0d6efd;
--bs-indigo: #6610f2;
--bs-purple: #6f42c1;
--bs-pink: #d63384;
--bs-red: #dc3545;
--bs-orange: #fd7e14;
--bs-yellow: #ffc107;
--bs-green: #198754;
--bs-teal: #20c997;
--bs-cyan: #0dcaf0;
--bs-white: #fff;
--bs-gray: #6c757d;
--bs-gray-dark: #343a40;
--bs-gray-100: #f8f9fa;
--bs-gray-200: #e9ecef;
--bs-gray-300: #dee2e6;
--bs-gray-400: #ced4da;
--bs-gray-500: #adb5bd;
--bs-gray-600: #6c757d;
--bs-gray-700: #495057;
--bs-gray-800: #343a40;
--bs-gray-900: #212529;
--bs-primary: #0d6efd;
--bs-secondary: #6c757d;
--bs-success: #198754;
--bs-info: #0dcaf0;
--bs-warning: #ffc107;
--bs-danger: #dc3545;
--bs-light: #f8f9fa;
--bs-dark: #212529;
--bs-primary-rgb: 13, 110, 253;
--bs-secondary-rgb: 108, 117, 125;
--bs-success-rgb: 25, 135, 84;
--bs-info-rgb: 13, 202, 240;
--bs-warning-rgb: 255, 193, 7;
--bs-danger-rgb: 220, 53, 69;
--bs-light-rgb: 248, 249, 250;
--bs-dark-rgb: 33, 37, 41;
--bs-white-rgb: 255, 255, 255;
--bs-black-rgb: 0, 0, 0;
--bs-body-color-rgb: 33, 37, 41;
--bs-body-bg-rgb: 255, 255, 255;
--bs-font-sans-serif: system-ui, -apple-system, "Segoe UI", Roboto,
"Helvetica Neue", Arial, "Noto Sans", "Liberation Sans", sans-serif,
"Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
--bs-font-monospace: SFMono-Regular, Menlo, Monaco, Consolas,
"Liberation Mono", "Courier New", monospace;
--bs-gradient: linear-gradient(
180deg,
rgba(255, 255, 255, 0.15),
rgba(255, 255, 255, 0)
);
--bs-body-font-family: var(--bs-font-sans-serif);
--bs-body-font-size: 1rem;
--bs-body-font-weight: 400;
--bs-body-line-height: 1.5;
--bs-body-color: #212529;
--bs-body-bg: #fff;
}
*,
::after,
::before {
box-sizing: border-box;
}
@media (prefers-reduced-motion: no-preference) {
:root {
scroll-behavior: smooth;
}
}
body {
margin: 0;
font-family: var(--bs-body-font-family);
font-size: var(--bs-body-font-size);
font-weight: var(--bs-body-font-weight);
line-height: var(--bs-body-line-height);
color: var(--bs-body-color);
text-align: var(--bs-body-text-align);
background-color: var(--bs-body-bg);
-webkit-text-size-adjust: 100%;
-webkit-tap-highlight-color: transparent;
}
hr {
margin: 1rem 0;
color: inherit;
background-color: currentColor;
border: 0;
opacity: 0.25;
}
hr:not([size]) {
height: 1px;
}
h1,
h2,
h3,
h4,
h5,
h6 {
margin-top: 0;
margin-bottom: 0.5rem;
font-weight: 500;
line-height: 1.2;
}
h1 {
font-size: calc(1.375rem + 1.5vw);
}
@media (min-width: 1200px) {
h1 {
font-size: 2.5rem;
}
}
h2 {
font-size: calc(1.325rem + 0.9vw);
}
@media (min-width: 1200px) {
h2 {
font-size: 2rem;
}
}
h3 {
font-size: calc(1.3rem + 0.6vw);
}
@media (min-width: 1200px) {
h3 {
font-size: 1.75rem;
}
}
h4 {
font-size: calc(1.275rem + 0.3vw);
}
@media (min-width: 1200px) {
h4 {
font-size: 1.5rem;
}
}
h5 {
font-size: 1.25rem;
}
h6 {
font-size: 1rem;
}
p {
margin-top: 0;
margin-bottom: 1rem;
}
abbr[data-bs-original-title],
abbr[title] {
-webkit-text-decoration: underline dotted;
text-decoration: underline dotted;
cursor: help;
-webkit-text-decoration-skip-ink: none;
text-decoration-skip-ink: none;
}
address {
margin-bottom: 1rem;
font-style: normal;
line-height: inherit;
}
ol,
ul {
padding-left: 2rem;
}
dl,
ol,
ul {
margin-top: 0;
margin-bottom: 1rem;
}
ol ol,
ol ul,
ul ol,
ul ul {
margin-bottom: 0;
}
dt {
font-weight: 700;
}
dd {
margin-bottom: 0.5rem;
margin-left: 0;
}
blockquote {
margin: 0 0 1rem;
}
b,
strong {
font-weight: bolder;
}
small {
font-size: 0.875em;
}
mark {
padding: 0.2em;
background-color: #fcf8e3;
}
sub,
sup {
position: relative;
font-size: 0.75em;
line-height: 0;
vertical-align: baseline;
}
sub {
bottom: -0.25em;
}
sup {
top: -0.5em;
}
a {
color: #0d6efd;
text-decoration: underline;
}
a:hover {
color: #0a58ca;
}
a:not([href]):not([class]),
a:not([href]):not([class]):hover {
color: inherit;
text-decoration: none;
}
code,
kbd,
pre,
samp {
font-family: var(--bs-font-monospace);
font-size: 1em;
direction: ltr;
unicode-bidi: bidi-override;
}
pre {
display: block;
margin-top: 0;
margin-bottom: 1rem;
overflow: auto;
font-size: 0.875em;
}
pre code {
font-size: inherit;
color: inherit;
word-break: normal;
}
code {
font-size: 0.875em;
color: #d63384;
word-wrap: break-word;
}
a > code {
color: inherit;
}
kbd {
padding: 0.2rem 0.4rem;
font-size: 0.875em;
color: #fff;
background-color: #212529;
border-radius: 0.2rem;
}
kbd kbd {
padding: 0;
font-size: 1em;
font-weight: 700;
}
figure {
margin: 0 0 1rem;
}
img,
svg {
vertical-align: middle;
}
table {
caption-side: bottom;
border-collapse: collapse;
}
caption {
padding-top: 0.5rem;
padding-bottom: 0.5rem;
color: #6c757d;
text-align: left;
}
th {
text-align: inherit;
text-align: -webkit-match-parent;
}
tbody,
td,
tfoot,
th,
thead,
tr {
border-color: inherit;
border-style: solid;
border-width: 0;
}
label {
display: inline-block;
}
button {
border-radius: 0;
}
button:focus:not(:focus-visible) {
outline: 0;
}
button,
input,
optgroup,
select,
textarea {
margin: 0;
font-family: inherit;
font-size: inherit;
line-height: inherit;
}
button,
select {
text-transform: none;
}
[role="button"] {
cursor: pointer;
}
select {
word-wrap: normal;
}
select:disabled {
opacity: 1;
}
[list]::-webkit-calendar-picker-indicator {
display: none;
}
[type="button"],
[type="reset"],
[type="submit"],
button {
-webkit-appearance: button;
}
[type="button"]:not(:disabled),
[type="reset"]:not(:disabled),
[type="submit"]:not(:disabled),
button:not(:disabled) {
cursor: pointer;
}
::-moz-focus-inner {
padding: 0;
border-style: none;
}
textarea {
resize: vertical;
}
fieldset {
min-width: 0;
padding: 0;
margin: 0;
border: 0;
}
legend {
float: left;
width: 100%;
padding: 0;
margin-bottom: 0.5rem;
font-size: calc(1.275rem + 0.3vw);
line-height: inherit;
}
@media (min-width: 1200px) {
legend {
font-size: 1.5rem;
}
}
legend + * {
clear: left;
}
::-webkit-datetime-edit-day-field,
::-webkit-datetime-edit-fields-wrapper,
::-webkit-datetime-edit-hour-field,
::-webkit-datetime-edit-minute,
::-webkit-datetime-edit-month-field,
::-webkit-datetime-edit-text,
::-webkit-datetime-edit-year-field {
padding: 0;
}
::-webkit-inner-spin-button {
height: auto;
}
[type="search"] {
outline-offset: -2px;
-webkit-appearance: textfield;
}
::-webkit-search-decoration {
-webkit-appearance: none;
}
::-webkit-color-swatch-wrapper {
padding: 0;
}
::-webkit-file-upload-button {
font: inherit;
}
::file-selector-button {
font: inherit;
}
::-webkit-file-upload-button {
font: inherit;
-webkit-appearance: button;
}
output {
display: inline-block;
}
iframe {
border: 0;
}
summary {
display: list-item;
cursor: pointer;
}
progress {
vertical-align: baseline;
}
[hidden] {
display: none !important;
}
/*# sourceMappingURL=bootstrap-reboot.min.css.map */

File diff suppressed because one or more lines are too long

View File

@ -1,495 +0,0 @@
/*!
* Bootstrap Reboot v5.1.3 (https://getbootstrap.com/)
* Copyright 2011-2021 The Bootstrap Authors
* Copyright 2011-2021 Twitter, Inc.
* Licensed under MIT (https://github.com/twbs/bootstrap/blob/main/LICENSE)
* Forked from Normalize.css, licensed MIT (https://github.com/necolas/normalize.css/blob/master/LICENSE.md)
*/
:root {
--bs-blue: #0d6efd;
--bs-indigo: #6610f2;
--bs-purple: #6f42c1;
--bs-pink: #d63384;
--bs-red: #dc3545;
--bs-orange: #fd7e14;
--bs-yellow: #ffc107;
--bs-green: #198754;
--bs-teal: #20c997;
--bs-cyan: #0dcaf0;
--bs-white: #fff;
--bs-gray: #6c757d;
--bs-gray-dark: #343a40;
--bs-gray-100: #f8f9fa;
--bs-gray-200: #e9ecef;
--bs-gray-300: #dee2e6;
--bs-gray-400: #ced4da;
--bs-gray-500: #adb5bd;
--bs-gray-600: #6c757d;
--bs-gray-700: #495057;
--bs-gray-800: #343a40;
--bs-gray-900: #212529;
--bs-primary: #0d6efd;
--bs-secondary: #6c757d;
--bs-success: #198754;
--bs-info: #0dcaf0;
--bs-warning: #ffc107;
--bs-danger: #dc3545;
--bs-light: #f8f9fa;
--bs-dark: #212529;
--bs-primary-rgb: 13, 110, 253;
--bs-secondary-rgb: 108, 117, 125;
--bs-success-rgb: 25, 135, 84;
--bs-info-rgb: 13, 202, 240;
--bs-warning-rgb: 255, 193, 7;
--bs-danger-rgb: 220, 53, 69;
--bs-light-rgb: 248, 249, 250;
--bs-dark-rgb: 33, 37, 41;
--bs-white-rgb: 255, 255, 255;
--bs-black-rgb: 0, 0, 0;
--bs-body-color-rgb: 33, 37, 41;
--bs-body-bg-rgb: 255, 255, 255;
--bs-font-sans-serif: system-ui, -apple-system, "Segoe UI", Roboto,
"Helvetica Neue", Arial, "Noto Sans", "Liberation Sans", sans-serif,
"Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
--bs-font-monospace: SFMono-Regular, Menlo, Monaco, Consolas,
"Liberation Mono", "Courier New", monospace;
--bs-gradient: linear-gradient(
180deg,
rgba(255, 255, 255, 0.15),
rgba(255, 255, 255, 0)
);
--bs-body-font-family: var(--bs-font-sans-serif);
--bs-body-font-size: 1rem;
--bs-body-font-weight: 400;
--bs-body-line-height: 1.5;
--bs-body-color: #212529;
--bs-body-bg: #fff;
}
*,
*::before,
*::after {
box-sizing: border-box;
}
@media (prefers-reduced-motion: no-preference) {
:root {
scroll-behavior: smooth;
}
}
body {
margin: 0;
font-family: var(--bs-body-font-family);
font-size: var(--bs-body-font-size);
font-weight: var(--bs-body-font-weight);
line-height: var(--bs-body-line-height);
color: var(--bs-body-color);
text-align: var(--bs-body-text-align);
background-color: var(--bs-body-bg);
-webkit-text-size-adjust: 100%;
-webkit-tap-highlight-color: rgba(0, 0, 0, 0);
}
hr {
margin: 1rem 0;
color: inherit;
background-color: currentColor;
border: 0;
opacity: 0.25;
}
hr:not([size]) {
height: 1px;
}
h6,
h5,
h4,
h3,
h2,
h1 {
margin-top: 0;
margin-bottom: 0.5rem;
font-weight: 500;
line-height: 1.2;
}
h1 {
font-size: calc(1.375rem + 1.5vw);
}
@media (min-width: 1200px) {
h1 {
font-size: 2.5rem;
}
}
h2 {
font-size: calc(1.325rem + 0.9vw);
}
@media (min-width: 1200px) {
h2 {
font-size: 2rem;
}
}
h3 {
font-size: calc(1.3rem + 0.6vw);
}
@media (min-width: 1200px) {
h3 {
font-size: 1.75rem;
}
}
h4 {
font-size: calc(1.275rem + 0.3vw);
}
@media (min-width: 1200px) {
h4 {
font-size: 1.5rem;
}
}
h5 {
font-size: 1.25rem;
}
h6 {
font-size: 1rem;
}
p {
margin-top: 0;
margin-bottom: 1rem;
}
abbr[title],
abbr[data-bs-original-title] {
-webkit-text-decoration: underline dotted;
text-decoration: underline dotted;
cursor: help;
-webkit-text-decoration-skip-ink: none;
text-decoration-skip-ink: none;
}
address {
margin-bottom: 1rem;
font-style: normal;
line-height: inherit;
}
ol,
ul {
padding-right: 2rem;
}
ol,
ul,
dl {
margin-top: 0;
margin-bottom: 1rem;
}
ol ol,
ul ul,
ol ul,
ul ol {
margin-bottom: 0;
}
dt {
font-weight: 700;
}
dd {
margin-bottom: 0.5rem;
margin-right: 0;
}
blockquote {
margin: 0 0 1rem;
}
b,
strong {
font-weight: bolder;
}
small {
font-size: 0.875em;
}
mark {
padding: 0.2em;
background-color: #fcf8e3;
}
sub,
sup {
position: relative;
font-size: 0.75em;
line-height: 0;
vertical-align: baseline;
}
sub {
bottom: -0.25em;
}
sup {
top: -0.5em;
}
a {
color: #0d6efd;
text-decoration: underline;
}
a:hover {
color: #0a58ca;
}
a:not([href]):not([class]),
a:not([href]):not([class]):hover {
color: inherit;
text-decoration: none;
}
pre,
code,
kbd,
samp {
font-family: var(--bs-font-monospace);
font-size: 1em;
direction: ltr;
unicode-bidi: bidi-override;
}
pre {
display: block;
margin-top: 0;
margin-bottom: 1rem;
overflow: auto;
font-size: 0.875em;
}
pre code {
font-size: inherit;
color: inherit;
word-break: normal;
}
code {
font-size: 0.875em;
color: #d63384;
word-wrap: break-word;
}
a > code {
color: inherit;
}
kbd {
padding: 0.2rem 0.4rem;
font-size: 0.875em;
color: #fff;
background-color: #212529;
border-radius: 0.2rem;
}
kbd kbd {
padding: 0;
font-size: 1em;
font-weight: 700;
}
figure {
margin: 0 0 1rem;
}
img,
svg {
vertical-align: middle;
}
table {
caption-side: bottom;
border-collapse: collapse;
}
caption {
padding-top: 0.5rem;
padding-bottom: 0.5rem;
color: #6c757d;
text-align: right;
}
th {
text-align: inherit;
text-align: -webkit-match-parent;
}
thead,
tbody,
tfoot,
tr,
td,
th {
border-color: inherit;
border-style: solid;
border-width: 0;
}
label {
display: inline-block;
}
button {
border-radius: 0;
}
button:focus:not(:focus-visible) {
outline: 0;
}
input,
button,
select,
optgroup,
textarea {
margin: 0;
font-family: inherit;
font-size: inherit;
line-height: inherit;
}
button,
select {
text-transform: none;
}
[role="button"] {
cursor: pointer;
}
select {
word-wrap: normal;
}
select:disabled {
opacity: 1;
}
[list]::-webkit-calendar-picker-indicator {
display: none;
}
button,
[type="button"],
[type="reset"],
[type="submit"] {
-webkit-appearance: button;
}
button:not(:disabled),
[type="button"]:not(:disabled),
[type="reset"]:not(:disabled),
[type="submit"]:not(:disabled) {
cursor: pointer;
}
::-moz-focus-inner {
padding: 0;
border-style: none;
}
textarea {
resize: vertical;
}
fieldset {
min-width: 0;
padding: 0;
margin: 0;
border: 0;
}
legend {
float: right;
width: 100%;
padding: 0;
margin-bottom: 0.5rem;
font-size: calc(1.275rem + 0.3vw);
line-height: inherit;
}
@media (min-width: 1200px) {
legend {
font-size: 1.5rem;
}
}
legend + * {
clear: right;
}
::-webkit-datetime-edit-fields-wrapper,
::-webkit-datetime-edit-text,
::-webkit-datetime-edit-minute,
::-webkit-datetime-edit-hour-field,
::-webkit-datetime-edit-day-field,
::-webkit-datetime-edit-month-field,
::-webkit-datetime-edit-year-field {
padding: 0;
}
::-webkit-inner-spin-button {
height: auto;
}
[type="search"] {
outline-offset: -2px;
-webkit-appearance: textfield;
}
[type="tel"],
[type="url"],
[type="email"],
[type="number"] {
direction: ltr;
}
::-webkit-search-decoration {
-webkit-appearance: none;
}
::-webkit-color-swatch-wrapper {
padding: 0;
}
::-webkit-file-upload-button {
font: inherit;
}
::file-selector-button {
font: inherit;
}
::-webkit-file-upload-button {
font: inherit;
-webkit-appearance: button;
}
output {
display: inline-block;
}
iframe {
border: 0;
}
summary {
display: list-item;
cursor: pointer;
}
progress {
vertical-align: baseline;
}
[hidden] {
display: none !important;
}
/*# sourceMappingURL=bootstrap-reboot.rtl.css.map */

File diff suppressed because one or more lines are too long

View File

@ -1,430 +0,0 @@
/*!
* Bootstrap Reboot v5.1.3 (https://getbootstrap.com/)
* Copyright 2011-2021 The Bootstrap Authors
* Copyright 2011-2021 Twitter, Inc.
* Licensed under MIT (https://github.com/twbs/bootstrap/blob/main/LICENSE)
* Forked from Normalize.css, licensed MIT (https://github.com/necolas/normalize.css/blob/master/LICENSE.md)
*/
:root {
--bs-blue: #0d6efd;
--bs-indigo: #6610f2;
--bs-purple: #6f42c1;
--bs-pink: #d63384;
--bs-red: #dc3545;
--bs-orange: #fd7e14;
--bs-yellow: #ffc107;
--bs-green: #198754;
--bs-teal: #20c997;
--bs-cyan: #0dcaf0;
--bs-white: #fff;
--bs-gray: #6c757d;
--bs-gray-dark: #343a40;
--bs-gray-100: #f8f9fa;
--bs-gray-200: #e9ecef;
--bs-gray-300: #dee2e6;
--bs-gray-400: #ced4da;
--bs-gray-500: #adb5bd;
--bs-gray-600: #6c757d;
--bs-gray-700: #495057;
--bs-gray-800: #343a40;
--bs-gray-900: #212529;
--bs-primary: #0d6efd;
--bs-secondary: #6c757d;
--bs-success: #198754;
--bs-info: #0dcaf0;
--bs-warning: #ffc107;
--bs-danger: #dc3545;
--bs-light: #f8f9fa;
--bs-dark: #212529;
--bs-primary-rgb: 13, 110, 253;
--bs-secondary-rgb: 108, 117, 125;
--bs-success-rgb: 25, 135, 84;
--bs-info-rgb: 13, 202, 240;
--bs-warning-rgb: 255, 193, 7;
--bs-danger-rgb: 220, 53, 69;
--bs-light-rgb: 248, 249, 250;
--bs-dark-rgb: 33, 37, 41;
--bs-white-rgb: 255, 255, 255;
--bs-black-rgb: 0, 0, 0;
--bs-body-color-rgb: 33, 37, 41;
--bs-body-bg-rgb: 255, 255, 255;
--bs-font-sans-serif: system-ui, -apple-system, "Segoe UI", Roboto,
"Helvetica Neue", Arial, "Noto Sans", "Liberation Sans", sans-serif,
"Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
--bs-font-monospace: SFMono-Regular, Menlo, Monaco, Consolas,
"Liberation Mono", "Courier New", monospace;
--bs-gradient: linear-gradient(
180deg,
rgba(255, 255, 255, 0.15),
rgba(255, 255, 255, 0)
);
--bs-body-font-family: var(--bs-font-sans-serif);
--bs-body-font-size: 1rem;
--bs-body-font-weight: 400;
--bs-body-line-height: 1.5;
--bs-body-color: #212529;
--bs-body-bg: #fff;
}
*,
::after,
::before {
box-sizing: border-box;
}
@media (prefers-reduced-motion: no-preference) {
:root {
scroll-behavior: smooth;
}
}
body {
margin: 0;
font-family: var(--bs-body-font-family);
font-size: var(--bs-body-font-size);
font-weight: var(--bs-body-font-weight);
line-height: var(--bs-body-line-height);
color: var(--bs-body-color);
text-align: var(--bs-body-text-align);
background-color: var(--bs-body-bg);
-webkit-text-size-adjust: 100%;
-webkit-tap-highlight-color: transparent;
}
hr {
margin: 1rem 0;
color: inherit;
background-color: currentColor;
border: 0;
opacity: 0.25;
}
hr:not([size]) {
height: 1px;
}
h1,
h2,
h3,
h4,
h5,
h6 {
margin-top: 0;
margin-bottom: 0.5rem;
font-weight: 500;
line-height: 1.2;
}
h1 {
font-size: calc(1.375rem + 1.5vw);
}
@media (min-width: 1200px) {
h1 {
font-size: 2.5rem;
}
}
h2 {
font-size: calc(1.325rem + 0.9vw);
}
@media (min-width: 1200px) {
h2 {
font-size: 2rem;
}
}
h3 {
font-size: calc(1.3rem + 0.6vw);
}
@media (min-width: 1200px) {
h3 {
font-size: 1.75rem;
}
}
h4 {
font-size: calc(1.275rem + 0.3vw);
}
@media (min-width: 1200px) {
h4 {
font-size: 1.5rem;
}
}
h5 {
font-size: 1.25rem;
}
h6 {
font-size: 1rem;
}
p {
margin-top: 0;
margin-bottom: 1rem;
}
abbr[data-bs-original-title],
abbr[title] {
-webkit-text-decoration: underline dotted;
text-decoration: underline dotted;
cursor: help;
-webkit-text-decoration-skip-ink: none;
text-decoration-skip-ink: none;
}
address {
margin-bottom: 1rem;
font-style: normal;
line-height: inherit;
}
ol,
ul {
padding-right: 2rem;
}
dl,
ol,
ul {
margin-top: 0;
margin-bottom: 1rem;
}
ol ol,
ol ul,
ul ol,
ul ul {
margin-bottom: 0;
}
dt {
font-weight: 700;
}
dd {
margin-bottom: 0.5rem;
margin-right: 0;
}
blockquote {
margin: 0 0 1rem;
}
b,
strong {
font-weight: bolder;
}
small {
font-size: 0.875em;
}
mark {
padding: 0.2em;
background-color: #fcf8e3;
}
sub,
sup {
position: relative;
font-size: 0.75em;
line-height: 0;
vertical-align: baseline;
}
sub {
bottom: -0.25em;
}
sup {
top: -0.5em;
}
a {
color: #0d6efd;
text-decoration: underline;
}
a:hover {
color: #0a58ca;
}
a:not([href]):not([class]),
a:not([href]):not([class]):hover {
color: inherit;
text-decoration: none;
}
code,
kbd,
pre,
samp {
font-family: var(--bs-font-monospace);
font-size: 1em;
direction: ltr;
unicode-bidi: bidi-override;
}
pre {
display: block;
margin-top: 0;
margin-bottom: 1rem;
overflow: auto;
font-size: 0.875em;
}
pre code {
font-size: inherit;
color: inherit;
word-break: normal;
}
code {
font-size: 0.875em;
color: #d63384;
word-wrap: break-word;
}
a > code {
color: inherit;
}
kbd {
padding: 0.2rem 0.4rem;
font-size: 0.875em;
color: #fff;
background-color: #212529;
border-radius: 0.2rem;
}
kbd kbd {
padding: 0;
font-size: 1em;
font-weight: 700;
}
figure {
margin: 0 0 1rem;
}
img,
svg {
vertical-align: middle;
}
table {
caption-side: bottom;
border-collapse: collapse;
}
caption {
padding-top: 0.5rem;
padding-bottom: 0.5rem;
color: #6c757d;
text-align: right;
}
th {
text-align: inherit;
text-align: -webkit-match-parent;
}
tbody,
td,
tfoot,
th,
thead,
tr {
border-color: inherit;
border-style: solid;
border-width: 0;
}
label {
display: inline-block;
}
button {
border-radius: 0;
}
button:focus:not(:focus-visible) {
outline: 0;
}
button,
input,
optgroup,
select,
textarea {
margin: 0;
font-family: inherit;
font-size: inherit;
line-height: inherit;
}
button,
select {
text-transform: none;
}
[role="button"] {
cursor: pointer;
}
select {
word-wrap: normal;
}
select:disabled {
opacity: 1;
}
[list]::-webkit-calendar-picker-indicator {
display: none;
}
[type="button"],
[type="reset"],
[type="submit"],
button {
-webkit-appearance: button;
}
[type="button"]:not(:disabled),
[type="reset"]:not(:disabled),
[type="submit"]:not(:disabled),
button:not(:disabled) {
cursor: pointer;
}
::-moz-focus-inner {
padding: 0;
border-style: none;
}
textarea {
resize: vertical;
}
fieldset {
min-width: 0;
padding: 0;
margin: 0;
border: 0;
}
legend {
float: right;
width: 100%;
padding: 0;
margin-bottom: 0.5rem;
font-size: calc(1.275rem + 0.3vw);
line-height: inherit;
}
@media (min-width: 1200px) {
legend {
font-size: 1.5rem;
}
}
legend + * {
clear: right;
}
::-webkit-datetime-edit-day-field,
::-webkit-datetime-edit-fields-wrapper,
::-webkit-datetime-edit-hour-field,
::-webkit-datetime-edit-minute,
::-webkit-datetime-edit-month-field,
::-webkit-datetime-edit-text,
::-webkit-datetime-edit-year-field {
padding: 0;
}
::-webkit-inner-spin-button {
height: auto;
}
[type="search"] {
outline-offset: -2px;
-webkit-appearance: textfield;
}
[type="email"],
[type="number"],
[type="tel"],
[type="url"] {
direction: ltr;
}
::-webkit-search-decoration {
-webkit-appearance: none;
}
::-webkit-color-swatch-wrapper {
padding: 0;
}
::-webkit-file-upload-button {
font: inherit;
}
::file-selector-button {
font: inherit;
}
::-webkit-file-upload-button {
font: inherit;
-webkit-appearance: button;
}
output {
display: inline-block;
}
iframe {
border: 0;
}
summary {
display: list-item;
cursor: pointer;
}
progress {
vertical-align: baseline;
}
[hidden] {
display: none !important;
}
/*# sourceMappingURL=bootstrap-reboot.rtl.min.css.map */

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More