Integrate multimodal RAG codebase

- Replaced existing localGPT codebase with multimodal RAG implementation - Includes full-stack application with backend, frontend, and RAG system - Added Docker support and comprehensive documentation - Enhanced with multimodal capabilities for document processing - Preserved git history for localGPT while integrating new functionality
2025-12-06 00:20:19 +01:00 · 2025-07-11 00:17:15 -07:00 · 2025-07-11 00:17:15 -07:00 · 2421514f3e
commit 2421514f3e
parent 4e0d9e75e9
211 changed files with 32131 additions and 123680 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -1,4 +0,0 @@
-*
-!*.py
-!requirements.txt
-!SOURCE_DOCUMENTS
--- a/.editorconfig
+++ b/.editorconfig
@ -1,17 +0,0 @@
-# http://editorconfig.org
-
-root = true
-
-[*]
-charset = utf-8
-end_of_line = lf
-insert_final_newline = true
-trim_trailing_whitespace = true
-
-[*.{py,rst,ini}]
-indent_style = space
-indent_size = 4
-
-[*.{html,css,scss,json,yml,xml}]
-indent_style = space
-indent_size = 2
--- a/.flake8
+++ b/.flake8
@ -1,4 +0,0 @@
-[flake8]
-exclude = docs
-max-line-length = 119
-extend-ignore = E203
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@ -1,13 +0,0 @@
-# These are supported funding model platforms
-
-github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
-patreon: # Replace with a single Patreon username
-open_collective: # Replace with a single Open Collective username
-ko_fi: promptengineering # Replace with a single Ko-fi username
-tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
-community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
-liberapay: # Replace with a single Liberapay username
-issuehunt: # Replace with a single IssueHunt username
-otechie: # Replace with a single Otechie username
-lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
-custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@ -0,0 +1,63 @@
+---
+name: Bug report
+about: Create a report to help us improve LocalGPT
+title: '[BUG] '
+labels: 'bug'
+assignees: ''
+
+---
+
+## 🐛 Bug Description
+A clear and concise description of what the bug is.
+
+## 🔄 Steps to Reproduce
+1. Go to '...'
+2. Click on '...'
+3. Scroll down to '...'
+4. See error
+
+## ✅ Expected Behavior
+A clear and concise description of what you expected to happen.
+
+## ❌ Actual Behavior
+A clear and concise description of what actually happened.
+
+## 📸 Screenshots
+If applicable, add screenshots to help explain your problem.
+
+## 🖥️ Environment Information
+**Desktop/Server:**
+- OS: [e.g. macOS 13.4, Ubuntu 20.04, Windows 11]
+- Python Version: [e.g. 3.11.5]
+- Node.js Version: [e.g. 23.10.0]
+- Ollama Version: [e.g. 0.9.5]
+- Docker Version: [e.g. 24.0.6] (if using Docker)
+
+**Browser (if web interface issue):**
+- Browser: [e.g. Chrome, Safari, Firefox]
+- Version: [e.g. 118.0.0.0]
+
+## 📋 System Health Check
+Please run `python system_health_check.py` and paste the output:
+
+```
+[Paste system health check output here]
+```
+
+## 📝 Error Logs
+Please include relevant error messages or logs:
+
+```
+[Paste error logs here]
+```
+
+## 🔧 Configuration
+- Deployment method: [Docker / Direct Python]
+- Models used: [e.g. qwen3:0.6b, qwen3:8b]
+- Document types: [e.g. PDF, DOCX, TXT]
+
+## 📎 Additional Context
+Add any other context about the problem here.
+
+## 🤔 Possible Solution
+If you have ideas for fixing the issue, please share them here. 
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@ -0,0 +1,50 @@
+---
+name: Feature request
+about: Suggest an idea for LocalGPT
+title: '[FEATURE] '
+labels: 'enhancement'
+assignees: ''
+
+---
+
+## 🚀 Feature Request
+
+### 📝 Is your feature request related to a problem? Please describe.
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+### 💡 Describe the solution you'd like
+A clear and concise description of what you want to happen.
+
+### 🔄 Describe alternatives you've considered
+A clear and concise description of any alternative solutions or features you've considered.
+
+### 🎯 Use Case
+Describe the specific use case or scenario where this feature would be valuable:
+- Who would use this feature?
+- When would they use it?
+- How would it improve their workflow?
+
+### 📋 Acceptance Criteria
+What would need to be implemented for this feature to be considered complete?
+- [ ] Criterion 1
+- [ ] Criterion 2
+- [ ] Criterion 3
+
+### 🏗️ Implementation Ideas
+If you have ideas about how this could be implemented, please share:
+- Which components would be affected?
+- Any technical considerations?
+- Potential challenges?
+
+### 📊 Priority
+How important is this feature to you?
+- [ ] Critical - Blocking my use case
+- [ ] High - Would significantly improve my workflow
+- [ ] Medium - Nice to have
+- [ ] Low - Minor improvement
+
+### 📎 Additional Context
+Add any other context, screenshots, mockups, or examples about the feature request here.
+
+### 🔗 Related Issues
+Link any related issues or discussions: 
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
@ -0,0 +1,78 @@
+## 📝 Description
+
+Brief description of what this PR does.
+
+Fixes #(issue number) <!-- If applicable -->
+
+## 🎯 Type of Change
+
+- [ ] 🐛 Bug fix (non-breaking change which fixes an issue)
+- [ ] ✨ New feature (non-breaking change which adds functionality)
+- [ ] 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
+- [ ] 📚 Documentation update
+- [ ] 🧪 Test improvements
+- [ ] 🔧 Code refactoring
+- [ ] 🎨 UI/UX improvements
+
+## 🧪 Testing
+
+### Test Environment
+- [ ] Tested with Docker deployment
+- [ ] Tested with direct Python deployment
+- [ ] Tested on macOS
+- [ ] Tested on Linux
+- [ ] Tested on Windows
+
+### Test Cases
+- [ ] All existing tests pass
+- [ ] New tests added for new functionality
+- [ ] Manual testing completed
+- [ ] System health check passes
+
+```bash
+# Commands used for testing
+python system_health_check.py
+python run_system.py --health
+# Add any specific test commands here
+```
+
+## 📋 Checklist
+
+### Code Quality
+- [ ] Code follows the project's coding standards
+- [ ] Self-review of the code completed
+- [ ] Code is properly commented
+- [ ] Type hints added (Python)
+- [ ] No console.log statements left in production code
+
+### Documentation
+- [ ] Documentation updated (if applicable)
+- [ ] API documentation updated (if applicable)
+- [ ] README updated (if applicable)
+- [ ] CONTRIBUTING.md guidelines followed
+
+### Dependencies
+- [ ] No new dependencies added, or new dependencies are justified
+- [ ] requirements.txt updated (if applicable)
+- [ ] package.json updated (if applicable)
+
+## 🖥️ Screenshots (if applicable)
+
+Add screenshots to help reviewers understand the changes.
+
+## 📊 Performance Impact
+
+Describe any performance implications:
+- [ ] No performance impact
+- [ ] Performance improved
+- [ ] Performance may be affected (explain below)
+
+## 🔄 Migration Notes
+
+If this is a breaking change, describe what users need to do:
+- [ ] No migration needed
+- [ ] Migration steps documented below
+
+## 📎 Additional Notes
+
+Any additional information that reviewers should know. 
--- a/.github/workflows/github-actions.yml
+++ b/.github/workflows/github-actions.yml
@ -1,19 +0,0 @@
-on: [push]
-
-jobs:
-  precommit:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Check out repository code
-        uses: actions/checkout@v3
-      - name: Cache Pre-Commit
-        uses: actions/cache@v3
-        with:
-          path: ~/.cache/pre-commit
-          key: ${{ runner.os }}-pre-commit-${{ hashFiles('.pre-commit-config.yaml') }}
-          restore-keys: |
-            ${{ runner.os }}-pre-commit-pip
-      - name: Install pre-commit
-        run: pip install -q pre-commit
-      - name: Run pre-commit
-        run: pre-commit run --show-diff-on-failure --color=always --all-files
--- a/.gitignore
+++ b/.gitignore
@ -1,169 +1,78 @@
-# Ignore vscode
-/.vscode
-/DB
-/models
+# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.

-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
+# dependencies
+/node_modules
+/.pnp
+.pnp.*
+.yarn/*
+!.yarn/patches
+!.yarn/plugins
+!.yarn/releases
+!.yarn/versions

-# C extensions
-*.so
+# testing
+/coverage

-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
+# next.js
+/.next/
+/out/

-# PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
+# production
+/build

-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-#   in version control.
-#   https://pdm.fming.dev/#use-with-ide
-.pdm.toml
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-
-# PyCharm
-#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-#  and can be added to the global gitignore or merged into this file.  For a more nuclear
-#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-.idea/
-
-#MacOS
+# misc
 .DS_Store
-SOURCE_DOCUMENTS/.DS_Store
+*.pem
+
+# debug
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+.pnpm-debug.log*
+
+# env files (can opt-in for committing if needed)
+.env*
+
+# vercel
+.vercel
+
+# typescript
+*.tsbuildinfo
+next-env.d.ts
+
+# Python
+__pycache__/
+*.pyc
+
+# Local Data
+/index_store
+/shared_uploads
+chat_history.db
+*.pkl
+
+# Backend generated files
+backend/shared_uploads/
+
+# Vector DB artefacts
+lancedb/
+index_store/overviews/
+
+# Logs and runtime output
+logs/
+*.log
+
+# SQLite or other database files
+*.db
+#backend/*.db
+# backend/chat_history.db
+backend/chroma_db/
+backend/chroma_db/**
+
+# Document and user-uploaded files (PDFs, images, etc.)
+rag_system/documents/
+*.pdf
+
+# Ensure docker.env remains tracked
+!docker.env
+!backend/chat_data.db
+
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -1,49 +0,0 @@
-default_stages: [commit]
-
-repos:
-  - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v4.4.0
-    hooks:
-      - id: trailing-whitespace
-      - id: end-of-file-fixer
-      - id: check-json
-      - id: check-toml
-      - id: check-xml
-      - id: check-yaml
-      - id: debug-statements
-      - id: check-builtin-literals
-      - id: check-case-conflict
-      - id: detect-private-key
-
-  - repo: https://github.com/pre-commit/mirrors-prettier
-    rev: "v3.0.0-alpha.9-for-vscode"
-    hooks:
-      - id: prettier
-        args: ["--tab-width", "2"]
-
-  - repo: https://github.com/asottile/pyupgrade
-    rev: v3.4.0
-    hooks:
-      - id: pyupgrade
-        args: [--py311-plus]
-        exclude: hooks/
-
-  - repo: https://github.com/psf/black
-    rev: 23.3.0
-    hooks:
-      - id: black
-
-  - repo: https://github.com/PyCQA/isort
-    rev: 5.12.0
-    hooks:
-      - id: isort
-
-  - repo: https://github.com/PyCQA/flake8
-    rev: 6.0.0
-    hooks:
-      - id: flake8
-
-ci:
-  autoupdate_schedule: weekly
-  skip: []
-  submodules: false
--- a/.pyup.yml
+++ b/.pyup.yml
@ -1,17 +0,0 @@
-# configure updates globally
-# default: all
-# allowed: all, insecure, False
-update: all
-
-# configure dependency pinning globally
-# default: True
-# allowed: True, False
-pin: True
-
-# add a label to pull requests, default is not set
-# requires private repo permissions, even on public repos
-# default: empty
-label_prs: update
-
-requirements:
-  - "requirements.txt"
--- a/3.20.2
+++ b/3.20.2
@ -1 +0,0 @@
-Requirement already satisfied: protobuf in c:\users\kevin\anaconda3\lib\site-packages (4.24.4)
--- a/ACKNOWLEDGEMENT.md
+++ b/ACKNOWLEDGEMENT.md
@ -1,10 +0,0 @@
-# Acknowledgments
-
-Some code was taken or inspired from other projects:-
-
- [CookieCutter Django][cookiecutter-django]
-  - `pre-commit-config.yaml` is taken from there with almost no changes
-  - `github-actions.yml` is inspired by `gitlab-ci.yml`
-  - `.pyup.yml`, `.flake8`, `.editorconfig`, `pyproject.toml` are taken from there with minor changes,
-
-[cookiecutter-django]: https://github.com/cookiecutter/cookiecutter-django
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -1,47 +1,457 @@
-# How to Contribute
+# Contributing to LocalGPT

-Always happy to get issues identified and pull requests!
+Thank you for your interest in contributing to LocalGPT! This guide will help you get started with contributing to our private document intelligence platform.

-## General considerations
+## 🚀 Quick Start for Contributors

-1. Keep it small. The smaller the change, the more likely we are to accept.
-2. Changes that fix a current issue get priority for review.
-3. Check out [GitHub guide][submit-a-pr] if you've never created a pull request before.
+### Prerequisites
+- Python 3.8+ (we test with 3.11.5)
+- Node.js 16+ (we test with 23.10.0)
+- Git
+- Ollama (for local AI models)

-## Getting started
+### Development Setup

-1. Fork the repo
-2. Clone your fork
-3. Create a branch for your changes
+1. **Fork and Clone**
+   ```bash
+   # Fork the repository on GitHub, then clone your fork
+   git clone https://github.com/YOUR_USERNAME/multimodal_rag.git
+   cd multimodal_rag
+   
+   # Add upstream remote
+   git remote add upstream https://github.com/PromtEngineer/multimodal_rag.git
+   ```

-This last step is very important, don't start developing from master, it'll cause pain if you need to send another change later.
+2. **Set Up Development Environment**
+   ```bash
+   # Install Python dependencies
+   pip install -r requirements.txt
+   
+   # Install Node.js dependencies
+   npm install
+   
+   # Install Ollama and models
+   curl -fsSL https://ollama.ai/install.sh | sh
+   ollama pull qwen3:0.6b
+   ollama pull qwen3:8b
+   ```

-TIP: If you're working on a GitHub issue, name your branch after the issue number, e.g. `issue-123-<ISSUE-NAME>`. This will help us keep track of what you're working on. If there is not an issue for what you're working on, create one first please. Someone else might be working on the same thing, or we might have a reason for not wanting to do it.
+3. **Verify Setup**
+   ```bash
+   # Run health check
+   python system_health_check.py
+   
+   # Start development system
+   python run_system.py --mode dev
+   ```

-## Pre-commit
+## 📋 Development Workflow

-GitHub Actions is going to run Pre-commit hooks on your PR. If the hooks fail, you will need to fix them before your PR can be merged. It will save you a lot of time if you run the hooks locally before you push your changes. To do that, you need to install pre-commit on your local machine.
+### Branch Strategy

-```shell
+We use a feature branch workflow:
+
+- `main` - Production-ready code
+- `docker` - Docker deployment features and documentation
+- `feature/*` - New features
+- `fix/*` - Bug fixes
+- `docs/*` - Documentation updates
+
+### Making Changes
+
+1. **Create a Feature Branch**
+   ```bash
+   # Update your main branch
+   git checkout main
+   git pull upstream main
+   
+   # Create feature branch
+   git checkout -b feature/your-feature-name
+   ```
+
+2. **Make Your Changes**
+   - Follow our [coding standards](#coding-standards)
+   - Write tests for new functionality
+   - Update documentation as needed
+
+3. **Test Your Changes**
+   ```bash
+   # Run health checks
+   python system_health_check.py
+   
+   # Test specific components
+   python -m pytest tests/ -v
+   
+   # Test system integration
+   python run_system.py --health
+   ```
+
+4. **Commit Your Changes**
+   ```bash
+   git add .
+   git commit -m "feat: add new feature description"
+   ```
+
+5. **Push and Create PR**
+   ```bash
+   git push origin feature/your-feature-name
+   # Create pull request on GitHub
+   ```
+
+## 🎯 Types of Contributions
+
+### 🐛 Bug Fixes
+- Check existing issues first
+- Include reproduction steps
+- Add tests to prevent regression
+
+### ✨ New Features
+- Discuss in issues before implementing
+- Follow existing architecture patterns
+- Include comprehensive tests
+- Update documentation
+
+### 📚 Documentation
+- Fix typos and improve clarity
+- Add examples and use cases
+- Update API documentation
+- Improve setup guides
+
+### 🧪 Testing
+- Add unit tests
+- Improve integration tests
+- Add performance benchmarks
+- Test edge cases
+
+## 📝 Coding Standards
+
+### Python Code Style
+
+We follow PEP 8 with some modifications:
+
+```python
+# Use type hints
+def process_document(file_path: str, config: Dict[str, Any]) -> ProcessingResult:
+    """Process a document with the given configuration.
+    
+    Args:
+        file_path: Path to the document file
+        config: Processing configuration dictionary
+        
+    Returns:
+        ProcessingResult object with metadata and chunks
+    """
+    pass
+
+# Use descriptive variable names
+embedding_model_name = "Qwen/Qwen3-Embedding-0.6B"
+retrieval_results = retriever.search(query, top_k=20)
+
+# Use dataclasses for structured data
+@dataclass
+class IndexingConfig:
+    embedding_batch_size: int = 50
+    enable_late_chunking: bool = True
+    chunk_size: int = 512
+```
+
+### TypeScript/React Code Style
+
+```typescript
+// Use TypeScript interfaces
+interface ChatMessage {
+  id: string;
+  content: string;
+  role: 'user' | 'assistant';
+  timestamp: Date;
+  sources?: DocumentSource[];
+}
+
+// Use functional components with hooks
+const ChatInterface: React.FC<ChatProps> = ({ sessionId }) => {
+  const [messages, setMessages] = useState<ChatMessage[]>([]);
+  
+  const handleSendMessage = useCallback(async (content: string) => {
+    // Implementation
+  }, [sessionId]);
+  
+  return (
+    <div className="chat-interface">
+      {/* Component JSX */}
+    </div>
+  );
+};
+```
+
+### File Organization
+
+```
+rag_system/
+├── agent/           # ReAct agent implementation
+├── indexing/        # Document processing and indexing
+├── retrieval/       # Search and retrieval components
+├── pipelines/       # End-to-end processing pipelines
+├── rerankers/       # Result reranking implementations
+└── utils/           # Shared utilities
+
+src/
+├── components/      # React components
+├── lib/            # Utility functions and API clients
+└── app/            # Next.js app router pages
+```
+
+## 🧪 Testing Guidelines
+
+### Unit Tests
+```python
+# Test file: tests/test_embeddings.py
+import pytest
+from rag_system.indexing.embedders import HuggingFaceEmbedder
+
+def test_embedding_generation():
+    embedder = HuggingFaceEmbedder("sentence-transformers/all-MiniLM-L6-v2")
+    embeddings = embedder.create_embeddings(["test text"])
+    
+    assert embeddings.shape[0] == 1
+    assert embeddings.shape[1] == 384  # Model dimension
+    assert embeddings.dtype == np.float32
+```
+
+### Integration Tests
+```python
+# Test file: tests/test_integration.py
+def test_end_to_end_indexing():
+    """Test complete document indexing pipeline."""
+    agent = get_agent("test")
+    result = agent.index_documents(["test_document.pdf"])
+    
+    assert result.success
+    assert len(result.indexed_chunks) > 0
+```
+
+### Frontend Tests
+```typescript
+// Test file: src/components/__tests__/ChatInterface.test.tsx
+import { render, screen, fireEvent } from '@testing-library/react';
+import { ChatInterface } from '../ChatInterface';
+
+test('sends message when form is submitted', async () => {
+  render(<ChatInterface sessionId="test-session" />);
+  
+  const input = screen.getByPlaceholderText('Type your message...');
+  const button = screen.getByRole('button', { name: /send/i });
+  
+  fireEvent.change(input, { target: { value: 'test message' } });
+  fireEvent.click(button);
+  
+  expect(screen.getByText('test message')).toBeInTheDocument();
+});
+```
+
+## 📖 Documentation Standards
+
+### Code Documentation
+```python
+def create_index(
+    documents: List[str],
+    config: IndexingConfig,
+    progress_callback: Optional[Callable[[float], None]] = None
+) -> IndexingResult:
+    """Create a searchable index from documents.
+    
+    This function processes documents through the complete indexing pipeline:
+    1. Text extraction and chunking
+    2. Embedding generation
+    3. Vector database storage
+    4. BM25 index creation
+    
+    Args:
+        documents: List of document file paths to index
+        config: Indexing configuration with model settings and parameters
+        progress_callback: Optional callback function for progress updates
+        
+    Returns:
+        IndexingResult containing success status, metrics, and any errors
+        
+    Raises:
+        IndexingError: If document processing fails
+        ModelLoadError: If embedding model cannot be loaded
+        
+    Example:
+        >>> config = IndexingConfig(embedding_batch_size=32)
+        >>> result = create_index(["doc1.pdf", "doc2.pdf"], config)
+        >>> print(f"Indexed {result.chunk_count} chunks")
+    """
+```
+
+### API Documentation
+```python
+# Use OpenAPI/FastAPI documentation
+@app.post("/chat", response_model=ChatResponse)
+async def chat_endpoint(request: ChatRequest) -> ChatResponse:
+    """Chat with indexed documents.
+    
+    Send a natural language query and receive an AI-generated response
+    based on the indexed document collection.
+    
+    - **query**: The user's question or prompt
+    - **session_id**: Chat session identifier
+    - **search_type**: Type of search (vector, hybrid, bm25)
+    - **retrieval_k**: Number of documents to retrieve
+    
+    Returns a response with the AI-generated answer and source documents.
+    """
+```
+
+## 🔧 Development Tools
+
+### Recommended VS Code Extensions
+```json
+{
+  "recommendations": [
+    "ms-python.python",
+    "ms-python.pylint",
+    "ms-python.black-formatter",
+    "bradlc.vscode-tailwindcss",
+    "esbenp.prettier-vscode",
+    "ms-vscode.vscode-typescript-next"
+  ]
+}
+```
+
+### Pre-commit Hooks
+```bash
+# Install pre-commit
 pip install pre-commit
-```

-Once installed, you need to add the pre-commit hooks to your local repo.
-
-```shell
+# Set up hooks
 pre-commit install
-```

-Now, every time you commit, the hooks will run and check your code. If they fail, you will need to fix them before you can commit.
-
-If it happened that you committed changes already without having pre-commit hooks and do not want to reset and recommit again, you can run the following command to run the hooks on your local repo.
-
-```shell
+# Run manually
 pre-commit run --all-files
 ```

-## Help Us Improve This Documentation
+### Development Scripts
+```bash
+# Lint Python code
+python -m pylint rag_system/

-If you find that something is missing or have suggestions for improvements, please submit a PR.
+# Format Python code
+python -m black rag_system/

-[submit-a-pr]: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request
+# Type check
+python -m mypy rag_system/
+
+# Lint TypeScript
+npm run lint
+
+# Format TypeScript
+npm run format
+```
+
+## 🐛 Issue Reporting
+
+### Bug Reports
+When reporting bugs, please include:
+
+1. **Environment Information**
+   ```
+   - OS: macOS 13.4
+   - Python: 3.11.5
+   - Node.js: 23.10.0
+   - Ollama: 0.9.5
+   ```
+
+2. **Steps to Reproduce**
+   ```
+   1. Start system with `python run_system.py`
+   2. Upload document via web interface
+   3. Ask question "What is this document about?"
+   4. Error occurs during response generation
+   ```
+
+3. **Expected vs Actual Behavior**
+4. **Error Messages and Logs**
+5. **Screenshots (if applicable)**
+
+### Feature Requests
+Include:
+- **Use Case**: Why is this feature needed?
+- **Proposed Solution**: How should it work?
+- **Alternatives**: What other approaches were considered?
+- **Additional Context**: Any relevant examples or references
+
+## 📦 Release Process
+
+### Version Numbering
+We use semantic versioning (semver):
+- `MAJOR.MINOR.PATCH`
+- Major: Breaking changes
+- Minor: New features (backward compatible)
+- Patch: Bug fixes
+
+### Release Checklist
+- [ ] All tests pass
+- [ ] Documentation updated
+- [ ] Version bumped in relevant files
+- [ ] Changelog updated
+- [ ] Docker images built and tested
+- [ ] Release notes prepared
+
+## 🤝 Community Guidelines
+
+### Code of Conduct
+- Be respectful and inclusive
+- Focus on constructive feedback
+- Help others learn and grow
+- Maintain professional communication
+
+### Getting Help
+- **GitHub Issues**: For bugs and feature requests
+- **GitHub Discussions**: For questions and general discussion
+- **Documentation**: Check existing docs first
+- **Code Review**: Provide thoughtful, actionable feedback
+
+## 🎯 Project Priorities
+
+### Current Focus Areas
+1. **Performance Optimization**: Improving indexing and retrieval speed
+2. **Model Support**: Adding more embedding and generation models
+3. **User Experience**: Enhancing the web interface
+4. **Documentation**: Improving setup and usage guides
+5. **Testing**: Expanding test coverage
+
+### Architecture Goals
+- **Modularity**: Components should be loosely coupled
+- **Extensibility**: Easy to add new models and features
+- **Performance**: Optimize for speed and memory usage
+- **Reliability**: Robust error handling and recovery
+- **Privacy**: Keep user data secure and local
+
+## 📚 Additional Resources
+
+### Learning Resources
+- [RAG System Architecture Overview](Documentation/architecture_overview.md)
+- [API Reference](Documentation/api_reference.md)
+- [Deployment Guide](Documentation/deployment_guide.md)
+- [Troubleshooting Guide](DOCKER_TROUBLESHOOTING.md)
+
+### External References
+- [LangChain Documentation](https://python.langchain.com/)
+- [Ollama Documentation](https://ollama.ai/docs)
+- [Next.js Documentation](https://nextjs.org/docs)
+- [FastAPI Documentation](https://fastapi.tiangolo.com/)
+
+---
+
+## 🙏 Thank You!
+
+Thank you for contributing to LocalGPT! Your contributions help make private document intelligence accessible to everyone.
+
+For questions about contributing, please:
+1. Check existing documentation
+2. Search existing issues
+3. Create a new issue with the `question` label
+4. Join our community discussions
+
+Happy coding! 🚀 
--- a/DOCKER_README.md
+++ b/DOCKER_README.md
@ -0,0 +1,340 @@
+# 🐳 LocalGPT Docker Deployment Guide
+
+This guide covers running LocalGPT using Docker containers with local Ollama for optimal performance.
+
+## 🚀 Quick Start
+
+### Complete Setup (5 Minutes)
+```bash
+# 1. Install Ollama locally
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# 2. Start Ollama server
+ollama serve
+
+# 3. Install required models (in another terminal)
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+
+# 4. Clone and start LocalGPT
+git clone https://github.com/your-org/rag-system.git
+cd rag-system
+./start-docker.sh
+
+# 5. Access the application
+open http://localhost:3000
+```
+
+## 📋 Prerequisites
+
+- **Docker Desktop** installed and running
+- **Ollama** installed locally (required for best performance)
+- **8GB+ RAM** (16GB recommended for larger models)
+- **10GB+ free disk space**
+
+## 🏗️ Architecture
+
+### Current Setup (Local Ollama + Docker Containers)
+```
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   Frontend      │────│    Backend      │────│    RAG API      │
+│  (Container)    │    │  (Container)    │    │  (Container)    │
+│   Port: 3000    │    │   Port: 8000    │    │   Port: 8001    │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+                                                        │
+                                                        │ API calls
+                                                        ▼
+                                               ┌─────────────────┐
+                                               │     Ollama      │
+                                               │ (Local/Host)    │
+                                               │   Port: 11434   │
+                                               └─────────────────┘
+```
+
+**Why Local Ollama?**
+- ✅ Better performance (direct GPU access)
+- ✅ Simpler setup (one less container)
+- ✅ Easier model management
+- ✅ More reliable connection
+
+## 🛠️ Container Details
+
+### Frontend Container (rag-frontend)
+- **Image**: Custom Node.js 18 build
+- **Port**: 3000
+- **Purpose**: Next.js web interface
+- **Health Check**: HTTP GET to /
+- **Memory**: ~500MB
+
+### Backend Container (rag-backend) 
+- **Image**: Custom Python 3.11 build
+- **Port**: 8000
+- **Purpose**: Session management, chat history, API gateway
+- **Health Check**: HTTP GET to /health
+- **Memory**: ~300MB
+
+### RAG API Container (rag-api)
+- **Image**: Custom Python 3.11 build
+- **Port**: 8001
+- **Purpose**: Document indexing, retrieval, AI processing
+- **Health Check**: HTTP GET to /models
+- **Memory**: ~2GB (varies with model usage)
+
+## 📂 Volume Mounts & Data
+
+### Persistent Data
+- `./lancedb/` → Vector database storage
+- `./index_store/` → Document indexes and metadata
+- `./shared_uploads/` → Uploaded document files
+- `./backend/chat_data.db` → SQLite chat history database
+
+### Shared Between Containers
+All containers share access to document storage and databases through bind mounts.
+
+## 🔧 Configuration
+
+### Environment Variables (docker.env)
+```bash
+# Ollama Configuration
+OLLAMA_HOST=http://host.docker.internal:11434
+
+# Service Configuration  
+NODE_ENV=production
+RAG_API_URL=http://rag-api:8001
+NEXT_PUBLIC_API_URL=http://localhost:8000
+
+# Database Paths (inside containers)
+DATABASE_PATH=/app/backend/chat_data.db
+LANCEDB_PATH=/app/lancedb
+UPLOADS_PATH=/app/shared_uploads
+```
+
+### Model Configuration
+The system uses these models by default:
+- **Embedding**: `Qwen/Qwen3-Embedding-0.6B` (1024 dimensions)
+- **Generation**: `qwen3:0.6b` (fast) or `qwen3:8b` (high quality)
+- **Reranking**: Built-in cross-encoder
+
+## 🎯 Management Commands
+
+### Start/Stop Services
+```bash
+# Start all services
+./start-docker.sh
+
+# Stop all services
+./start-docker.sh stop
+
+# Restart services
+./start-docker.sh stop && ./start-docker.sh
+```
+
+### Monitor Services
+```bash
+# Check container status
+./start-docker.sh status
+docker compose ps
+
+# View live logs
+./start-docker.sh logs
+docker compose logs -f
+
+# View specific service logs
+docker compose logs -f rag-api
+docker compose logs -f backend
+docker compose logs -f frontend
+```
+
+### Manual Docker Compose
+```bash
+# Start manually
+docker compose --env-file docker.env up --build -d
+
+# Stop manually
+docker compose down
+
+# Rebuild specific service
+docker compose build --no-cache rag-api
+docker compose up -d rag-api
+```
+
+### Health Checks
+```bash
+# Test all endpoints
+curl -f http://localhost:3000 && echo "✅ Frontend OK"
+curl -f http://localhost:8000/health && echo "✅ Backend OK"
+curl -f http://localhost:8001/models && echo "✅ RAG API OK"
+curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
+```
+
+## 🐞 Debugging
+
+### Access Container Shells
+```bash
+# RAG API container (most debugging happens here)
+docker compose exec rag-api bash
+
+# Backend container
+docker compose exec backend bash
+
+# Frontend container
+docker compose exec frontend sh
+```
+
+### Common Debug Commands
+```bash
+# Test RAG system initialization
+docker compose exec rag-api python -c "
+from rag_system.main import get_agent
+agent = get_agent('default')
+print('✅ RAG System OK')
+"
+
+# Test Ollama connection from container
+docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
+
+# Check environment variables
+docker compose exec rag-api env | grep OLLAMA
+
+# View Python packages
+docker compose exec rag-api pip list | grep -E "(torch|transformers|lancedb)"
+```
+
+### Resource Monitoring
+```bash
+# Monitor container resources
+docker stats
+
+# Check disk usage
+docker system df
+df -h ./lancedb ./shared_uploads
+
+# Check memory usage by service
+docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
+```
+
+## 🚨 Troubleshooting
+
+### Common Issues
+
+#### Container Won't Start
+```bash
+# Check logs for specific error
+docker compose logs [service-name]
+
+# Rebuild from scratch
+./start-docker.sh stop
+docker system prune -f
+./start-docker.sh
+
+# Check for port conflicts
+lsof -i :3000 -i :8000 -i :8001
+```
+
+#### Can't Connect to Ollama
+```bash
+# Verify Ollama is running
+curl http://localhost:11434/api/tags
+
+# Restart Ollama
+pkill ollama
+ollama serve
+
+# Test from container
+docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
+```
+
+#### Memory Issues
+```bash
+# Check memory usage
+docker stats --no-stream
+free -h  # On host
+
+# Increase Docker memory limit
+# Docker Desktop → Settings → Resources → Memory → 8GB+
+
+# Use smaller models
+ollama pull qwen3:0.6b  # Instead of qwen3:8b
+```
+
+#### Frontend Build Errors
+```bash
+# Clean build
+docker compose build --no-cache frontend
+docker compose up -d frontend
+
+# Check frontend logs
+docker compose logs frontend
+```
+
+#### Database/Storage Issues
+```bash
+# Check file permissions
+ls -la backend/chat_data.db
+ls -la lancedb/
+
+# Reset permissions
+chmod 664 backend/chat_data.db
+chmod -R 755 lancedb/ shared_uploads/
+
+# Test database access
+docker compose exec backend sqlite3 /app/backend/chat_data.db ".tables"
+```
+
+### Performance Issues
+
+#### Slow Response Times
+- Use faster models: `qwen3:0.6b` instead of `qwen3:8b`
+- Increase Docker memory allocation
+- Ensure SSD storage for databases
+- Monitor with `docker stats`
+
+#### High Memory Usage
+- Reduce batch sizes in configuration
+- Use smaller embedding models
+- Clear unused Docker resources: `docker system prune`
+
+### Complete Reset
+```bash
+# Nuclear option - reset everything
+./start-docker.sh stop
+docker system prune -a --volumes
+rm -rf lancedb/* shared_uploads/* backend/chat_data.db
+./start-docker.sh
+```
+
+## 🏆 Success Criteria
+
+Your Docker deployment is successful when:
+
+- ✅ `./start-docker.sh status` shows all containers healthy
+- ✅ All health checks pass (see commands above)  
+- ✅ You can access http://localhost:3000
+- ✅ You can upload documents and create indexes
+- ✅ You can chat with your documents
+- ✅ No errors in container logs
+
+### Performance Benchmarks
+
+**Good Performance:**
+- Container startup: < 2 minutes
+- Index creation: < 2 min per 100MB document
+- Query response: < 30 seconds
+- Memory usage: < 4GB total containers
+
+**Optimal Performance:**
+- Container startup: < 1 minute
+- Index creation: < 1 min per 100MB document  
+- Query response: < 10 seconds
+- Memory usage: < 2GB total containers
+
+## 📚 Additional Resources
+
+- **Detailed Troubleshooting**: See `DOCKER_TROUBLESHOOTING.md`
+- **Complete Documentation**: See `Documentation/docker_usage.md`
+- **System Architecture**: See `Documentation/architecture_overview.md`
+- **Direct Development**: See main `README.md` for non-Docker setup
+
+---
+
+**Happy Dockerizing! 🐳** Need help? Check the troubleshooting guide or open an issue. 
--- a/DOCKER_TROUBLESHOOTING.md
+++ b/DOCKER_TROUBLESHOOTING.md
@ -0,0 +1,604 @@
+# 🐳 Docker Troubleshooting Guide - LocalGPT
+
+_Last updated: 2025-01-07_
+
+This guide helps diagnose and fix Docker-related issues with LocalGPT's containerized deployment.
+
+---
+
+## 🏁 Quick Health Check
+
+### System Status Check
+```bash
+# Check Docker daemon
+docker version
+
+# Check Ollama status  
+curl http://localhost:11434/api/tags
+
+# Check containers
+./start-docker.sh status
+
+# Test all endpoints
+curl -f http://localhost:3000 && echo "✅ Frontend OK"
+curl -f http://localhost:8000/health && echo "✅ Backend OK"
+curl -f http://localhost:8001/models && echo "✅ RAG API OK"
+curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
+```
+
+### Expected Success Output
+```
+✅ Frontend OK
+✅ Backend OK
+✅ RAG API OK
+✅ Ollama OK
+```
+
+---
+
+## 🚨 Common Issues & Solutions
+
+### 1. Docker Daemon Issues
+
+#### Problem: "Cannot connect to Docker daemon"
+```
+Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
+```
+
+#### Solution A: Restart Docker Desktop (macOS/Windows)
+```bash
+# Quit Docker Desktop completely
+# macOS: Click Docker icon → "Quit Docker Desktop"
+# Windows: Right-click Docker icon → "Quit Docker Desktop"
+
+# Wait for it to fully shut down
+sleep 10
+
+# Start Docker Desktop
+open -a Docker  # macOS
+# Windows: Click Docker Desktop from Start menu
+
+# Wait for Docker to be ready (2-3 minutes)
+docker version
+```
+
+#### Solution B: Linux Docker Service
+```bash
+# Check Docker service status
+sudo systemctl status docker
+
+# Restart Docker service
+sudo systemctl restart docker
+
+# Enable auto-start
+sudo systemctl enable docker
+
+# Test connection
+docker version
+```
+
+#### Solution C: Hard Reset
+```bash
+# Kill all Docker processes
+sudo pkill -f docker
+
+# Remove socket files
+sudo rm -f /var/run/docker.sock
+sudo rm -f /Users/prompt/.docker/run/docker.sock  # macOS
+
+# Restart Docker Desktop
+open -a Docker  # macOS
+```
+
+### 2. Ollama Connection Issues
+
+#### Problem: RAG API can't connect to Ollama
+```
+ConnectionError: Failed to connect to Ollama at http://host.docker.internal:11434
+```
+
+#### Solution A: Verify Ollama is Running
+```bash
+# Check if Ollama is running
+curl http://localhost:11434/api/tags
+
+# If not running, start it
+ollama serve
+
+# Install required models
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+```
+
+#### Solution B: Test from Container
+```bash
+# Test Ollama connection from RAG API container
+docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
+
+# If this fails, check Docker network settings
+docker network ls
+docker network inspect rag_system_old_default
+```
+
+#### Solution C: Alternative Ollama Host
+```bash
+# Edit docker.env to use different host
+echo "OLLAMA_HOST=http://172.17.0.1:11434" >> docker.env
+
+# Or use IP address
+echo "OLLAMA_HOST=http://$(ipconfig getifaddr en0):11434" >> docker.env  # macOS
+```
+
+### 3. Container Build Failures
+
+#### Problem: Frontend build fails
+```
+ERROR: Failed to build frontend container
+```
+
+#### Solution: Clean Build
+```bash
+# Stop containers
+./start-docker.sh stop
+
+# Clean Docker cache
+docker system prune -f
+docker builder prune -f
+
+# Rebuild frontend only
+docker compose build --no-cache frontend
+docker compose up -d frontend
+
+# Check logs
+docker compose logs frontend
+```
+
+#### Problem: Python package installation fails
+```
+ERROR: Could not install packages due to an EnvironmentError
+```
+
+#### Solution: Update Dependencies
+```bash
+# Check requirements file exists
+ls -la requirements-docker.txt
+
+# Test package installation locally
+pip install -r requirements-docker.txt --dry-run
+
+# Rebuild with updated base image
+docker compose build --no-cache --pull rag-api
+```
+
+### 4. Port Conflicts
+
+#### Problem: "Port already in use"
+```
+Error starting userland proxy: listen tcp4 0.0.0.0:3000: bind: address already in use
+```
+
+#### Solution: Find and Kill Conflicting Processes
+```bash
+# Check what's using the ports
+lsof -i :3000 -i :8000 -i :8001
+
+# Kill specific processes
+pkill -f "npm run dev"      # Frontend
+pkill -f "server.py"        # Backend
+pkill -f "api_server"       # RAG API
+
+# Or kill by port
+sudo kill -9 $(lsof -t -i:3000)
+sudo kill -9 $(lsof -t -i:8000)
+sudo kill -9 $(lsof -t -i:8001)
+
+# Restart containers
+./start-docker.sh
+```
+
+### 5. Memory Issues
+
+#### Problem: Containers crash due to OOM (Out of Memory)
+```
+Container killed due to memory limit
+```
+
+#### Solution: Increase Docker Memory
+```bash
+# Check current memory usage
+docker stats --no-stream
+
+# Increase Docker Desktop memory allocation
+# Docker Desktop → Settings → Resources → Memory → 8GB+
+
+# Monitor memory usage
+docker stats
+
+# Use smaller models if needed
+ollama pull qwen3:0.6b  # Instead of qwen3:8b
+```
+
+#### Problem: System running slow
+```bash
+# Check host memory
+free -h  # Linux
+vm_stat  # macOS
+
+# Clean up Docker resources
+docker system prune -f
+docker volume prune -f
+```
+
+### 6. Volume Mount Issues
+
+#### Problem: Permission denied accessing files
+```
+Permission denied: /app/lancedb
+```
+
+#### Solution: Fix Permissions
+```bash
+# Create directories if they don't exist
+mkdir -p lancedb index_store shared_uploads backend
+
+# Fix permissions
+chmod -R 755 lancedb index_store shared_uploads
+chmod 664 backend/chat_data.db
+
+# Check ownership
+ls -la lancedb/ shared_uploads/ backend/
+
+# Reset permissions if needed
+sudo chown -R $USER:$USER lancedb shared_uploads backend
+```
+
+#### Problem: Database file not found
+```
+No such file or directory: '/app/backend/chat_data.db'
+```
+
+#### Solution: Initialize Database
+```bash
+# Create empty database file
+touch backend/chat_data.db
+
+# Or initialize with schema
+python -c "
+from backend.database import ChatDatabase
+db = ChatDatabase()
+db.init_database()
+print('Database initialized')
+"
+
+# Restart containers
+./start-docker.sh stop
+./start-docker.sh
+```
+
+---
+
+## 🔍 Advanced Debugging
+
+### Container-Level Debugging
+
+#### Access Container Shells
+```bash
+# RAG API container (most issues happen here)
+docker compose exec rag-api bash
+
+# Check environment variables
+docker compose exec rag-api env | grep -E "(OLLAMA|RAG|NODE)"
+
+# Test Python imports
+docker compose exec rag-api python -c "
+import sys
+print('Python version:', sys.version)
+from rag_system.main import get_agent
+print('✅ RAG system imports work')
+"
+
+# Backend container
+docker compose exec backend bash
+python -c "
+from backend.database import ChatDatabase
+print('✅ Database imports work')
+"
+
+# Frontend container  
+docker compose exec frontend sh
+npm --version
+node --version
+```
+
+#### Check Container Resources
+```bash
+# Monitor real-time resource usage
+docker stats
+
+# Check individual container health
+docker compose ps
+docker inspect rag-api --format='{{.State.Health.Status}}'
+
+# View container configurations
+docker compose config
+```
+
+#### Network Debugging
+```bash
+# Check network connectivity
+docker compose exec rag-api ping backend
+docker compose exec backend ping rag-api
+docker compose exec rag-api ping host.docker.internal
+
+# Check DNS resolution
+docker compose exec rag-api nslookup host.docker.internal
+
+# Test HTTP connections
+docker compose exec rag-api curl -v http://backend:8000/health
+docker compose exec rag-api curl -v http://host.docker.internal:11434/api/tags
+```
+
+### Log Analysis
+
+#### Container Logs
+```bash
+# View all logs
+./start-docker.sh logs
+
+# Follow specific service logs
+docker compose logs -f rag-api
+docker compose logs -f backend
+docker compose logs -f frontend
+
+# Search for errors
+docker compose logs rag-api 2>&1 | grep -i error
+docker compose logs backend 2>&1 | grep -i "traceback\|error"
+
+# Save logs to file
+docker compose logs > docker-debug.log 2>&1
+```
+
+#### System Logs
+```bash
+# Docker daemon logs (Linux)
+journalctl -u docker.service -f
+
+# macOS: Check Console app for Docker logs
+# Windows: Check Event Viewer
+```
+
+---
+
+## 🧪 Testing & Validation
+
+### Manual Container Testing
+
+#### Test Individual Containers
+```bash
+# Test RAG API alone
+docker build -f Dockerfile.rag-api -t test-rag-api .
+docker run --rm -p 8001:8001 -e OLLAMA_HOST=http://host.docker.internal:11434 test-rag-api &
+sleep 30
+curl http://localhost:8001/models
+pkill -f test-rag-api
+
+# Test Backend alone
+docker build -f Dockerfile.backend -t test-backend .
+docker run --rm -p 8000:8000 test-backend &
+sleep 30
+curl http://localhost:8000/health
+pkill -f test-backend
+```
+
+#### Integration Testing
+```bash
+# Full system test
+./start-docker.sh
+
+# Wait for all services to be ready
+sleep 60
+
+# Test complete workflow
+curl -X POST http://localhost:8000/sessions \
+  -H "Content-Type: application/json" \
+  -d '{"title": "Test Session"}'
+
+# Test document upload (if you have a test PDF)
+# curl -X POST http://localhost:8000/upload -F "file=@test.pdf"
+
+# Clean up
+./start-docker.sh stop
+```
+
+### Automated Testing Script
+
+Create `test-docker-health.sh`:
+```bash
+#!/bin/bash
+set -e
+
+echo "🐳 Docker Health Test Starting..."
+
+# Start containers
+./start-docker.sh
+
+# Wait for services
+echo "⏳ Waiting for services to start..."
+sleep 60
+
+# Test endpoints
+echo "🔍 Testing endpoints..."
+curl -f http://localhost:3000 && echo "✅ Frontend OK" || echo "❌ Frontend FAIL"
+curl -f http://localhost:8000/health && echo "✅ Backend OK" || echo "❌ Backend FAIL"  
+curl -f http://localhost:8001/models && echo "✅ RAG API OK" || echo "❌ RAG API FAIL"
+curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK" || echo "❌ Ollama FAIL"
+
+# Test container health
+echo "🔍 Checking container health..."
+docker compose ps
+
+echo "🎉 Health test complete!"
+```
+
+---
+
+## 🔄 Recovery Procedures
+
+### Complete System Reset
+
+#### Soft Reset
+```bash
+# Stop containers
+./start-docker.sh stop
+
+# Clean up Docker resources
+docker system prune -f
+
+# Restart containers
+./start-docker.sh
+```
+
+#### Hard Reset (⚠️ Deletes all data)
+```bash
+# Stop everything
+./start-docker.sh stop
+
+# Remove all containers, images, and volumes
+docker system prune -a --volumes
+
+# Remove local data (CAUTION: This deletes all your documents and chat history)
+rm -rf lancedb/* shared_uploads/* backend/chat_data.db
+
+# Rebuild from scratch
+./start-docker.sh
+```
+
+#### Selective Reset
+
+Reset only specific components:
+```bash
+# Reset just the database
+./start-docker.sh stop
+rm backend/chat_data.db
+./start-docker.sh
+
+# Reset just vector storage
+./start-docker.sh stop
+rm -rf lancedb/*
+./start-docker.sh
+
+# Reset just uploaded documents
+rm -rf shared_uploads/*
+```
+
+---
+
+## 📊 Performance Optimization
+
+### Resource Monitoring
+```bash
+# Monitor containers continuously
+watch -n 5 'docker stats --no-stream'
+
+# Check disk usage
+docker system df
+du -sh lancedb shared_uploads backend
+
+# Monitor host resources
+htop  # Linux
+top   # macOS/Windows
+```
+
+### Performance Tuning
+```bash
+# Use smaller models for better performance
+ollama pull qwen3:0.6b  # Instead of qwen3:8b
+
+# Reduce Docker memory if needed
+# Docker Desktop → Settings → Resources → Memory
+
+# Clean up regularly
+docker system prune -f
+docker volume prune -f
+```
+
+---
+
+## 🆘 When All Else Fails
+
+### Alternative Deployment Options
+
+#### 1. Direct Development (No Docker)
+```bash
+# Stop Docker containers
+./start-docker.sh stop
+
+# Use direct development instead
+python run_system.py
+```
+
+#### 2. Minimal Docker (RAG API only)
+```bash
+# Run only RAG API in Docker
+docker build -f Dockerfile.rag-api -t rag-api .
+docker run -p 8001:8001 rag-api
+
+# Run other components directly
+cd backend && python server.py &
+npm run dev
+```
+
+#### 3. Hybrid Approach
+```bash
+# Run some services in Docker, others directly
+docker compose up -d rag-api
+cd backend && python server.py &
+npm run dev
+```
+
+### Getting Help
+
+#### Diagnostic Information to Collect
+```bash
+# System information
+docker version
+docker compose version
+uname -a
+
+# Container information
+docker compose ps
+docker compose config
+
+# Resource information
+docker stats --no-stream
+docker system df
+
+# Error logs
+docker compose logs > docker-errors.log 2>&1
+```
+
+#### Support Channels
+1. **Check GitHub Issues**: Search existing issues for similar problems
+2. **Documentation**: Review the complete documentation in `Documentation/`
+3. **Create Issue**: Include diagnostic information above
+
+---
+
+## ✅ Success Checklist
+
+Your Docker deployment is working correctly when:
+
+- ✅ `docker version` shows Docker is running
+- ✅ `curl http://localhost:11434/api/tags` shows Ollama is accessible
+- ✅ `./start-docker.sh status` shows all containers healthy
+- ✅ All health check URLs return 200 OK
+- ✅ You can access the frontend at http://localhost:3000
+- ✅ You can create document indexes successfully
+- ✅ You can chat with your documents
+- ✅ No error messages in container logs
+
+**If all boxes are checked, your Docker deployment is successful! 🎉**
+
+---
+
+**Still having issues?** Check the main `DOCKER_README.md` or create an issue with your diagnostic information. 
--- a/21
+++ b/21
@ -1,21 +0,0 @@
-# syntax=docker/dockerfile:1
-# Build as `docker build . -t localgpt`, requires BuildKit.
-# Run as `docker run -it --mount src="$HOME/.cache",target=/root/.cache,type=bind --gpus=all localgpt`, requires Nvidia container toolkit.
-
-FROM nvidia/cuda:11.7.1-runtime-ubuntu22.04
-RUN apt-get update && apt-get install -y software-properties-common
-RUN apt-get install -y g++-11 make python3 python-is-python3 pip
-# only copy what's needed at every step to optimize layer cache
-COPY ./requirements.txt .
-# use BuildKit cache mount to drastically reduce redownloading from pip on repeated builds
-RUN --mount=type=cache,target=/root/.cache CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --timeout 100 -r requirements.txt llama-cpp-python==0.1.83
-COPY SOURCE_DOCUMENTS ./SOURCE_DOCUMENTS
-COPY ingest.py constants.py ./
-# Docker BuildKit does not support GPU during *docker build* time right now, only during *docker run*.
-# See <https://github.com/moby/buildkit/issues/1436>.
-# If this changes in the future you can `docker build --build-arg device_type=cuda  . -t localgpt` (+GPU argument to be determined).
-ARG device_type=cpu
-RUN --mount=type=cache,target=/root/.cache python ingest.py --device_type $device_type
-COPY . .
-ENV device_type=cuda
-CMD python run_localGPT.py --device_type $device_type
--- a/Dockerfile.backend
+++ b/Dockerfile.backend
@ -0,0 +1,31 @@
+FROM python:3.11-slim
+
+# Set working directory
+WORKDIR /app
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy requirements and install Python dependencies (using Docker-specific requirements)
+COPY requirements-docker.txt ./requirements.txt
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy backend code and dependencies
+COPY backend/ ./backend/
+COPY rag_system/ ./rag_system/
+
+# Create necessary directories
+RUN mkdir -p shared_uploads logs
+
+# Expose port
+EXPOSE 8000
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+
+# Run the backend server
+WORKDIR /app/backend
+CMD ["python", "server.py"] 
--- a/Dockerfile.frontend
+++ b/Dockerfile.frontend
@ -0,0 +1,31 @@
+FROM node:18-alpine
+
+# Set working directory
+WORKDIR /app
+
+# Install dependencies (including dev dependencies for build)
+COPY package.json package-lock.json ./
+RUN npm ci
+
+# Copy source code and configuration files
+COPY src/ ./src/
+COPY public/ ./public/
+COPY next.config.ts ./
+COPY tsconfig.json ./
+COPY tailwind.config.js ./
+COPY postcss.config.mjs ./
+COPY eslint.config.mjs ./
+
+# Build the application (skip linting for Docker)
+ENV NEXT_LINT=false
+RUN npm run build
+
+# Expose port
+EXPOSE 3000
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD curl -f http://localhost:3000 || exit 1
+
+# Start the application
+CMD ["npm", "start"] 
--- a/Dockerfile.rag-api
+++ b/Dockerfile.rag-api
@ -0,0 +1,31 @@
+FROM python:3.11-slim
+
+# Set working directory
+WORKDIR /app
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    curl \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy requirements and install Python dependencies (using Docker-specific requirements)
+COPY requirements-docker.txt ./requirements.txt
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy RAG system code and backend dependencies
+COPY rag_system/ ./rag_system/
+COPY backend/ ./backend/
+
+# Create necessary directories
+RUN mkdir -p lancedb index_store shared_uploads logs
+
+# Expose port
+EXPOSE 8001
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD curl -f http://localhost:8001/models || exit 1
+
+# Run the RAG API server
+CMD ["python", "-m", "rag_system.api_server"] 
--- a/45
+++ b/45
@ -1,45 +0,0 @@
-FROM vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest
-
-ENV HABANA_VISIBLE_DEVICES=all
-ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
-ENV PT_HPU_LAZY_ACC_PAR_MODE=0
-ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=1
-
-# Install linux packages
-ENV DEBIAN_FRONTEND="noninteractive"  TZ=Etc/UTC
-RUN apt-get update && apt-get install -y tzdata bash-completion python3-pip openssh-server \
-    vim git iputils-ping net-tools protobuf-compiler curl bc gawk tmux \
-    && rm -rf /var/lib/apt/lists/*
-
-# Add repo contents
-ADD localGPT /root/localGPT
-WORKDIR /root/localGPT
-
-# Install python packages
-RUN pip install --upgrade pip \
-    && pip install langchain-experimental==0.0.62 \
-    && pip install langchain==0.0.329 \
-    && pip install protobuf==3.20.2 \
-    && pip install grpcio-tools \
-    && pip install pymilvus==2.4.0 \
-    && pip install chromadb==0.5.15 \
-    && pip install llama-cpp-python==0.1.66 \
-    && pip install pdfminer.six==20221105 \
-    && pip install transformers==4.43.1 \
-    && pip install optimum[habana]==1.13.1 \
-    && pip install InstructorEmbedding==1.0.1 \
-    && pip install sentence-transformers==3.0.1 \
-    && pip install faiss-cpu==1.7.4 \
-    && pip install huggingface_hub==0.16.4 \
-    && pip install protobuf==3.20.2 \
-    && pip install auto-gptq==0.2.2 \
-    && pip install docx2txt unstructured unstructured[pdf] urllib3 accelerate \
-    && pip install bitsandbytes \
-    && pip install click flask requests openpyxl \
-    && pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.17.0 \
-    && pip install python-multipart \
-    && pip install fastapi \
-    && pip install uvicorn \
-    && pip install gptcache==0.1.43 \
-    && pip install pypdf==4.3.1 \
-    && pip install python-jose[cryptography]
--- a/Documentation/api_reference.md
+++ b/Documentation/api_reference.md
@ -0,0 +1,161 @@
+# 📚 API Reference (Backend & RAG API)
+
+_Last updated: 2025-01-07_
+
+---
+
+## Backend HTTP API (Python `backend/server.py`)
+**Base URL**: `http://localhost:8000`
+
+| Endpoint | Method | Description | Request Body | Success Response |
+|----------|--------|-------------|--------------|------------------|
+| `/health` | GET | Health probe incl. Ollama status & DB stats | – | 200 JSON `{ status, ollama_running, available_models, database_stats }` |
+| `/chat` | POST | Stateless chat (no session) | `{ message:str, model?:str, conversation_history?:[{role,content}]}` | 200 `{ response:str, model:str, message_count:int }` |
+| `/sessions` | GET | List all sessions | – | `{ sessions:ChatSession[], total:int }` |
+| `/sessions` | POST | Create session | `{ title?:str, model?:str }` | 201 `{ session:ChatSession, session_id }` |
+| `/sessions/<id>` | GET | Get session + msgs | – | `{ session, messages }` |
+| `/sessions/<id>` | DELETE | Delete session | – | `{ message, deleted_session_id }` |
+| `/sessions/<id>/rename` | POST | Rename session | `{ title:str }` | `{ message, session }` |
+| `/sessions/<id>/messages` | POST | Session chat (builds history) | See ChatRequest + retrieval opts ▼ | `{ response, session, user_message_id, ai_message_id }` |
+| `/sessions/<id>/documents` | GET | List uploaded docs | – | `{ files:string[], file_count:int, session }` |
+| `/sessions/<id>/upload` | POST multipart | Upload docs to session | field `files[]` | `{ message, uploaded_files, processing_results?, session_documents?, total_session_documents? }` |
+| `/sessions/<id>/index` | POST | Trigger RAG indexing for session | `{ latechunk?, doclingChunk?, chunkSize?, ... }` | `{ message }` |
+| `/sessions/<id>/indexes` | GET | List indexes linked to session | – | `{ indexes, total }` |
+| `/sessions/<sid>/indexes/<idxid>` | POST | Link index to session | – | `{ message }` |
+| `/sessions/cleanup` | GET | Remove empty sessions | – | `{ message, cleanup_count }` |
+| `/models` | GET | List generation / embedding models | – | `{ generation_models:str[], embedding_models:str[] }` |
+| `/indexes` | GET | List all indexes | – | `{ indexes, total }` |
+| `/indexes` | POST | Create index | `{ name:str, description?:str, metadata?:dict }` | `{ index_id }` |
+| `/indexes/<id>` | GET | Get single index | – | `{ index }` |
+| `/indexes/<id>` | DELETE | Delete index | – | `{ message, index_id }` |
+| `/indexes/<id>/upload` | POST multipart | Upload docs to index | field `files[]` | `{ message, uploaded_files }` |
+| `/indexes/<id>/build` | POST | Build / rebuild index (RAG) | `{ latechunk?, doclingChunk?, ...}` | 200 `{ response?, message?}` (idempotent) |
+
+---
+
+## RAG API (Python `rag_system/api_server.py`)
+**Base URL**: `http://localhost:8001`
+
+| Endpoint | Method | Description | Request Body | Success Response |
+|----------|--------|-------------|--------------|------------------|
+| `/chat` | POST | Run RAG query with full pipeline | See RAG ChatRequest ▼ | `{ answer:str, source_documents:[], reasoning?:str, confidence?:float }` |
+| `/chat/stream` | POST | Run RAG query with SSE streaming | Same as /chat | Server-Sent Events stream |
+| `/index` | POST | Index documents with full configuration | See Index Request ▼ | `{ message:str, indexed_files:[], table_name:str }` |
+| `/models` | GET | List available models | – | `{ generation_models:str[], embedding_models:str[] }` |
+
+### RAG ChatRequest (Advanced Options)
+```jsonc
+{
+  "query": "string",                    // Required – user question
+  "session_id": "string",               // Optional – for session context
+  "table_name": "string",               // Optional – specific index table
+  "compose_sub_answers": true,          // Optional – compose sub-answers 
+  "query_decompose": true,              // Optional – decompose complex queries
+  "ai_rerank": false,                   // Optional – AI-powered reranking
+  "context_expand": false,              // Optional – context expansion
+  "verify": true,                       // Optional – answer verification
+  "retrieval_k": 20,                    // Optional – number of chunks to retrieve
+  "context_window_size": 1,             // Optional – context window size
+  "reranker_top_k": 10,                 // Optional – top-k after reranking
+  "search_type": "hybrid",              // Optional – "hybrid|dense|fts"
+  "dense_weight": 0.7,                  // Optional – dense search weight (0-1)
+  "force_rag": false,                   // Optional – bypass triage, force RAG
+  "provence_prune": false,              // Optional – sentence-level pruning
+  "provence_threshold": 0.8,            // Optional – pruning threshold
+  "model": "qwen3:8b"                   // Optional – generation model override
+}
+```
+
+### Index Request (Document Indexing)
+```jsonc
+{
+  "file_paths": ["path1.pdf", "path2.pdf"],  // Required – files to index
+  "session_id": "string",                     // Required – session identifier
+  "chunk_size": 512,                          // Optional – chunk size (default: 512)
+  "chunk_overlap": 64,                        // Optional – chunk overlap (default: 64)
+  "enable_latechunk": true,                   // Optional – enable late chunking
+  "enable_docling_chunk": false,              // Optional – enable DocLing chunking
+  "retrieval_mode": "hybrid",                 // Optional – "hybrid|dense|fts"
+  "window_size": 2,                           // Optional – context window
+  "enable_enrich": true,                      // Optional – enable enrichment
+  "embedding_model": "Qwen/Qwen3-Embedding-0.6B",  // Optional – embedding model
+  "enrich_model": "qwen3:0.6b",               // Optional – enrichment model
+  "overview_model_name": "qwen3:0.6b",        // Optional – overview model
+  "batch_size_embed": 50,                     // Optional – embedding batch size
+  "batch_size_enrich": 25                     // Optional – enrichment batch size
+}
+```
+
+> **Note on CORS** – All endpoints include `Access-Control-Allow-Origin: *` header.
+
+---
+
+## Frontend Wrapper (`src/lib/api.ts`)
+The React/Next.js frontend calls the backend via a typed wrapper. Important methods & payloads:
+
+| Method | Backend Endpoint | Payload Shape |
+|--------|------------------|---------------|
+| `checkHealth()` | `/health` | – |
+| `sendMessage({ message, model?, conversation_history? })` | `/chat` | ChatRequest |
+| `getSessions()` | `/sessions` | – |
+| `createSession(title?, model?)` | `/sessions` | – |
+| `getSession(sessionId)` | `/sessions/<id>` | – |
+| `sendSessionMessage(sessionId, message, opts)` | `/sessions/<id>/messages` | `ChatRequest + retrieval opts` |
+| `uploadFiles(sessionId, files[])` | `/sessions/<id>/upload` | multipart |
+| `indexDocuments(sessionId)` | `/sessions/<id>/index` | opts similar to buildIndex |
+| `buildIndex(indexId, opts)` | `/indexes/<id>/build` | Index build options |
+| `linkIndexToSession` | `/sessions/<sid>/indexes/<idx>` | – |
+
+---
+
+## Payload Definitions (Canonical)
+
+### ChatRequest (frontend ⇄ backend)
+```jsonc
+{
+  "message": "string",              // Required – raw user text
+  "model": "string",                // Optional – generation model id
+  "conversation_history": [         // Optional – prior turn list
+    { "role": "user|assistant", "content": "string" }
+  ]
+}
+```
+
+### Session Chat Extended Options
+```jsonc
+{
+  "composeSubAnswers": true,
+  "decompose": true,
+  "aiRerank": false,
+  "contextExpand": false,
+  "verify": true,
+  "retrievalK": 10,
+  "contextWindowSize": 5,
+  "rerankerTopK": 20,
+  "searchType": "fts|hybrid|dense",
+  "denseWeight": 0.75,
+  "force_rag": false
+}
+```
+
+### Index Build Options
+```jsonc
+{
+  "latechunk": true,
+  "doclingChunk": false,
+  "chunkSize": 512,
+  "chunkOverlap": 64,
+  "retrievalMode": "hybrid|dense|fts",
+  "windowSize": 2,
+  "enableEnrich": true,
+  "embeddingModel": "Qwen/Qwen3-Embedding-0.6B",
+  "enrichModel": "qwen3:0.6b",
+  "overviewModel": "qwen3:0.6b",
+  "batchSizeEmbed": 64,
+  "batchSizeEnrich": 32
+}
+```
+
+---
+
+_This reference is derived from static code analysis of `backend/server.py`, `rag_system/api_server.py`, and `src/lib/api.ts`. Keep it in sync with route or type changes._ 
--- a/Documentation/architecture_overview.md
+++ b/Documentation/architecture_overview.md
@ -0,0 +1,83 @@
+# 🏗️ System Architecture Overview
+
+_Last updated: 2025-07-06_
+
+This document explains how data and control flow through the Advanced **RAG System** — from a user's browser all the way to model inference and back.  It is intended as the **ground-truth reference** for engineers and integrators.
+
+---
+
+## 1. Bird's-Eye Diagram
+
+```mermaid
+flowchart LR
+    subgraph Client
+        U["👤  User (Browser)"]
+        FE["Next.js Front-end\nReact Components"]
+        U --> FE
+    end
+
+    subgraph Network
+        FE -->|HTTP/JSON| BE["Python HTTP Server\nbackend/server.py"]
+    end
+
+    subgraph Core["rag_system core package"]
+        BE --> LOOP["Agent Loop\n(rag_system/agent/loop.py)"]
+        BE --> IDX["Indexing Pipeline\n(pipelines/indexing_pipeline.py)"]
+
+        LOOP --> RP["Retrieval Pipeline\n(pipelines/retrieval_pipeline.py)"]
+        LOOP --> VER["Verifier (Grounding Check)"]
+        RP --> RET["Retrievers\nBM25 | Dense | Hybrid"]
+        RP --> RER["AI Reranker"]
+        RP --> SYNT["Answer Synthesiser"]
+    end
+
+    subgraph Storage
+        LDB[("LanceDB Vector Tables")]
+        SQL[("SQLite – chat & metadata")]
+    end
+
+    subgraph Models
+        OLLAMA["Ollama Server\n(qwen3, etc.)"]
+        HF["HuggingFace Hosted\nEmbedding/Reranker Models"]
+    end
+
+    %% data edges
+    IDX -->|chunks & embeddings| LDB
+    RET -->|vector search| LDB
+    LOOP -->|LLM calls| OLLAMA
+    RP -->|LLM calls| OLLAMA
+    VER -->|LLM calls| OLLAMA
+    RP -->|rerank| HF
+
+    BE -->|CRUD| SQL
+```
+
+---
+
+### Data-flow Narrative
+1. **User** interacts with the Next.js UI; messages are posted via `src/lib/api.ts`.
+2. **backend/server.py** receives JSON over HTTP, applies CORS, and proxies the request into `rag_system`.
+3. **Agent Loop** decides (via _Triage_) whether to perform Retrieval-Augmented Generation (RAG) or direct LLM answering.
+4. If RAG is chosen:
+   1. **Retrieval Pipeline** fetches candidates from **LanceDB** using BM25 + dense vectors.
+   2. **AI Reranker** (HF model) sorts snippets.
+   3. **Answer Synthesiser** calls **Ollama** to write the final answer.
+5. Answers can be **Verified** for grounding (optional flag).
+6. Index-building is an offline path triggered from the UI — PDF/📄 files are chunked, embedded and stored in LanceDB.
+
+---
+
+## 2. Component Documents
+The table below links to deep-dives for each major component.
+
+| **Component** | **Documentation** |
+|---------------|-------------------|
+| Agent Loop | [`system_overview.md`](system_overview.md) |
+| Indexing Pipeline | [`indexing_pipeline.md`](indexing_pipeline.md) |
+| Retrieval Pipeline | [`retrieval_pipeline.md`](retrieval_pipeline.md) |
+| Verifier | [`verifier.md`](verifier.md) |
+| Triage System | [`triage_system.md`](triage_system.md) |
+
+---
+
+> **Change-management**: whenever architecture changes (new micro-service, different DB, etc.) update this overview diagram first, then individual component docs. 
--- a/Documentation/deployment_guide.md
+++ b/Documentation/deployment_guide.md
@ -0,0 +1,598 @@
+# 🚀 RAG System Deployment Guide
+
+_Last updated: 2025-01-07_
+
+This guide provides comprehensive instructions for deploying the RAG system using both Docker and direct development approaches.
+
+---
+
+## 🎯 Deployment Options
+
+### Option 1: Docker Deployment (Production) 🐳
+- **Best for**: Production environments, containerized deployments, scaling
+- **Pros**: Isolated, reproducible, easy to manage
+- **Cons**: Slightly more complex setup, resource overhead
+
+### Option 2: Direct Development (Development) 💻
+- **Best for**: Development, debugging, customization
+- **Pros**: Direct access to code, faster iteration, easier debugging
+- **Cons**: More dependencies to manage
+
+---
+
+## 1. Prerequisites
+
+### 1.1 System Requirements
+
+#### **Minimum Requirements**
+- **CPU**: 4 cores, 2.5GHz+
+- **RAM**: 8GB (16GB recommended)
+- **Storage**: 50GB free space
+- **OS**: Linux, macOS, or Windows with WSL2
+
+#### **Recommended Requirements**
+- **CPU**: 8+ cores, 3.0GHz+
+- **RAM**: 32GB+ (for large models)
+- **Storage**: 200GB+ SSD
+- **GPU**: NVIDIA GPU with 8GB+ VRAM (optional, for acceleration)
+
+### 1.2 Common Dependencies
+
+**Both deployment methods require:**
+```bash
+# Ollama (required for both approaches)
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# Git for cloning
+git 2.30+
+```
+
+### 1.3 Docker-Specific Dependencies
+
+**For Docker deployment:**
+```bash
+# Docker & Docker Compose
+Docker Engine 24.0+
+Docker Compose 2.20+
+```
+
+### 1.4 Direct Development Dependencies
+
+**For direct development:**
+```bash
+# Python & Node.js
+Python 3.8+
+Node.js 16+
+npm 8+
+```
+
+---
+
+## 2. 🐳 Docker Deployment
+
+### 2.1 Installation
+
+#### **Step 1: Install Docker**
+
+**Ubuntu/Debian:**
+```bash
+# Install Docker
+curl -fsSL https://get.docker.com -o get-docker.sh
+sudo sh get-docker.sh
+sudo usermod -aG docker $USER
+newgrp docker
+
+# Install Docker Compose V2
+sudo apt-get update
+sudo apt-get install docker-compose-plugin
+```
+
+**macOS:**
+```bash
+# Install Docker Desktop
+brew install --cask docker
+# Or download from: https://www.docker.com/products/docker-desktop
+```
+
+**Windows:**
+```bash
+# Install Docker Desktop with WSL2 backend
+# Download from: https://www.docker.com/products/docker-desktop
+```
+
+#### **Step 2: Clone Repository**
+```bash
+git clone https://github.com/your-org/rag-system.git
+cd rag-system
+```
+
+#### **Step 3: Install Ollama**
+```bash
+# Install Ollama (runs locally even with Docker)
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# Start Ollama
+ollama serve
+
+# In another terminal, install models
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+```
+
+#### **Step 4: Launch Docker System**
+```bash
+# Start all containers using the convenience script
+./start-docker.sh
+
+# Or manually:
+docker compose --env-file docker.env up --build -d
+```
+
+#### **Step 5: Verify Deployment**
+```bash
+# Check container status
+docker compose ps
+
+# Test all endpoints
+curl http://localhost:3000      # Frontend
+curl http://localhost:8000/health  # Backend
+curl http://localhost:8001/models  # RAG API
+curl http://localhost:11434/api/tags  # Ollama
+```
+
+### 2.2 Docker Management
+
+#### **Container Operations**
+```bash
+# Start system
+./start-docker.sh
+
+# Stop system
+./start-docker.sh stop
+
+# View logs
+./start-docker.sh logs
+
+# Check status
+./start-docker.sh status
+
+# Manual Docker Compose commands
+docker compose ps                    # Check status
+docker compose logs -f              # Follow logs
+docker compose down                 # Stop all containers
+docker compose up --build -d        # Rebuild and restart
+```
+
+#### **Individual Container Management**
+```bash
+# Restart specific service
+docker compose restart rag-api
+
+# View specific service logs
+docker compose logs -f backend
+
+# Execute commands in container
+docker compose exec rag-api python -c "print('Hello')"
+```
+
+---
+
+## 3. 💻 Direct Development
+
+### 3.1 Installation
+
+#### **Step 1: Install Dependencies**
+
+**Python Dependencies:**
+```bash
+# Clone repository
+git clone https://github.com/your-org/rag-system.git
+cd rag-system
+
+# Create virtual environment (recommended)
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+
+# Install Python packages
+pip install -r requirements.txt
+```
+
+**Node.js Dependencies:**
+```bash
+# Install Node.js dependencies
+npm install
+```
+
+#### **Step 2: Install and Configure Ollama**
+```bash
+# Install Ollama
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# Start Ollama
+ollama serve
+
+# In another terminal, install models
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+```
+
+#### **Step 3: Launch System**
+
+**Option A: Integrated Launcher (Recommended)**
+```bash
+# Start all components with one command
+python run_system.py
+```
+
+**Option B: Manual Component Startup**
+```bash
+# Terminal 1: RAG API
+python -m rag_system.api_server
+
+# Terminal 2: Backend
+cd backend && python server.py
+
+# Terminal 3: Frontend
+npm run dev
+
+# Access at http://localhost:3000
+```
+
+#### **Step 4: Verify Installation**
+```bash
+# Check system health
+python system_health_check.py
+
+# Test endpoints
+curl http://localhost:3000      # Frontend
+curl http://localhost:8000/health  # Backend
+curl http://localhost:8001/models  # RAG API
+```
+
+### 3.2 Direct Development Management
+
+#### **System Operations**
+```bash
+# Start system
+python run_system.py
+
+# Check system health
+python system_health_check.py
+
+# Stop system
+# Press Ctrl+C in terminal running run_system.py
+```
+
+#### **Individual Component Management**
+```bash
+# Start components individually
+python -m rag_system.api_server    # RAG API on port 8001
+cd backend && python server.py     # Backend on port 8000
+npm run dev                         # Frontend on port 3000
+
+# Development tools
+npm run build                       # Build frontend for production
+pip install -r requirements.txt --upgrade  # Update Python packages
+```
+
+---
+
+## 4. Architecture Comparison
+
+### 4.1 Docker Architecture
+
+```mermaid
+graph TB
+    subgraph "Docker Containers"
+        Frontend[Frontend Container<br/>Next.js<br/>Port 3000]
+        Backend[Backend Container<br/>Python API<br/>Port 8000]
+        RAG[RAG API Container<br/>Document Processing<br/>Port 8001]
+    end
+    
+    subgraph "Local System"
+        Ollama[Ollama Server<br/>Port 11434]
+    end
+    
+    Frontend --> Backend
+    Backend --> RAG
+    RAG --> Ollama
+```
+
+### 4.2 Direct Development Architecture
+
+```mermaid
+graph TB
+    subgraph "Local Processes"
+        Frontend[Next.js Dev Server<br/>Port 3000]
+        Backend[Python Backend<br/>Port 8000]
+        RAG[RAG API<br/>Port 8001]
+        Ollama[Ollama Server<br/>Port 11434]
+    end
+    
+    Frontend --> Backend
+    Backend --> RAG
+    RAG --> Ollama
+```
+
+---
+
+## 5. Configuration
+
+### 5.1 Environment Variables
+
+#### **Docker Configuration (`docker.env`)**
+```bash
+# Ollama Configuration
+OLLAMA_HOST=http://host.docker.internal:11434
+
+# Service Configuration
+NODE_ENV=production
+RAG_API_URL=http://rag-api:8001
+NEXT_PUBLIC_API_URL=http://localhost:8000
+```
+
+#### **Direct Development Configuration**
+```bash
+# Environment variables are set automatically by run_system.py
+# Override in environment if needed:
+export OLLAMA_HOST=http://localhost:11434
+export RAG_API_URL=http://localhost:8001
+```
+
+### 5.2 Model Configuration
+
+#### **Default Models**
+```python
+# Embedding Models
+EMBEDDING_MODELS = [
+    "Qwen/Qwen3-Embedding-0.6B",  # Fast, 1024 dimensions
+    "Qwen/Qwen3-Embedding-4B",    # High quality, 2048 dimensions
+]
+
+# Generation Models  
+GENERATION_MODELS = [
+    "qwen3:0.6b",  # Fast responses
+    "qwen3:8b",    # High quality
+]
+```
+
+### 5.3 Performance Tuning
+
+#### **Memory Settings**
+```bash
+# For Docker: Increase memory allocation
+# Docker Desktop → Settings → Resources → Memory → 16GB+
+
+# For Direct Development: Monitor with
+htop  # or top on macOS
+```
+
+#### **Model Settings**
+```python
+# Batch sizes (adjust based on available RAM)
+EMBEDDING_BATCH_SIZE = 50   # Reduce if OOM
+ENRICHMENT_BATCH_SIZE = 25  # Reduce if OOM
+
+# Chunk settings
+CHUNK_SIZE = 512           # Text chunk size
+CHUNK_OVERLAP = 64         # Overlap between chunks
+```
+
+---
+
+## 6. Operational Procedures
+
+### 6.1 System Monitoring
+
+#### **Health Checks**
+```bash
+# Comprehensive system check
+curl -f http://localhost:3000 && echo "✅ Frontend OK"
+curl -f http://localhost:8000/health && echo "✅ Backend OK"
+curl -f http://localhost:8001/models && echo "✅ RAG API OK"
+curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
+```
+
+#### **Performance Monitoring**
+```bash
+# Docker monitoring
+docker stats
+
+# Direct development monitoring
+htop           # Overall system
+nvidia-smi     # GPU usage (if available)
+```
+
+### 6.2 Log Management
+
+#### **Docker Logs**
+```bash
+# All services
+docker compose logs -f
+
+# Specific service
+docker compose logs -f rag-api
+
+# Save logs to file
+docker compose logs > system.log 2>&1
+```
+
+#### **Direct Development Logs**
+```bash
+# Logs are printed to terminal
+# Redirect to file if needed:
+python run_system.py > system.log 2>&1
+```
+
+### 6.3 Backup and Restore
+
+#### **Data Backup**
+```bash
+# Create backup directory
+mkdir -p backups/$(date +%Y%m%d)
+
+# Backup databases and indexes
+cp -r backend/chat_data.db backups/$(date +%Y%m%d)/
+cp -r lancedb backups/$(date +%Y%m%d)/
+cp -r index_store backups/$(date +%Y%m%d)/
+
+# For Docker: also backup volumes
+docker compose down
+docker run --rm -v rag_system_old_ollama_data:/data -v $(pwd)/backups:/backup alpine tar czf /backup/ollama_models_$(date +%Y%m%d).tar.gz -C /data .
+```
+
+#### **Data Restore**
+```bash
+# Stop system
+./start-docker.sh stop  # Docker
+# Or Ctrl+C for direct development
+
+# Restore files
+cp -r backups/YYYYMMDD/* ./
+
+# Restart system
+./start-docker.sh  # Docker
+python run_system.py  # Direct development
+```
+
+---
+
+## 7. Troubleshooting
+
+### 7.1 Common Issues
+
+#### **Port Conflicts**
+```bash
+# Check what's using ports
+lsof -i :3000 -i :8000 -i :8001 -i :11434
+
+# For Docker: Stop conflicting containers
+./start-docker.sh stop
+
+# For Direct: Kill processes
+pkill -f "npm run dev"
+pkill -f "server.py"
+pkill -f "api_server"
+```
+
+#### **Docker Issues**
+```bash
+# Docker daemon not running
+docker version  # Check if daemon responds
+
+# Restart Docker Desktop (macOS/Windows)
+# Or restart docker service (Linux)
+sudo systemctl restart docker
+
+# Clear Docker cache
+docker system prune -f
+```
+
+#### **Ollama Issues**
+```bash
+# Check Ollama status
+curl http://localhost:11434/api/tags
+
+# Restart Ollama
+pkill ollama
+ollama serve
+
+# Reinstall models
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+```
+
+### 7.2 Performance Issues
+
+#### **Memory Problems**
+```bash
+# Check memory usage
+free -h           # Linux
+vm_stat           # macOS
+docker stats      # Docker containers
+
+# Solutions:
+# 1. Increase system RAM
+# 2. Reduce batch sizes in configuration
+# 3. Use smaller models (qwen3:0.6b instead of qwen3:8b)
+```
+
+#### **Slow Response Times**
+```bash
+# Check model loading
+curl http://localhost:11434/api/tags
+
+# Monitor component response times
+time curl http://localhost:8001/models
+
+# Solutions:
+# 1. Use SSD storage
+# 2. Increase CPU cores
+# 3. Use GPU acceleration (if available)
+```
+
+---
+
+## 8. Production Considerations
+
+### 8.1 Security
+
+#### **Network Security**
+```bash
+# Use reverse proxy (nginx/traefik) for production
+# Enable HTTPS/TLS
+# Restrict port access with firewall
+```
+
+#### **Data Security**
+```bash
+# Enable authentication in production
+# Encrypt sensitive data
+# Regular security updates
+```
+
+### 8.2 Scaling
+
+#### **Horizontal Scaling**
+```bash
+# Use Docker Swarm or Kubernetes
+# Load balance frontend and backend
+# Scale RAG API instances based on load
+```
+
+#### **Resource Optimization**
+```bash
+# Use dedicated GPU nodes for AI workloads
+# Implement model caching
+# Optimize batch processing
+```
+
+---
+
+## 9. Success Criteria
+
+### 9.1 Deployment Verification
+
+Your deployment is successful when:
+
+- ✅ All health checks pass
+- ✅ Frontend loads at http://localhost:3000
+- ✅ You can create document indexes
+- ✅ You can chat with uploaded documents
+- ✅ No error messages in logs
+
+### 9.2 Performance Benchmarks
+
+**Acceptable Performance:**
+- Index creation: < 2 minutes per 100MB document
+- Query response: < 30 seconds for complex questions
+- Memory usage: < 8GB total system memory
+
+**Optimal Performance:**
+- Index creation: < 1 minute per 100MB document  
+- Query response: < 10 seconds for complex questions
+- Memory usage: < 16GB total system memory
+
+---
+
+**Happy Deploying! 🚀** 
--- a/Documentation/docker_usage.md
+++ b/Documentation/docker_usage.md
@ -0,0 +1,543 @@
+# 🐳 Docker Usage Guide - RAG System
+
+_Last updated: 2025-01-07_
+
+This guide provides practical Docker commands and procedures for running the RAG system in containerized environments with local Ollama.
+
+---
+
+## 📋 Prerequisites
+
+### Required Setup
+- Docker Desktop installed and running
+- Ollama installed locally (even for Docker deployment)
+- 8GB+ RAM available
+
+### Architecture Overview
+```
+┌─────────────────────────────────────┐
+│           Docker Containers        │
+├─────────────────────────────────────┤
+│ Frontend (Port 3000)               │
+│ Backend (Port 8000)                │
+│ RAG API (Port 8001)                │
+└─────────────────────────────────────┘
+            │
+            ▼
+┌─────────────────────────────────────┐
+│         Local System               │
+├─────────────────────────────────────┤
+│ Ollama Server (Port 11434)         │
+└─────────────────────────────────────┘
+```
+
+---
+
+## 1. Quick Start Commands
+
+### Step 1: Clone and Setup
+
+```bash
+# Clone repository
+git clone <your-repository-url>
+cd rag_system_old
+
+# Verify Docker is running
+docker version
+```
+
+### Step 2: Install and Configure Ollama (Required)
+
+**⚠️ Important**: Even with Docker, Ollama must be installed locally for optimal performance.
+
+```bash
+# Install Ollama
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# Start Ollama (in one terminal)
+ollama serve
+
+# Install required models (in another terminal)
+ollama pull qwen3:0.6b      # Fast model (650MB)
+ollama pull qwen3:8b        # High-quality model (4.7GB)
+
+# Verify models are installed
+ollama list
+
+# Test Ollama connection
+curl http://localhost:11434/api/tags
+```
+
+### Step 3: Start Docker Containers
+
+```bash
+# Start all containers
+./start-docker.sh
+
+# Stop all containers
+./start-docker.sh stop
+
+# View logs
+./start-docker.sh logs
+
+# Check status
+./start-docker.sh status
+
+# Restart containers
+./start-docker.sh stop
+./start-docker.sh
+```
+
+### 1.2 Service Access
+
+Once running, access the system at:
+- **Frontend**: http://localhost:3000
+- **Backend API**: http://localhost:8000  
+- **RAG API**: http://localhost:8001
+- **Ollama**: http://localhost:11434
+
+---
+
+## 2. Container Management
+
+### 2.1 Using the Convenience Script
+
+```bash
+# Start all containers
+./start-docker.sh
+
+# Stop all containers
+./start-docker.sh stop
+
+# View logs
+./start-docker.sh logs
+
+# Check status
+./start-docker.sh status
+
+# Restart containers
+./start-docker.sh stop
+./start-docker.sh
+```
+
+### 2.2 Manual Docker Compose Commands
+
+```bash
+# Start all services
+docker compose --env-file docker.env up --build -d
+
+# Check status
+docker compose ps
+
+# View logs
+docker compose logs -f
+
+# Stop all services
+docker compose down
+
+# Force rebuild
+docker compose build --no-cache
+docker compose up --build -d
+```
+
+### 2.3 Individual Service Management
+
+```bash
+# Start specific service
+docker compose up -d frontend
+docker compose up -d backend
+docker compose up -d rag-api
+
+# Restart specific service
+docker compose restart rag-api
+
+# Stop specific service
+docker compose stop backend
+
+# View specific service logs
+docker compose logs -f rag-api
+```
+
+---
+
+## 3. Development Workflow
+
+### 3.1 Code Changes
+
+```bash
+# After frontend changes
+docker compose restart frontend
+
+# After backend changes  
+docker compose restart backend
+
+# After RAG system changes
+docker compose restart rag-api
+
+# Rebuild after dependency changes
+docker compose build --no-cache rag-api
+docker compose up -d rag-api
+```
+
+### 3.2 Debugging Containers
+
+```bash
+# Access container shell
+docker compose exec frontend sh
+docker compose exec backend bash
+docker compose exec rag-api bash
+
+# Run commands in container
+docker compose exec rag-api python -c "from rag_system.main import get_agent; print('✅ RAG System OK')"
+docker compose exec backend curl http://localhost:8000/health
+
+# Check environment variables
+docker compose exec rag-api env | grep OLLAMA
+```
+
+### 3.3 Development vs Production
+
+```bash
+# Development mode (if docker-compose.dev.yml exists)
+docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d
+
+# Production mode (default)
+docker compose --env-file docker.env up -d
+```
+
+---
+
+## 4. Logging & Monitoring
+
+### 4.1 Log Management
+
+```bash
+# View all logs
+docker compose logs
+
+# View specific service logs
+docker compose logs frontend
+docker compose logs backend
+docker compose logs rag-api
+
+# Follow logs in real-time
+docker compose logs -f
+
+# View last N lines
+docker compose logs --tail=100
+
+# View logs with timestamps
+docker compose logs -t
+
+# Save logs to file
+docker compose logs > system.log 2>&1
+
+# View logs since specific time
+docker compose logs --since=2h
+docker compose logs --since=2025-01-01T00:00:00
+```
+
+### 4.2 System Monitoring
+
+```bash
+# Monitor resource usage
+docker stats
+
+# Monitor specific containers
+docker stats rag-frontend rag-backend rag-api
+
+# Check container health
+docker compose ps
+
+# System information
+docker system info
+docker system df
+```
+
+---
+
+## 5. Ollama Integration
+
+### 5.1 Ollama Setup
+
+```bash
+# Install Ollama (one-time setup)
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# Start Ollama server
+ollama serve
+
+# Check Ollama status
+curl http://localhost:11434/api/tags
+
+# Install models
+ollama pull qwen3:0.6b      # Fast model
+ollama pull qwen3:8b        # High-quality model
+
+# List installed models
+ollama list
+```
+
+### 5.2 Ollama Management
+
+```bash
+# Check model status from container
+docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
+
+# Test Ollama connection
+curl -X POST http://localhost:11434/api/generate \
+  -H "Content-Type: application/json" \
+  -d '{"model": "qwen3:0.6b", "prompt": "Hello", "stream": false}'
+
+# Monitor Ollama logs (if running with logs)
+# Ollama logs appear in the terminal where you ran 'ollama serve'
+```
+
+### 5.3 Model Management
+
+```bash
+# Update models
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+
+# Remove unused models
+ollama rm old-model-name
+
+# Check model information
+ollama show qwen3:0.6b
+```
+
+---
+
+## 6. Data Management
+
+### 6.1 Volume Management
+
+```bash
+# List volumes
+docker volume ls
+
+# View volume usage
+docker system df -v
+
+# Backup volumes
+docker run --rm -v rag_system_old_lancedb:/data -v $(pwd)/backup:/backup alpine tar czf /backup/lancedb_backup.tar.gz -C /data .
+
+# Clean unused volumes
+docker volume prune
+```
+
+### 6.2 Database Management
+
+```bash
+# Access SQLite database
+docker compose exec backend sqlite3 /app/backend/chat_data.db
+
+# Backup database
+cp backend/chat_data.db backup/chat_data_$(date +%Y%m%d).db
+
+# Check LanceDB tables from container
+docker compose exec rag-api python -c "
+import lancedb
+db = lancedb.connect('/app/lancedb')
+print('Tables:', db.table_names())
+"
+```
+
+### 6.3 File Management
+
+```bash
+# Access shared files
+docker compose exec rag-api ls -la /app/shared_uploads
+
+# Copy files to/from containers
+docker cp local_file.pdf rag-api:/app/shared_uploads/
+docker cp rag-api:/app/shared_uploads/file.pdf ./local_file.pdf
+
+# Check disk usage
+docker compose exec rag-api df -h
+```
+
+---
+
+## 7. Troubleshooting
+
+### 7.1 Common Issues
+
+#### Container Won't Start
+```bash
+# Check Docker daemon
+docker version
+
+# Check for port conflicts
+lsof -i :3000 -i :8000 -i :8001
+
+# Check container logs
+docker compose logs [service-name]
+
+# Restart Docker Desktop
+# macOS/Windows: Restart Docker Desktop
+# Linux: sudo systemctl restart docker
+```
+
+#### Ollama Connection Issues
+```bash
+# Check Ollama is running
+curl http://localhost:11434/api/tags
+
+# Restart Ollama
+pkill ollama
+ollama serve
+
+# Check from container
+docker compose exec rag-api curl http://host.docker.internal:11434/api/tags
+```
+
+#### Performance Issues
+```bash
+# Check resource usage
+docker stats
+
+# Increase Docker memory (Docker Desktop Settings)
+# Recommended: 8GB+ for Docker
+
+# Check container health
+docker compose ps
+```
+
+### 7.2 Reset and Clean
+
+```bash
+# Stop everything
+./start-docker.sh stop
+
+# Clean containers and images
+docker system prune -a
+
+# Clean volumes (⚠️ deletes data)
+docker volume prune
+
+# Complete reset (⚠️ deletes everything)
+docker compose down -v
+docker system prune -a --volumes
+```
+
+### 7.3 Health Checks
+
+```bash
+# Comprehensive health check
+curl -f http://localhost:3000 && echo "✅ Frontend OK"
+curl -f http://localhost:8000/health && echo "✅ Backend OK"
+curl -f http://localhost:8001/models && echo "✅ RAG API OK"
+curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
+
+# Check all container status
+docker compose ps
+
+# Test model loading
+docker compose exec rag-api python -c "
+from rag_system.main import get_agent
+agent = get_agent('default')
+print('✅ RAG System initialized successfully')
+"
+```
+
+---
+
+## 8. Advanced Usage
+
+### 8.1 Production Deployment
+
+```bash
+# Use production environment
+export NODE_ENV=production
+
+# Start with resource limits
+docker compose --env-file docker.env up -d
+
+# Enable automatic restarts
+docker update --restart unless-stopped $(docker ps -q)
+```
+
+### 8.2 Scaling
+
+```bash
+# Scale specific services
+docker compose up -d --scale backend=2 --scale rag-api=2
+
+# Use Docker Swarm for clustering
+docker swarm init
+docker stack deploy -c docker-compose.yml rag-system
+```
+
+### 8.3 Security
+
+```bash
+# Scan images for vulnerabilities
+docker scout cves rag-frontend
+docker scout cves rag-backend
+docker scout cves rag-api
+
+# Update base images
+docker compose build --no-cache --pull
+```
+
+---
+
+## 9. Configuration
+
+### 9.1 Environment Variables
+
+The system uses `docker.env` for configuration:
+
+```bash
+# Ollama configuration
+OLLAMA_HOST=http://host.docker.internal:11434
+
+# Service configuration
+NODE_ENV=production
+RAG_API_URL=http://rag-api:8001
+NEXT_PUBLIC_API_URL=http://localhost:8000
+```
+
+### 9.2 Custom Configuration
+
+```bash
+# Create custom environment file
+cp docker.env docker.custom.env
+
+# Edit custom configuration
+nano docker.custom.env
+
+# Use custom configuration
+docker compose --env-file docker.custom.env up -d
+```
+
+---
+
+## 10. Success Checklist
+
+Your Docker deployment is successful when:
+
+- ✅ All containers are running: `docker compose ps`
+- ✅ Ollama is accessible: `curl http://localhost:11434/api/tags`
+- ✅ Frontend loads: `curl http://localhost:3000`
+- ✅ Backend responds: `curl http://localhost:8000/health`
+- ✅ RAG API works: `curl http://localhost:8001/models`
+- ✅ You can create indexes and chat with documents
+
+### Performance Expectations
+
+**Acceptable Performance:**
+- Container startup: < 2 minutes
+- Memory usage: < 4GB Docker containers + Ollama
+- Response time: < 30 seconds for complex queries
+
+**Optimal Performance:**
+- Container startup: < 1 minute  
+- Memory usage: < 2GB Docker containers + Ollama
+- Response time: < 10 seconds for complex queries
+
+---
+
+**Happy Containerizing! 🐳** 
--- a/Documentation/improvement_plan.md
+++ b/Documentation/improvement_plan.md
@ -0,0 +1,87 @@
+# RAG System – Improvement Road-map
+
+_Revision: 2025-07-05_
+
+This document captures high-impact enhancements identified during the July 2025 code-review.  Items are grouped by theme and include a short rationale plus suggested implementation notes.  **No code has been changed – this file is planning only.**
+
+---
+
+## 1. Retrieval Accuracy & Speed
+
+| ID | Item | Rationale | Notes |
+|----|------|-----------|-------|
+| 1.1 | Late-chunk result merging | Returned snippets can be single late-chunks → fragmented. | After retrieval, gather sibling chunks (±1) and concatenate before reranking / display. |
+| 1.2 | Tiered retrieval (ANN pre-filter) | Large indexes → LanceDB full scan can be slow. | Use in-memory FAISS/HNSW to narrow to top-N, then exact LanceDB search. |
+| 1.3 | Dynamic fusion weights | Different corpora favour dense vs BM25 differently. | Learn weight on small validation set; store in index `metadata`. |
+| 1.4 | Query expansion via KG | Use extracted entities to enrich queries. | Requires Graph-RAG path clean-up first. |
+
+## 2. Routing / Triage
+
+| ID | Item | Rationale |
+|----|------|-----------|
+| 2.1 | Embed + cache document overviews | LLM router costs tokens; cosine-similarity pre-check is cheaper. |
+| 2.2 | Session-level routing memo | Avoid repeated LLM triage for follow-up queries. |
+| 2.3 | Remove legacy pattern rules | Simplifies maintenance once overview & ML routing mature. |
+
+## 3. Indexing Pipeline
+
+| ID | Item | Rationale |
+|----|------|-----------|
+| 3.1 | Parallel document conversion | PDF→MD + chunking is serial today; speed gains possible. |
+| 3.2 | Incremental indexing | Re-embedding whole corpus wastes time. |
+| 3.3 | Auto GPU dtype selection | Use FP16 on CUDA / MPS for memory and speed. |
+| 3.4 | Post-build health check | Catch broken indexes (dim mismatch etc.) early. |
+
+## 4. Embedding Model Management
+
+* **Registry file** mapping tag → dims/source/license.  UI & backend validate against it.
+* **Embedder pool** caches loaded HF/Ollama weights per model to save RAM.
+
+## 5. Database & Storage
+
+* LanceDB table GC for orphaned tables.
+* Scheduled SQLite `VACUUM` when fragmentation > X %.
+
+## 6. Observability & Ops
+
+* JSON structured logging.
+* `/metrics` endpoint for Prometheus.
+* Deep health-probe (`/health/deep`) exercising end-to-end query.
+
+## 7. Front-end UX
+
+* SSE-driven progress bar for indexing.
+* Matched-term highlighting in retrieved snippets.
+* Preset buttons (Fast / Balanced / High-Recall) for retrieval settings.
+
+## 8. Testing & CI
+
+* Replace deleted BM25 tests with LanceDB hybrid tests.
+* Integration test: build → query → assert ≥1 doc.
+* GitHub Action that spins up Ollama, pulls small embedding model, runs smoke test.
+
+## 9. Codebase Hygiene
+
+* Graph-RAG integration (currently disabled, can be implemented if needed).
+* Consolidate duplicate config keys (`embedding_model_name`, etc.).
+* Run `mypy --strict`, pylint, and black in CI.
+
+---
+
+### 🧹 System Cleanup (Priority: **HIGH**)
+Reduce complexity and improve maintainability.
+
+* **✅ COMPLETED**: Remove experimental DSPy integration and unused modules (35+ files removed)  
+* **✅ COMPLETED**: Clean up duplicate or obsolete documentation files
+* **✅ COMPLETED**: Remove unused import statements and dependencies  
+* **✅ COMPLETED**: Consolidate similar configuration files
+* **✅ COMPLETED**: Remove broken or non-functional ReAct agent implementation
+
+### Priority Matrix (suggested order)
+
+1.  **Critical reliability**: 3.4, 5.1, 9.2
+2.  **User-visible wins**: 1.1, 7.1, 7.2
+3.  **Performance**: 1.2, 3.1, 3.3
+4.  **Long-term maintainability**: 2.3, 9.1, 9.3
+
+Feel free to rearrange based on team objectives and resource availability. 
--- a/Documentation/indexing_pipeline.md
+++ b/Documentation/indexing_pipeline.md
@ -0,0 +1,665 @@
+# 🗂️ Indexing Pipeline
+
+_Implementation entry-point: `rag_system/pipelines/indexing_pipeline.py` + helpers in `indexing/` & `ingestion/`._
+
+## Overview
+Transforms raw documents (PDF, TXT, etc.) into search-ready **chunks** with embeddings, storing them in LanceDB and generating auxiliary assets (overviews, context summaries).
+
+## High-Level Diagram
+```mermaid
+flowchart TD
+    A["Uploaded Files"] --> B{Converter}
+    B -->|PDF→text| C["Plain Text"]
+    C --> D{Chunker}
+    D -->|docling| D1[DocLing Chunking]
+    D -->|latechunk| D2[Late Chunking]
+    D -->|standard| D3[Fixed-size]
+    D1 & D2 & D3 --> E["Contextual Enricher"]
+    E -->|local ctx summary| F["Embedding Generator"]
+    F -->|vectors| G[(LanceDB Table)]
+    E --> H["Overview Builder"]
+    H -->|JSONL| OVR[[`index_store/overviews/<idx>.jsonl`]]
+```
+
+## Steps in Detail
+| Step | Module | Key Classes | Notes |
+|------|--------|------------|-------|
+| Conversion | `ingestion/pdf_converter.py` | `PDFConverter` | Uses `Docling` library to extract text with structure preservation. |
+| Chunking | `ingestion/chunking.py`, `indexing/latechunk.py`, `ingestion/docling_chunker.py` | `MarkdownRecursiveChunker`, `DoclingChunker` | Controlled by flags `latechunk`, `doclingChunk`, `chunkSize`, `chunkOverlap`. |
+| Contextual Enrichment | `indexing/contextualizer.py` | `ContextualEnricher` | Generates per-chunk summaries (LLM call). |
+| Embedding | `indexing/embedders.py`, `indexing/representations.py` | `QwenEmbedder`, `EmbeddingGenerator` | Batch size tunable (`batchSizeEmbed`). Uses Qwen3-Embedding models. |
+| LanceDB Ingest | `index_store/lancedb/…` | – | Each index has a dedicated table `text_pages_<index_id>`. |
+| Overview | `indexing/overview_builder.py` | `OverviewBuilder` | First-N chunks summarised for triage routing. |
+
+### Control Flow (Code)
+1. **backend/server.py → handle_build_index()** collects files + opts and POSTs to `/index` endpoint on advanced RAG API (local process).
+2. **indexing_pipeline.IndexingPipeline.run()** orchestrates conversion → chunking → enrichment → embedding → storage.
+3. Metadata (chunk_size, models, etc.) stored in SQLite `indexes` table.
+
+## Configuration Flags
+| Flag | Description | Default |
+|------|-------------|---------|
+| `latechunk` | Merge k adjacent sibling chunks at query time | false |
+| `doclingChunk` | Use DocLing structural chunking | false |
+| `chunkSize` / `chunkOverlap` | Standard fixed slicing | 512 / 64 |
+| `enableEnrich` | Run contextual summaries | true |
+| `embeddingModel` | Override embedder | `Qwen/Qwen3-Embedding-0.6B` |
+| `overviewModel` | Model used in `OverviewBuilder` | `qwen3:0.6b` |
+| `batchSizeEmbed / Enrich` | Batch sizes | 50 / 25 |
+
+## Error Handling
+* Duplicate LanceDB table ➟ now idempotent (commit `af99b38`).
+* Failed PDF parse ➟ chunker skips file, logs warning.
+
+## Extension Ideas
+* Add OCR layer before PDF conversion.
+* Store embeddings in Remote LanceDB instance (update URL in config).
+
+## Detailed Implementation Analysis
+
+### Pipeline Architecture Pattern
+The `IndexingPipeline` uses a **sequential processing pattern** with parallel batch operations. Each stage processes all documents before moving to the next stage, enabling efficient memory usage and progress tracking.
+
+```python
+def run(self, file_paths: List[str]):
+    with timer("Complete Indexing Pipeline"):
+        # Stage 1: Document Processing & Chunking
+        all_chunks = []
+        doc_chunks_map = {}
+        
+        # Stage 2: Contextual Enrichment (optional)
+        if self.contextual_enricher:
+            all_chunks = self.contextual_enricher.enrich_batch(all_chunks)
+        
+        # Stage 3: Dense Indexing (embedding + storage)
+        if self.vector_indexer:
+            self.vector_indexer.index_chunks(all_chunks, table_name)
+        
+        # Stage 4: Graph Extraction (optional)
+        if self.graph_extractor:
+            self.graph_extractor.extract_and_store(all_chunks)
+```
+
+### Document Processing Deep-Dive
+
+#### PDF Conversion Strategy
+```python
+# PDFConverter uses Docling for robust text extraction with structure
+def convert_to_markdown(self, file_path: str) -> List[Tuple[str, Dict, Any]]:
+    # Quick heuristic: if PDF has text layer, skip OCR for speed
+    use_ocr = not self._pdf_has_text(file_path)
+    converter = self.converter_ocr if use_ocr else self.converter_no_ocr
+    
+    result = converter.convert(file_path)
+    markdown_content = result.document.export_to_markdown()
+    
+    metadata = {"source": file_path}
+    # Return DoclingDocument object for advanced chunkers
+    return [(markdown_content, metadata, result.document)]
+```
+
+**Benefits**:
+- Preserves document structure (headings, lists, tables)
+- Automatic OCR fallback for image-based PDFs
+- Maintains page-level metadata for source attribution
+- Structured output supports advanced chunking strategies
+
+#### Chunking Strategy Selection
+```python
+# Dynamic chunker selection based on config
+chunker_mode = config.get("chunker_mode", "legacy")
+
+if chunker_mode == "docling":
+    self.chunker = DoclingChunker(
+        max_tokens=chunk_size,
+        overlap=overlap_sentences,
+        tokenizer_model="Qwen/Qwen3-Embedding-0.6B"
+    )
+else:
+    self.chunker = MarkdownRecursiveChunker(
+        max_chunk_size=chunk_size,
+        min_chunk_size=min(chunk_overlap, chunk_size // 4)
+    )
+```
+
+#### Recursive Markdown Chunking Algorithm
+```python
+def chunk(self, text: str, document_id: str, metadata: Dict) -> List[Dict]:
+    # Priority hierarchy for splitting
+    separators = [
+        "\n\n# ",      # H1 headers (highest priority)
+        "\n\n## ",     # H2 headers
+        "\n\n### ",    # H3 headers
+        "\n\n",        # Paragraph breaks
+        "\n",          # Line breaks
+        ". ",          # Sentence boundaries
+        " "            # Word boundaries (last resort)
+    ]
+    
+    chunks = []
+    current_chunk = ""
+    
+    for separator in separators:
+        if len(current_chunk) <= self.max_chunk_size:
+            continue
+            
+        # Split on current separator
+        parts = current_chunk.split(separator)
+        
+        # Reassemble with overlap
+        for i, part in enumerate(parts):
+            if len(part) > self.max_chunk_size:
+                # Recursively split large parts
+                continue
+            
+            # Add overlap from previous chunk
+            if i > 0 and len(chunks) > 0:
+                overlap_text = chunks[-1]["text"][-self.chunk_overlap:]
+                part = overlap_text + separator + part
+            
+            chunks.append({
+                "text": part,
+                "document_id": document_id,
+                "metadata": {**metadata, "chunk_index": len(chunks)}
+            })
+```
+
+### DocLing Chunking Implementation
+
+#### Token-Aware Sentence Packing
+```python
+class DoclingChunker:
+    def __init__(self, max_tokens: int = 512, overlap: int = 1, 
+                 tokenizer_model: str = "Qwen/Qwen3-Embedding-0.6B"):
+        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_model)
+        self.max_tokens = max_tokens
+        self.overlap = overlap  # sentences of overlap
+    
+    def split_markdown(self, markdown: str, document_id: str, metadata: Dict):
+        sentences = self._sentence_split(markdown)
+        chunks = []
+        window = []
+        
+        while sentences:
+            # Add sentences until token limit
+            while (sentences and 
+                   self._token_len(" ".join(window + [sentences[0]])) <= self.max_tokens):
+                window.append(sentences.pop(0))
+            
+            if not window:  # Single sentence > limit
+                window.append(sentences.pop(0))
+            
+            # Create chunk
+            chunk_text = " ".join(window)
+            chunks.append({
+                "chunk_id": f"{document_id}_{len(chunks)}",
+                "text": chunk_text,
+                "metadata": {
+                    **metadata,
+                    "chunk_index": len(chunks),
+                    "heading_path": metadata.get("heading_path", []),
+                    "block_type": metadata.get("block_type", "paragraph")
+                }
+            })
+            
+            # Add overlap for next chunk
+            if self.overlap and sentences:
+                overlap_sentences = window[-self.overlap:]
+                sentences = overlap_sentences + sentences
+            window = []
+        
+        return chunks
+```
+
+#### Document Structure Preservation
+```python
+def chunk_document(self, doc, document_id: str, metadata: Dict):
+    """Walk DoclingDocument tree and emit structured chunks."""
+    chunks = []
+    current_heading_path = []
+    buffer = []
+    
+    # Process document elements in reading order
+    for txt_item in doc.texts:
+        role = getattr(txt_item, "role", None)
+        
+        if role == "heading":
+            self._flush_buffer(buffer, chunks, current_heading_path)
+            level = getattr(txt_item, "level", 1)
+            # Update heading hierarchy
+            current_heading_path = current_heading_path[:level-1]
+            current_heading_path.append(txt_item.text.strip())
+            continue
+        
+        # Accumulate text in token-aware buffer
+        text_piece = txt_item.text
+        if self._buffer_would_exceed_limit(buffer, text_piece):
+            self._flush_buffer(buffer, chunks, current_heading_path)
+        
+        buffer.append(text_piece)
+    
+    self._flush_buffer(buffer, chunks, current_heading_path)
+    return chunks
+```
+
+### Contextual Enrichment Implementation
+
+#### Batch Processing Pattern
+```python
+class ContextualEnricher:
+    def enrich_batch(self, chunks: List[Dict]) -> List[Dict]:
+        enriched_chunks = []
+        
+        # Process in batches to manage memory
+        for i in range(0, len(chunks), self.batch_size):
+            batch = chunks[i:i + self.batch_size]
+            
+            # Parallel enrichment within batch
+            with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+                futures = [
+                    executor.submit(self._enrich_single_chunk, chunk)
+                    for chunk in batch
+                ]
+                
+                for future in concurrent.futures.as_completed(futures):
+                    enriched_chunks.append(future.result())
+        
+        return enriched_chunks
+```
+
+#### Contextual Prompt Engineering
+```python
+def _generate_context_summary(self, chunk_text: str, surrounding_context: str) -> str:
+    prompt = f"""
+    Analyze this text chunk and provide a concise summary that captures:
+    1. Main topics and key information
+    2. Context within the broader document
+    3. Relevance for search and retrieval
+    
+    Document Context:
+    {surrounding_context}
+    
+    Chunk to Analyze:
+    {chunk_text}
+    
+    Summary (max 2 sentences):
+    """
+    
+    response = self.llm_client.complete(
+        prompt=prompt,
+        model=self.ollama_config["enrichment_model"]  # qwen3:0.6b
+    )
+    
+    return response.strip()
+```
+
+### Embedding Generation Pipeline
+
+#### Model Selection Strategy
+```python
+def select_embedder(model_name: str, ollama_host: str = None):
+    """Select appropriate embedder based on model name."""
+    if "Qwen3-Embedding" in model_name:
+        return QwenEmbedder(model_name=model_name)
+    elif "bge-" in model_name:
+        return BGEEmbedder(model_name=model_name)
+    elif ollama_host and model_name in ["nomic-embed-text"]:
+        return OllamaEmbedder(model_name=model_name, host=ollama_host)
+    else:
+        # Default to Qwen embedder
+        return QwenEmbedder(model_name="Qwen/Qwen3-Embedding-0.6B")
+```
+
+#### Batch Embedding Generation
+```python
+class QwenEmbedder:
+    def create_embeddings(self, texts: List[str]) -> np.ndarray:
+        """Generate embeddings in batches for efficiency."""
+        embeddings = []
+        
+        for i in range(0, len(texts), self.batch_size):
+            batch = texts[i:i + self.batch_size]
+            
+            # Tokenize and encode
+            inputs = self.tokenizer(
+                batch, 
+                padding=True, 
+                truncation=True, 
+                max_length=512,
+                return_tensors='pt'
+            )
+            
+            with torch.no_grad():
+                outputs = self.model(**inputs)
+                # Mean pooling over token embeddings
+                batch_embeddings = outputs.last_hidden_state.mean(dim=1)
+                embeddings.append(batch_embeddings.cpu().numpy())
+        
+        return np.vstack(embeddings)
+```
+
+### LanceDB Storage Implementation
+
+#### Table Management Strategy
+```python
+class LanceDBManager:
+    def create_table_if_not_exists(self, table_name: str, schema: Schema):
+        """Create LanceDB table with proper schema."""
+        try:
+            table = self.db.open_table(table_name)
+            print(f"Table {table_name} already exists")
+            return table
+        except FileNotFoundError:
+            # Table doesn't exist, create it
+            table = self.db.create_table(
+                table_name,
+                schema=schema,
+                mode="create"
+            )
+            print(f"Created new table: {table_name}")
+            return table
+    
+    def index_chunks(self, chunks: List[Dict], table_name: str):
+        """Store chunks with embeddings in LanceDB."""
+        table = self.get_table(table_name)
+        
+        # Prepare data for insertion
+        records = []
+        for chunk in chunks:
+            record = {
+                "chunk_id": chunk["chunk_id"],
+                "text": chunk["text"],
+                "vector": chunk["embedding"].tolist(),
+                "metadata": json.dumps(chunk["metadata"]),
+                "document_id": chunk["metadata"]["document_id"],
+                "chunk_index": chunk["metadata"]["chunk_index"]
+            }
+            records.append(record)
+        
+        # Batch insert
+        table.add(records)
+        
+        # Create vector index for fast similarity search
+        table.create_index("vector", config=IvfPq(num_partitions=256))
+```
+
+### Overview Building for Query Routing
+
+#### Document Summarization Strategy
+```python
+class OverviewBuilder:
+    def build_overview(self, chunks: List[Dict], document_id: str) -> Dict:
+        """Generate document overview for query routing."""
+        # Take first N chunks for overview (usually most important)
+        sample_chunks = chunks[:self.max_chunks_for_overview]
+        combined_text = "\n\n".join([c["text"] for c in sample_chunks])
+        
+        overview_prompt = f"""
+        Analyze this document and create a brief overview that includes:
+        1. Main topic and purpose
+        2. Key themes and concepts
+        3. Document type and domain
+        4. Relevant search keywords
+        
+        Document text:
+        {combined_text}
+        
+        Overview (max 3 sentences):
+        """
+        
+        overview = self.llm_client.complete(
+            prompt=overview_prompt,
+            model=self.overview_model  # qwen3:0.6b for speed
+        )
+        
+        return {
+            "document_id": document_id,
+            "overview": overview.strip(),
+            "chunk_count": len(chunks),
+            "keywords": self._extract_keywords(combined_text),
+            "created_at": datetime.now().isoformat()
+        }
+    
+    def save_overview(self, overview: Dict):
+        """Save overview to JSONL file for query routing."""
+        overview_path = f"./index_store/overviews/{overview['document_id']}.jsonl"
+        
+        with open(overview_path, 'w') as f:
+            json.dump(overview, f)
+```
+
+### Performance Optimizations
+
+#### Memory Management
+```python
+class IndexingPipeline:
+    def __init__(self, config: Dict, ollama_client: OllamaClient, ollama_config: Dict):
+        # Lazy initialization to save memory
+        self._pdf_converter = None
+        self._chunker = None
+        self._embedder = None
+        
+    def _get_embedder(self):
+        """Lazy load embedder to avoid memory overhead."""
+        if self._embedder is None:
+            model_name = self.config.get("embedding_model_name", "Qwen/Qwen3-Embedding-0.6B")
+            self._embedder = select_embedder(model_name)
+        return self._embedder
+    
+    def process_document_batch(self, file_paths: List[str]):
+        """Process documents in batches to manage memory."""
+        for batch_start in range(0, len(file_paths), self.batch_size):
+            batch = file_paths[batch_start:batch_start + self.batch_size]
+            
+            # Process batch
+            self._process_batch(batch)
+            
+            # Cleanup to free memory
+            if hasattr(self, '_embedder') and self._embedder:
+                self._embedder.cleanup()
+```
+
+#### Parallel Processing
+```python
+def run_parallel_processing(self, file_paths: List[str]):
+    """Process multiple documents in parallel."""
+    with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
+        futures = []
+        
+        for file_path in file_paths:
+            future = executor.submit(self._process_single_file, file_path)
+            futures.append(future)
+        
+        # Collect results
+        results = []
+        for future in concurrent.futures.as_completed(futures):
+            try:
+                result = future.result(timeout=300)  # 5 minute timeout
+                results.append(result)
+            except Exception as e:
+                print(f"Error processing file: {e}")
+        
+        return results
+```
+
+### Error Handling and Recovery
+
+#### Graceful Degradation
+```python
+def run(self, file_paths: List[str], table_name: str):
+    """Main pipeline with comprehensive error handling."""
+    processed_files = []
+    failed_files = []
+    
+    for file_path in file_paths:
+        try:
+            # Attempt processing
+            chunks = self._process_single_file(file_path)
+            
+            if chunks:
+                # Store successfully processed chunks
+                self._store_chunks(chunks, table_name)
+                processed_files.append(file_path)
+            else:
+                print(f"⚠️ No chunks generated from {file_path}")
+                failed_files.append((file_path, "No chunks generated"))
+                
+        except Exception as e:
+            print(f"❌ Error processing {file_path}: {e}")
+            failed_files.append((file_path, str(e)))
+            continue  # Continue with other files
+    
+    # Return summary
+    return {
+        "processed": len(processed_files),
+        "failed": len(failed_files),
+        "processed_files": processed_files,
+        "failed_files": failed_files
+    }
+```
+
+#### Recovery Mechanisms
+```python
+def recover_from_partial_failure(self, table_name: str, document_id: str):
+    """Recover from partial indexing failures."""
+    try:
+        # Check what was already processed
+        table = self.db_manager.get_table(table_name)
+        existing_chunks = table.search().where(f"document_id = '{document_id}'").to_list()
+        
+        if existing_chunks:
+            print(f"Found {len(existing_chunks)} existing chunks for {document_id}")
+            return True
+            
+        # Cleanup partial data
+        self._cleanup_partial_data(table_name, document_id)
+        return False
+        
+    except Exception as e:
+        print(f"Recovery failed: {e}")
+        return False
+```
+
+### Configuration and Customization
+
+#### Pipeline Configuration Options
+```python
+DEFAULT_CONFIG = {
+    "chunking": {
+        "strategy": "docling",  # "docling", "recursive", "fixed"
+        "max_tokens": 512,
+        "overlap": 64,
+        "min_chunk_size": 100
+    },
+    "embedding": {
+        "model_name": "Qwen/Qwen3-Embedding-0.6B",
+        "batch_size": 32,
+        "max_length": 512
+    },
+    "enrichment": {
+        "enabled": True,
+        "model": "qwen3:0.6b",
+        "batch_size": 16
+    },
+    "overview": {
+        "enabled": True,
+        "max_chunks": 5,
+        "model": "qwen3:0.6b"
+    },
+    "storage": {
+        "create_index": True,
+        "index_type": "IvfPq",
+        "num_partitions": 256
+    }
+}
+```
+
+#### Custom Processing Hooks
+```python
+class IndexingPipeline:
+    def __init__(self, config: Dict, hooks: Dict = None):
+        self.hooks = hooks or {}
+    
+    def _run_hook(self, hook_name: str, *args, **kwargs):
+        """Execute custom processing hooks."""
+        if hook_name in self.hooks:
+            return self.hooks[hook_name](*args, **kwargs)
+        return None
+    
+    def process_chunk(self, chunk: Dict) -> Dict:
+        """Process single chunk with custom hooks."""
+        # Pre-processing hook
+        chunk = self._run_hook("pre_chunk_process", chunk) or chunk
+        
+        # Standard processing
+        if self.contextual_enricher:
+            chunk = self.contextual_enricher.enrich_chunk(chunk)
+        
+        # Post-processing hook
+        chunk = self._run_hook("post_chunk_process", chunk) or chunk
+        
+        return chunk
+```
+
+---
+
+## Current Implementation Status
+
+### Completed Features ✅
+- DocLing-based PDF processing with OCR fallback
+- Multiple chunking strategies (DocLing, Recursive, Fixed-size)
+- Qwen3-Embedding-0.6B integration
+- Contextual enrichment with qwen3:0.6b
+- LanceDB storage with vector indexing
+- Overview generation for query routing
+- Batch processing and parallel execution
+- Comprehensive error handling
+
+### In Development 🚧
+- Graph extraction and knowledge graph building
+- Multimodal processing for images and tables
+- Advanced late-chunking optimization
+- Distributed processing support
+
+### Planned Features 📋
+- Custom model fine-tuning pipeline
+- Real-time incremental indexing
+- Cross-document relationship extraction
+- Advanced metadata enrichment
+
+---
+
+## Performance Benchmarks
+
+| Document Type | Processing Speed | Memory Usage | Storage Efficiency |
+|---------------|------------------|--------------|-------------------|
+| Text PDFs | 2-5 pages/sec | 2-4GB | 1MB/100 pages |
+| Image PDFs | 0.5-1 page/sec | 4-8GB | 2MB/100 pages |
+| Technical Docs | 1-3 pages/sec | 3-6GB | 1.5MB/100 pages |
+| Research Papers | 2-4 pages/sec | 2-4GB | 1.2MB/100 pages |
+
+## Extension Points
+
+### Custom Chunkers
+```python
+class CustomChunker(BaseChunker):
+    def chunk(self, text: str, document_id: str, metadata: Dict) -> List[Dict]:
+        # Implement custom chunking logic
+        pass
+```
+
+### Custom Embedders
+```python
+class CustomEmbedder(BaseEmbedder):
+    def create_embeddings(self, texts: List[str]) -> np.ndarray:
+        # Implement custom embedding generation
+        pass
+```
+
+### Custom Enrichers
+```python
+class CustomEnricher(BaseEnricher):
+    def enrich_chunk(self, chunk: Dict) -> Dict:
+        # Implement custom enrichment logic
+        pass
+``` 
--- a/Documentation/installation_guide.md
+++ b/Documentation/installation_guide.md
@ -0,0 +1,542 @@
+# 📦 RAG System Installation Guide
+
+_Last updated: 2025-01-07_
+
+This guide provides step-by-step instructions for installing and setting up the RAG system using either Docker or direct development approaches.
+
+---
+
+## 🎯 Installation Options
+
+### Option 1: Docker Deployment (Production Ready) 🐳
+- **Best for**: Production environments, isolated setups, easy management
+- **Requirements**: Docker Desktop + Local Ollama
+- **Setup time**: ~10 minutes
+
+### Option 2: Direct Development (Developer Friendly) 💻
+- **Best for**: Development, customization, debugging
+- **Requirements**: Python + Node.js + Ollama
+- **Setup time**: ~15 minutes
+
+---
+
+## 1. Prerequisites
+
+### 1.1 System Requirements
+
+#### **Minimum Requirements**
+- **CPU**: 4 cores, 2.5GHz+
+- **RAM**: 8GB (16GB recommended)
+- **Storage**: 50GB free space
+- **OS**: macOS 10.15+, Ubuntu 20.04+, Windows 10+
+
+#### **Recommended Requirements**
+- **CPU**: 8+ cores, 3.0GHz+
+- **RAM**: 32GB+ (for large models)
+- **Storage**: 200GB+ SSD
+- **GPU**: NVIDIA GPU with 8GB+ VRAM (optional)
+
+### 1.2 Common Dependencies
+
+**Required for both approaches:**
+- **Ollama**: AI model runtime (always required)
+- **Git**: 2.30+ for cloning repository
+
+**Docker-specific:**
+- **Docker Desktop**: 24.0+ with Docker Compose
+
+**Direct Development-specific:**
+- **Python**: 3.8+ 
+- **Node.js**: 16+ with npm
+
+---
+
+## 2. Ollama Installation (Required for Both)
+
+### 2.1 Install Ollama
+
+#### **macOS/Linux:**
+```bash
+# Install Ollama
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# Verify installation
+ollama --version
+```
+
+#### **Windows:**
+```bash
+# Download from: https://ollama.ai/download
+# Run the installer and follow setup wizard
+```
+
+### 2.2 Configure Ollama
+
+```bash
+# Start Ollama server
+ollama serve
+
+# In another terminal, install required models
+ollama pull qwen3:0.6b      # Fast model (650MB)
+ollama pull qwen3:8b        # High-quality model (4.7GB)
+
+# Verify models are installed
+ollama list
+
+# Test Ollama
+ollama run qwen3:0.6b "Hello, how are you?"
+```
+
+**⚠️ Important**: Keep Ollama running (`ollama serve`) for the entire setup process.
+
+---
+
+## 3. 🐳 Docker Installation & Setup
+
+### 3.1 Install Docker
+
+#### **macOS:**
+```bash
+# Install Docker Desktop via Homebrew
+brew install --cask docker
+
+# Or download from: https://www.docker.com/products/docker-desktop/
+# Start Docker Desktop from Applications
+
+# Verify installation
+docker --version
+docker compose version
+```
+
+#### **Ubuntu/Debian:**
+```bash
+# Update system
+sudo apt-get update
+
+# Install Docker using convenience script
+curl -fsSL https://get.docker.com -o get-docker.sh
+sudo sh get-docker.sh
+
+# Add user to docker group
+sudo usermod -aG docker $USER
+newgrp docker
+
+# Install Docker Compose V2
+sudo apt-get install docker-compose-plugin
+
+# Verify installation
+docker --version
+docker compose version
+```
+
+#### **Windows:**
+1. Download Docker Desktop from https://www.docker.com/products/docker-desktop/
+2. Run installer and enable WSL 2 integration
+3. Restart computer and start Docker Desktop
+4. Verify in PowerShell: `docker --version`
+
+### 3.2 Clone and Setup RAG System
+
+```bash
+# Clone repository
+git clone <your-repository-url>
+cd rag_system_old
+
+# Verify Ollama is running
+curl http://localhost:11434/api/tags
+
+# Start Docker containers
+./start-docker.sh
+
+# Wait for containers to start (2-3 minutes)
+sleep 120
+
+# Verify deployment
+./start-docker.sh status
+```
+
+### 3.3 Test Docker Deployment
+
+```bash
+# Test all endpoints
+curl -f http://localhost:3000 && echo "✅ Frontend OK"
+curl -f http://localhost:8000/health && echo "✅ Backend OK"
+curl -f http://localhost:8001/models && echo "✅ RAG API OK"
+curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
+
+# Access the application
+open http://localhost:3000
+```
+
+---
+
+## 4. 💻 Direct Development Setup
+
+### 4.1 Install Development Dependencies
+
+#### **Python Setup:**
+```bash
+# Clone repository
+git clone https://github.com/your-org/rag-system.git
+cd rag-system
+
+# Create virtual environment (recommended)
+python -m venv venv
+
+# Activate virtual environment
+source venv/bin/activate  # macOS/Linux
+# venv\Scripts\activate   # Windows
+
+# Install Python dependencies
+pip install -r requirements.txt
+
+# Verify Python setup
+python -c "import torch; print('✅ PyTorch OK')"
+python -c "import transformers; print('✅ Transformers OK')"
+python -c "import lancedb; print('✅ LanceDB OK')"
+```
+
+#### **Node.js Setup:**
+```bash
+# Install Node.js dependencies
+npm install
+
+# Verify Node.js setup
+node --version  # Should be 16+
+npm --version
+npm list --depth=0
+```
+
+### 4.2 Start Direct Development
+
+```bash
+# Ensure Ollama is running
+curl http://localhost:11434/api/tags
+
+# Start all components with one command
+python run_system.py
+
+# Or start components manually in separate terminals:
+# Terminal 1: python -m rag_system.api_server
+# Terminal 2: cd backend && python server.py  
+# Terminal 3: npm run dev
+```
+
+### 4.3 Test Direct Development
+
+```bash
+# Check system health
+python system_health_check.py
+
+# Test endpoints
+curl -f http://localhost:3000 && echo "✅ Frontend OK"
+curl -f http://localhost:8000/health && echo "✅ Backend OK"
+curl -f http://localhost:8001/models && echo "✅ RAG API OK"
+
+# Access the application
+open http://localhost:3000
+```
+
+---
+
+## 5. Detailed Installation Steps
+
+### 5.1 Repository Setup
+
+```bash
+# Clone repository
+git clone https://github.com/your-org/rag-system.git
+cd rag-system
+
+# Check repository structure
+ls -la
+
+# Create required directories
+mkdir -p lancedb index_store shared_uploads logs backend
+touch backend/chat_data.db
+
+# Set permissions
+chmod -R 755 lancedb index_store shared_uploads
+chmod 664 backend/chat_data.db
+```
+
+### 5.2 Configuration
+
+#### **Environment Variables**
+For Docker (automatic via `docker.env`):
+```bash
+OLLAMA_HOST=http://host.docker.internal:11434
+NODE_ENV=production
+RAG_API_URL=http://rag-api:8001
+NEXT_PUBLIC_API_URL=http://localhost:8000
+```
+
+For Direct Development (set automatically by `run_system.py`):
+```bash
+OLLAMA_HOST=http://localhost:11434
+RAG_API_URL=http://localhost:8001
+NEXT_PUBLIC_API_URL=http://localhost:8000
+```
+
+#### **Model Configuration**
+The system defaults to these models:
+- **Embedding**: `Qwen/Qwen3-Embedding-0.6B` (1024 dimensions)
+- **Generation**: `qwen3:0.6b` for fast responses, `qwen3:8b` for quality
+- **Reranking**: Built-in cross-encoder
+
+### 5.3 Database Initialization
+
+```bash
+# Initialize SQLite database
+python -c "
+from backend.database import ChatDatabase
+db = ChatDatabase()
+db.init_database()
+print('✅ Database initialized')
+"
+
+# Verify database
+sqlite3 backend/chat_data.db ".tables"
+```
+
+---
+
+## 6. Verification & Testing
+
+### 6.1 System Health Checks
+
+#### **Comprehensive Health Check:**
+```bash
+# For Docker deployment
+./start-docker.sh status
+docker compose ps
+
+# For Direct development
+python system_health_check.py
+
+# Universal health check
+curl -f http://localhost:3000 && echo "✅ Frontend OK"
+curl -f http://localhost:8000/health && echo "✅ Backend OK"
+curl -f http://localhost:8001/models && echo "✅ RAG API OK"
+curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
+```
+
+#### **RAG System Test:**
+```bash
+# Test RAG system initialization
+python -c "
+from rag_system.main import get_agent
+agent = get_agent('default')
+print('✅ RAG System initialized successfully')
+"
+
+# Test embedding generation
+python -c "
+from rag_system.main import get_agent
+agent = get_agent('default')
+embedder = agent.retrieval_pipeline._get_text_embedder()
+test_emb = embedder.create_embeddings(['Hello world'])
+print(f'✅ Embedding generated: {test_emb.shape}')
+"
+```
+
+### 6.2 Functional Testing
+
+#### **Document Upload Test:**
+1. Access http://localhost:3000
+2. Click "Create New Index"
+3. Upload a PDF document
+4. Configure settings and build index
+5. Test chat functionality
+
+#### **API Testing:**
+```bash
+# Test session creation
+curl -X POST http://localhost:8000/sessions \
+  -H "Content-Type: application/json" \
+  -d '{"title": "Test Session"}'
+
+# Test models endpoint
+curl http://localhost:8001/models
+
+# Test health endpoints
+curl http://localhost:8000/health
+curl http://localhost:8001/health
+```
+
+---
+
+## 7. Troubleshooting Installation
+
+### 7.1 Common Issues
+
+#### **Ollama Issues:**
+```bash
+# Ollama not responding
+curl http://localhost:11434/api/tags
+
+# If fails, restart Ollama
+pkill ollama
+ollama serve
+
+# Reinstall models if needed
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+```
+
+#### **Docker Issues:**
+```bash
+# Docker daemon not running
+docker version
+
+# Restart Docker Desktop (macOS/Windows)
+# Or restart docker service (Linux)
+sudo systemctl restart docker
+
+# Clear Docker cache if build fails
+docker system prune -f
+```
+
+#### **Python Issues:**
+```bash
+# Check Python version
+python --version  # Should be 3.8+
+
+# Check virtual environment
+which python
+pip list | grep torch
+
+# Reinstall dependencies
+pip install -r requirements.txt --force-reinstall
+```
+
+#### **Node.js Issues:**
+```bash
+# Check Node version
+node --version  # Should be 16+
+
+# Clear and reinstall
+rm -rf node_modules package-lock.json
+npm install
+```
+
+### 7.2 Performance Issues
+
+#### **Memory Problems:**
+```bash
+# Check system memory
+free -h  # Linux
+vm_stat  # macOS
+
+# For Docker: Increase memory allocation
+# Docker Desktop → Settings → Resources → Memory → 8GB+
+
+# Use smaller models
+ollama pull qwen3:0.6b  # Instead of qwen3:8b
+```
+
+#### **Slow Performance:**
+- Use SSD storage for databases (`lancedb/`, `shared_uploads/`)
+- Increase CPU cores if possible
+- Close unnecessary applications
+- Use smaller batch sizes in configuration
+
+---
+
+## 8. Post-Installation Setup
+
+### 8.1 Model Optimization
+
+```bash
+# Install additional models (optional)
+ollama pull nomic-embed-text        # Alternative embedding model
+ollama pull llama3.1:8b            # Alternative generation model
+
+# Test model switching
+curl -X POST http://localhost:8001/chat \
+  -H "Content-Type: application/json" \
+  -d '{"query": "Hello", "model": "qwen3:8b"}'
+```
+
+### 8.2 Security Configuration
+
+```bash
+# Set proper file permissions
+chmod 600 backend/chat_data.db    # Restrict database access
+chmod 700 lancedb/                # Restrict vector DB access
+
+# Configure firewall (production)
+sudo ufw allow 3000/tcp           # Frontend
+sudo ufw deny 8000/tcp            # Backend (internal only)
+sudo ufw deny 8001/tcp            # RAG API (internal only)
+```
+
+### 8.3 Backup Setup
+
+```bash
+# Create backup script
+cat > backup_system.sh << 'EOF'
+#!/bin/bash
+BACKUP_DIR="backups/$(date +%Y%m%d_%H%M%S)"
+mkdir -p "$BACKUP_DIR"
+
+# Backup databases and indexes
+cp -r backend/chat_data.db "$BACKUP_DIR/"
+cp -r lancedb "$BACKUP_DIR/"
+cp -r index_store "$BACKUP_DIR/"
+cp -r shared_uploads "$BACKUP_DIR/"
+
+echo "Backup completed: $BACKUP_DIR"
+EOF
+
+chmod +x backup_system.sh
+```
+
+---
+
+## 9. Success Criteria
+
+### 9.1 Installation Complete When:
+
+- ✅ All health checks pass without errors
+- ✅ Frontend loads at http://localhost:3000
+- ✅ All models are installed and responding
+- ✅ You can create document indexes
+- ✅ You can chat with uploaded documents
+- ✅ No error messages in logs/terminal
+
+### 9.2 Performance Benchmarks
+
+**Acceptable Performance:**
+- System startup: < 5 minutes
+- Index creation: < 2 minutes per 100MB document
+- Query response: < 30 seconds
+- Memory usage: < 8GB total
+
+**Optimal Performance:**
+- System startup: < 2 minutes
+- Index creation: < 1 minute per 100MB document
+- Query response: < 10 seconds
+- Memory usage: < 4GB total
+
+---
+
+## 10. Next Steps
+
+### 10.1 Getting Started
+
+1. **Upload Documents**: Create your first index with PDF documents
+2. **Explore Features**: Try different query types and models
+3. **Customize**: Adjust model settings and chunk sizes
+4. **Scale**: Add more documents and create multiple indexes
+
+### 10.2 Additional Resources
+
+- **Quick Start**: See `Documentation/quick_start.md`
+- **Docker Usage**: See `Documentation/docker_usage.md`
+- **System Architecture**: See `Documentation/architecture_overview.md`
+- **API Reference**: See `Documentation/api_reference.md`
+
+---
+
+**Congratulations! 🎉** Your RAG system is now ready to use. Visit http://localhost:3000 to start chatting with your documents. 
--- a/Documentation/prompt_inventory.md
+++ b/Documentation/prompt_inventory.md
@ -0,0 +1,70 @@
+# 📜 Prompt Inventory (Ground-Truth)
+
+_All generation / verification prompts currently hard-coded in the codebase._  
+_Last updated: 2025-07-06_
+
+> Edit process: if you change a prompt in code, please **update this file** or, once we migrate to the central registry, delete the entry here.
+
+---
+
+## 1. Indexing / Context Enrichment
+
+| ID | File & Lines | Variable / Builder | Purpose |
+|----|--------------|--------------------|---------|
+| `overview_builder.default` | `rag_system/indexing/overview_builder.py` `12-21` | `DEFAULT_PROMPT` | Generate 1-paragraph document overview for search-time routing.
+| `contextualizer.system` | `rag_system/indexing/contextualizer.py` `11` | `SYSTEM_PROMPT` | System instruction: explain summarisation role.
+| `contextualizer.local_context` | same file `13-15` | `LOCAL_CONTEXT_PROMPT_TEMPLATE` | Human message – wraps neighbouring chunks.
+| `contextualizer.chunk` | same file `17-19` | `CHUNK_PROMPT_TEMPLATE` | Human message – shows the target chunk.
+| `graph_extractor.entities` | `rag_system/indexing/graph_extractor.py` `20-31` | `entity_prompt` | Ask LLM to list entities.
+| `graph_extractor.relationships` | same file `53-64` | `relationship_prompt` | Ask LLM to list relationships.
+
+## 2. Retrieval / Query Transformation
+
+| ID | File & Lines | Purpose |
+|----|--------------|---------|
+| `query_transformer.expand` | `rag_system/retrieval/query_transformer.py` `10-26` | Produce query rewrites (keywords, boolean). |
+| `hyde.hypothetical_doc` | same `115-122` | HyDE hypothetical document generator. |
+| `graph_query.translate` | same `124-140` | Translate user question to JSON KG query. |
+
+## 3. Pipeline Answer Synthesis
+
+| ID | File & Lines | Purpose |
+|----|--------------|---------|
+| `retrieval_pipeline.synth_final` | `rag_system/pipelines/retrieval_pipeline.py` `217-256` | Turn verified facts into answer (with directives 1-6). |
+
+## 4. Agent – Classical Loop
+
+| ID | File & Lines | Purpose |
+|----|--------------|---------|
+| `agent.loop.initial_thought` | `rag_system/agent/loop.py` `157-180` | First LLM call to think about query. |
+| `agent.loop.verify_path` | same `190-205` | Secondary thought loop. |
+| `agent.loop.compose_sub` | same `506-542` | Compose answer from sub-answers. |
+| `agent.loop.router` | same `648-660` | Decide which subsystem handles query. |
+
+## 5. Verifier
+
+| ID | File & Lines | Purpose |
+|----|--------------|---------|
+| `verifier.fact_check` | `rag_system/agent/verifier.py` `18-58` | Strict JSON-format grounding verifier. |
+
+## 6. Backend Router (Fast path)
+
+| ID | File & Lines | Purpose |
+|----|--------------|---------|
+| `backend.router` | `backend/server.py` `435-448` | Decide "RAG vs direct LLM" before heavy processing. |
+
+## 7. Miscellaneous
+
+| ID | File & Lines | Purpose |
+|----|--------------|---------|
+| `vision.placeholder` | `rag_system/utils/ollama_client.py` `169` | Dummy prompt for VLM colour check. |
+
+---
+
+### Missing / To-Do
+1. Verify whether **ReActAgent.PROMPT_TEMPLATE** captures every placeholder – some earlier lines may need explicit ID when we move to central registry.
+2. Search TS/JS code once the backend prompts are ported (currently none).
+
+---
+
+**Next step:** create `rag_system/prompts/registry.yaml` and start moving each prompt above into a key–value entry with identical IDs. Update callers gradually using the helper proposed earlier. 
--- a/Documentation/quick_start.md
+++ b/Documentation/quick_start.md
@ -0,0 +1,379 @@
+# ⚡ Quick Start Guide - RAG System
+
+_Get up and running in 5 minutes!_
+
+---
+
+## 🚀 Choose Your Deployment Method
+
+### Option 1: Docker Deployment (Production Ready) 🐳
+
+Best for: Production deployments, isolated environments, easy scaling
+
+### Option 2: Direct Development (Developer Friendly) 💻  
+
+Best for: Development, customization, debugging, faster iteration
+
+---
+
+## 🐳 Docker Deployment
+
+### Prerequisites
+- Docker Desktop installed and running
+- 8GB+ RAM available
+- Internet connection
+
+### Step 1: Clone and Setup
+
+```bash
+# Clone repository
+git clone <your-repository-url>
+cd rag_system_old
+
+# Ensure Docker is running
+docker version
+```
+
+### Step 2: Install Ollama Locally
+
+**Even with Docker, Ollama runs locally for better performance:**
+
+```bash
+# Install Ollama
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# Start Ollama (in one terminal)
+ollama serve
+
+# Install models (in another terminal)
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+```
+
+### Step 3: Start Docker Containers
+
+```bash
+# Start all containers
+./start-docker.sh
+
+# Or manually:
+docker compose --env-file docker.env up --build -d
+```
+
+### Step 4: Verify Deployment
+
+```bash
+# Check container status
+docker compose ps
+
+# Test endpoints
+curl http://localhost:3000      # Frontend
+curl http://localhost:8000/health  # Backend  
+curl http://localhost:8001/models  # RAG API
+```
+
+### Step 5: Access Application
+
+Open your browser to: **http://localhost:3000**
+
+---
+
+## 💻 Direct Development
+
+### Prerequisites
+- Python 3.8+
+- Node.js 16+ and npm
+- 8GB+ RAM available
+
+### Step 1: Clone and Install Dependencies
+
+```bash
+# Clone repository
+git clone <your-repository-url>
+cd rag_system_old
+
+# Install Python dependencies
+pip install -r requirements.txt
+
+# Install Node.js dependencies  
+npm install
+```
+
+### Step 2: Install and Configure Ollama
+
+```bash
+# Install Ollama
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# Start Ollama (in one terminal)
+ollama serve
+
+# Install models (in another terminal)
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+```
+
+### Step 3: Start the System
+
+```bash
+# Start all components with one command
+python run_system.py
+```
+
+**Or start components manually in separate terminals:**
+
+```bash
+# Terminal 1: RAG API
+python -m rag_system.api_server
+
+# Terminal 2: Backend
+cd backend && python server.py
+
+# Terminal 3: Frontend
+npm run dev
+```
+
+### Step 4: Verify Installation
+
+```bash
+# Check system health
+python system_health_check.py
+
+# Test endpoints
+curl http://localhost:3000      # Frontend
+curl http://localhost:8000/health  # Backend
+curl http://localhost:8001/models  # RAG API
+```
+
+### Step 5: Access Application
+
+Open your browser to: **http://localhost:3000**
+
+---
+
+## 🎯 First Use Guide
+
+### 1. Create a Chat Session
+- Click "New Chat" in the interface
+- Give your session a descriptive name
+
+### 2. Upload Documents
+- Click "Create New Index" button
+- Upload PDF files from your computer
+- Configure processing options:
+  - **Chunk Size**: 512 (recommended)
+  - **Embedding Model**: Qwen/Qwen3-Embedding-0.6B
+  - **Enable Enrichment**: Yes
+- Click "Build Index" and wait for processing
+
+### 3. Start Chatting
+- Select your built index
+- Ask questions about your documents:
+  - "What is this document about?"
+  - "Summarize the key points"
+  - "What are the main findings?"
+  - "Compare the arguments in section 3 and 5"
+
+---
+
+## 🔧 Management Commands
+
+### Docker Commands
+
+```bash
+# Container management
+./start-docker.sh                    # Start all containers
+./start-docker.sh stop              # Stop all containers
+./start-docker.sh logs              # View logs
+./start-docker.sh status            # Check status
+
+# Manual Docker Compose
+docker compose ps                    # Check status
+docker compose logs -f              # Follow logs
+docker compose down                 # Stop containers
+docker compose up --build -d        # Rebuild and start
+```
+
+### Direct Development Commands
+
+```bash
+# System management
+python run_system.py               # Start all services
+python system_health_check.py      # Check system health
+
+# Individual components
+python -m rag_system.api_server    # RAG API only
+cd backend && python server.py     # Backend only
+npm run dev                         # Frontend only
+
+# Stop: Press Ctrl+C in terminal running services
+```
+
+---
+
+## 🆘 Quick Troubleshooting
+
+### Docker Issues
+
+**Containers not starting?**
+```bash
+# Check Docker daemon
+docker version
+
+# Restart Docker Desktop and try again
+./start-docker.sh
+```
+
+**Port conflicts?**
+```bash
+# Check what's using ports
+lsof -i :3000 -i :8000 -i :8001
+
+# Stop conflicting processes
+./start-docker.sh stop
+```
+
+### Direct Development Issues
+
+**Import errors?**
+```bash
+# Check Python installation
+python --version  # Should be 3.8+
+
+# Reinstall dependencies
+pip install -r requirements.txt --force-reinstall
+```
+
+**Node.js errors?**
+```bash
+# Check Node version
+node --version    # Should be 16+
+
+# Reinstall dependencies
+rm -rf node_modules package-lock.json
+npm install
+```
+
+### Common Issues
+
+**Ollama not responding?**
+```bash
+# Check if Ollama is running
+curl http://localhost:11434/api/tags
+
+# Restart Ollama
+pkill ollama
+ollama serve
+```
+
+**Out of memory?**
+```bash
+# Check memory usage
+docker stats  # For Docker
+htop          # For direct development
+
+# Recommended: 16GB+ RAM for optimal performance
+```
+
+---
+
+## 📊 System Verification
+
+Run this comprehensive check:
+
+```bash
+# Check all endpoints
+curl -f http://localhost:3000 && echo "✅ Frontend OK"
+curl -f http://localhost:8000/health && echo "✅ Backend OK"  
+curl -f http://localhost:8001/models && echo "✅ RAG API OK"
+curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
+
+# For Docker: Check containers
+docker compose ps
+```
+
+---
+
+## 🎉 Success!
+
+If you see:
+- ✅ All services responding
+- ✅ Frontend accessible at http://localhost:3000  
+- ✅ No error messages
+
+You're ready to start using LocalGPT!
+
+### What's Next?
+
+1. **📚 Upload Documents**: Add your PDF files to create indexes
+2. **💬 Start Chatting**: Ask questions about your documents
+3. **🔧 Customize**: Explore different models and settings
+4. **📖 Learn More**: Check the full documentation below
+
+### 📁 Key Files
+
+```
+rag-system/
+├── 🐳 start-docker.sh           # Docker deployment script
+├── 🏃 run_system.py             # Direct development launcher
+├── 🩺 system_health_check.py    # System verification
+├── 📋 requirements.txt          # Python dependencies
+├── 📦 package.json              # Node.js dependencies
+├── 📁 Documentation/            # Complete documentation
+└── 📁 rag_system/              # Core system code
+```
+
+### 📖 Additional Resources
+
+- **🏗️ Architecture**: See `Documentation/architecture_overview.md`
+- **🔧 Configuration**: See `Documentation/system_overview.md`  
+- **🚀 Deployment**: See `Documentation/deployment_guide.md`
+- **🐛 Troubleshooting**: See `DOCKER_TROUBLESHOOTING.md`
+
+---
+
+**Happy RAG-ing! 🚀** 
+
+---
+
+## 🛠️ Indexing Scripts
+
+The repository includes several convenient scripts for document indexing:
+
+### Simple Index Creation Script
+
+For quick document indexing without the UI:
+
+```bash
+# Basic usage
+./simple_create_index.sh "Index Name" "document.pdf"
+
+# Multiple documents
+./simple_create_index.sh "Research Papers" "paper1.pdf" "paper2.pdf" "notes.txt"
+
+# Using wildcards
+./simple_create_index.sh "Invoice Collection" ./invoices/*.pdf
+```
+
+**Supported file types**: PDF, TXT, DOCX, MD
+
+### Batch Indexing Script
+
+For processing large document collections:
+
+```bash
+# Using the Python batch indexing script
+python demo_batch_indexing.py
+
+# Or using the direct indexing script
+python create_index_script.py
+```
+
+These scripts automatically:
+- ✅ Check prerequisites (Ollama running, Python dependencies)
+- ✅ Validate document formats
+- ✅ Create database entries
+- ✅ Process documents with the RAG pipeline
+- ✅ Generate searchable indexes
+
+--- 
--- a/Documentation/retrieval_pipeline.md
+++ b/Documentation/retrieval_pipeline.md
@ -0,0 +1,616 @@
+# 📥 Retrieval Pipeline
+
+_Maps to `rag_system/pipelines/retrieval_pipeline.py` and helpers in `retrieval/`, `rerankers/`._
+
+## Role
+Given a **user query** and one or more indexed tables, retrieve the most relevant text chunks and synthesise an answer.
+
+## Sub-components
+| Stage | Module | Key Classes / Fns | Notes |
+|-------|--------|-------------------|-------|
+| Query Pre-processing | `retrieval/query_transformer.py` | `QueryTransformer`, `HyDEGenerator`, `GraphQueryTranslator` | Expands, rewrites, or translates the raw query. |
+| Retrieval | `retrieval/retrievers.py` | `BM25Retriever`, `DenseRetriever`, `HybridRetriever` | Abstract over LanceDB vector + FTS search. |
+| Reranking | `rerankers/reranker.py` | `ColBERTSmall`, fallback `bge-reranker` | Optionally improves result ordering. |
+| Synthesis | `pipelines/retrieval_pipeline.py` | `_synthesize_final_answer()` | Calls LLM with evidence snippets. |
+
+## End-to-End Flow
+
+```mermaid
+flowchart LR
+    Q["User Query"] --> XT["Query Transformer"]
+    XT -->|variants| RETRIEVE
+    subgraph Retrieval
+        RET_BM25[BM25] --> MERGE
+        RET_DENSE[Dense Vector] --> MERGE
+        style RET_BM25 fill:#444,stroke:#ccc,color:#fff
+        style RET_DENSE fill:#444,stroke:#ccc,color:#fff
+    end
+    MERGE --> RERANK
+    RERANK --> K[["Top-K Chunks"]]
+    K --> SYNTH["Answer Synthesiser\n(LLM)"]
+    SYNTH --> A["Answer + Sources"]
+```
+
+### Narrative
+1. **Query Transformer** may expand the query (keyword list, HyDE doc, KG translation) depending on `searchType`.
+2. **Retrievers** execute BM25 and/or dense similarity against LanceDB.  Combination controlled by `retrievalMode` and `denseWeight`.
+3. **Reranker** (if `aiRerank=true` or hybrid search) scores snippets; top `rerankerTopK` chosen.
+4. **Synthesiser** streams an LLM completion using the prompt described in `prompt_inventory.md` (`retrieval_pipeline.synth_final`).
+
+## Configuration Flags (passed from UI → backend)
+| Flag | Default | Effect |
+|------|---------|--------|
+| `searchType` | `fts` | UI label (FTS / Dense / Hybrid). |
+| `retrievalK` | 10 | Initial candidate count per retriever. |
+| `contextWindowSize` | 5 | How many adjacent chunks to merge (late-chunk). |
+| `rerankerTopK` | 20 | How many docs to pass into AI reranker. |
+| `denseWeight` | 0.5 | When `hybrid`, linear mix weight. |
+| `aiRerank` | bool | Toggle reranker. |
+| `verify` | bool | If true, pass answer to **Verifier** component. |
+
+## Interfaces
+* Reads from **LanceDB** tables `text_pages_<index>`.
+* Calls **Ollama** generation model specified in `PIPELINE_CONFIGS`.
+* Exposes `RetrievalPipeline.answer_stream()` iterator consumed by SSE API.
+
+## Extension Points
+* Plug new retriever by inheriting `BaseRetriever` and registering in `retrievers.py`.
+* Swap reranker model via `EXTERNAL_MODELS['reranker_model']`.
+* Custom answer prompt can be overridden by passing `prompt_override` to `_synthesize_final_answer()` (not yet surfaced in UI).
+
+##  Detailed Implementation Analysis
+
+### Core Architecture Pattern
+The `RetrievalPipeline` uses **lazy initialization** for all components to avoid heavy memory usage during startup. Each component (embedder, retrievers, rerankers) is only loaded when first accessed via private `_get_*()` methods.
+
+```python
+def _get_text_embedder(self):
+    if self.text_embedder is None:
+        self.text_embedder = select_embedder(
+            self.config.get("embedding_model_name", "Qwen/Qwen3-Embedding-0.6B"),
+            self.ollama_config.get("host")
+        )
+    return self.text_embedder
+```
+
+### Thread Safety Implementation
+**Critical Issue**: ColBERT reranker and model loading are not thread-safe. The system uses multiple locks:
+
+```python
+# Global locks to prevent race conditions
+_rerank_lock: Lock = Lock()           # Protects .rank() calls
+_ai_reranker_init_lock: Lock = Lock() # Prevents concurrent model loading
+_sentence_pruner_lock: Lock = Lock()  # Serializes Provence model init
+```
+
+When multiple queries run in parallel, only one thread can initialize heavy models or perform reranking operations.
+
+### Retrieval Strategy Deep-Dive
+
+#### 1. Multi-Vector Dense Retrieval (`_get_dense_retriever()`)
+```python
+self.dense_retriever = MultiVectorRetriever(
+    db_manager,           # LanceDB connection
+    text_embedder,        # Qwen3-Embedding embedder
+    vision_model=None,    # Optional multimodal
+    fusion_config={}      # Score combination rules
+)
+```
+
+**Process**:
+1. Query → embedding vector (1024D for Qwen3-Embedding-0.6B)
+2. LanceDB ANN search using IVF-PQ index
+3. Cosine similarity scoring
+4. Returns top-K with metadata
+
+#### 2. BM25 Full-Text Search (`_get_bm25_retriever()`)
+```python
+# Uses SQLite FTS5 under the hood
+SELECT chunk_id, text, bm25(fts_table) as score 
+FROM fts_table 
+WHERE fts_table MATCH ? 
+ORDER BY bm25(fts_table) 
+LIMIT ?
+```
+
+**Token Processing**:
+- Stemming via Porter algorithm
+- Stop-word removal
+- N-gram tokenization (configurable)
+
+#### 3. Hybrid Score Fusion
+When both retrievers are enabled:
+```python
+final_score = (1 - dense_weight) * bm25_score + dense_weight * dense_score
+```
+Default `dense_weight = 0.7` favors semantic over lexical matching (updated from 0.5).
+
+### Late-Chunk Merging Algorithm
+
+**Problem**: Small chunks lose context; large chunks dilute relevance.  
+**Solution**: Retrieve small chunks, then expand with neighbors.
+
+```python
+def _get_surrounding_chunks_lancedb(self, chunk, window_size):
+    start_index = max(0, chunk_index - window_size)
+    end_index = chunk_index + window_size
+    
+    sql_filter = f"document_id = '{document_id}' AND chunk_index >= {start_index} AND chunk_index <= {end_index}"
+    results = tbl.search().where(sql_filter).to_list()
+    
+    # Sort by chunk_index to maintain document order
+    return sorted(results, key=lambda x: x.get("chunk_index", 0))
+```
+
+**Benefits**:
+- Maintains granular search precision
+- Provides richer context for answer generation
+- Configurable window size (default: 5 chunks = ~2500 tokens)
+
+### AI Reranker Implementation
+
+#### ColBERT Strategy (via rerankers-lib)
+```python
+from rerankers import Reranker
+self.ai_reranker = Reranker("answerdotai/answerai-colbert-small-v1", model_type="colbert")
+
+# Usage
+scores = reranker.rank(query, [doc.text for doc in candidates])
+```
+
+**ColBERT Architecture**:
+- **Query encoding**: Each token → 128D vector
+- **Document encoding**: Each token → 128D vector  
+- **Interaction**: MaxSim between all query-doc token pairs
+- **Advantage**: Fine-grained token-level matching
+
+#### Fallback: BGE Cross-Encoder
+```python
+# When ColBERT fails/unavailable
+from sentence_transformers import CrossEncoder
+model = CrossEncoder('BAAI/bge-reranker-base')
+scores = model.predict([(query, doc.text) for doc in candidates])
+```
+
+### Answer Synthesis Pipeline
+
+#### Prompt Engineering Pattern
+```python
+def _synthesize_final_answer(self, query: str, facts: str, *, event_callback=None):
+    prompt = f"""
+You are an AI assistant specialised in answering questions from retrieved context.
+
+Context you receive
+• VERIFIED FACTS – text snippets retrieved from the user's documents.
+• ORIGINAL QUESTION – the user's actual query.
+
+Instructions
+1. Evaluate each snippet for relevance to the ORIGINAL QUESTION
+2. Synthesise an answer **using only information from relevant snippets**
+3. If snippets contradict, mention the contradiction explicitly
+4. If insufficient information: "I could not find that information in the provided documents."
+5. Provide thorough, well-structured answer with relevant numbers/names
+6. Do **not** introduce external knowledge
+
+–––––  Retrieved Snippets  –––––
+{facts}
+––––––––––––––––––––––––––––––
+
+ORIGINAL QUESTION: "{query}"
+"""
+
+    response = self.llm_client.complete_stream(
+        prompt=prompt,
+        model=self.ollama_config["generation_model"]  # qwen3:8b
+    )
+    
+    for chunk in response:
+        if event_callback:
+            event_callback({"type": "answer_chunk", "content": chunk})
+        yield chunk
+```
+
+**Advanced Features**:
+- **Source Attribution**: Automatic citation generation
+- **Confidence Scoring**: Based on retrieval scores and snippet relevance
+- **Answer Verification**: Optional grounding check via Verifier component
+
+### Query Processing and Transformation
+
+#### Query Decomposition
+```python
+class QueryDecomposer:
+    def decompose_query(self, query: str) -> List[str]:
+        """Break complex queries into simpler sub-queries."""
+        decomposition_prompt = f"""
+        Break down this complex question into 2-4 simpler sub-questions that would help answer the original question.
+        
+        Original question: {query}
+        
+        Sub-questions:
+        1.
+        2.
+        3.
+        4.
+        """
+        
+        response = self.llm_client.complete(
+            prompt=decomposition_prompt,
+            model=self.enrichment_model  # qwen3:0.6b for speed
+        )
+        
+        # Parse response into list of sub-queries
+        return self._parse_subqueries(response)
+```
+
+#### HyDE (Hypothetical Document Embeddings)
+```python
+class HyDEGenerator:
+    def generate_hypothetical_doc(self, query: str) -> str:
+        """Generate hypothetical document that would answer the query."""
+        hyde_prompt = f"""
+        Generate a hypothetical document passage that would perfectly answer this question:
+        
+        Question: {query}
+        
+        Hypothetical passage:
+        """
+        
+        response = self.llm_client.complete(
+            prompt=hyde_prompt,
+            model=self.enrichment_model
+        )
+        
+        return response.strip()
+```
+
+### Caching and Performance Optimization
+
+#### Semantic Query Caching
+```python
+class RetrievalPipeline:
+    def __init__(self, config, ollama_client, ollama_config):
+        # TTL cache for embeddings and results
+        self.query_cache = TTLCache(maxsize=100, ttl=300)  # 5 min TTL
+        self.embedding_cache = LRUCache(maxsize=500)
+        self.semantic_threshold = 0.98  # Similarity threshold for cache hits
+    
+    def get_cached_result(self, query: str, session_id: str = None) -> Optional[Dict]:
+        """Check for semantically similar cached queries."""
+        query_embedding = self._get_text_embedder().create_embeddings([query])[0]
+        
+        for cached_query, cached_data in self.query_cache.items():
+            cached_embedding = cached_data["embedding"]
+            similarity = cosine_similarity([query_embedding], [cached_embedding])[0][0]
+            
+            if similarity > self.semantic_threshold:
+                # Check session scope if configured
+                if self.cache_scope == "session" and cached_data.get("session_id") != session_id:
+                    continue
+                
+                print(f"🎯 Cache hit: {similarity:.3f} similarity")
+                return cached_data["result"]
+        
+        return None
+```
+
+#### Batch Processing Optimizations
+```python
+def process_query_batch(self, queries: List[str]) -> List[Dict]:
+    """Process multiple queries efficiently."""
+    # Batch embed all queries
+    query_embeddings = self._get_text_embedder().create_embeddings(queries)
+    
+    # Batch search
+    results = []
+    for i, query in enumerate(queries):
+        embedding = query_embeddings[i]
+        
+        # Search with pre-computed embedding
+        dense_results = self._search_dense_with_embedding(embedding)
+        bm25_results = self._search_bm25(query)
+        
+        # Combine and rerank
+        combined = self._combine_results(dense_results, bm25_results)
+        reranked = self._rerank_batch([query], [combined])[0]
+        
+        results.append(reranked)
+    
+    return results
+```
+
+### Advanced Search Features
+
+#### Conversational Context Integration
+```python
+def answer_with_history(self, query: str, conversation_history: List[Dict], **kwargs):
+    """Answer query with conversation context."""
+    # Build conversational context
+    context_prompt = self._build_conversation_context(conversation_history)
+    
+    # Expand query with context
+    expanded_query = f"{context_prompt}\n\nCurrent question: {query}"
+    
+    # Process with expanded context
+    return self.answer_stream(expanded_query, **kwargs)
+
+def _build_conversation_context(self, history: List[Dict]) -> str:
+    """Build context from conversation history."""
+    context_parts = []
+    
+    for turn in history[-3:]:  # Last 3 turns for context
+        if turn.get("role") == "user":
+            context_parts.append(f"Previous question: {turn['content']}")
+        elif turn.get("role") == "assistant":
+            # Extract key points from previous answers
+            context_parts.append(f"Previous context: {turn['content'][:200]}...")
+    
+    return "\n".join(context_parts)
+```
+
+#### Multi-Index Search
+```python
+def search_multiple_indexes(self, query: str, index_ids: List[str], **kwargs):
+    """Search across multiple document indexes."""
+    all_results = []
+    
+    for index_id in index_ids:
+        table_name = f"text_pages_{index_id}"
+        
+        try:
+            # Search individual index
+            index_results = self._search_single_index(query, table_name, **kwargs)
+            
+            # Add index metadata
+            for result in index_results:
+                result["source_index"] = index_id
+            
+            all_results.extend(index_results)
+            
+        except Exception as e:
+            print(f"⚠️ Error searching index {index_id}: {e}")
+            continue
+    
+    # Global reranking across all indexes
+    if len(all_results) > kwargs.get("retrieval_k", 20):
+        all_results = self._rerank_global(query, all_results, **kwargs)
+    
+    return all_results
+```
+
+### Error Handling and Resilience
+
+#### Graceful Degradation
+```python
+def answer_stream(self, query: str, **kwargs):
+    """Main answer method with comprehensive error handling."""
+    try:
+        # Try full pipeline
+        return self._answer_stream_full_pipeline(query, **kwargs)
+        
+    except Exception as e:
+        print(f"⚠️ Full pipeline failed: {e}")
+        
+        try:
+            # Fallback: Dense-only search
+            kwargs["search_type"] = "dense"
+            kwargs["ai_rerank"] = False
+            return self._answer_stream_fallback(query, **kwargs)
+            
+        except Exception as e2:
+            print(f"⚠️ Fallback failed: {e2}")
+            
+            # Last resort: Direct LLM answer
+            return self._direct_llm_answer(query)
+
+def _direct_llm_answer(self, query: str):
+    """Direct LLM answer as last resort."""
+    prompt = f"""
+    The document retrieval system is temporarily unavailable. 
+    Please provide a helpful response acknowledging this limitation.
+    
+    User question: {query}
+    
+    Response:
+    """
+    
+    response = self.llm_client.complete_stream(
+        prompt=prompt,
+        model=self.ollama_config["generation_model"]
+    )
+    
+    yield "⚠️ Document search unavailable. Providing general response:\n\n"
+    
+    for chunk in response:
+        yield chunk
+```
+
+#### Recovery Mechanisms
+```python
+def recover_from_embedding_failure(self, query: str, **kwargs):
+    """Recover when embedding model fails."""
+    print("🔄 Attempting embedding model recovery...")
+    
+    # Try to reinitialize embedder
+    try:
+        self.text_embedder = None  # Clear failed instance
+        embedder = self._get_text_embedder()  # Reinitialize
+        
+        # Test with simple query
+        test_embedding = embedder.create_embeddings(["test"])
+        
+        if test_embedding is not None:
+            print("✅ Embedding model recovered")
+            return True
+            
+    except Exception as e:
+        print(f"❌ Recovery failed: {e}")
+    
+    # Fallback to BM25-only search
+    kwargs["search_type"] = "bm25"
+    kwargs["ai_rerank"] = False
+    print("🔄 Falling back to keyword search only")
+    
+    return False
+```
+
+### Performance Monitoring and Metrics
+
+#### Query Performance Tracking
+```python
+class PerformanceTracker:
+    def __init__(self):
+        self.metrics = {
+            "query_count": 0,
+            "avg_response_time": 0,
+            "cache_hit_rate": 0,
+            "error_rate": 0,
+            "embedding_time": 0,
+            "retrieval_time": 0,
+            "reranking_time": 0,
+            "synthesis_time": 0
+        }
+    
+    @contextmanager
+    def track_query(self, query: str):
+        """Context manager for tracking query performance."""
+        start_time = time.time()
+        
+        try:
+            yield
+            
+            # Success metrics
+            duration = time.time() - start_time
+            self.metrics["query_count"] += 1
+            self.metrics["avg_response_time"] = (
+                (self.metrics["avg_response_time"] * (self.metrics["query_count"] - 1) + duration) 
+                / self.metrics["query_count"]
+            )
+            
+        except Exception as e:
+            # Error metrics
+            self.metrics["error_rate"] = (
+                self.metrics["error_rate"] * self.metrics["query_count"] + 1
+            ) / (self.metrics["query_count"] + 1)
+            
+            raise e
+        
+        finally:
+            self.metrics["query_count"] += 1
+```
+
+#### Resource Usage Monitoring
+```python
+def monitor_memory_usage(self):
+    """Monitor memory usage of pipeline components."""
+    import psutil
+    import gc
+    
+    process = psutil.Process()
+    memory_info = process.memory_info()
+    
+    print(f"Memory Usage: {memory_info.rss / 1024 / 1024:.1f} MB")
+    
+    # Component-specific monitoring
+    if hasattr(self, 'text_embedder') and self.text_embedder:
+        print(f"Embedder loaded: {type(self.text_embedder).__name__}")
+    
+    if hasattr(self, 'ai_reranker') and self.ai_reranker:
+        print(f"Reranker loaded: {type(self.ai_reranker).__name__}")
+    
+    # Suggest cleanup if memory usage is high
+    if memory_info.rss > 8 * 1024 * 1024 * 1024:  # 8GB
+        print("⚠️ High memory usage detected - consider cleanup")
+        gc.collect()
+```
+
+---
+
+## Configuration Reference
+
+### Default Pipeline Configuration
+```python
+RETRIEVAL_CONFIG = {
+    "retriever": "multivector",
+    "search_type": "hybrid",
+    "retrieval_k": 20,
+    "reranker_top_k": 10,
+    "dense_weight": 0.7,
+    "late_chunking": {
+        "enabled": True,
+        "window_size": 5
+    },
+    "ai_rerank": True,
+    "verify_answers": False,
+    "cache_enabled": True,
+    "cache_ttl": 300,
+    "semantic_cache_threshold": 0.98
+}
+```
+
+### Model Configuration
+```python
+MODEL_CONFIG = {
+    "embedding_model": "Qwen/Qwen3-Embedding-0.6B",
+    "generation_model": "qwen3:8b",
+    "enrichment_model": "qwen3:0.6b",
+    "reranker_model": "answerdotai/answerai-colbert-small-v1",
+    "fallback_reranker": "BAAI/bge-reranker-base"
+}
+```
+
+### Performance Tuning
+```python
+PERFORMANCE_CONFIG = {
+    "batch_sizes": {
+        "embedding": 32,
+        "reranking": 16,
+        "synthesis": 1
+    },
+    "timeouts": {
+        "embedding": 30,
+        "retrieval": 60,
+        "reranking": 30,
+        "synthesis": 120
+    },
+    "memory_limits": {
+        "max_cache_size": 1000,
+        "max_results_per_query": 100,
+        "chunk_size_limit": 2048
+    }
+}
+```
+
+## Extension Examples
+
+### Custom Retriever Implementation
+```python
+class CustomRetriever(BaseRetriever):
+    def search(self, query: str, k: int = 10) -> List[Dict]:
+        """Implement custom search logic."""
+        # Your custom retrieval implementation
+        pass
+    
+    def get_embeddings(self, texts: List[str]) -> np.ndarray:
+        """Generate embeddings for custom retrieval."""
+        # Your custom embedding logic
+        pass
+```
+
+### Custom Reranker Implementation
+```python
+class CustomReranker(BaseReranker):
+    def rank(self, query: str, documents: List[Dict]) -> List[Dict]:
+        """Implement custom reranking logic."""
+        # Your custom reranking implementation
+        pass
+```
+
+### Custom Query Transformer
+```python
+class CustomQueryTransformer:
+    def transform(self, query: str, context: Dict = None) -> str:
+        """Transform query based on context."""
+        # Your custom query transformation logic
+        pass
+``` 
--- a/Documentation/system_overview.md
+++ b/Documentation/system_overview.md
@ -0,0 +1,429 @@
+# 🏗️ RAG System - Complete System Overview
+
+_Last updated: 2025-01-09_
+
+This document provides a comprehensive overview of the Advanced Retrieval-Augmented Generation (RAG) System, covering its architecture, components, data flow, and operational characteristics.
+
+---
+
+## 1. System Architecture
+
+### 1.1 High-Level Architecture
+
+The RAG system implements a sophisticated 4-tier microservices architecture:
+
+```mermaid
+graph TB
+    subgraph "Client Layer"
+        Browser[👤 User Browser]
+        UI[Next.js Frontend<br/>React/TypeScript]
+        Browser --> UI
+    end
+    
+    subgraph "API Gateway Layer"
+        Backend[Backend Server<br/>Python HTTP Server<br/>Port 8000]
+        UI -->|REST API| Backend
+    end
+    
+    subgraph "Processing Layer"
+        RAG[RAG API Server<br/>Document Processing<br/>Port 8001]
+        Backend -->|Internal API| RAG
+    end
+    
+    subgraph "LLM Service Layer"
+        Ollama[Ollama Server<br/>LLM Inference<br/>Port 11434]
+        RAG -->|Model Calls| Ollama
+    end
+    
+    subgraph "Storage Layer"
+        SQLite[(SQLite Database<br/>Sessions & Metadata)]
+        LanceDB[(LanceDB<br/>Vector Embeddings)]
+        FileSystem[File System<br/>Documents & Indexes]
+        
+        Backend --> SQLite
+        RAG --> LanceDB
+        RAG --> FileSystem
+    end
+```
+
+### 1.2 Component Breakdown
+
+| Component | Technology | Port | Purpose |
+|-----------|------------|------|---------|
+| **Frontend** | Next.js 15, React 19, TypeScript | 3000 | User interface, chat interactions |
+| **Backend** | Python 3.11, HTTP Server | 8000 | API gateway, session management, routing |
+| **RAG API** | Python 3.11, Advanced NLP | 8001 | Document processing, retrieval, generation |
+| **Ollama** | Go-based LLM server | 11434 | Local LLM inference (embedding, generation) |
+| **SQLite** | Embedded database | - | Sessions, messages, index metadata |
+| **LanceDB** | Vector database | - | Document embeddings, similarity search |
+
+---
+
+## 2. Core Functionality
+
+### 2.1 Intelligent Dual-Layer Routing
+
+The system's key innovation is its **dual-layer routing architecture** that optimizes both speed and intelligence:
+
+#### **Layer 1: Speed Optimization Routing**
+- **Location**: `backend/server.py`
+- **Purpose**: Route simple queries to Direct LLM (~1.3s) vs complex queries to RAG Pipeline (~20s)
+- **Decision Logic**: Pattern matching, keyword detection, query complexity analysis
+
+```python
+# Example routing decisions
+"Hello!" → Direct LLM (greeting pattern)
+"What does the document say about pricing?" → RAG Pipeline (document keyword)
+"What's 2+2?" → Direct LLM (simple + short)
+"Summarize the key findings from the report" → RAG Pipeline (complex + indicators)
+```
+
+#### **Layer 2: Intelligence Optimization Routing**
+- **Location**: `rag_system/agent/loop.py`
+- **Purpose**: Within RAG pipeline, route to optimal processing method
+- **Methods**: 
+  - `direct_answer`: General knowledge queries
+  - `rag_query`: Document-specific queries requiring retrieval
+  - `graph_query`: Entity relationship queries (future feature)
+
+### 2.2 Document Processing Pipeline
+
+#### **Indexing Process**
+1. **Document Upload**: PDF files uploaded via web interface
+2. **Text Extraction**: Docling library extracts text with layout preservation
+3. **Chunking**: Intelligent chunking with configurable strategies (DocLing, Late Chunking, Standard)
+4. **Embedding**: Text converted to vector embeddings using Qwen models
+5. **Storage**: Vectors stored in LanceDB with metadata in SQLite
+
+#### **Retrieval Process**
+1. **Query Processing**: User query analyzed and contextualized
+2. **Embedding**: Query converted to vector embedding
+3. **Search**: Hybrid search combining vector similarity and BM25 keyword matching
+4. **Reranking**: AI-powered reranking for relevance optimization
+5. **Synthesis**: LLM generates final answer using retrieved context
+
+### 2.3 Advanced Features
+
+#### **Query Decomposition**
+- Complex queries automatically broken into sub-queries
+- Parallel processing of sub-queries for efficiency
+- Intelligent composition of final answers
+
+#### **Contextual Enrichment**
+- Conversation history integration
+- Context-aware query expansion
+- Session-based memory management
+
+#### **Verification System**
+- Answer verification against source documents
+- Confidence scoring and grounding checks
+- Source attribution and citation
+
+---
+
+## 3. Data Architecture
+
+### 3.1 Storage Systems
+
+#### **SQLite Database** (`backend/chat_data.db`)
+```sql
+-- Core tables
+sessions          -- Chat sessions with metadata
+messages          -- Individual messages and responses
+indexes           -- Document index metadata
+session_indexes   -- Links sessions to their indexes
+```
+
+#### **LanceDB Vector Store** (`./lancedb/`)
+```
+tables/
+├── text_pages_[uuid]     -- Document text embeddings
+├── image_pages_[uuid]    -- Image embeddings (future)
+└── metadata_[uuid]       -- Document metadata
+```
+
+#### **File System** (`./index_store/`)
+```
+index_store/
+├── overviews/           -- Document summaries for routing
+├── bm25/               -- BM25 keyword indexes
+└── graph/              -- Knowledge graph data
+```
+
+### 3.2 Data Flow
+
+1. **Document Upload** → File System (`shared_uploads/`)
+2. **Processing** → Embeddings stored in LanceDB
+3. **Metadata** → Index info stored in SQLite
+4. **Query** → Search LanceDB + SQLite coordination
+5. **Response** → Message history stored in SQLite
+
+---
+
+## 4. Model Architecture
+
+### 4.1 Configurable Model Pipeline
+
+The system supports multiple embedding and generation models with automatic switching:
+
+#### **Current Model Configuration**
+```python
+EXTERNAL_MODELS = {
+    "embedding_model": "Qwen/Qwen3-Embedding-0.6B",  # 1024D
+    "reranker_model": "answerdotai/answerai-colbert-small-v1",  # ColBERT reranker
+    "vision_model": "Qwen/Qwen-VL-Chat",  # Vision model for multimodal
+    "fallback_reranker": "BAAI/bge-reranker-base",  # Backup reranker
+}
+
+OLLAMA_CONFIG = {
+    "generation_model": "qwen3:8b",  # High-quality generation
+    "enrichment_model": "qwen3:0.6b",  # Fast enrichment/routing
+    "host": "http://localhost:11434"
+}
+```
+
+#### **Model Switching**
+- **Per-Session**: Each chat session can use different embedding models
+- **Automatic**: System automatically switches models based on index metadata
+- **Dynamic**: Models loaded just-in-time to optimize memory usage
+
+### 4.2 Supported Models
+
+#### **Embedding Models**
+- `Qwen/Qwen3-Embedding-0.6B` (1024D) - Default, fast and high-quality
+
+#### **Generation Models** (via Ollama)
+- `qwen3:8b` - Primary generation model (high quality)
+- `qwen3:0.6b` - Fast enrichment and routing model
+
+#### **Reranking Models**
+- `answerdotai/answerai-colbert-small-v1` - Primary ColBERT reranker
+- `BAAI/bge-reranker-base` - Fallback cross-encoder reranker
+
+#### **Vision Models** (Multimodal)
+- `Qwen/Qwen-VL-Chat` - Vision-language model for image processing
+
+---
+
+## 5. Pipeline Configurations
+
+### 5.1 Default Production Pipeline
+
+```python
+PIPELINE_CONFIGS = {
+    "default": {
+        "description": "Production-ready pipeline with hybrid search, AI reranking, and verification",
+        "storage": {
+            "lancedb_uri": "./lancedb",
+            "text_table_name": "text_pages_v3", 
+            "bm25_path": "./index_store/bm25",
+            "graph_path": "./index_store/graph/knowledge_graph.gml"
+        },
+        "retrieval": {
+            "retriever": "multivector",
+            "search_type": "hybrid",
+            "late_chunking": {
+                "enabled": True,
+                "table_suffix": "_lc_v3"
+            },
+            "dense": { 
+                "enabled": True,
+                "weight": 0.7
+            },
+            "bm25": { 
+                "enabled": True,
+                "index_name": "rag_bm25_index"
+            }
+        },
+        "embedding_model_name": "Qwen/Qwen3-Embedding-0.6B",
+        "reranker": {
+            "enabled": True,
+            "model_name": "answerdotai/answerai-colbert-small-v1",
+            "top_k": 20
+        }
+    }
+}
+```
+
+### 5.2 Processing Options
+
+#### **Chunking Strategies**
+- **Standard**: Fixed-size chunks with overlap
+- **DocLing**: Structure-aware chunking using DocLing library
+- **Late Chunking**: Small chunks expanded at query time
+
+#### **Enrichment Options**
+- **Contextual Enrichment**: AI-generated chunk summaries
+- **Overview Building**: Document-level summaries for routing
+- **Graph Extraction**: Entity and relationship extraction
+
+---
+
+## 6. Performance Characteristics
+
+### 6.1 Response Times
+
+| Operation | Time Range | Notes |
+|-----------|------------|-------|
+| Simple Chat | 1-3 seconds | Direct LLM, no retrieval |
+| Document Query | 5-15 seconds | Includes retrieval and reranking |
+| Complex Analysis | 15-30 seconds | Multi-step reasoning |
+| Document Indexing | 2-5 min/100MB | Depends on enrichment settings |
+
+### 6.2 Memory Usage
+
+| Component | Memory Usage | Notes |
+|-----------|--------------|-------|
+| Embedding Model | 1-2GB | Qwen3-Embedding-0.6B |
+| Generation Model | 8-16GB | qwen3:8b |
+| Reranker Model | 500MB-1GB | ColBERT reranker |
+| Database Cache | 500MB-2GB | LanceDB and SQLite |
+
+### 6.3 Scalability
+
+- **Concurrent Users**: 5-10 users with 16GB RAM
+- **Document Capacity**: 10,000+ documents per index
+- **Query Throughput**: 10-20 queries/minute per instance
+- **Storage**: Approximately 1MB per 100 pages indexed
+
+---
+
+## 7. Security & Privacy
+
+### 7.1 Data Privacy
+
+- **Local Processing**: All AI models run locally via Ollama
+- **No External Calls**: No data sent to external APIs
+- **Document Isolation**: Documents stored locally with session-based access
+- **User Isolation**: Each session maintains separate context
+
+---
+
+## 8. Configuration & Customization
+
+### 8.1 Model Configuration
+Models can be configured in `rag_system/main.py`:
+
+```python
+# Embedding model configuration
+EXTERNAL_MODELS = {
+    "embedding_model": "Qwen/Qwen3-Embedding-0.6B",  # Your preferred model
+    "reranker_model": "answerdotai/answerai-colbert-small-v1",
+}
+
+# Generation model configuration
+OLLAMA_CONFIG = {
+    "generation_model": "qwen3:8b",  # Your LLM model
+    "enrichment_model": "qwen3:0.6b",  # Your fast model
+}
+```
+
+### 8.2 Pipeline Configuration
+Processing behavior configured in `PIPELINE_CONFIGS`:
+
+```python
+PIPELINE_CONFIGS = {
+    "retrieval": {
+        "search_type": "hybrid",
+        "dense": {"weight": 0.7},
+        "bm25": {"enabled": True}
+    },
+    "chunking": {
+        "chunk_size": 512,
+        "chunk_overlap": 64,
+        "enable_latechunk": True,
+        "enable_docling": True
+    }
+}
+```
+
+### 8.3 UI Configuration
+Frontend behavior configured in environment variables:
+
+```bash
+NEXT_PUBLIC_API_URL=http://localhost:8000
+NEXT_PUBLIC_ENABLE_STREAMING=true
+NEXT_PUBLIC_MAX_FILE_SIZE=50MB
+```
+
+---
+
+## 9. Monitoring & Observability
+
+### 9.1 Logging System
+- **Structured Logging**: JSON-formatted logs with timestamps
+- **Log Levels**: DEBUG, INFO, WARNING, ERROR
+- **Log Rotation**: Automatic log file rotation
+- **Component Isolation**: Separate logs per service
+
+### 9.2 Health Monitoring
+- **Health Endpoints**: `/health` on all services
+- **Service Dependencies**: Cascading health checks
+- **Performance Metrics**: Response times, error rates
+- **Resource Monitoring**: Memory, CPU, disk usage
+
+### 9.3 Debugging Features
+- **Debug Mode**: Detailed operation tracing
+- **Query Inspection**: Step-by-step query processing
+- **Model Switching Logs**: Embedding model change tracking
+- **Error Reporting**: Comprehensive error context
+
+---
+
+## ⚙️ Configuration Modes
+
+The system supports multiple configuration modes optimized for different use cases:
+
+### **Default Mode** (`"default"`)
+- **Description**: Production-ready pipeline with full features
+- **Search**: Hybrid (dense + BM25) with 0.7 dense weight
+- **Reranking**: AI-powered ColBERT reranker
+- **Query Processing**: Query decomposition enabled
+- **Verification**: Grounding verification enabled
+- **Performance**: ~3-8 seconds per query
+- **Memory**: ~10-16GB (with models loaded)
+
+### **Fast Mode** (`"fast"`)  
+- **Description**: Speed-optimized pipeline with minimal overhead
+- **Search**: Vector-only (no BM25, no late chunking)
+- **Reranking**: Disabled
+- **Query Processing**: Single-pass, no decomposition
+- **Verification**: Disabled
+- **Performance**: ~1-3 seconds per query
+- **Memory**: ~8-12GB (with models loaded)
+
+### **BM25 Mode** (`"bm25"`)
+- **Description**: Traditional keyword-based search
+- **Search**: BM25 only
+- **Use Case**: Exact keyword matching, legacy compatibility
+
+### **Graph RAG Mode** (`"graph_rag"`)
+- **Description**: Knowledge graph integration (currently disabled)
+- **Status**: Available for future implementation
+- **Use Case**: Relationship-aware retrieval
+
+---
+
+## 10. Development & Extension
+
+### 10.1 Architecture Principles
+- **Modular Design**: Clear separation of concerns
+- **Configuration-Driven**: Behavior controlled via config files
+- **Lazy Loading**: Components loaded on-demand
+- **Thread Safety**: Proper synchronization for concurrent access
+
+### 10.2 Extension Points
+- **Custom Retrievers**: Implement `BaseRetriever` interface
+- **Custom Chunkers**: Extend chunking strategies
+- **Custom Models**: Add new embedding or generation models
+- **Custom Pipelines**: Create specialized processing workflows
+
+### 10.3 Testing Strategy
+- **Unit Tests**: Individual component testing
+- **Integration Tests**: End-to-end workflow testing
+- **Performance Tests**: Load and stress testing
+- **Health Checks**: Automated system validation
+
+---
+
+> **Note**: This overview reflects the current implementation as of 2025-01-09. For the latest changes, check the git history and individual component documentation. 
--- a/Documentation/triage_system.md
+++ b/Documentation/triage_system.md
@ -0,0 +1,60 @@
+# 🔀 Triage / Routing System
+
+_Maps to `rag_system/agent/loop.Agent._should_use_rag`, `_route_using_overviews`, and the fast-path router in `backend/server.py`._
+
+## Purpose
+Determine, for every incoming query, whether it should be answered by:
+1. **Direct LLM Generation** (no retrieval) — faster, cheaper.
+2. **Retrieval-Augmented Generation (RAG)** — when the answer likely requires document context.
+
+## Decision Signals
+| Signal | Source | Notes |
+|--------|--------|-------|
+| Keyword/regex check | `backend/server.py` (fast path) | Hard-coded quick wins (`what time`, `define`, etc.). |
+| Index presence | SQLite (session → indexes) | If no indexes linked, direct LLM. |
+| Overview routing | `_route_using_overviews()` | Uses document overviews and enrichment model to predict relevance. |
+| LLM router prompt | `agent/loop.py` lines 648-665 | Final arbitrator (Ollama call, JSON output). |
+
+## High-level Flow
+```mermaid
+flowchart TD
+    Q["Incoming Query"] --> S1{Session\nHas Indexes?}
+    S1 -- no --> LLM["Direct LLM Generation"]
+    S1 -- yes --> S2{Fast Regex\nHeuristics}
+    S2 -- match--> LLM
+    S2 -- no --> S3{Overview\nRelevance > τ?}
+    S3 -- low --> LLM
+    S3 -- high --> S4[LLM Router\n(prompt @648)]
+    S4 -- "route: RAG" --> RAG["Retrieval Pipeline"]
+    S4 -- "route: DIRECT" --> LLM
+```
+
+## Detailed Sequence (Code-level)
+1. **backend/server.py**
+   * `handle_session_chat()` builds `router_prompt` (line ~435) and makes a **first pass** decision before calling the heavy agent code.
+2. **agent.loop._should_use_rag()**
+   * Re-evaluates using richer features (e.g., token count, query type).
+3. **Overviews Phase** (`_route_using_overviews()`)
+   * Loads JSONL overviews file per index.
+   * Calls enrichment model (`qwen3:0.6b`) with prompt: _"Does this overview mention … ? "_ → returns yes/no.
+4. **LLM Router** (prompt lines 648-665)
+   * JSON-only response `{ "route": "RAG" | "DIRECT" }`.
+
+## Interfaces & Dependencies
+| Component | Calls / Data |
+|-----------|--------------|
+| SQLite `chat_sessions` | Reads `indexes` column to know linked index IDs. |
+| LanceDB Overviews | Reads `index_store/overviews/<idx>.jsonl`. |
+| `OllamaClient` | Generates LLM router decision. |
+
+## Config Flags
+* `PIPELINE_CONFIGS.triage.enabled` – global toggle.
+* Env var `TRIAGE_OVERVIEW_THRESHOLD` – min similarity score to prefer RAG (default 0.35).
+
+## Failure / Fallback Modes
+1. If overview file missing → skip to LLM router.
+2. If LLM router errors → default to RAG (safer) but log warning.
+
+---
+
+_Keep this document updated whenever routing heuristics, thresholds, or prompt wording change._ 
--- a/Documentation/verifier.md
+++ b/Documentation/verifier.md
@ -0,0 +1,49 @@
+# ✅ Answer Verifier
+
+_File: `rag_system/agent/verifier.py`_
+
+## Objective
+Assess whether an answer produced by RAG is **grounded** in the retrieved context snippets.
+
+## Prompt (see `prompt_inventory.md` `verifier.fact_check`)
+Strict JSON schema:
+```jsonc
+{
+  "verdict": "SUPPORTED" | "NOT_SUPPORTED" | "NEEDS_CLARIFICATION",
+  "is_grounded": true | false,
+  "reasoning": "< ≤30 words >",
+  "confidence_score": 0-100
+}
+```
+
+## Sequence Diagram
+```mermaid
+sequenceDiagram
+    participant RP as Retrieval Pipeline
+    participant V as Verifier
+    participant LLM as Ollama
+
+    RP->>V: query, context, answer
+    V->>LLM: verification prompt
+    LLM-->>V: JSON verdict
+    V-->>RP: VerificationResult
+```
+
+## Usage Sites
+| Caller | Code | When |
+|--------|------|------|
+| `RetrievalPipeline.answer_stream()` | `pipelines/retrieval_pipeline.py` | If `verify=true` flag from frontend. |
+| `Agent.loop.run()` | fallback path | Experimental for composed answers. |
+
+## Config
+| Flag | Default | Meaning |
+|------|---------|---------|
+| `verify` | false | Frontend toggle; if true verifier runs. |
+| `generation_model` | `qwen3:8b` | Same model as answer generation.
+
+## Failure Modes
+* If LLM returns invalid JSON → parse exception handled, result = NOT_SUPPORTED.
+* If verification call times out → pipeline logs but still returns answer (unverified).
+
+---
+_Keep updated when schema or usage flags change._ 
--- a/201
+++ b/201
@ -1,201 +0,0 @@
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
-   1. Definitions.
-
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-
-   END OF TERMS AND CONDITIONS
-
-   APPENDIX: How to apply the Apache License to your work.
-
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-
-   Copyright [yyyy] [name of copyright owner]
-
-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--- a/README.md
+++ b/README.md
@ -1,346 +1,688 @@
-# LocalGPT: Secure, Local Conversations with Your Documents 🌐
+# LocalGPT - Private Document Intelligence Platform

-<p align="center">
-<a href="https://trendshift.io/repositories/2947" target="_blank"><img src="https://trendshift.io/api/badge/repositories/2947" alt="PromtEngineer%2FlocalGPT | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
-</p>
+<div align="center">

-[![GitHub Stars](https://img.shields.io/github/stars/PromtEngineer/localGPT?style=social)](https://github.com/PromtEngineer/localGPT/stargazers)
-[![GitHub Forks](https://img.shields.io/github/forks/PromtEngineer/localGPT?style=social)](https://github.com/PromtEngineer/localGPT/network/members)
-[![GitHub Issues](https://img.shields.io/github/issues/PromtEngineer/localGPT)](https://github.com/PromtEngineer/localGPT/issues)
-[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/PromtEngineer/localGPT)](https://github.com/PromtEngineer/localGPT/pulls)
-[![License](https://img.shields.io/github/license/PromtEngineer/localGPT)](https://github.com/PromtEngineer/localGPT/blob/main/LICENSE)
+![LocalGPT Logo](https://img.shields.io/badge/LocalGPT-Private%20AI-blue?style=for-the-badge)

-🚨🚨 You can run localGPT on a pre-configured [Virtual Machine](https://bit.ly/localGPT). Make sure to use the code: PromptEngineering to get 50% off. I will get a small commision!
+**Transform your documents into intelligent, searchable knowledge with complete privacy**

-**LocalGPT** is an open-source initiative that allows you to converse with your documents without compromising your privacy. With everything running locally, you can be assured that no data ever leaves your computer. Dive into the world of secure, local document interactions with LocalGPT.
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
+[![Docker](https://img.shields.io/badge/docker-supported-blue.svg)](https://www.docker.com/)

-## Features 🌟
- **Utmost Privacy**: Your data remains on your computer, ensuring 100% security.
- **Versatile Model Support**: Seamlessly integrate a variety of open-source models, including HF, GPTQ, GGML, and GGUF.
- **Diverse Embeddings**: Choose from a range of open-source embeddings.
- **Reuse Your LLM**: Once downloaded, reuse your LLM without the need for repeated downloads.
- **Chat History**: Remembers your previous conversations (in a session).
- **API**: LocalGPT has an API that you can use for building RAG Applications.
- **Graphical Interface**: LocalGPT comes with two GUIs, one uses the API and the other is standalone (based on streamlit).
- **GPU, CPU, HPU & MPS Support**: Supports multiple platforms out of the box, Chat with your data using `CUDA`, `CPU`, `HPU (Intel® Gaudi®)` or `MPS` and more!
+[Quick Start](#quick-start) • [Features](#features) • [Installation](#installation) • [Documentation](#documentation) • [API Reference](#api-reference)

-## Dive Deeper with Our Videos 🎥
- [Detailed code-walkthrough](https://youtu.be/MlyoObdIHyo)
- [Llama-2 with LocalGPT](https://youtu.be/lbFmceo4D5E)
- [Adding Chat History](https://youtu.be/d7otIM_MCZs)
- [LocalGPT - Updated (09/17/2023)](https://youtu.be/G_prHSKX9d4)
+</div>

-## Technical Details 🛠️
-By selecting the right local models and the power of `LangChain` you can run the entire RAG pipeline locally, without any data leaving your environment, and with reasonable performance.
+## 🚀 What is LocalGPT?

- `ingest.py` uses `LangChain` tools to parse the document and create embeddings locally using `InstructorEmbeddings`. It then stores the result in a local vector database using `Chroma` vector store.
- `run_localGPT.py` uses a local LLM to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.
- You can replace this local LLM with any other LLM from the HuggingFace. Make sure whatever LLM you select is in the HF format.
+LocalGPT is a **private, local document intelligence platform** that allows you to chat with your documents using advanced AI models - all while keeping your data completely private and secure on your own infrastructure.

-This project was inspired by the original [privateGPT](https://github.com/imartinez/privateGPT).
+### 🎯 Key Benefits

-## Built Using 🧩
- [LangChain](https://github.com/hwchase17/langchain)
- [HuggingFace LLMs](https://huggingface.co/models)
- [InstructorEmbeddings](https://instructor-embedding.github.io/)
- [LLAMACPP](https://github.com/abetlen/llama-cpp-python)
- [ChromaDB](https://www.trychroma.com/)
- [Streamlit](https://streamlit.io/)
+- **🔒 Complete Privacy**: Your documents never leave your server
+- **🧠 Advanced AI**: State-of-the-art RAG (Retrieval-Augmented Generation) with smart routing
+- **📚 Multi-Format Support**: PDFs, Word docs, text files, and more
+- **🔍 Intelligent Search**: Hybrid search combining semantic similarity and keyword matching
+- **⚡ High Performance**: Optimized for speed with batch processing and caching
+- **🐳 Easy Deployment**: Docker support for simple setup and scaling

-# Environment Setup 🌍
+---

-1. 📥 Clone the repo using git:
+## ✨ Features

-```shell
-git clone https://github.com/PromtEngineer/localGPT.git
+### 📖 Document Processing
+- **Multi-format Support**: PDF, DOCX, TXT, Markdown, and more
+- **Smart Chunking**: Intelligent text segmentation with overlap optimization
+- **Contextual Enrichment**: Enhanced document understanding with AI-generated context
+- **Batch Processing**: Handle multiple documents simultaneously
+
+### 🤖 AI-Powered Chat
+- **Natural Language Queries**: Ask questions in plain English
+- **Source Attribution**: Every answer includes document references
+- **Smart Routing**: Automatically chooses the best approach for each query
+- **Multiple AI Models**: Support for Ollama, OpenAI, and Hugging Face models
+
+### 🔍 Advanced Search
+- **Hybrid Search**: Combines semantic similarity with keyword matching
+- **Vector Embeddings**: State-of-the-art embedding models for semantic understanding
+- **BM25 Ranking**: Traditional information retrieval for precise keyword matching
+- **Reranking**: AI-powered result refinement for better relevance
+
+### 🛠️ Developer-Friendly
+- **RESTful APIs**: Complete API access for integration
+- **Real-time Progress**: Live updates during document processing
+- **Flexible Configuration**: Customize models, chunk sizes, and search parameters
+- **Extensible Architecture**: Plugin system for custom components
+
+### 🎨 Modern Interface
+- **Intuitive Web UI**: Clean, responsive design
+- **Session Management**: Organize conversations by topic
+- **Index Management**: Easy document collection management
+- **Real-time Chat**: Streaming responses for immediate feedback
+
+---
+
+## 🚀 Quick Start
+
+### Prerequisites
+- Python 3.8 or higher (tested with Python 3.11.5)
+- Node.js 16+ and npm (tested with Node.js 23.10.0, npm 10.9.2)
+- Docker (optional, for containerized deployment)
+- 8GB+ RAM (16GB+ recommended)
+- Ollama (required for both deployment approaches)
+
+### Option 1: Docker Deployment (Recommended for Production)
+
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/localgpt.git
+cd localgpt
+
+# Install Ollama locally (required even for Docker)
+curl -fsSL https://ollama.ai/install.sh | sh
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+
+# Start Ollama
+ollama serve
+
+# Start with Docker (in a new terminal)
+./start-docker.sh
+
+# Access the application
+open http://localhost:3000
 ```

-2. 🐍 Install [conda](https://www.anaconda.com/download) for virtual environment management. Create and activate a new virtual environment.
+**Docker Management Commands:**
+```bash
+# Check container status
+docker compose ps

-```shell
-conda create -n localGPT python=3.10.0
-conda activate localGPT
+# View logs
+docker compose logs -f
+
+# Stop containers
+./start-docker.sh stop
 ```

-3. 🛠️ Install the dependencies using pip
+### Option 2: Direct Development (Recommended for Development)

-To set up your environment to run the code, first install all requirements:
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/localgpt.git
+cd localgpt

-```shell
+# Install Python dependencies
 pip install -r requirements.txt
+
+# Install Node.js dependencies
+npm install
+
+# Install and start Ollama
+curl -fsSL https://ollama.ai/install.sh | sh
+ollama pull qwen3:0.6b
+ollama pull qwen3:8b
+ollama serve
+
+# Start the system (in a new terminal)
+python run_system.py
+
+# Access the application
+open http://localhost:3000
 ```

-***Installing LLAMA-CPP :***
+**Direct Development Management:**
+```bash
+# Check system health (comprehensive diagnostics)
+python system_health_check.py

-LocalGPT uses [LlamaCpp-Python](https://github.com/abetlen/llama-cpp-python) for GGML (you will need llama-cpp-python <=0.1.76) and GGUF (llama-cpp-python >=0.1.83) models.
+# Check service status
+python run_system.py --health

-To run the quantized Llama3 model, ensure you have llama-cpp-python version 0.2.62 or higher installed.
-
-If you want to use BLAS or Metal with [llama-cpp](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal) you can set appropriate flags:
-
-For `NVIDIA` GPUs support, use `cuBLAS`
-
-```shell
-# Example: cuBLAS
-CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir
+# Stop all services
+python run_system.py --stop
+# Or press Ctrl+C in the terminal running python run_system.py
 ```

-For Apple Metal (`M1/M2`) support, use
+### Option 3: Manual Component Startup

-```shell
-# Example: METAL
-CMAKE_ARGS="-DLLAMA_METAL=on"  FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir
+```bash
+# Terminal 1: Start Ollama
+ollama serve
+
+# Terminal 2: Start RAG API
+python -m rag_system.api_server
+
+# Terminal 3: Start Backend
+cd backend && python server.py
+
+# Terminal 4: Start Frontend
+npm run dev
+
+# Access at http://localhost:3000
 ```
-For more details, please refer to [llama-cpp](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal)

-## Docker 🐳
+---

-Installing the required packages for GPU inference on NVIDIA GPUs, like gcc 11 and CUDA 11, may cause conflicts with other packages in your system.
-As an alternative to Conda, you can use Docker with the provided Dockerfile.
-It includes CUDA, your system just needs Docker, BuildKit, your NVIDIA GPU driver and the NVIDIA container toolkit.
-Build as `docker build -t localgpt .`, requires BuildKit.
-Docker BuildKit does not support GPU during *docker build* time right now, only during *docker run*.
-Run as `docker run -it --mount src="$HOME/.cache",target=/root/.cache,type=bind --gpus=all localgpt`.
-For running the code on Intel® Gaudi® HPU, use the following Dockerfile - `Dockerfile_hpu`.
+## 📋 Installation Guide

-## Test dataset
+### System Requirements

-For testing, this repository comes with [Constitution of USA](https://constitutioncenter.org/media/files/constitution.pdf) as an example file to use.
+| Component | Minimum | Recommended | Tested |
+|-----------|---------|-------------|--------|
+| Python | 3.8+ | 3.11+ | 3.11.5 |
+| Node.js | 16+ | 18+ | 23.10.0 |
+| RAM | 8GB | 16GB+ | 16GB+ |
+| Storage | 10GB | 50GB+ | 50GB+ |
+| CPU | 4 cores | 8+ cores | 8+ cores |
+| GPU | Optional | NVIDIA GPU with 8GB+ VRAM | MPS (Apple Silicon) |

-## Ingesting your OWN Data.
-Put your files in the `SOURCE_DOCUMENTS` folder. You can put multiple folders within the `SOURCE_DOCUMENTS` folder and the code will recursively read your files.
+### Detailed Installation

-### Support file formats:
-LocalGPT currently supports the following file formats. LocalGPT uses `LangChain` for loading these file formats. The code in `constants.py` uses a `DOCUMENT_MAP` dictionary to map a file format to the corresponding loader. In order to add support for another file format, simply add this dictionary with the file format and the corresponding loader from [LangChain](https://python.langchain.com/docs/modules/data_connection/document_loaders/).
+#### 1. Install System Dependencies

-```shell
-DOCUMENT_MAP = {
-    ".txt": TextLoader,
-    ".md": TextLoader,
-    ".py": TextLoader,
-    ".pdf": PDFMinerLoader,
-    ".csv": CSVLoader,
-    ".xls": UnstructuredExcelLoader,
-    ".xlsx": UnstructuredExcelLoader,
-    ".docx": Docx2txtLoader,
-    ".doc": Docx2txtLoader,
+**Ubuntu/Debian:**
+```bash
+sudo apt update
+sudo apt install python3.8 python3-pip nodejs npm docker.io docker-compose
+```
+
+**macOS:**
+```bash
+brew install python@3.8 node npm docker docker-compose
+```
+
+**Windows:**
+```bash
+# Install Python 3.8+, Node.js, and Docker Desktop
+# Then use PowerShell or WSL2
+```
+
+#### 2. Install AI Models
+
+**Install Ollama (Recommended):**
+```bash
+# Install Ollama
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# Pull recommended models
+ollama pull qwen3:0.6b          # Fast generation model
+ollama pull qwen3:8b            # High-quality generation model
+```
+
+#### 3. Configure Environment
+
+```bash
+# Copy environment template
+cp .env.example .env
+
+# Edit configuration
+nano .env
+```
+
+**Key Configuration Options:**
+```env
+# AI Models
+OLLAMA_HOST=http://localhost:11434
+DEFAULT_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
+DEFAULT_GENERATION_MODEL=qwen3:0.6b
+
+# Database
+DATABASE_PATH=./backend/chat_data.db
+VECTOR_DB_PATH=./lancedb
+
+# Server Settings
+BACKEND_PORT=8000
+FRONTEND_PORT=3000
+```
+
+#### 4. Initialize the System
+
+```bash
+# Run system health check
+python system_health_check.py
+
+# Initialize databases
+python -c "from backend.database import ChatDatabase; ChatDatabase().init_database()"
+
+# Test installation
+python -c "from rag_system.main import get_agent; print('✅ Installation successful!')"
+
+# Validate complete setup
+python run_system.py --health
+```
+
+---
+
+## 🎯 Getting Started
+
+### 1. Create Your First Index
+
+An **index** is a collection of processed documents that you can chat with.
+
+#### Using the Web Interface:
+1. Open http://localhost:3000
+2. Click "Create New Index"
+3. Upload your documents (PDF, DOCX, TXT)
+4. Configure processing options
+5. Click "Build Index"
+
+#### Using Scripts:
+```bash
+# Simple script approach
+./simple_create_index.sh "My Documents" "path/to/document.pdf"
+
+# Interactive script
+python create_index_script.py
+```
+
+#### Using API:
+```bash
+# Create index
+curl -X POST http://localhost:8000/indexes \
+  -H "Content-Type: application/json" \
+  -d '{"name": "My Index", "description": "My documents"}'
+
+# Upload documents
+curl -X POST http://localhost:8000/indexes/INDEX_ID/upload \
+  -F "files=@document.pdf"
+
+# Build index
+curl -X POST http://localhost:8000/indexes/INDEX_ID/build
+```
+
+### 2. Start Chatting
+
+Once your index is built:
+
+1. **Create a Chat Session**: Click "New Chat" or use an existing session
+2. **Select Your Index**: Choose which document collection to query
+3. **Ask Questions**: Type natural language questions about your documents
+4. **Get Answers**: Receive AI-generated responses with source citations
+
+### 3. Advanced Features
+
+#### Custom Model Configuration
+```bash
+# Use different models for different tasks
+curl -X POST http://localhost:8000/sessions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "title": "High Quality Session",
+    "model": "qwen3:8b",
+    "embedding_model": "Qwen/Qwen3-Embedding-4B"
+  }'
+```
+
+#### Batch Document Processing
+```bash
+# Process multiple documents at once
+python demo_batch_indexing.py --config batch_indexing_config.json
+```
+
+#### API Integration
+```python
+import requests
+
+# Chat with your documents via API
+response = requests.post('http://localhost:8000/chat', json={
+    'query': 'What are the key findings in the research papers?',
+    'session_id': 'your-session-id',
+    'search_type': 'hybrid',
+    'retrieval_k': 20
+})
+
+print(response.json()['response'])
+```
+
+---
+
+## 🔧 Configuration
+
+### Model Configuration
+
+LocalGPT supports multiple AI model providers:
+
+#### Ollama Models (Local)
+```python
+OLLAMA_CONFIG = {
+    'host': 'http://localhost:11434',
+    'generation_model': 'qwen3:0.6b',
+    'embedding_model': 'nomic-embed-text'
 }
 ```

-### Ingest
-
-Run the following command to ingest all the data.
-
-If you have `cuda` setup on your system.
-
-```shell
-python ingest.py
-```
-You will see an output like this:
-<img width="1110" alt="Screenshot 2023-09-14 at 3 36 27 PM" src="https://github.com/PromtEngineer/localGPT/assets/134474669/c9274e9a-842c-49b9-8d95-606c3d80011f">
-
-
-Use the device type argument to specify a given device.
-To run on `cpu`
-
-```sh
-python ingest.py --device_type cpu
+#### Hugging Face Models
+```python
+EXTERNAL_MODELS = {
+    'embedding': {
+        'Qwen/Qwen3-Embedding-0.6B': {'dimensions': 1024},
+        'Qwen/Qwen3-Embedding-4B': {'dimensions': 2048},
+        'Qwen/Qwen3-Embedding-8B': {'dimensions': 4096}
+    }
+}
 ```

-To run on `M1/M2`
+### Processing Configuration

-```sh
-python ingest.py --device_type mps
+```python
+PIPELINE_CONFIGS = {
+    'default': {
+        'chunk_size': 512,
+        'chunk_overlap': 64,
+        'retrieval_mode': 'hybrid',
+        'window_size': 5,
+        'enable_enrich': True,
+        'latechunk': True,
+        'docling_chunk': True
+    },
+    'fast': {
+        'chunk_size': 256,
+        'chunk_overlap': 32,
+        'retrieval_mode': 'vector',
+        'enable_enrich': False
+    }
+}
 ```

-Use help for a full list of supported devices.
+### Search Configuration

-```sh
-python ingest.py --help
+```python
+SEARCH_CONFIG = {
+    'hybrid': {
+        'dense_weight': 0.7,
+        'sparse_weight': 0.3,
+        'retrieval_k': 20,
+        'reranker_top_k': 10
+    }
+}
 ```

-This will create a new folder called `DB` and use it for the newly created vector store. You can ingest as many documents as you want, and all will be accumulated in the local embeddings database.
-If you want to start from an empty database, delete the `DB` and reingest your documents.
+---

-Note: When you run this for the first time, it will need internet access to download the embedding model (default: `Instructor Embedding`). In the subsequent runs, no data will leave your local environment and you can ingest data without internet connection.
+## 📚 Use Cases

-## Ask questions to your documents, locally!
+### 📊 Business Intelligence
+- **Document Analysis**: Extract insights from reports, contracts, and presentations
+- **Compliance**: Query regulatory documents and policies
+- **Knowledge Management**: Build searchable company knowledge bases

-In order to chat with your documents, run the following command (by default, it will run on `cuda`).
+### 🔬 Research & Academia
+- **Literature Review**: Analyze research papers and academic publications
+- **Data Analysis**: Query experimental results and datasets
+- **Collaboration**: Share findings with team members securely

-```shell
-python run_localGPT.py
-```
-You can also specify the device type just like `ingest.py`
+### ⚖️ Legal & Compliance
+- **Case Research**: Search through legal documents and precedents
+- **Contract Analysis**: Extract key terms and obligations
+- **Regulatory Compliance**: Query compliance requirements and guidelines

-```shell
-python run_localGPT.py --device_type mps # to run on Apple silicon
+### 🏥 Healthcare
+- **Medical Records**: Analyze patient data and treatment histories
+- **Research**: Query medical literature and clinical studies
+- **Compliance**: Navigate healthcare regulations and standards
+
+### 💼 Personal Productivity
+- **Document Organization**: Create searchable personal knowledge bases
+- **Research**: Analyze books, articles, and reference materials
+- **Learning**: Build interactive study materials from textbooks
+
+---
+
+## 🛠️ Troubleshooting
+
+### Common Issues
+
+#### Installation Problems
+```bash
+# Check Python version
+python --version  # Should be 3.8+
+
+# Check dependencies
+pip list | grep -E "(torch|transformers|lancedb)"
+
+# Reinstall dependencies
+pip install -r requirements.txt --force-reinstall
 ```

-```shell
-# To run on Intel® Gaudi® hpu
-MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.2" # in constants.py
-python run_localGPT.py --device_type hpu
+#### Model Loading Issues
+```bash
+# Check Ollama status
+ollama list
+curl http://localhost:11434/api/tags
+
+# Pull missing models
+ollama pull qwen3:0.6b
 ```

-This will load the ingested vector store and embedding model. You will be presented with a prompt:
+#### Database Issues
+```bash
+# Check database connectivity
+python -c "from backend.database import ChatDatabase; db = ChatDatabase(); print('✅ Database OK')"

-```shell
-> Enter a query:
+# Reset database (WARNING: This deletes all data)
+rm backend/chat_data.db
+python -c "from backend.database import ChatDatabase; ChatDatabase().init_database()"
 ```

-After typing your question, hit enter. LocalGPT will take some time based on your hardware. You will get a response like this below.
-<img width="1312" alt="Screenshot 2023-09-14 at 3 33 19 PM" src="https://github.com/PromtEngineer/localGPT/assets/134474669/a7268de9-ade0-420b-a00b-ed12207dbe41">
+#### Performance Issues
+```bash
+# Check system resources
+python system_health_check.py

-Once the answer is generated, you can then ask another question without re-running the script, just wait for the prompt again.
+# Monitor memory usage
+htop  # or Task Manager on Windows

-
-***Note:*** When you run this for the first time, it will need internet connection to download the LLM (default: `TheBloke/Llama-2-7b-Chat-GGUF`). After that you can turn off your internet connection, and the script inference would still work. No data gets out of your local environment.
-
-Type `exit` to finish the script.
-
-### Extra Options with run_localGPT.py
-
-You can use the `--show_sources` flag with `run_localGPT.py` to show which chunks were retrieved by the embedding model. By default, it will show 4 different sources/chunks. You can change the number of sources/chunks
-
-```shell
-python run_localGPT.py --show_sources
+# Optimize for low-memory systems
+export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
 ```

-Another option is to enable chat history. ***Note***: This is disabled by default and can be enabled by using the  `--use_history` flag. The context window is limited so keep in mind enabling history will use it and might overflow.
+### Getting Help

-```shell
-python run_localGPT.py --use_history
+1. **Check Logs**: Look at `logs/system.log` for detailed error messages
+2. **System Health**: Run `python system_health_check.py`
+3. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)
+4. **GitHub Issues**: Report bugs and request features
+5. **Community**: Join our Discord/Slack community
+
+---
+
+## 🔗 API Reference
+
+### Core Endpoints
+
+#### Chat API
+```http
+POST /chat
+Content-Type: application/json
+
+{
+  "query": "What are the main topics discussed?",
+  "session_id": "uuid",
+  "search_type": "hybrid",
+  "retrieval_k": 20
+}
 ```

-You can store user questions and model responses with flag `--save_qa` into a csv file `/local_chat_history/qa_log.csv`. Every interaction will be stored. 
+#### Index Management
+```http
+# Create index
+POST /indexes
+{"name": "My Index", "description": "Description"}

-```shell
-python run_localGPT.py --save_qa
+# Upload documents
+POST /indexes/{id}/upload
+Content-Type: multipart/form-data
+
+# Build index
+POST /indexes/{id}/build
+
+# Get index status
+GET /indexes/{id}
 ```

-# Run the Graphical User Interface
+#### Session Management
+```http
+# Create session
+POST /sessions
+{"title": "My Session", "model": "qwen3:0.6b"}

-1. Open `constants.py` in an editor of your choice and depending on choice add the LLM you want to use. By default, the following model will be used:
+# Get sessions
+GET /sessions

-   ```shell
-   MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
-   MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"
-   ```
+# Link index to session
+POST /sessions/{session_id}/indexes/{index_id}
+```

-3. Open up a terminal and activate your python environment that contains the dependencies installed from requirements.txt.
+### Advanced Features

-4. Navigate to the `/LOCALGPT` directory.
+#### Streaming Chat
+```http
+POST /chat/stream
+Content-Type: application/json

-5. Run the following command `python run_localGPT_API.py`. The API should being to run.
+{
+  "query": "Explain the methodology",
+  "session_id": "uuid",
+  "stream": true
+}
+```

-6. Wait until everything has loaded in. You should see something like `INFO:werkzeug:Press CTRL+C to quit`.
+#### Batch Processing
+```http
+POST /batch/index
+Content-Type: application/json

-7. Open up a second terminal and activate the same python environment.
+{
+  "file_paths": ["doc1.pdf", "doc2.pdf"],
+  "config": {
+    "chunk_size": 512,
+    "enable_enrich": true
+  }
+}
+```

-8. Navigate to the `/LOCALGPT/localGPTUI` directory.
+For complete API documentation, see [API_REFERENCE.md](API_REFERENCE.md).

-9. Run the command `python localGPTUI.py`.
+---

-10. Open up a web browser and go the address `http://localhost:5111/`.
+## 🏗️ Architecture

+LocalGPT is built with a modular, scalable architecture:

-# How to select different LLM models?
+```mermaid
+graph TB
+    UI[Web Interface] --> API[Backend API]
+    API --> Agent[RAG Agent]
+    Agent --> Retrieval[Retrieval Pipeline]
+    Agent --> Generation[Generation Pipeline]
+    
+    Retrieval --> Vector[Vector Search]
+    Retrieval --> BM25[BM25 Search]
+    Retrieval --> Rerank[Reranking]
+    
+    Vector --> LanceDB[(LanceDB)]
+    BM25 --> BM25DB[(BM25 Index)]
+    
+    Generation --> Ollama[Ollama Models]
+    Generation --> HF[Hugging Face Models]
+    
+    API --> SQLite[(SQLite DB)]
+```

-To change the models you will need to set both `MODEL_ID` and `MODEL_BASENAME`.
+### Key Components

-1. Open up `constants.py` in the editor of your choice.
-2. Change the `MODEL_ID` and `MODEL_BASENAME`. If you are using a quantized model (`GGML`, `GPTQ`, `GGUF`), you will need to provide `MODEL_BASENAME`. For unquantized models, set `MODEL_BASENAME` to `NONE`
-5. There are a number of example models from HuggingFace that have already been tested to be run with the original trained model (ending with HF or have a .bin in its "Files and versions"), and quantized models (ending with GPTQ or have a .no-act-order or .safetensors in its "Files and versions").
-6. For models that end with HF or have a .bin inside its "Files and versions" on its HuggingFace page.
+- **Frontend**: React/Next.js web interface
+- **Backend**: Python FastAPI server
+- **RAG Agent**: Intelligent query routing and processing
+- **Vector Database**: LanceDB for semantic search
+- **Search Engine**: BM25 for keyword search
+- **AI Models**: Ollama and Hugging Face integration

-   - Make sure you have a `MODEL_ID` selected. For example -> `MODEL_ID = "TheBloke/guanaco-7B-HF"`
-   - Go to the [HuggingFace Repo](https://huggingface.co/TheBloke/guanaco-7B-HF)
+---

-7. For models that contain GPTQ in its name and or have a .no-act-order or .safetensors extension inside its "Files and versions on its HuggingFace page.
+## 🤝 Contributing

-   - Make sure you have a `MODEL_ID` selected. For example -> model_id = `"TheBloke/wizardLM-7B-GPTQ"`
-   - Got to the corresponding [HuggingFace Repo](https://huggingface.co/TheBloke/wizardLM-7B-GPTQ) and select "Files and versions".
-   - Pick one of the model names and set it as  `MODEL_BASENAME`. For example -> `MODEL_BASENAME = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"`
+We welcome contributions from developers of all skill levels! LocalGPT is an open-source project that benefits from community involvement.

-8. Follow the same steps for `GGUF` and `GGML` models.
+### 🚀 Quick Start for Contributors

-# GPU and VRAM Requirements
+```bash
+# Fork and clone the repository
+git clone https://github.com/YOUR_USERNAME/multimodal_rag.git
+cd multimodal_rag

-Below is the VRAM requirement for different models depending on their size (Billions of parameters). The estimates in the table does not include VRAM used by the Embedding models - which use an additional 2GB-7GB of VRAM depending on the model.
+# Set up development environment
+pip install -r requirements.txt
+npm install

-| Mode Size (B) | float32   | float16   | GPTQ 8bit      | GPTQ 4bit          |
-| ------- | --------- | --------- | -------------- | ------------------ |
-| 7B      | 28 GB     | 14 GB     | 7 GB - 9 GB    | 3.5 GB - 5 GB      |
-| 13B     | 52 GB     | 26 GB     | 13 GB - 15 GB  | 6.5 GB - 8 GB      |
-| 32B     | 130 GB    | 65 GB     | 32.5 GB - 35 GB| 16.25 GB - 19 GB   |
-| 65B     | 260.8 GB  | 130.4 GB  | 65.2 GB - 67 GB| 32.6 GB - 35 GB    |
+# Install Ollama and models
+curl -fsSL https://ollama.ai/install.sh | sh
+ollama pull qwen3:0.6b qwen3:8b

+# Verify setup
+python system_health_check.py
+python run_system.py --mode dev
+```

-# System Requirements
+### 📋 How to Contribute

-## Python Version
+1. **🐛 Report Bugs**: Use our [bug report template](.github/ISSUE_TEMPLATE/bug_report.md)
+2. **💡 Request Features**: Use our [feature request template](.github/ISSUE_TEMPLATE/feature_request.md)
+3. **🔧 Submit Code**: Follow our [development workflow](CONTRIBUTING.md#development-workflow)
+4. **📚 Improve Docs**: Help make our documentation better

-To use this software, you must have Python 3.10 or later installed. Earlier versions of Python will not compile.
+### 🎯 Priority Areas

-## C++ Compiler
+- **Performance Optimization**: Improve indexing and retrieval speed
+- **Model Integration**: Add support for new AI models
+- **User Experience**: Enhance the web interface
+- **Testing**: Expand test coverage
+- **Documentation**: Improve setup and usage guides

-If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer.
+### 📖 Detailed Guidelines

-### For Windows 10/11
+For comprehensive contributing guidelines, including:
+- Development setup and workflow
+- Coding standards and best practices
+- Testing requirements
+- Documentation standards
+- Release process

-To install a C++ compiler on Windows 10/11, follow these steps:
+**👉 See our [CONTRIBUTING.md](CONTRIBUTING.md) guide**

-1. Install Visual Studio 2022.
-2. Make sure the following components are selected:
-   - Universal Windows Platform development
-   - C++ CMake tools for Windows
-3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
-4. Run the installer and select the "gcc" component.
+---

-### NVIDIA Driver's Issues:
+## 📄 License

-Follow this [page](https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-22-04) to install NVIDIA Drivers.
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

-## Star History
+---

-[![Star History Chart](https://api.star-history.com/svg?repos=PromtEngineer/localGPT&type=Date)](https://star-history.com/#PromtEngineer/localGPT&Date)
+## 🙏 Acknowledgments

-# Disclaimer
+- **Ollama**: For providing excellent local AI model serving
+- **LanceDB**: For high-performance vector database
+- **Hugging Face**: For state-of-the-art AI models
+- **React/Next.js**: For the modern web interface
+- **FastAPI**: For the robust backend framework

-This is a test project to validate the feasibility of a fully local solution for question answering using LLMs and Vector embeddings. It is not production ready, and it is not meant to be used in production. Vicuna-7B is based on the Llama model so that has the original Llama license.
+---

-# Common Errors
+## 📞 Support

- - [Torch not compatible with CUDA enabled](https://github.com/pytorch/pytorch/issues/30664)
+- **Documentation**: [Technical Docs](TECHNICAL_DOCS.md)
+- **Issues**: [GitHub Issues](https://github.com/yourusername/localgpt/issues)
+- **Discussions**: [GitHub Discussions](https://github.com/yourusername/localgpt/discussions)
+- **Email**: support@localgpt.com

-   -  Get CUDA version
-      ```shell
-      nvcc --version
-      ```
-      ```shell
-      nvidia-smi
-      ```
-   - Try installing PyTorch depending on your CUDA version
-      ```shell
-         conda install -c pytorch torchvision cudatoolkit=10.1 pytorch
-      ```
-   - If it doesn't work, try reinstalling
-      ```shell
-         pip uninstall torch
-         pip cache purge
-         pip install torch -f https://download.pytorch.org/whl/torch_stable.html
-      ```
+---

- [ERROR: pip's dependency resolver does not currently take into account all the packages that are installed](https://stackoverflow.com/questions/72672196/error-pips-dependency-resolver-does-not-currently-take-into-account-all-the-pa/76604141#76604141)
-  ```shell
-     pip install h5py
-     pip install typing-extensions
-     pip install wheel
-  ```
- [Failed to import transformers](https://github.com/huggingface/transformers/issues/11262)
-  - Try re-install
-    ```shell
-       conda uninstall tokenizers, transformers
-       pip install transformers
-    ```
+<div align="center">
+
+**Made with ❤️ for private, intelligent document processing**
+
+[⭐ Star us on GitHub](https://github.com/yourusername/localgpt) • [🐛 Report Bug](https://github.com/yourusername/localgpt/issues) • [💡 Request Feature](https://github.com/yourusername/localgpt/issues)
+
+</div>
--- a/SOURCE_DOCUMENTS/Orca_paper.pdf
+++ b/SOURCE_DOCUMENTS/Orca_paper.pdf
--- a/backend/README.md
+++ b/backend/README.md
@ -0,0 +1,93 @@
+# localGPT Backend
+
+Simple Python backend that connects your frontend to Ollama for local LLM chat.
+
+## Prerequisites
+
+1. **Install Ollama** (if not already installed):
+   ```bash
+   # Visit https://ollama.ai or run:
+   curl -fsSL https://ollama.ai/install.sh | sh
+   ```
+
+2. **Start Ollama**:
+   ```bash
+   ollama serve
+   ```
+
+3. **Pull a model** (optional, server will suggest if needed):
+   ```bash
+   ollama pull llama3.2
+   ```
+
+## Setup
+
+1. **Install Python dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+2. **Test Ollama connection**:
+   ```bash
+   python ollama_client.py
+   ```
+
+3. **Start the backend server**:
+   ```bash
+   python server.py
+   ```
+
+Server will run on `http://localhost:8000`
+
+## API Endpoints
+
+### Health Check
+```bash
+GET /health
+```
+Returns server status and available models.
+
+### Chat
+```bash
+POST /chat
+Content-Type: application/json
+
+{
+  "message": "Hello!",
+  "model": "llama3.2:latest",
+  "conversation_history": []
+}
+```
+
+Returns:
+```json
+{
+  "response": "Hello! How can I help you?",
+  "model": "llama3.2:latest",
+  "message_count": 1
+}
+```
+
+## Testing
+
+Test the chat endpoint:
+```bash
+curl -X POST http://localhost:8000/chat \
+  -H "Content-Type: application/json" \
+  -d '{"message": "Hello!", "model": "llama3.2:latest"}'
+```
+
+## Frontend Integration
+
+Your React frontend should connect to:
+- **Backend**: `http://localhost:8000`
+- **Chat endpoint**: `http://localhost:8000/chat`
+
+## What's Next
+
+This simple backend is ready for:
+- ✅ **Real-time chat** with local LLMs
+- 🔜 **Document upload** for RAG
+- 🔜 **Vector database** integration
+- 🔜 **Streaming responses**
+- 🔜 **Chat history** persistence 
--- a/backend/chat_data.db
+++ b/backend/chat_data.db
--- a/backend/database.py
+++ b/backend/database.py
@ -0,0 +1,684 @@
+import sqlite3
+import uuid
+import json
+from datetime import datetime
+from typing import List, Dict, Optional, Tuple
+
+class ChatDatabase:
+    def __init__(self, db_path: str = "chat_history.db"):
+        self.db_path = db_path
+        self.init_database()
+    
+    def init_database(self):
+        """Initialize the SQLite database with required tables"""
+        conn = sqlite3.connect(self.db_path)
+        cursor = conn.cursor()
+        
+        # Enable foreign keys
+        conn.execute("PRAGMA foreign_keys = ON")
+        
+        # Sessions table
+        conn.execute('''
+            CREATE TABLE IF NOT EXISTS sessions (
+                id TEXT PRIMARY KEY,
+                title TEXT NOT NULL,
+                created_at TEXT NOT NULL,
+                updated_at TEXT NOT NULL,
+                model_used TEXT NOT NULL,
+                message_count INTEGER DEFAULT 0
+            )
+        ''')
+        
+        # Messages table
+        conn.execute('''
+            CREATE TABLE IF NOT EXISTS messages (
+                id TEXT PRIMARY KEY,
+                session_id TEXT NOT NULL,
+                content TEXT NOT NULL,
+                sender TEXT NOT NULL CHECK (sender IN ('user', 'assistant')),
+                timestamp TEXT NOT NULL,
+                metadata TEXT DEFAULT '{}',
+                FOREIGN KEY (session_id) REFERENCES sessions (id) ON DELETE CASCADE
+            )
+        ''')
+        
+        # Create indexes for better performance
+        conn.execute('CREATE INDEX IF NOT EXISTS idx_messages_session_id ON messages(session_id)')
+        conn.execute('CREATE INDEX IF NOT EXISTS idx_messages_timestamp ON messages(timestamp)')
+        conn.execute('CREATE INDEX IF NOT EXISTS idx_sessions_updated_at ON sessions(updated_at)')
+        
+        # Documents table
+        conn.execute('''
+            CREATE TABLE IF NOT EXISTS session_documents (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                session_id TEXT NOT NULL,
+                file_path TEXT NOT NULL,
+                indexed INTEGER DEFAULT 0,
+                FOREIGN KEY (session_id) REFERENCES sessions (id) ON DELETE CASCADE
+            )
+        ''')
+        conn.execute('CREATE INDEX IF NOT EXISTS idx_session_documents_session_id ON session_documents(session_id)')
+        
+        # --- NEW: Index persistence tables ---
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS indexes (
+                id TEXT PRIMARY KEY,
+                name TEXT UNIQUE,
+                description TEXT,
+                created_at TEXT,
+                updated_at TEXT,
+                vector_table_name TEXT,
+                metadata TEXT
+            )
+        ''')
+
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS index_documents (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                index_id TEXT,
+                original_filename TEXT,
+                stored_path TEXT,
+                FOREIGN KEY(index_id) REFERENCES indexes(id)
+            )
+        ''')
+
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS session_indexes (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                session_id TEXT,
+                index_id TEXT,
+                linked_at TEXT,
+                FOREIGN KEY(session_id) REFERENCES sessions(id),
+                FOREIGN KEY(index_id) REFERENCES indexes(id)
+            )
+        ''')
+        
+        conn.commit()
+        conn.close()
+        print("✅ Database initialized successfully")
+    
+    def create_session(self, title: str, model: str) -> str:
+        """Create a new chat session"""
+        session_id = str(uuid.uuid4())
+        now = datetime.now().isoformat()
+        
+        conn = sqlite3.connect(self.db_path)
+        conn.execute('''
+            INSERT INTO sessions (id, title, created_at, updated_at, model_used)
+            VALUES (?, ?, ?, ?, ?)
+        ''', (session_id, title, now, now, model))
+        conn.commit()
+        conn.close()
+        
+        print(f"📝 Created new session: {session_id[:8]}... - {title}")
+        return session_id
+    
+    def get_sessions(self, limit: int = 50) -> List[Dict]:
+        """Get all chat sessions, ordered by most recent"""
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        
+        cursor = conn.execute('''
+            SELECT id, title, created_at, updated_at, model_used, message_count
+            FROM sessions
+            ORDER BY updated_at DESC
+            LIMIT ?
+        ''', (limit,))
+        
+        sessions = [dict(row) for row in cursor.fetchall()]
+        conn.close()
+        
+        return sessions
+    
+    def get_session(self, session_id: str) -> Optional[Dict]:
+        """Get a specific session"""
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        
+        cursor = conn.execute('''
+            SELECT id, title, created_at, updated_at, model_used, message_count
+            FROM sessions
+            WHERE id = ?
+        ''', (session_id,))
+        
+        row = cursor.fetchone()
+        conn.close()
+        
+        return dict(row) if row else None
+    
+    def add_message(self, session_id: str, content: str, sender: str, metadata: Dict = None) -> str:
+        """Add a message to a session"""
+        message_id = str(uuid.uuid4())
+        now = datetime.now().isoformat()
+        metadata_json = json.dumps(metadata or {})
+        
+        conn = sqlite3.connect(self.db_path)
+        
+        # Add the message
+        conn.execute('''
+            INSERT INTO messages (id, session_id, content, sender, timestamp, metadata)
+            VALUES (?, ?, ?, ?, ?, ?)
+        ''', (message_id, session_id, content, sender, now, metadata_json))
+        
+        # Update session timestamp and message count
+        conn.execute('''
+            UPDATE sessions 
+            SET updated_at = ?, 
+                message_count = message_count + 1
+            WHERE id = ?
+        ''', (now, session_id))
+        
+        conn.commit()
+        conn.close()
+        
+        return message_id
+    
+    def get_messages(self, session_id: str, limit: int = 100) -> List[Dict]:
+        """Get all messages for a session"""
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        
+        cursor = conn.execute('''
+            SELECT id, content, sender, timestamp, metadata
+            FROM messages
+            WHERE session_id = ?
+            ORDER BY timestamp ASC
+            LIMIT ?
+        ''', (session_id, limit))
+        
+        messages = []
+        for row in cursor.fetchall():
+            message = dict(row)
+            message['metadata'] = json.loads(message['metadata'])
+            messages.append(message)
+        
+        conn.close()
+        return messages
+    
+    def get_conversation_history(self, session_id: str) -> List[Dict]:
+        """Get conversation history in the format expected by Ollama"""
+        messages = self.get_messages(session_id)
+        
+        history = []
+        for msg in messages:
+            history.append({
+                "role": msg["sender"],
+                "content": msg["content"]
+            })
+        
+        return history
+    
+    def update_session_title(self, session_id: str, title: str):
+        """Update session title"""
+        conn = sqlite3.connect(self.db_path)
+        conn.execute('''
+            UPDATE sessions 
+            SET title = ?, updated_at = ?
+            WHERE id = ?
+        ''', (title, datetime.now().isoformat(), session_id))
+        conn.commit()
+        conn.close()
+    
+    def delete_session(self, session_id: str) -> bool:
+        """Delete a session and all its messages"""
+        conn = sqlite3.connect(self.db_path)
+        cursor = conn.execute('DELETE FROM sessions WHERE id = ?', (session_id,))
+        deleted = cursor.rowcount > 0
+        conn.commit()
+        conn.close()
+        
+        if deleted:
+            print(f"🗑️ Deleted session: {session_id[:8]}...")
+        
+        return deleted
+    
+    def cleanup_empty_sessions(self) -> int:
+        """Remove sessions with no messages"""
+        conn = sqlite3.connect(self.db_path)
+        
+        # Find sessions with no messages
+        cursor = conn.execute('''
+            SELECT s.id FROM sessions s
+            LEFT JOIN messages m ON s.id = m.session_id
+            WHERE m.id IS NULL
+        ''')
+        
+        empty_sessions = [row[0] for row in cursor.fetchall()]
+        
+        # Delete empty sessions
+        deleted_count = 0
+        for session_id in empty_sessions:
+            cursor = conn.execute('DELETE FROM sessions WHERE id = ?', (session_id,))
+            if cursor.rowcount > 0:
+                deleted_count += 1
+                print(f"🗑️ Cleaned up empty session: {session_id[:8]}...")
+        
+        conn.commit()
+        conn.close()
+        
+        if deleted_count > 0:
+            print(f"✨ Cleaned up {deleted_count} empty sessions")
+        
+        return deleted_count
+    
+    def get_stats(self) -> Dict:
+        """Get database statistics"""
+        conn = sqlite3.connect(self.db_path)
+        
+        # Get session count
+        cursor = conn.execute('SELECT COUNT(*) FROM sessions')
+        session_count = cursor.fetchone()[0]
+        
+        # Get message count
+        cursor = conn.execute('SELECT COUNT(*) FROM messages')
+        message_count = cursor.fetchone()[0]
+        
+        # Get most used model
+        cursor = conn.execute('''
+            SELECT model_used, COUNT(*) as count
+            FROM sessions
+            GROUP BY model_used
+            ORDER BY count DESC
+            LIMIT 1
+        ''')
+        most_used_model = cursor.fetchone()
+        
+        conn.close()
+        
+        return {
+            "total_sessions": session_count,
+            "total_messages": message_count,
+            "most_used_model": most_used_model[0] if most_used_model else None
+        }
+
+    def add_document_to_session(self, session_id: str, file_path: str) -> int:
+        """Adds a document file path to a session."""
+        conn = sqlite3.connect(self.db_path)
+        cursor = conn.execute(
+            "INSERT INTO session_documents (session_id, file_path) VALUES (?, ?)",
+            (session_id, file_path)
+        )
+        doc_id = cursor.lastrowid
+        conn.commit()
+        conn.close()
+        print(f"📄 Added document '{file_path}' to session {session_id[:8]}...")
+        return doc_id
+
+    def get_documents_for_session(self, session_id: str) -> List[str]:
+        """Retrieves all document file paths for a given session."""
+        conn = sqlite3.connect(self.db_path)
+        cursor = conn.execute(
+            "SELECT file_path FROM session_documents WHERE session_id = ?",
+            (session_id,)
+        )
+        paths = [row[0] for row in cursor.fetchall()]
+        conn.close()
+        return paths
+
+    # -------- Index helpers ---------
+
+    def create_index(self, name: str, description: str|None = None, metadata: dict | None = None) -> str:
+        idx_id = str(uuid.uuid4())
+        created = datetime.now().isoformat()
+        vector_table = f"text_pages_{idx_id}"
+        conn = sqlite3.connect(self.db_path)
+        conn.execute('''
+            INSERT INTO indexes (id, name, description, created_at, updated_at, vector_table_name, metadata)
+            VALUES (?,?,?,?,?,?,?)
+        ''', (idx_id, name, description, created, created, vector_table, json.dumps(metadata or {})))
+        conn.commit()
+        conn.close()
+        print(f"📂 Created new index '{name}' ({idx_id[:8]})")
+        return idx_id
+
+    def get_index(self, index_id: str) -> dict | None:
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        cur = conn.execute('SELECT * FROM indexes WHERE id=?', (index_id,))
+        row = cur.fetchone()
+        if not row:
+            conn.close()
+            return None
+        idx = dict(row)
+        idx['metadata'] = json.loads(idx['metadata'] or '{}')
+        cur = conn.execute('SELECT original_filename, stored_path FROM index_documents WHERE index_id=?', (index_id,))
+        docs = [{'filename': r[0], 'stored_path': r[1]} for r in cur.fetchall()]
+        idx['documents'] = docs
+        conn.close()
+        return idx
+
+    def list_indexes(self) -> list[dict]:
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        rows = conn.execute('SELECT * FROM indexes').fetchall()
+        res = []
+        for r in rows:
+            item = dict(r)
+            item['metadata'] = json.loads(item['metadata'] or '{}')
+            # attach documents list for convenience
+            docs_cur = conn.execute('SELECT original_filename, stored_path FROM index_documents WHERE index_id=?', (item['id'],))
+            docs = [{'filename':d[0],'stored_path':d[1]} for d in docs_cur.fetchall()]
+            item['documents'] = docs
+            res.append(item)
+        conn.close()
+        return res
+
+    def add_document_to_index(self, index_id: str, filename: str, stored_path: str):
+        conn = sqlite3.connect(self.db_path)
+        conn.execute('INSERT INTO index_documents (index_id, original_filename, stored_path) VALUES (?,?,?)', (index_id, filename, stored_path))
+        conn.commit()
+        conn.close()
+
+    def link_index_to_session(self, session_id: str, index_id: str):
+        conn = sqlite3.connect(self.db_path)
+        conn.execute('INSERT INTO session_indexes (session_id, index_id, linked_at) VALUES (?,?,?)', (session_id, index_id, datetime.now().isoformat()))
+        conn.commit()
+        conn.close()
+
+    def get_indexes_for_session(self, session_id: str) -> list[str]:
+        conn = sqlite3.connect(self.db_path)
+        cursor = conn.execute('SELECT index_id FROM session_indexes WHERE session_id=? ORDER BY linked_at', (session_id,))
+        ids = [r[0] for r in cursor.fetchall()]
+        conn.close()
+        return ids
+
+    def delete_index(self, index_id: str) -> bool:
+        """Delete an index and its related records (documents, session links). Returns True if deleted."""
+        conn = sqlite3.connect(self.db_path)
+        try:
+            # Get vector table name before deletion (optional, for LanceDB cleanup)
+            cur = conn.execute('SELECT vector_table_name FROM indexes WHERE id = ?', (index_id,))
+            row = cur.fetchone()
+            vector_table_name = row[0] if row else None
+
+            # Remove child rows first due to foreign‐key constraints
+            conn.execute('DELETE FROM index_documents WHERE index_id = ?', (index_id,))
+            conn.execute('DELETE FROM session_indexes WHERE index_id = ?', (index_id,))
+            cursor = conn.execute('DELETE FROM indexes WHERE id = ?', (index_id,))
+            deleted = cursor.rowcount > 0
+            conn.commit()
+        finally:
+            conn.close()
+
+        if deleted:
+            print(f"🗑️ Deleted index {index_id[:8]}... and related records")
+            # Optional: attempt to drop LanceDB table if available
+            if vector_table_name:
+                try:
+                    from rag_system.indexing.embedders import LanceDBManager
+                    import os
+                    db_path = os.getenv('LANCEDB_PATH') or './rag_system/index_store/lancedb'
+                    ldb = LanceDBManager(db_path)
+                    db = ldb.db
+                    if hasattr(db, 'table_names') and vector_table_name in db.table_names():
+                        db.drop_table(vector_table_name)
+                        print(f"🚮 Dropped LanceDB table '{vector_table_name}'")
+                except Exception as e:
+                    print(f"⚠️ Could not drop LanceDB table '{vector_table_name}': {e}")
+        return deleted
+
+    def update_index_metadata(self, index_id: str, updates: dict):
+        """Merge new key/values into an index's metadata JSON column."""
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        cur = conn.execute('SELECT metadata FROM indexes WHERE id=?', (index_id,))
+        row = cur.fetchone()
+        if row is None:
+            conn.close()
+            raise ValueError("Index not found")
+        existing = json.loads(row['metadata'] or '{}')
+        existing.update(updates)
+        conn.execute('UPDATE indexes SET metadata=?, updated_at=? WHERE id=?', (json.dumps(existing), datetime.now().isoformat(), index_id))
+        conn.commit()
+        conn.close()
+
+    def inspect_and_populate_index_metadata(self, index_id: str) -> dict:
+        """
+        Inspect LanceDB table to extract metadata for older indexes.
+        Returns the inferred metadata or empty dict if inspection fails.
+        """
+        try:
+            # Get index info
+            index_info = self.get_index(index_id)
+            if not index_info:
+                return {}
+            
+            # Check if metadata is already populated
+            if index_info.get('metadata') and len(index_info['metadata']) > 0:
+                return index_info['metadata']
+            
+            # Try to inspect the LanceDB table
+            vector_table_name = index_info.get('vector_table_name')
+            if not vector_table_name:
+                return {}
+            
+            try:
+                # Try to import the RAG system modules
+                try:
+                    from rag_system.indexing.embedders import LanceDBManager
+                    import os
+                    
+                    # Use the same path as the system
+                    db_path = os.getenv('LANCEDB_PATH') or './rag_system/index_store/lancedb'
+                    ldb = LanceDBManager(db_path)
+                    
+                    # Check if table exists
+                    if not hasattr(ldb.db, 'table_names') or vector_table_name not in ldb.db.table_names():
+                        # Table doesn't exist - this means the index was never properly built
+                        inferred_metadata = {
+                            'status': 'incomplete',
+                            'issue': 'Vector table not found - index may not have been built properly',
+                            'vector_table_expected': vector_table_name,
+                            'available_tables': list(ldb.db.table_names()) if hasattr(ldb.db, 'table_names') else [],
+                            'metadata_inferred_at': datetime.now().isoformat(),
+                            'metadata_source': 'lancedb_inspection'
+                        }
+                        self.update_index_metadata(index_id, inferred_metadata)
+                        print(f"⚠️ Index {index_id[:8]}... appears incomplete - vector table missing")
+                        return inferred_metadata
+                    
+                    # Get table and inspect schema/data
+                    table = ldb.db.open_table(vector_table_name)
+                    
+                    # Get a sample record to inspect - use correct LanceDB API
+                    try:
+                        # Try to get sample data using proper LanceDB methods
+                        sample_df = table.to_pandas()
+                        if len(sample_df) == 0:
+                            inferred_metadata = {
+                                'status': 'empty',
+                                'issue': 'Vector table exists but contains no data',
+                                'metadata_inferred_at': datetime.now().isoformat(),
+                                'metadata_source': 'lancedb_inspection'
+                            }
+                            self.update_index_metadata(index_id, inferred_metadata)
+                            return inferred_metadata
+                        
+                        # Take only first row for inspection
+                        sample_df = sample_df.head(1)
+                    except Exception as e:
+                        print(f"⚠️ Could not read data from table {vector_table_name}: {e}")
+                        return {}
+                    
+                    # Infer metadata from table structure
+                    inferred_metadata = {
+                        'status': 'functional',
+                        'total_chunks': len(table.to_pandas()),  # Get total count
+                    }
+                    
+                    # Check vector dimensions
+                    if 'vector' in sample_df.columns:
+                        vector_data = sample_df['vector'].iloc[0]
+                        if isinstance(vector_data, list):
+                            inferred_metadata['vector_dimensions'] = len(vector_data)
+                            
+                            # Try to infer embedding model from vector dimensions
+                            dim_to_model = {
+                                384: 'BAAI/bge-small-en-v1.5 (or similar)',
+                                512: 'sentence-transformers/all-MiniLM-L6-v2 (or similar)',
+                                768: 'BAAI/bge-base-en-v1.5 (or similar)', 
+                                1024: 'Qwen/Qwen3-Embedding-0.6B (or similar)',
+                                1536: 'text-embedding-ada-002 (or similar)'
+                            }
+                            if len(vector_data) in dim_to_model:
+                                inferred_metadata['embedding_model_inferred'] = dim_to_model[len(vector_data)]
+                    
+                    # Try to parse metadata from sample record
+                    if 'metadata' in sample_df.columns:
+                        try:
+                            sample_metadata = json.loads(sample_df['metadata'].iloc[0])
+                            # Look for common metadata fields that might give us clues
+                            if 'document_id' in sample_metadata:
+                                inferred_metadata['has_document_structure'] = True
+                            if 'chunk_index' in sample_metadata:
+                                inferred_metadata['has_chunk_indexing'] = True
+                            if 'original_text' in sample_metadata:
+                                inferred_metadata['has_contextual_enrichment'] = True
+                                inferred_metadata['retrieval_mode_inferred'] = 'hybrid (contextual enrichment detected)'
+                            
+                            # Check for chunk size patterns
+                            if 'text' in sample_df.columns:
+                                text_length = len(sample_df['text'].iloc[0])
+                                if text_length > 0:
+                                    inferred_metadata['sample_chunk_length'] = text_length
+                                    # Rough chunk size estimation
+                                    estimated_tokens = text_length // 4  # rough estimate: 4 chars per token
+                                    if estimated_tokens < 300:
+                                        inferred_metadata['chunk_size_inferred'] = '256 tokens (estimated)'
+                                    elif estimated_tokens < 600:
+                                        inferred_metadata['chunk_size_inferred'] = '512 tokens (estimated)'
+                                    else:
+                                        inferred_metadata['chunk_size_inferred'] = '1024+ tokens (estimated)'
+                                        
+                        except (json.JSONDecodeError, KeyError):
+                            pass
+                    
+                    # Check if FTS index exists
+                    try:
+                        indices = table.list_indices()
+                        fts_exists = any('fts' in idx.name.lower() for idx in indices)
+                        if fts_exists:
+                            inferred_metadata['has_fts_index'] = True
+                            inferred_metadata['retrieval_mode_inferred'] = 'hybrid (FTS + vector)'
+                        else:
+                            inferred_metadata['retrieval_mode_inferred'] = 'vector-only'
+                    except:
+                        pass
+                    
+                    # Add inspection timestamp
+                    inferred_metadata['metadata_inferred_at'] = datetime.now().isoformat()
+                    inferred_metadata['metadata_source'] = 'lancedb_inspection'
+                    
+                    # Update the database with inferred metadata
+                    if inferred_metadata:
+                        self.update_index_metadata(index_id, inferred_metadata)
+                        print(f"🔍 Inferred metadata for index {index_id[:8]}...: {len(inferred_metadata)} fields")
+                    
+                    return inferred_metadata
+                    
+                except ImportError as import_error:
+                    # RAG system modules not available - provide basic fallback metadata
+                    print(f"⚠️ RAG system modules not available for inspection: {import_error}")
+                    
+                    # Check if this is actually a legacy index by looking at creation date
+                    created_at = index_info.get('created_at', '')
+                    is_recent = False
+                    if created_at:
+                        try:
+                            from datetime import datetime, timedelta
+                            created_date = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
+                            # Consider indexes created in the last 30 days as "recent"
+                            is_recent = created_date > datetime.now().replace(tzinfo=created_date.tzinfo) - timedelta(days=30)
+                        except:
+                            pass
+                    
+                    # Provide basic fallback metadata with better status detection
+                    if is_recent:
+                        status = 'functional'
+                        issue = 'Detailed configuration inspection requires RAG system modules, but index appears functional'
+                    else:
+                        status = 'legacy'
+                        issue = 'This index was created before metadata tracking was implemented. Configuration details are not available.'
+                    
+                    fallback_metadata = {
+                        'status': status,
+                        'issue': issue,
+                        'metadata_inferred_at': datetime.now().isoformat(),
+                        'metadata_source': 'fallback_inspection',
+                        'documents_count': len(index_info.get('documents', [])),
+                        'created_at': index_info.get('created_at', 'unknown'),
+                        'inspection_limitation': 'Backend server cannot access full RAG system modules for detailed inspection'
+                    }
+                    
+                    # Try to infer some basic info from the vector table name
+                    if vector_table_name:
+                        fallback_metadata['vector_table_name'] = vector_table_name
+                        fallback_metadata['note'] = 'Vector table exists but detailed inspection requires RAG system modules'
+                    
+                    self.update_index_metadata(index_id, fallback_metadata)
+                    status_msg = "recent but limited inspection" if is_recent else "legacy"
+                    print(f"📝 Added fallback metadata for {status_msg} index {index_id[:8]}...")
+                    return fallback_metadata
+                    
+            except Exception as e:
+                print(f"⚠️ Could not inspect LanceDB table for index {index_id[:8]}...: {e}")
+                return {}
+                
+        except Exception as e:
+            print(f"⚠️ Failed to inspect index metadata for {index_id[:8]}...: {e}")
+            return {}
+
+def generate_session_title(first_message: str, max_length: int = 50) -> str:
+    """Generate a session title from the first message"""
+    # Clean up the message
+    title = first_message.strip()
+    
+    # Remove common prefixes
+    prefixes = ["hey", "hi", "hello", "can you", "please", "i want", "i need"]
+    title_lower = title.lower()
+    for prefix in prefixes:
+        if title_lower.startswith(prefix):
+            title = title[len(prefix):].strip()
+            break
+    
+    # Capitalize first letter
+    if title:
+        title = title[0].upper() + title[1:]
+    
+    # Truncate if too long
+    if len(title) > max_length:
+        title = title[:max_length].strip() + "..."
+    
+    # Fallback
+    if not title or len(title) < 3:
+        title = "New Chat"
+    
+    return title
+
+# Global database instance
+db = ChatDatabase()
+
+if __name__ == "__main__":
+    # Test the database
+    print("🧪 Testing database...")
+    
+    # Create a test session
+    session_id = db.create_session("Test Chat", "llama3.2:latest")
+    
+    # Add some messages
+    db.add_message(session_id, "Hello!", "user")
+    db.add_message(session_id, "Hi there! How can I help you?", "assistant")
+    
+    # Get messages
+    messages = db.get_messages(session_id)
+    print(f"📨 Messages: {len(messages)}")
+    
+    # Get sessions
+    sessions = db.get_sessions()
+    print(f"📋 Sessions: {len(sessions)}")
+    
+    # Get stats
+    stats = db.get_stats()
+    print(f"📊 Stats: {stats}")
+    
+    print("✅ Database test completed!") 
--- a/backend/ollama_client.py
+++ b/backend/ollama_client.py
@ -0,0 +1,200 @@
+import requests
+import json
+import os
+from typing import List, Dict
+
+class OllamaClient:
+    def __init__(self, base_url: str = "http://localhost:11434"):
+        self.base_url = base_url
+        self.api_url = f"{base_url}/api"
+    
+    def is_ollama_running(self) -> bool:
+        """Check if Ollama server is running"""
+        try:
+            response = requests.get(f"{self.base_url}/api/tags", timeout=5)
+            return response.status_code == 200
+        except requests.exceptions.RequestException:
+            return False
+    
+    def list_models(self) -> List[str]:
+        """Get list of available models"""
+        try:
+            response = requests.get(f"{self.api_url}/tags")
+            if response.status_code == 200:
+                models = response.json().get("models", [])
+                return [model["name"] for model in models]
+            return []
+        except requests.exceptions.RequestException as e:
+            print(f"Error fetching models: {e}")
+            return []
+    
+    def pull_model(self, model_name: str) -> bool:
+        """Pull a model if not available"""
+        try:
+            response = requests.post(
+                f"{self.api_url}/pull",
+                json={"name": model_name},
+                stream=True
+            )
+            
+            if response.status_code == 200:
+                print(f"Pulling model {model_name}...")
+                for line in response.iter_lines():
+                    if line:
+                        data = json.loads(line)
+                        if "status" in data:
+                            print(f"Status: {data['status']}")
+                        if data.get("status") == "success":
+                            return True
+                return True
+            return False
+        except requests.exceptions.RequestException as e:
+            print(f"Error pulling model: {e}")
+            return False
+    
+    def chat(self, message: str, model: str = "llama3.2", conversation_history: List[Dict] = None, enable_thinking: bool = True) -> str:
+        """Send a chat message to Ollama"""
+        if conversation_history is None:
+            conversation_history = []
+        
+        # Add user message to conversation
+        messages = conversation_history + [{"role": "user", "content": message}]
+        
+        try:
+            payload = {
+                "model": model,
+                "messages": messages,
+                "stream": False,
+            }
+            
+            # Multiple approaches to disable thinking tokens
+            if not enable_thinking:
+                payload.update({
+                    "think": False,  # Native Ollama parameter
+                    "options": {
+                        "think": False,
+                        "thinking": False,
+                        "temperature": 0.7,
+                        "top_p": 0.9
+                    }
+                })
+            else:
+                payload["think"] = True
+            
+            response = requests.post(
+                f"{self.api_url}/chat",
+                json=payload,
+                timeout=60
+            )
+            
+            if response.status_code == 200:
+                result = response.json()
+                response_text = result["message"]["content"]
+                
+                # Additional cleanup: remove any thinking tokens that might slip through
+                if not enable_thinking:
+                    # Remove common thinking token patterns
+                    import re
+                    response_text = re.sub(r'<think>.*?</think>', '', response_text, flags=re.DOTALL | re.IGNORECASE)
+                    response_text = re.sub(r'<thinking>.*?</thinking>', '', response_text, flags=re.DOTALL | re.IGNORECASE)
+                    response_text = response_text.strip()
+                
+                return response_text
+            else:
+                return f"Error: {response.status_code} - {response.text}"
+                
+        except requests.exceptions.RequestException as e:
+            return f"Connection error: {e}"
+    
+    def chat_stream(self, message: str, model: str = "llama3.2", conversation_history: List[Dict] = None, enable_thinking: bool = True):
+        """Stream chat response from Ollama"""
+        if conversation_history is None:
+            conversation_history = []
+        
+        messages = conversation_history + [{"role": "user", "content": message}]
+        
+        try:
+            payload = {
+                "model": model,
+                "messages": messages,
+                "stream": True,
+            }
+            
+            # Multiple approaches to disable thinking tokens
+            if not enable_thinking:
+                payload.update({
+                    "think": False,  # Native Ollama parameter
+                    "options": {
+                        "think": False,
+                        "thinking": False,
+                        "temperature": 0.7,
+                        "top_p": 0.9
+                    }
+                })
+            else:
+                payload["think"] = True
+            
+            response = requests.post(
+                f"{self.api_url}/chat",
+                json=payload,
+                stream=True,
+                timeout=60
+            )
+            
+            if response.status_code == 200:
+                for line in response.iter_lines():
+                    if line:
+                        try:
+                            data = json.loads(line)
+                            if "message" in data and "content" in data["message"]:
+                                content = data["message"]["content"]
+                                
+                                # Filter out thinking tokens in streaming mode
+                                if not enable_thinking:
+                                    # Skip content that looks like thinking tokens
+                                    if '<think>' in content.lower() or '<thinking>' in content.lower():
+                                        continue
+                                
+                                yield content
+                        except json.JSONDecodeError:
+                            continue
+            else:
+                yield f"Error: {response.status_code} - {response.text}"
+                
+        except requests.exceptions.RequestException as e:
+            yield f"Connection error: {e}"
+
+def main():
+    """Test the Ollama client"""
+    client = OllamaClient()
+    
+    # Check if Ollama is running
+    if not client.is_ollama_running():
+        print("❌ Ollama is not running. Please start Ollama first.")
+        print("Install: https://ollama.ai")
+        print("Run: ollama serve")
+        return
+    
+    print("✅ Ollama is running!")
+    
+    # List available models
+    models = client.list_models()
+    print(f"Available models: {models}")
+    
+    # Try to use llama3.2, pull if needed
+    model_name = "llama3.2"
+    if model_name not in [m.split(":")[0] for m in models]:
+        print(f"Model {model_name} not found. Pulling...")
+        if client.pull_model(model_name):
+            print(f"✅ Model {model_name} pulled successfully!")
+        else:
+            print(f"❌ Failed to pull model {model_name}")
+            return
+    
+    # Test chat
+    print("\n🤖 Testing chat...")
+    response = client.chat("Hello! Can you tell me a short joke?", model_name)
+    print(f"AI: {response}")
+
+if __name__ == "__main__":
+    main() 
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@ -0,0 +1,3 @@
+requests
+python-dotenv
+PyPDF2 
--- a/backend/server.py
+++ b/backend/server.py
--- a/backend/simple_pdf_processor.py
+++ b/backend/simple_pdf_processor.py
@ -0,0 +1,216 @@
+"""
+Simple PDF Processing Service
+Handles PDF upload and text extraction for RAG functionality
+"""
+
+import os
+import uuid
+from typing import List, Dict, Any
+import PyPDF2
+from io import BytesIO
+import sqlite3
+import json
+from datetime import datetime
+
+class SimplePDFProcessor:
+    def __init__(self, db_path: str = "chat_data.db"):
+        """Initialize simple PDF processor with SQLite storage"""
+        self.db_path = db_path
+        self.init_database()
+        print("✅ Simple PDF processor initialized")
+    
+    def init_database(self):
+        """Initialize SQLite database for storing PDF content"""
+        conn = sqlite3.connect(self.db_path)
+        conn.execute('''
+            CREATE TABLE IF NOT EXISTS pdf_documents (
+                id TEXT PRIMARY KEY,
+                session_id TEXT NOT NULL,
+                filename TEXT NOT NULL,
+                content TEXT NOT NULL,
+                created_at TEXT NOT NULL
+            )
+        ''')
+        
+        conn.commit()
+        conn.close()
+    
+    def extract_text_from_pdf(self, pdf_bytes: bytes) -> str:
+        """Extract text from PDF bytes"""
+        try:
+            print(f"📄 Starting PDF text extraction ({len(pdf_bytes)} bytes)")
+            pdf_file = BytesIO(pdf_bytes)
+            pdf_reader = PyPDF2.PdfReader(pdf_file)
+            
+            print(f"📖 PDF has {len(pdf_reader.pages)} pages")
+            
+            text = ""
+            for page_num, page in enumerate(pdf_reader.pages):
+                print(f"📄 Processing page {page_num + 1}")
+                try:
+                    page_text = page.extract_text()
+                    if page_text.strip():
+                        text += f"\n--- Page {page_num + 1} ---\n"
+                        text += page_text + "\n"
+                    print(f"✅ Page {page_num + 1}: extracted {len(page_text)} characters")
+                except Exception as page_error:
+                    print(f"❌ Error on page {page_num + 1}: {str(page_error)}")
+                    continue
+            
+            print(f"📄 Total extracted text: {len(text)} characters")
+            return text.strip()
+            
+        except Exception as e:
+            print(f"❌ Error extracting text from PDF: {str(e)}")
+            print(f"❌ Error type: {type(e).__name__}")
+            return ""
+    
+    def process_pdf(self, pdf_bytes: bytes, filename: str, session_id: str) -> Dict[str, Any]:
+        """Process a PDF file and store in database"""
+        print(f"📄 Processing PDF: {filename}")
+        
+        # Extract text
+        text = self.extract_text_from_pdf(pdf_bytes)
+        if not text:
+            return {
+                "success": False,
+                "error": "Could not extract text from PDF",
+                "filename": filename
+            }
+        
+        print(f"📝 Extracted {len(text)} characters from {filename}")
+        
+        # Store in database
+        document_id = str(uuid.uuid4())
+        now = datetime.now().isoformat()
+        
+        try:
+            conn = sqlite3.connect(self.db_path)
+            
+            # Store document
+            conn.execute('''
+                INSERT INTO pdf_documents (id, session_id, filename, content, created_at)
+                VALUES (?, ?, ?, ?, ?)
+            ''', (document_id, session_id, filename, text, now))
+            
+            conn.commit()
+            conn.close()
+            
+            print(f"💾 Stored document {filename} in database")
+            
+            return {
+                "success": True,
+                "filename": filename,
+                "file_id": document_id,
+                "text_length": len(text)
+            }
+            
+        except Exception as e:
+            print(f"❌ Error storing in database: {str(e)}")
+            return {
+                "success": False,
+                "error": f"Database storage failed: {str(e)}",
+                "filename": filename
+            }
+    
+    def get_session_documents(self, session_id: str) -> List[Dict[str, Any]]:
+        """Get all documents for a session"""
+        try:
+            conn = sqlite3.connect(self.db_path)
+            conn.row_factory = sqlite3.Row
+            
+            cursor = conn.execute('''
+                SELECT id, filename, created_at
+                FROM pdf_documents
+                WHERE session_id = ?
+                ORDER BY created_at DESC
+            ''', (session_id,))
+            
+            documents = [dict(row) for row in cursor.fetchall()]
+            conn.close()
+            
+            return documents
+            
+        except Exception as e:
+            print(f"❌ Error getting session documents: {str(e)}")
+            return []
+    
+    def get_document_content(self, session_id: str) -> str:
+        """Get all document content for a session (for LLM context)"""
+        try:
+            conn = sqlite3.connect(self.db_path)
+            
+            cursor = conn.execute('''
+                SELECT filename, content
+                FROM pdf_documents
+                WHERE session_id = ?
+                ORDER BY created_at ASC
+            ''', (session_id,))
+            
+            rows = cursor.fetchall()
+            conn.close()
+            
+            if not rows:
+                return ""
+            
+            # Combine all document content
+            combined_content = ""
+            for filename, content in rows:
+                combined_content += f"\n\n=== Document: {filename} ===\n\n"
+                combined_content += content
+            
+            return combined_content.strip()
+            
+        except Exception as e:
+            print(f"❌ Error getting document content: {str(e)}")
+            return ""
+    
+    def delete_session_documents(self, session_id: str) -> bool:
+        """Delete all documents for a session"""
+        try:
+            conn = sqlite3.connect(self.db_path)
+            cursor = conn.execute('''
+                DELETE FROM pdf_documents
+                WHERE session_id = ?
+            ''', (session_id,))
+            
+            deleted_count = cursor.rowcount
+            conn.commit()
+            conn.close()
+            
+            if deleted_count > 0:
+                print(f"🗑️ Deleted {deleted_count} documents for session {session_id[:8]}...")
+            
+            return deleted_count > 0
+            
+        except Exception as e:
+            print(f"❌ Error deleting session documents: {str(e)}")
+            return False
+
+
+# Global instance
+simple_pdf_processor = None
+
+def initialize_simple_pdf_processor():
+    """Initialize the global PDF processor"""
+    global simple_pdf_processor
+    try:
+        simple_pdf_processor = SimplePDFProcessor()
+        print("✅ Global PDF processor initialized")
+    except Exception as e:
+        print(f"❌ Failed to initialize PDF processor: {str(e)}")
+        simple_pdf_processor = None
+
+def get_simple_pdf_processor():
+    """Get the global PDF processor instance"""
+    global simple_pdf_processor
+    if simple_pdf_processor is None:
+        initialize_simple_pdf_processor()
+    return simple_pdf_processor
+
+if __name__ == "__main__":
+    # Test the simple PDF processor
+    print("🧪 Testing simple PDF processor...")
+    
+    processor = SimplePDFProcessor()
+    print("✅ Simple PDF processor test completed!") 
--- a/backend/test_backend.py
+++ b/backend/test_backend.py
@ -0,0 +1,155 @@
+#!/usr/bin/env python3
+"""
+Simple test script for the localGPT backend
+"""
+
+import requests
+import json
+import time
+
+def test_health_endpoint():
+    """Test the health endpoint"""
+    print("🔍 Testing health endpoint...")
+    try:
+        response = requests.get("http://localhost:8000/health", timeout=5)
+        if response.status_code == 200:
+            data = response.json()
+            print(f"✅ Health check passed")
+            print(f"   Ollama running: {data['ollama_running']}")
+            print(f"   Models available: {len(data['available_models'])}")
+            return True
+        else:
+            print(f"❌ Health check failed: {response.status_code}")
+            return False
+    except requests.exceptions.RequestException as e:
+        print(f"❌ Health check failed: {e}")
+        return False
+
+def test_chat_endpoint():
+    """Test the chat endpoint"""
+    print("\n💬 Testing chat endpoint...")
+    
+    test_message = {
+        "message": "Say 'Hello World' and nothing else.",
+        "model": "llama3.2:latest"
+    }
+    
+    try:
+        response = requests.post(
+            "http://localhost:8000/chat",
+            headers={"Content-Type": "application/json"},
+            json=test_message,
+            timeout=30
+        )
+        
+        if response.status_code == 200:
+            data = response.json()
+            print(f"✅ Chat test passed")
+            print(f"   Model: {data['model']}")
+            print(f"   Response: {data['response']}")
+            print(f"   Message count: {data['message_count']}")
+            return True
+        else:
+            print(f"❌ Chat test failed: {response.status_code}")
+            print(f"   Response: {response.text}")
+            return False
+            
+    except requests.exceptions.RequestException as e:
+        print(f"❌ Chat test failed: {e}")
+        return False
+
+def test_conversation_history():
+    """Test conversation with history"""
+    print("\n🗨️  Testing conversation history...")
+    
+    # First message
+    conversation = []
+    
+    message1 = {
+        "message": "My name is Alice. Remember this.",
+        "model": "llama3.2:latest",
+        "conversation_history": conversation
+    }
+    
+    try:
+        response1 = requests.post(
+            "http://localhost:8000/chat",
+            headers={"Content-Type": "application/json"},
+            json=message1,
+            timeout=30
+        )
+        
+        if response1.status_code == 200:
+            data1 = response1.json()
+            
+            # Add to conversation history
+            conversation.append({"role": "user", "content": "My name is Alice. Remember this."})
+            conversation.append({"role": "assistant", "content": data1["response"]})
+            
+            # Second message asking about the name
+            message2 = {
+                "message": "What is my name?",
+                "model": "llama3.2:latest", 
+                "conversation_history": conversation
+            }
+            
+            response2 = requests.post(
+                "http://localhost:8000/chat",
+                headers={"Content-Type": "application/json"},
+                json=message2,
+                timeout=30
+            )
+            
+            if response2.status_code == 200:
+                data2 = response2.json()
+                print(f"✅ Conversation history test passed")
+                print(f"   First response: {data1['response']}")
+                print(f"   Second response: {data2['response']}")
+                
+                # Check if the AI remembered the name
+                if "alice" in data2['response'].lower():
+                    print(f"✅ AI correctly remembered the name!")
+                else:
+                    print(f"⚠️  AI might not have remembered the name")
+                return True
+            else:
+                print(f"❌ Second message failed: {response2.status_code}")
+                return False
+        else:
+            print(f"❌ First message failed: {response1.status_code}")
+            return False
+            
+    except requests.exceptions.RequestException as e:
+        print(f"❌ Conversation test failed: {e}")
+        return False
+
+def main():
+    print("🧪 Testing localGPT Backend")
+    print("=" * 40)
+    
+    # Test health endpoint
+    health_ok = test_health_endpoint()
+    if not health_ok:
+        print("\n❌ Backend server is not running or not healthy")
+        print("   Make sure to run: python server.py")
+        return
+    
+    # Test basic chat
+    chat_ok = test_chat_endpoint()
+    if not chat_ok:
+        print("\n❌ Chat functionality is not working")
+        return
+    
+    # Test conversation history
+    conversation_ok = test_conversation_history()
+    
+    print("\n" + "=" * 40)
+    if health_ok and chat_ok and conversation_ok:
+        print("🎉 All tests passed! Backend is ready for frontend integration.")
+    else:
+        print("⚠️  Some tests failed. Check the issues above.")
+    
+    print("\n🔗 Ready to connect to frontend at http://localhost:3000")
+
+if __name__ == "__main__":
+    main() 
--- a/batch_indexing_config.json
+++ b/batch_indexing_config.json
@ -0,0 +1,19 @@
+{
+  "index_name": "Sample Batch Index",
+  "index_description": "Example batch index configuration",
+  "documents": [
+    "./rag_system/documents/invoice_1039.pdf",
+    "./rag_system/documents/invoice_1041.pdf"
+  ],
+  "processing": {
+    "chunk_size": 512,
+    "chunk_overlap": 64,
+    "enable_enrich": true,
+    "enable_latechunk": true,
+    "enable_docling": true,
+    "embedding_model": "Qwen/Qwen3-Embedding-0.6B",
+    "generation_model": "qwen3:0.6b",
+    "retrieval_mode": "hybrid",
+    "window_size": 2
+  }
+}
--- a/constants.py
+++ b/constants.py
@ -1,202 +0,0 @@
-import os
-
-# from dotenv import load_dotenv
-from chromadb.config import Settings
-
-# https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/excel.html?highlight=xlsx#microsoft-excel
-from langchain.document_loaders import CSVLoader, PDFMinerLoader, TextLoader, UnstructuredExcelLoader, Docx2txtLoader
-from langchain.document_loaders import UnstructuredFileLoader, UnstructuredMarkdownLoader
-from langchain.document_loaders import UnstructuredHTMLLoader
-
-
-# load_dotenv()
-ROOT_DIRECTORY = os.path.dirname(os.path.realpath(__file__))
-
-# Define the folder for storing database
-SOURCE_DIRECTORY = f"{ROOT_DIRECTORY}/SOURCE_DOCUMENTS"
-
-PERSIST_DIRECTORY = f"{ROOT_DIRECTORY}/DB"
-
-MODELS_PATH = "./models"
-
-# Can be changed to a specific number
-INGEST_THREADS = os.cpu_count() or 8
-
-# Define the Chroma settings
-CHROMA_SETTINGS = Settings(
-    anonymized_telemetry=False,
-    is_persistent=True,
-)
-
-# Context Window and Max New Tokens
-CONTEXT_WINDOW_SIZE = 8096
-MAX_NEW_TOKENS = CONTEXT_WINDOW_SIZE  # int(CONTEXT_WINDOW_SIZE/4)
-
-#### If you get a "not enough space in the buffer" error, you should reduce the values below, start with half of the original values and keep halving the value until the error stops appearing
-
-N_GPU_LAYERS = 100  # Llama-2-70B has 83 layers
-N_BATCH = 512
-
-### From experimenting with the Llama-2-7B-Chat-GGML model on 8GB VRAM, these values work:
-# N_GPU_LAYERS = 20
-# N_BATCH = 512
-
-
-# https://python.langchain.com/en/latest/_modules/langchain/document_loaders/excel.html#UnstructuredExcelLoader
-DOCUMENT_MAP = {
-    ".html": UnstructuredHTMLLoader,
-    ".txt": TextLoader,
-    ".md": UnstructuredMarkdownLoader,
-    ".py": TextLoader,
-    # ".pdf": PDFMinerLoader,
-    ".pdf": UnstructuredFileLoader,
-    ".csv": CSVLoader,
-    ".xls": UnstructuredExcelLoader,
-    ".xlsx": UnstructuredExcelLoader,
-    ".docx": Docx2txtLoader,
-    ".doc": Docx2txtLoader,
-}
-
-# Default Instructor Model
-EMBEDDING_MODEL_NAME = "hkunlp/instructor-large"  # Uses 1.5 GB of VRAM (High Accuracy with lower VRAM usage)
-
-####
-#### OTHER EMBEDDING MODEL OPTIONS
-####
-
-# EMBEDDING_MODEL_NAME = "hkunlp/instructor-xl" # Uses 5 GB of VRAM (Most Accurate of all models)
-# EMBEDDING_MODEL_NAME = "intfloat/e5-large-v2" # Uses 1.5 GB of VRAM (A little less accurate than instructor-large)
-# EMBEDDING_MODEL_NAME = "intfloat/e5-base-v2" # Uses 0.5 GB of VRAM (A good model for lower VRAM GPUs)
-# EMBEDDING_MODEL_NAME = "all-MiniLM-L6-v2" # Uses 0.2 GB of VRAM (Less accurate but fastest - only requires 150mb of vram)
-
-####
-#### MULTILINGUAL EMBEDDING MODELS
-####
-
-# EMBEDDING_MODEL_NAME = "intfloat/multilingual-e5-large" # Uses 2.5 GB of VRAM
-# EMBEDDING_MODEL_NAME = "intfloat/multilingual-e5-base" # Uses 1.2 GB of VRAM
-
-
-#### SELECT AN OPEN SOURCE LLM (LARGE LANGUAGE MODEL)
-# Select the Model ID and model_basename
-# load the LLM for generating Natural Language responses
-
-#### GPU VRAM Memory required for LLM Models (ONLY) by Billion Parameter value (B Model)
-#### Does not include VRAM used by Embedding Models - which use an additional 2GB-7GB of VRAM depending on the model.
-####
-#### (B Model)   (float32)    (float16)    (GPTQ 8bit)         (GPTQ 4bit)
-####    7b         28 GB        14 GB       7 GB - 9 GB        3.5 GB - 5 GB
-####    13b        52 GB        26 GB       13 GB - 15 GB      6.5 GB - 8 GB
-####    32b        130 GB       65 GB       32.5 GB - 35 GB    16.25 GB - 19 GB
-####    65b        260.8 GB     130.4 GB    65.2 GB - 67 GB    32.6 GB -  - 35 GB
-
-# MODEL_ID = "TheBloke/Llama-2-7B-Chat-GGML"
-# MODEL_BASENAME = "llama-2-7b-chat.ggmlv3.q4_0.bin"
-
-####
-#### (FOR GGUF MODELS)
-####
-
-# MODEL_ID = "TheBloke/Llama-2-13b-Chat-GGUF"
-# MODEL_BASENAME = "llama-2-13b-chat.Q4_K_M.gguf"
-
-# MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
-# MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"
-
-# MODEL_ID = "QuantFactory/Meta-Llama-3-8B-Instruct-GGUF"
-# MODEL_BASENAME = "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf"
-
-# Use mistral to run on hpu
-# MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.2"
-
-# LLAMA 3 # use for Apple Silicon
-MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
-MODEL_BASENAME = None
-
-# LLAMA 3 # use for NVIDIA GPUs
-# MODEL_ID = "unsloth/llama-3-8b-bnb-4bit"
-# MODEL_BASENAME = None
-
-# MODEL_ID = "TheBloke/Mistral-7B-Instruct-v0.1-GGUF"
-# MODEL_BASENAME = "mistral-7b-instruct-v0.1.Q8_0.gguf"
-
-# MODEL_ID = "TheBloke/Llama-2-70b-Chat-GGUF"
-# MODEL_BASENAME = "llama-2-70b-chat.Q4_K_M.gguf"
-
-####
-#### (FOR HF MODELS)
-####
-
-# MODEL_ID = "NousResearch/Llama-2-7b-chat-hf"
-# MODEL_BASENAME = None
-# MODEL_ID = "TheBloke/vicuna-7B-1.1-HF"
-# MODEL_BASENAME = None
-# MODEL_ID = "TheBloke/Wizard-Vicuna-7B-Uncensored-HF"
-# MODEL_ID = "TheBloke/guanaco-7B-HF"
-# MODEL_ID = 'NousResearch/Nous-Hermes-13b' # Requires ~ 23GB VRAM. Using STransformers
-# alongside will 100% create OOM on 24GB cards.
-# llm = load_model(device_type, model_id=model_id)
-
-####
-#### (FOR GPTQ QUANTIZED) Select a llm model based on your GPU and VRAM GB. Does not include Embedding Models VRAM usage.
-####
-
-##### 48GB VRAM Graphics Cards (RTX 6000, RTX A6000 and other 48GB VRAM GPUs) #####
-
-### 65b GPTQ LLM Models for 48GB GPUs (*** With best embedding model: hkunlp/instructor-xl ***)
-# MODEL_ID = "TheBloke/guanaco-65B-GPTQ"
-# MODEL_BASENAME = "model.safetensors"
-# MODEL_ID = "TheBloke/Airoboros-65B-GPT4-2.0-GPTQ"
-# MODEL_BASENAME = "model.safetensors"
-# MODEL_ID = "TheBloke/gpt4-alpaca-lora_mlp-65B-GPTQ"
-# MODEL_BASENAME = "model.safetensors"
-# MODEL_ID = "TheBloke/Upstage-Llama1-65B-Instruct-GPTQ"
-# MODEL_BASENAME = "model.safetensors"
-
-##### 24GB VRAM Graphics Cards (RTX 3090 - RTX 4090 (35% Faster) - RTX A5000 - RTX A5500) #####
-
-### 13b GPTQ Models for 24GB GPUs (*** With best embedding model: hkunlp/instructor-xl ***)
-# MODEL_ID = "TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ"
-# MODEL_BASENAME = "Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors"
-# MODEL_ID = "TheBloke/vicuna-13B-v1.5-GPTQ"
-# MODEL_BASENAME = "model.safetensors"
-# MODEL_ID = "TheBloke/Nous-Hermes-13B-GPTQ"
-# MODEL_BASENAME = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"
-# MODEL_ID = "TheBloke/WizardLM-13B-V1.2-GPTQ"
-# MODEL_BASENAME = "gptq_model-4bit-128g.safetensors
-
-### 30b GPTQ Models for 24GB GPUs (*** Requires using intfloat/e5-base-v2 instead of hkunlp/instructor-large as embedding model ***)
-# MODEL_ID = "TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ"
-# MODEL_BASENAME = "Wizard-Vicuna-30B-Uncensored-GPTQ-4bit--1g.act.order.safetensors"
-# MODEL_ID = "TheBloke/WizardLM-30B-Uncensored-GPTQ"
-# MODEL_BASENAME = "WizardLM-30B-Uncensored-GPTQ-4bit.act-order.safetensors"
-
-##### 8-10GB VRAM Graphics Cards (RTX 3080 - RTX 3080 Ti - RTX 3070 Ti - 3060 Ti - RTX 2000 Series, Quadro RTX 4000, 5000, 6000) #####
-### (*** Requires using intfloat/e5-small-v2 instead of hkunlp/instructor-large as embedding model ***)
-
-### 7b GPTQ Models for 8GB GPUs
-# MODEL_ID = "TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ"
-# MODEL_BASENAME = "Wizard-Vicuna-7B-Uncensored-GPTQ-4bit-128g.no-act.order.safetensors"
-# MODEL_ID = "TheBloke/WizardLM-7B-uncensored-GPTQ"
-# MODEL_BASENAME = "WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors"
-# MODEL_ID = "TheBloke/wizardLM-7B-GPTQ"
-# MODEL_BASENAME = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"
-
-####
-#### (FOR GGML) (Quantized cpu+gpu+mps) models - check if they support llama.cpp
-####
-
-# MODEL_ID = "TheBloke/wizard-vicuna-13B-GGML"
-# MODEL_BASENAME = "wizard-vicuna-13B.ggmlv3.q4_0.bin"
-# MODEL_BASENAME = "wizard-vicuna-13B.ggmlv3.q6_K.bin"
-# MODEL_BASENAME = "wizard-vicuna-13B.ggmlv3.q2_K.bin"
-# MODEL_ID = "TheBloke/orca_mini_3B-GGML"
-# MODEL_BASENAME = "orca-mini-3b.ggmlv3.q4_0.bin"
-
-####
-#### (FOR AWQ QUANTIZED) Select a llm model based on your GPU and VRAM GB. Does not include Embedding Models VRAM usage.
-### (*** MODEL_BASENAME is not actually used but have to contain .awq so the correct model loading is used ***)
-### (*** Compute capability 7.5 (sm75) and CUDA Toolkit 11.8+ are required ***)
-####
-# MODEL_ID = "TheBloke/Llama-2-7B-Chat-AWQ"
-# MODEL_BASENAME = "model.safetensors.awq"
--- a/crawl.py
+++ b/crawl.py
@ -1,91 +0,0 @@
-import os
-import shutil
-import click
-import subprocess
-
-from constants import (
-    DOCUMENT_MAP,
-    SOURCE_DIRECTORY
-)
-
-def logToFile(logentry):
-   file1 = open("crawl.log","a")
-   file1.write(logentry + "\n")
-   file1.close()
-   print(logentry + "\n")
-
-@click.command()
-@click.option(
-    "--device_type",
-    default="cuda",
-    type=click.Choice(
-        [
-            "cpu",
-            "cuda",
-            "ipu",
-            "xpu",
-            "mkldnn",
-            "opengl",
-            "opencl",
-            "ideep",
-            "hip",
-            "ve",
-            "fpga",
-            "ort",
-            "xla",
-            "lazy",
-            "vulkan",
-            "mps",
-            "meta",
-            "hpu",
-            "mtia",
-        ],
-    ),
-    help="Device to run on. (Default is cuda)",
-)
-@click.option(
-    "--landing_directory",
-    default="./LANDING_DOCUMENTS"
-)
-@click.option(
-    "--processed_directory",
-    default="./PROCESSED_DOCUMENTS"
-)
-@click.option(
-    "--error_directory",
-    default="./ERROR_DOCUMENTS"
-)
-@click.option(
-    "--unsupported_directory",
-    default="./UNSUPPORTED_DOCUMENTS"
-)
-
-def main(device_type, landing_directory, processed_directory, error_directory, unsupported_directory):
-    paths = []
-
-    os.makedirs(processed_directory, exist_ok=True)
-    os.makedirs(error_directory, exist_ok=True)
-    os.makedirs(unsupported_directory, exist_ok=True)
-
-    for root, _, files in os.walk(landing_directory):
-        for file_name in files:
-            file_extension = os.path.splitext(file_name)[1]
-            short_filename = os.path.basename(file_name)
-
-            if not os.path.isdir(root + "/" + file_name):
-               if file_extension in DOCUMENT_MAP.keys():
-                   shutil.move(root + "/" + file_name, SOURCE_DIRECTORY+ "/" + short_filename)
-                   logToFile("START: " + root + "/" + short_filename)
-                   process = subprocess.Popen("python ingest.py --device_type=" + device_type, shell=True, stdout=subprocess.PIPE)
-                   process.wait()
-                   if process.returncode > 0:
-                       shutil.move(SOURCE_DIRECTORY + "/" + short_filename, error_directory + "/" + short_filename)
-                       logToFile("ERROR: " + root + "/" + short_filename)
-                   else:
-                       logToFile("VALID: " + root + "/" + short_filename)
-                       shutil.move(SOURCE_DIRECTORY + "/" + short_filename, processed_directory+ "/" + short_filename)
-               else:
-                   shutil.move(root + "/" + file_name, unsupported_directory+ "/" + short_filename)
-
-if __name__ == "__main__":
-    main()
--- a/create_index_script.py
+++ b/create_index_script.py
@ -0,0 +1,372 @@
+#!/usr/bin/env python3
+"""
+Interactive Index Creation Script for LocalGPT RAG System
+
+This script provides a user-friendly interface for creating document indexes
+using the LocalGPT RAG system. It supports both single documents and batch
+processing of multiple documents.
+
+Usage:
+    python create_index_script.py
+    python create_index_script.py --batch
+    python create_index_script.py --config custom_config.json
+"""
+
+import os
+import sys
+import json
+import argparse
+from typing import List, Optional
+from pathlib import Path
+
+# Add the project root to the path so we can import rag_system modules
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+
+try:
+    from rag_system.main import PIPELINE_CONFIGS, get_agent
+    from rag_system.pipelines.indexing_pipeline import IndexingPipeline
+    from rag_system.utils.ollama_client import OllamaClient
+    from backend.database import ChatDatabase
+except ImportError as e:
+    print(f"❌ Error importing required modules: {e}")
+    print("Please ensure you're running this script from the project root directory.")
+    sys.exit(1)
+
+
+class IndexCreator:
+    """Interactive index creation utility."""
+    
+    def __init__(self, config_path: Optional[str] = None):
+        """Initialize the index creator with optional custom configuration."""
+        self.db = ChatDatabase()
+        self.config = self._load_config(config_path)
+        
+        # Initialize Ollama client
+        self.ollama_client = OllamaClient()
+        self.ollama_config = {
+            "generation_model": "qwen3:0.6b",
+            "embedding_model": "qwen3:0.6b"
+        }
+        
+        # Initialize indexing pipeline
+        self.pipeline = IndexingPipeline(
+            self.config, 
+            self.ollama_client, 
+            self.ollama_config
+        )
+    
+    def _load_config(self, config_path: Optional[str] = None) -> dict:
+        """Load configuration from file or use default."""
+        if config_path and os.path.exists(config_path):
+            try:
+                with open(config_path, 'r') as f:
+                    return json.load(f)
+            except Exception as e:
+                print(f"⚠️  Error loading config from {config_path}: {e}")
+                print("Using default configuration...")
+        
+        return PIPELINE_CONFIGS.get("default", {})
+    
+    def get_user_input(self, prompt: str, default: str = "") -> str:
+        """Get user input with optional default value."""
+        if default:
+            user_input = input(f"{prompt} [{default}]: ").strip()
+            return user_input if user_input else default
+        return input(f"{prompt}: ").strip()
+    
+    def select_documents(self) -> List[str]:
+        """Interactive document selection."""
+        print("\n📁 Document Selection")
+        print("=" * 50)
+        
+        documents = []
+        
+        while True:
+            print("\nOptions:")
+            print("1. Add a single document")
+            print("2. Add all documents from a directory")
+            print("3. Finish and proceed with selected documents")
+            print("4. Show selected documents")
+            
+            choice = self.get_user_input("Select an option (1-4)", "1")
+            
+            if choice == "1":
+                doc_path = self.get_user_input("Enter document path")
+                if os.path.exists(doc_path):
+                    documents.append(os.path.abspath(doc_path))
+                    print(f"✅ Added: {doc_path}")
+                else:
+                    print(f"❌ File not found: {doc_path}")
+            
+            elif choice == "2":
+                dir_path = self.get_user_input("Enter directory path")
+                if os.path.isdir(dir_path):
+                    supported_extensions = ['.pdf', '.txt', '.docx', '.md']
+                    found_docs = []
+                    
+                    for ext in supported_extensions:
+                        found_docs.extend(Path(dir_path).glob(f"*{ext}"))
+                        found_docs.extend(Path(dir_path).glob(f"**/*{ext}"))
+                    
+                    if found_docs:
+                        print(f"Found {len(found_docs)} documents:")
+                        for doc in found_docs:
+                            print(f"  - {doc}")
+                        
+                        if self.get_user_input("Add all these documents? (y/n)", "y").lower() == 'y':
+                            documents.extend([str(doc.absolute()) for doc in found_docs])
+                            print(f"✅ Added {len(found_docs)} documents")
+                    else:
+                        print("❌ No supported documents found in directory")
+                else:
+                    print(f"❌ Directory not found: {dir_path}")
+            
+            elif choice == "3":
+                if documents:
+                    break
+                else:
+                    print("❌ No documents selected. Please add at least one document.")
+            
+            elif choice == "4":
+                if documents:
+                    print(f"\n📄 Selected documents ({len(documents)}):")
+                    for i, doc in enumerate(documents, 1):
+                        print(f"  {i}. {doc}")
+                else:
+                    print("No documents selected yet.")
+            
+            else:
+                print("Invalid choice. Please select 1-4.")
+        
+        return documents
+    
+    def configure_processing(self) -> dict:
+        """Interactive processing configuration."""
+        print("\n⚙️  Processing Configuration")
+        print("=" * 50)
+        
+        print("Configure how documents will be processed:")
+        
+        # Basic settings
+        chunk_size = int(self.get_user_input("Chunk size", "512"))
+        chunk_overlap = int(self.get_user_input("Chunk overlap", "64"))
+        
+        # Advanced settings
+        print("\nAdvanced options:")
+        enable_enrich = self.get_user_input("Enable contextual enrichment? (y/n)", "y").lower() == 'y'
+        enable_latechunk = self.get_user_input("Enable late chunking? (y/n)", "y").lower() == 'y'
+        enable_docling = self.get_user_input("Enable Docling chunking? (y/n)", "y").lower() == 'y'
+        
+        # Model selection
+        print("\nModel Configuration:")
+        embedding_model = self.get_user_input("Embedding model", "Qwen/Qwen3-Embedding-0.6B")
+        generation_model = self.get_user_input("Generation model", "qwen3:0.6b")
+        
+        return {
+            "chunk_size": chunk_size,
+            "chunk_overlap": chunk_overlap,
+            "enable_enrich": enable_enrich,
+            "enable_latechunk": enable_latechunk,
+            "enable_docling": enable_docling,
+            "embedding_model": embedding_model,
+            "generation_model": generation_model,
+            "retrieval_mode": "hybrid",
+            "window_size": 2
+        }
+    
+    def create_index_interactive(self) -> None:
+        """Run the interactive index creation process."""
+        print("🚀 LocalGPT Index Creation Tool")
+        print("=" * 50)
+        
+        # Get index details
+        index_name = self.get_user_input("Enter index name")
+        index_description = self.get_user_input("Enter index description (optional)")
+        
+        # Select documents
+        documents = self.select_documents()
+        
+        # Configure processing
+        processing_config = self.configure_processing()
+        
+        # Confirm creation
+        print("\n📋 Index Summary")
+        print("=" * 50)
+        print(f"Name: {index_name}")
+        print(f"Description: {index_description or 'None'}")
+        print(f"Documents: {len(documents)}")
+        print(f"Chunk size: {processing_config['chunk_size']}")
+        print(f"Enrichment: {'Enabled' if processing_config['enable_enrich'] else 'Disabled'}")
+        print(f"Embedding model: {processing_config['embedding_model']}")
+        
+        if self.get_user_input("\nProceed with index creation? (y/n)", "y").lower() != 'y':
+            print("❌ Index creation cancelled.")
+            return
+        
+        # Create the index
+        try:
+            print("\n🔥 Creating index...")
+            
+            # Create index record in database
+            index_id = self.db.create_index(
+                name=index_name,
+                description=index_description,
+                metadata=processing_config
+            )
+            
+            # Add documents to index
+            for doc_path in documents:
+                filename = os.path.basename(doc_path)
+                self.db.add_document_to_index(index_id, filename, doc_path)
+            
+            # Process documents through pipeline
+            print("📚 Processing documents...")
+            self.pipeline.process_documents(documents)
+            
+            print(f"\n✅ Index '{index_name}' created successfully!")
+            print(f"Index ID: {index_id}")
+            print(f"Processed {len(documents)} documents")
+            
+            # Test the index
+            if self.get_user_input("\nTest the index with a sample query? (y/n)", "y").lower() == 'y':
+                self.test_index(index_id)
+                
+        except Exception as e:
+            print(f"❌ Error creating index: {e}")
+            import traceback
+            traceback.print_exc()
+    
+    def test_index(self, index_id: str) -> None:
+        """Test the created index with a sample query."""
+        try:
+            print("\n🧪 Testing Index")
+            print("=" * 50)
+            
+            # Get agent for testing
+            agent = get_agent("default")
+            
+            # Test query
+            test_query = self.get_user_input("Enter a test query", "What is this document about?")
+            
+            print(f"\nProcessing query: {test_query}")
+            response = agent.run(test_query, table_name=f"text_pages_{index_id}")
+            
+            print(f"\n🤖 Response:")
+            print(response)
+            
+        except Exception as e:
+            print(f"❌ Error testing index: {e}")
+    
+    def batch_create_from_config(self, config_file: str) -> None:
+        """Create index from batch configuration file."""
+        try:
+            with open(config_file, 'r') as f:
+                batch_config = json.load(f)
+            
+            index_name = batch_config.get("index_name", "Batch Index")
+            index_description = batch_config.get("index_description", "")
+            documents = batch_config.get("documents", [])
+            processing_config = batch_config.get("processing", {})
+            
+            if not documents:
+                print("❌ No documents specified in batch configuration")
+                return
+            
+            # Validate documents exist
+            valid_documents = []
+            for doc_path in documents:
+                if os.path.exists(doc_path):
+                    valid_documents.append(doc_path)
+                else:
+                    print(f"⚠️  Document not found: {doc_path}")
+            
+            if not valid_documents:
+                print("❌ No valid documents found")
+                return
+            
+            print(f"🚀 Creating batch index: {index_name}")
+            print(f"📄 Processing {len(valid_documents)} documents...")
+            
+            # Create index
+            index_id = self.db.create_index(
+                name=index_name,
+                description=index_description,
+                metadata=processing_config
+            )
+            
+            # Add documents
+            for doc_path in valid_documents:
+                filename = os.path.basename(doc_path)
+                self.db.add_document_to_index(index_id, filename, doc_path)
+            
+            # Process documents
+            self.pipeline.process_documents(valid_documents)
+            
+            print(f"✅ Batch index '{index_name}' created successfully!")
+            print(f"Index ID: {index_id}")
+            
+        except Exception as e:
+            print(f"❌ Error creating batch index: {e}")
+            import traceback
+            traceback.print_exc()
+
+
+def create_sample_batch_config():
+    """Create a sample batch configuration file."""
+    sample_config = {
+        "index_name": "Sample Batch Index",
+        "index_description": "Example batch index configuration",
+        "documents": [
+            "./rag_system/documents/invoice_1039.pdf",
+            "./rag_system/documents/invoice_1041.pdf"
+        ],
+        "processing": {
+            "chunk_size": 512,
+            "chunk_overlap": 64,
+            "enable_enrich": True,
+            "enable_latechunk": True,
+            "enable_docling": True,
+            "embedding_model": "Qwen/Qwen3-Embedding-0.6B",
+            "generation_model": "qwen3:0.6b",
+            "retrieval_mode": "hybrid",
+            "window_size": 2
+        }
+    }
+    
+    with open("batch_indexing_config.json", "w") as f:
+        json.dump(sample_config, f, indent=2)
+    
+    print("📄 Sample batch configuration created: batch_indexing_config.json")
+
+
+def main():
+    """Main entry point for the script."""
+    parser = argparse.ArgumentParser(description="LocalGPT Index Creation Tool")
+    parser.add_argument("--batch", help="Batch configuration file", type=str)
+    parser.add_argument("--config", help="Custom pipeline configuration file", type=str)
+    parser.add_argument("--create-sample", action="store_true", help="Create sample batch config")
+    
+    args = parser.parse_args()
+    
+    if args.create_sample:
+        create_sample_batch_config()
+        return
+    
+    try:
+        creator = IndexCreator(config_path=args.config)
+        
+        if args.batch:
+            creator.batch_create_from_config(args.batch)
+        else:
+            creator.create_index_interactive()
+            
+    except KeyboardInterrupt:
+        print("\n\n❌ Operation cancelled by user.")
+    except Exception as e:
+        print(f"❌ Unexpected error: {e}")
+        import traceback
+        traceback.print_exc()
+
+
+if __name__ == "__main__":
+    main() 
--- a/demo_batch_indexing.py
+++ b/demo_batch_indexing.py
@ -0,0 +1,386 @@
+#!/usr/bin/env python3
+"""
+Demo Batch Indexing Script for LocalGPT RAG System
+
+This script demonstrates how to perform batch indexing of multiple documents
+using configuration files. It's designed to showcase the full capabilities
+of the indexing pipeline with various configuration options.
+
+Usage:
+    python demo_batch_indexing.py --config batch_indexing_config.json
+    python demo_batch_indexing.py --create-sample-config
+    python demo_batch_indexing.py --help
+"""
+
+import os
+import sys
+import json
+import argparse
+import time
+import logging
+from typing import List, Dict, Any, Optional
+from pathlib import Path
+from datetime import datetime
+
+# Add the project root to the path so we can import rag_system modules
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+
+try:
+    from rag_system.main import PIPELINE_CONFIGS
+    from rag_system.pipelines.indexing_pipeline import IndexingPipeline
+    from rag_system.utils.ollama_client import OllamaClient
+    from backend.database import ChatDatabase
+except ImportError as e:
+    print(f"❌ Error importing required modules: {e}")
+    print("Please ensure you're running this script from the project root directory.")
+    sys.exit(1)
+
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s | %(levelname)-7s | %(name)s | %(message)s",
+)
+
+
+class BatchIndexingDemo:
+    """Demonstration of batch indexing capabilities."""
+    
+    def __init__(self, config_path: str):
+        """Initialize the batch indexing demo."""
+        self.config_path = config_path
+        self.config = self._load_config()
+        self.db = ChatDatabase()
+        
+        # Initialize Ollama client
+        self.ollama_client = OllamaClient()
+        
+        # Initialize pipeline with merged configuration
+        self.pipeline_config = self._merge_configurations()
+        self.pipeline = IndexingPipeline(
+            self.pipeline_config,
+            self.ollama_client,
+            self.config.get("ollama_config", {
+                "generation_model": "qwen3:0.6b",
+                "embedding_model": "qwen3:0.6b"
+            })
+        )
+    
+    def _load_config(self) -> Dict[str, Any]:
+        """Load batch indexing configuration from file."""
+        try:
+            with open(self.config_path, 'r') as f:
+                config = json.load(f)
+            print(f"✅ Loaded configuration from {self.config_path}")
+            return config
+        except FileNotFoundError:
+            print(f"❌ Configuration file not found: {self.config_path}")
+            sys.exit(1)
+        except json.JSONDecodeError as e:
+            print(f"❌ Invalid JSON in configuration file: {e}")
+            sys.exit(1)
+    
+    def _merge_configurations(self) -> Dict[str, Any]:
+        """Merge batch config with default pipeline config."""
+        # Start with default pipeline configuration
+        merged_config = PIPELINE_CONFIGS.get("default", {}).copy()
+        
+        # Override with batch-specific settings
+        batch_settings = self.config.get("pipeline_settings", {})
+        
+        # Deep merge for nested dictionaries
+        def deep_merge(base: dict, override: dict) -> dict:
+            result = base.copy()
+            for key, value in override.items():
+                if key in result and isinstance(result[key], dict) and isinstance(value, dict):
+                    result[key] = deep_merge(result[key], value)
+                else:
+                    result[key] = value
+            return result
+        
+        return deep_merge(merged_config, batch_settings)
+    
+    def validate_documents(self, documents: List[str]) -> List[str]:
+        """Validate and filter document paths."""
+        valid_documents = []
+        
+        print(f"📋 Validating {len(documents)} documents...")
+        
+        for doc_path in documents:
+            # Handle relative paths
+            if not os.path.isabs(doc_path):
+                doc_path = os.path.abspath(doc_path)
+            
+            if os.path.exists(doc_path):
+                # Check file extension
+                ext = Path(doc_path).suffix.lower()
+                if ext in ['.pdf', '.txt', '.docx', '.md']:
+                    valid_documents.append(doc_path)
+                    print(f"  ✅ {doc_path}")
+                else:
+                    print(f"  ⚠️  Unsupported file type: {doc_path}")
+            else:
+                print(f"  ❌ File not found: {doc_path}")
+        
+        print(f"📊 {len(valid_documents)} valid documents found")
+        return valid_documents
+    
+    def create_indexes(self) -> List[str]:
+        """Create multiple indexes based on configuration."""
+        indexes = self.config.get("indexes", [])
+        created_indexes = []
+        
+        for index_config in indexes:
+            index_id = self.create_single_index(index_config)
+            if index_id:
+                created_indexes.append(index_id)
+        
+        return created_indexes
+    
+    def create_single_index(self, index_config: Dict[str, Any]) -> Optional[str]:
+        """Create a single index from configuration."""
+        try:
+            # Extract index metadata
+            index_name = index_config.get("name", "Unnamed Index")
+            index_description = index_config.get("description", "")
+            documents = index_config.get("documents", [])
+            
+            if not documents:
+                print(f"⚠️  No documents specified for index '{index_name}', skipping...")
+                return None
+            
+            # Validate documents
+            valid_documents = self.validate_documents(documents)
+            if not valid_documents:
+                print(f"❌ No valid documents found for index '{index_name}'")
+                return None
+            
+            print(f"\n🚀 Creating index: {index_name}")
+            print(f"📄 Processing {len(valid_documents)} documents")
+            
+            # Create index record in database
+            index_metadata = {
+                "created_by": "demo_batch_indexing.py",
+                "created_at": datetime.now().isoformat(),
+                "document_count": len(valid_documents),
+                "config_used": index_config.get("processing_options", {})
+            }
+            
+            index_id = self.db.create_index(
+                name=index_name,
+                description=index_description,
+                metadata=index_metadata
+            )
+            
+            # Add documents to index
+            for doc_path in valid_documents:
+                filename = os.path.basename(doc_path)
+                self.db.add_document_to_index(index_id, filename, doc_path)
+            
+            # Process documents through pipeline
+            start_time = time.time()
+            self.pipeline.process_documents(valid_documents)
+            processing_time = time.time() - start_time
+            
+            print(f"✅ Index '{index_name}' created successfully!")
+            print(f"   Index ID: {index_id}")
+            print(f"   Processing time: {processing_time:.2f} seconds")
+            print(f"   Documents processed: {len(valid_documents)}")
+            
+            return index_id
+            
+        except Exception as e:
+            print(f"❌ Error creating index '{index_name}': {e}")
+            import traceback
+            traceback.print_exc()
+            return None
+    
+    def demonstrate_features(self):
+        """Demonstrate various indexing features."""
+        print("\n🎯 Batch Indexing Demo Features:")
+        print("=" * 50)
+        
+        # Show configuration
+        print(f"📋 Configuration file: {self.config_path}")
+        print(f"📊 Number of indexes to create: {len(self.config.get('indexes', []))}")
+        
+        # Show pipeline settings
+        pipeline_settings = self.config.get("pipeline_settings", {})
+        if pipeline_settings:
+            print("\n⚙️  Pipeline Settings:")
+            for key, value in pipeline_settings.items():
+                print(f"   {key}: {value}")
+        
+        # Show model configuration
+        ollama_config = self.config.get("ollama_config", {})
+        if ollama_config:
+            print("\n🤖 Model Configuration:")
+            for key, value in ollama_config.items():
+                print(f"   {key}: {value}")
+    
+    def run_demo(self):
+        """Run the complete batch indexing demo."""
+        print("🚀 LocalGPT Batch Indexing Demo")
+        print("=" * 50)
+        
+        # Show demo features
+        self.demonstrate_features()
+        
+        # Create indexes
+        print(f"\n📚 Starting batch indexing process...")
+        start_time = time.time()
+        
+        created_indexes = self.create_indexes()
+        
+        total_time = time.time() - start_time
+        
+        # Summary
+        print(f"\n📊 Batch Indexing Summary")
+        print("=" * 50)
+        print(f"✅ Successfully created {len(created_indexes)} indexes")
+        print(f"⏱️  Total processing time: {total_time:.2f} seconds")
+        
+        if created_indexes:
+            print(f"\n📋 Created Indexes:")
+            for i, index_id in enumerate(created_indexes, 1):
+                index_info = self.db.get_index(index_id)
+                if index_info:
+                    print(f"   {i}. {index_info['name']} ({index_id[:8]}...)")
+                    print(f"      Documents: {len(index_info.get('documents', []))}")
+        
+        print(f"\n🎉 Demo completed successfully!")
+        print(f"💡 You can now use these indexes in the LocalGPT interface.")
+
+
+def create_sample_config():
+    """Create a comprehensive sample configuration file."""
+    sample_config = {
+        "description": "Demo batch indexing configuration showcasing various features",
+        "pipeline_settings": {
+            "embedding_model_name": "Qwen/Qwen3-Embedding-0.6B",
+            "indexing": {
+                "embedding_batch_size": 50,
+                "enrichment_batch_size": 25,
+                "enable_progress_tracking": True
+            },
+            "contextual_enricher": {
+                "enabled": True,
+                "window_size": 2,
+                "model_name": "qwen3:0.6b"
+            },
+            "chunking": {
+                "chunk_size": 512,
+                "chunk_overlap": 64,
+                "enable_latechunk": True,
+                "enable_docling": True
+            },
+            "retrievers": {
+                "dense": {
+                    "enabled": True,
+                    "lancedb_table_name": "demo_text_pages"
+                },
+                "bm25": {
+                    "enabled": True,
+                    "index_name": "demo_bm25_index"
+                }
+            },
+            "storage": {
+                "lancedb_uri": "./index_store/lancedb",
+                "bm25_path": "./index_store/bm25"
+            }
+        },
+        "ollama_config": {
+            "generation_model": "qwen3:0.6b",
+            "embedding_model": "qwen3:0.6b"
+        },
+        "indexes": [
+            {
+                "name": "Sample Invoice Collection",
+                "description": "Demo index containing sample invoice documents",
+                "documents": [
+                    "./rag_system/documents/invoice_1039.pdf",
+                    "./rag_system/documents/invoice_1041.pdf"
+                ],
+                "processing_options": {
+                    "chunk_size": 512,
+                    "enable_enrichment": True,
+                    "retrieval_mode": "hybrid"
+                }
+            },
+            {
+                "name": "Research Papers Demo",
+                "description": "Demo index for research papers and whitepapers",
+                "documents": [
+                    "./rag_system/documents/Newwhitepaper_Agents2.pdf"
+                ],
+                "processing_options": {
+                    "chunk_size": 1024,
+                    "enable_enrichment": True,
+                    "retrieval_mode": "dense"
+                }
+            }
+        ]
+    }
+    
+    config_filename = "batch_indexing_config.json"
+    with open(config_filename, "w") as f:
+        json.dump(sample_config, f, indent=2)
+    
+    print(f"✅ Sample configuration created: {config_filename}")
+    print(f"📝 Edit this file to customize your batch indexing setup")
+    print(f"🚀 Run: python demo_batch_indexing.py --config {config_filename}")
+
+
+def main():
+    """Main entry point for the demo script."""
+    parser = argparse.ArgumentParser(
+        description="LocalGPT Batch Indexing Demo",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  python demo_batch_indexing.py --config batch_indexing_config.json
+  python demo_batch_indexing.py --create-sample-config
+  
+This demo showcases the advanced batch indexing capabilities of LocalGPT,
+including multi-index creation, advanced configuration options, and
+comprehensive processing pipelines.
+        """
+    )
+    
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="batch_indexing_config.json",
+        help="Path to batch indexing configuration file"
+    )
+    
+    parser.add_argument(
+        "--create-sample-config",
+        action="store_true",
+        help="Create a sample configuration file"
+    )
+    
+    args = parser.parse_args()
+    
+    if args.create_sample_config:
+        create_sample_config()
+        return
+    
+    if not os.path.exists(args.config):
+        print(f"❌ Configuration file not found: {args.config}")
+        print(f"💡 Create a sample config with: python {sys.argv[0]} --create-sample-config")
+        sys.exit(1)
+    
+    try:
+        demo = BatchIndexingDemo(args.config)
+        demo.run_demo()
+        
+    except KeyboardInterrupt:
+        print("\n\n❌ Demo cancelled by user.")
+    except Exception as e:
+        print(f"❌ Demo failed: {e}")
+        import traceback
+        traceback.print_exc()
+
+
+if __name__ == "__main__":
+    main() 
--- a/docker-compose.local-ollama.yml
+++ b/docker-compose.local-ollama.yml
@ -0,0 +1,77 @@
+services:
+  # RAG API server (connects to host Ollama)
+  rag-api:
+    build:
+      context: .
+      dockerfile: Dockerfile.rag-api
+    container_name: rag-api
+    ports:
+      - "8001:8001"
+    environment:
+      - OLLAMA_HOST=http://host.docker.internal:11434
+      - NODE_ENV=production
+    volumes:
+      - ./lancedb:/app/lancedb
+      - ./index_store:/app/index_store
+      - ./shared_uploads:/app/shared_uploads
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8001/models"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    restart: unless-stopped
+    networks:
+      - rag-network
+
+  # Backend API server
+  backend:
+    build:
+      context: .
+      dockerfile: Dockerfile.backend
+    container_name: rag-backend
+    ports:
+      - "8000:8000"
+    environment:
+      - NODE_ENV=production
+      - RAG_API_URL=http://rag-api:8001
+    volumes:
+      - ./backend/chat_data.db:/app/backend/chat_data.db
+      - ./shared_uploads:/app/shared_uploads
+    depends_on:
+      rag-api:
+        condition: service_healthy
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    restart: unless-stopped
+    networks:
+      - rag-network
+
+  # Frontend Next.js application
+  frontend:
+    build:
+      context: .
+      dockerfile: Dockerfile.frontend
+    container_name: rag-frontend
+    ports:
+      - "3000:3000"
+    environment:
+      - NODE_ENV=production
+      - NEXT_PUBLIC_API_URL=http://localhost:8000
+    depends_on:
+      backend:
+        condition: service_healthy
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3000"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    restart: unless-stopped
+    networks:
+      - rag-network
+
+networks:
+  rag-network:
+    driver: bridge 
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,103 @@
+services:
+  # Ollama service for LLM inference (optional - can use host Ollama instead)
+  ollama:
+    image: ollama/ollama:latest
+    container_name: rag-ollama
+    ports:
+      - "11434:11434"
+    volumes:
+      - ollama_data:/root/.ollama
+    environment:
+      - OLLAMA_HOST=0.0.0.0
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    restart: unless-stopped
+    networks:
+      - rag-network
+    profiles:
+      - with-ollama  # Optional service - enable with --profile with-ollama
+
+  # RAG API server
+  rag-api:
+    build:
+      context: .
+      dockerfile: Dockerfile.rag-api
+    container_name: rag-api
+    ports:
+      - "8001:8001"
+    environment:
+      # Use host Ollama by default, or containerized Ollama if enabled
+      - OLLAMA_HOST=${OLLAMA_HOST:-http://host.docker.internal:11434}
+      - NODE_ENV=production
+    volumes:
+      - ./lancedb:/app/lancedb
+      - ./index_store:/app/index_store
+      - ./shared_uploads:/app/shared_uploads
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8001/models"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    restart: unless-stopped
+    networks:
+      - rag-network
+
+  # Backend API server
+  backend:
+    build:
+      context: .
+      dockerfile: Dockerfile.backend
+    container_name: rag-backend
+    ports:
+      - "8000:8000"
+    environment:
+      - NODE_ENV=production
+      - RAG_API_URL=http://rag-api:8001
+    volumes:
+      - ./backend/chat_data.db:/app/backend/chat_data.db
+      - ./shared_uploads:/app/shared_uploads
+    depends_on:
+      rag-api:
+        condition: service_healthy
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    restart: unless-stopped
+    networks:
+      - rag-network
+
+  # Frontend Next.js application
+  frontend:
+    build:
+      context: .
+      dockerfile: Dockerfile.frontend
+    container_name: rag-frontend
+    ports:
+      - "3000:3000"
+    environment:
+      - NODE_ENV=production
+      - NEXT_PUBLIC_API_URL=http://localhost:8000
+    depends_on:
+      backend:
+        condition: service_healthy
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3000"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    restart: unless-stopped
+    networks:
+      - rag-network
+
+volumes:
+  ollama_data:
+    driver: local
+
+networks:
+  rag-network:
+    driver: bridge 
--- a/docker.env
+++ b/docker.env
@ -0,0 +1,11 @@
+# Docker environment configuration
+# Set this to use local Ollama instance running on host
+OLLAMA_HOST=http://host.docker.internal:11434
+
+# Alternative: Use containerized Ollama (uncomment and run with --profile with-ollama)
+# OLLAMA_HOST=http://ollama:11434
+
+# Other configuration
+NODE_ENV=production
+NEXT_PUBLIC_API_URL=http://localhost:8000
+RAG_API_URL=http://rag-api:8001 
--- a/eslint.config.mjs
+++ b/eslint.config.mjs
@ -0,0 +1,16 @@
+import { dirname } from "path";
+import { fileURLToPath } from "url";
+import { FlatCompat } from "@eslint/eslintrc";
+
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = dirname(__filename);
+
+const compat = new FlatCompat({
+  baseDirectory: __dirname,
+});
+
+const eslintConfig = [
+  ...compat.extends("next/core-web-vitals", "next/typescript"),
+];
+
+export default eslintConfig;
--- a/gaudi_utils/embeddings.py
+++ b/gaudi_utils/embeddings.py
@ -1,40 +0,0 @@
-import logging
-import torch
-
-from langchain.embeddings import HuggingFaceEmbeddings
-from habana_frameworks.torch.utils.library_loader import load_habana_module
-from optimum.habana.sentence_transformers.modeling_utils import (
-    adapt_sentence_transformers_to_gaudi,
-)
-
-from constants import EMBEDDING_MODEL_NAME
-
-
-def load_embeddings():
-    """Load HuggingFace Embeddings object onto Gaudi or CPU"""
-    load_habana_module()
-    if torch.hpu.is_available():
-        logging.info("Loading embedding model on hpu")
-
-        adapt_sentence_transformers_to_gaudi()
-        embeddings = HuggingFaceEmbeddings(
-            model_name=EMBEDDING_MODEL_NAME, model_kwargs={"device": "hpu"}
-        )
-    else:
-        logging.info("Loading embedding model on cpu")
-        embeddings = HuggingFaceEmbeddings(
-            model_name=EMBEDDING_MODEL_NAME, model_kwargs={"device": "cpu"}
-        )
-    return embeddings
-
-
-def calculate_similarity(model, response, expected_answer):
-    """Calculate similarity between response and expected answer using the model"""
-    response_embedding = model.client.encode(response, convert_to_tensor=True).squeeze()
-    expected_embedding = model.client.encode(
-        expected_answer, convert_to_tensor=True
-    ).squeeze()
-    similarity_score = torch.nn.functional.cosine_similarity(
-        response_embedding, expected_embedding, dim=0
-    )
-    return similarity_score.item()
--- a/gaudi_utils/pipeline.py
+++ b/gaudi_utils/pipeline.py
@ -1,168 +0,0 @@
-import copy
-import os
-import torch
-from pathlib import Path
-from typing import List
-
-import habana_frameworks.torch.hpu as torch_hpu
-
-from habana_frameworks.torch.hpu import wrap_in_hpu_graph
-from huggingface_hub import snapshot_download
-from optimum.habana.transformers.generation import MODELS_OPTIMIZED_WITH_STATIC_SHAPES
-from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi
-from optimum.habana.utils import set_seed
-from transformers import AutoModelForCausalLM, AutoTokenizer, TextGenerationPipeline
-from transformers.utils import is_offline_mode
-
-
-def get_repo_root(model_name_or_path, local_rank=-1, token=None):
-    """
-    Downloads the specified model checkpoint and returns the repository where it was downloaded.
-    """
-    if Path(model_name_or_path).is_dir():
-        # If it is a local model, no need to download anything
-        return model_name_or_path
-    else:
-        # Checks if online or not
-        if is_offline_mode():
-            if local_rank == 0:
-                print("Offline mode: forcing local_files_only=True")
-
-        # Only download PyTorch weights by default
-        allow_patterns = ["*.bin"]
-
-        # Download only on first process
-        if local_rank in [-1, 0]:
-            cache_dir = snapshot_download(
-                model_name_or_path,
-                local_files_only=is_offline_mode(),
-                cache_dir=os.getenv("TRANSFORMERS_CACHE", None),
-                allow_patterns=allow_patterns,
-                max_workers=16,
-                token=token,
-            )
-            if local_rank == -1:
-                # If there is only one process, then the method is finished
-                return cache_dir
-
-        # Make all processes wait so that other processes can get the checkpoint directly from cache
-        torch.distributed.barrier()
-
-        return snapshot_download(
-            model_name_or_path,
-            local_files_only=is_offline_mode(),
-            cache_dir=os.getenv("TRANSFORMERS_CACHE", None),
-            allow_patterns=allow_patterns,
-            token=token,
-        )
-
-
-def get_optimized_model_name(config):
-    for model_type in MODELS_OPTIMIZED_WITH_STATIC_SHAPES:
-        if model_type == config.model_type:
-            return model_type
-
-    return None
-
-
-def model_is_optimized(config):
-    """
-    Checks if the given config belongs to a model in optimum/habana/transformers/models, which has a
-    new input token_idx.
-    """
-    return get_optimized_model_name(config) is not None
-
-
-class GaudiTextGenerationPipeline(TextGenerationPipeline):
-    """
-    An end-to-end text-generation pipeline that can used to initialize LangChain classes.
-    """
-    def __init__(self, model_name_or_path=None, revision="main", **kwargs):
-        self.task = "text-generation"
-        self.device = "hpu"
-
-        # Tweak generation so that it runs faster on Gaudi
-        adapt_transformers_to_gaudi()
-        set_seed(27)
-
-        # Initialize tokenizer and define datatype
-        self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, revision=revision)
-        model_dtype = torch.bfloat16
-
-        # Intialize model
-        get_repo_root(model_name_or_path)
-        model = AutoModelForCausalLM.from_pretrained(model_name_or_path, revision=revision, torch_dtype=model_dtype)
-        model = model.eval().to(self.device)
-        is_optimized = model_is_optimized(model.config)
-        model = wrap_in_hpu_graph(model)
-        self.model = model
-
-        # Used for padding input to fixed length
-        self.tokenizer.padding_side = "left"
-        self.max_padding_length = kwargs.get("max_padding_length", self.model.config.max_position_embeddings)
-
-        # Define config params for llama and mistral models
-        if self.model.config.model_type in ["llama", "mistral"]:
-            self.model.generation_config.pad_token_id = 0
-            self.model.generation_config.bos_token_id = 1
-            self.model.generation_config.eos_token_id = 2
-            self.tokenizer.bos_token_id = self.model.generation_config.bos_token_id
-            self.tokenizer.eos_token_id = self.model.generation_config.eos_token_id
-            self.tokenizer.pad_token_id = self.model.generation_config.pad_token_id
-            self.tokenizer.pad_token = self.tokenizer.decode(self.tokenizer.pad_token_id)
-            self.tokenizer.eos_token = self.tokenizer.decode(self.tokenizer.eos_token_id)
-            self.tokenizer.bos_token = self.tokenizer.decode(self.tokenizer.bos_token_id)
-
-        # Applicable to models that do not have pad tokens
-        if self.tokenizer.pad_token is None:
-            self.tokenizer.pad_token = self.tokenizer.eos_token
-            self.model.generation_config.pad_token_id = self.model.generation_config.eos_token_id
-
-        # Edit generation configuration based on input arguments
-        self.generation_config = copy.deepcopy(self.model.generation_config)
-        self.generation_config.max_new_tokens = kwargs.get("max_new_tokens", 100)
-        self.generation_config.use_cache = kwargs.get("use_kv_cache", True)
-        self.generation_config.static_shapes = is_optimized
-        self.generation_config.do_sample = kwargs.get("do_sample", False)
-        self.generation_config.num_beams = kwargs.get("num_beams", 1)
-        self.generation_config.temperature = kwargs.get("temperature", 1.0)
-        self.generation_config.top_p = kwargs.get("top_p", 1.0)
-        self.generation_config.repetition_penalty = kwargs.get("repetition_penalty", 1.0)
-        self.generation_config.num_return_sequences = kwargs.get("num_return_sequences", 1)
-        self.generation_config.bad_words_ids = None
-        self.generation_config.force_words_ids = None
-        self.generation_config.ignore_eos = False
-
-        # Define empty post-process params dict as there is no postprocesing
-        self._postprocess_params = {}
-
-        # Warm-up hpu and compile computation graphs
-        self.compile_graph()
-
-    def __call__(self, prompt: List[str]):
-        """
-        __call__ method of pipeline class
-        """
-        # Tokenize input string
-        model_inputs = self.tokenizer.encode_plus(prompt[0], return_tensors="pt", max_length=self.max_padding_length, padding="max_length", truncation=True)
-
-        # Move tensors to hpu
-        for t in model_inputs:
-            if torch.is_tensor(model_inputs[t]):
-                model_inputs[t] = model_inputs[t].to(self.device)
-
-        # Call model's generate method
-        output = self.model.generate(**model_inputs, generation_config=self.generation_config, lazy_mode=True, hpu_graphs=True, profiling_steps=0, profiling_warmup_steps=0).cpu()
-
-        # Decode and return result
-        output_text = self.tokenizer.decode(output[0], skip_special_tokens=True)
-        del output, model_inputs
-        return [{"generated_text": output_text}]
-
-    def compile_graph(self):
-        """
-        Function to compile computation graphs and synchronize hpus.
-        """
-        for _ in range(3):
-            self(["Here is my prompt"])
-        torch_hpu.synchronize()
--- a/ingest.py
+++ b/ingest.py
@ -1,185 +0,0 @@
-import logging
-import os
-from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, as_completed
-
-import click
-import torch
-from langchain.docstore.document import Document
-from langchain.text_splitter import Language, RecursiveCharacterTextSplitter
-from langchain.vectorstores import Chroma
-from utils import get_embeddings
-
-from constants import (
-    CHROMA_SETTINGS,
-    DOCUMENT_MAP,
-    EMBEDDING_MODEL_NAME,
-    INGEST_THREADS,
-    PERSIST_DIRECTORY,
-    SOURCE_DIRECTORY,
-)
-
-import nltk
-nltk.download('punkt_tab')
-nltk.download('averaged_perceptron_tagger_eng')
-
-def file_log(logentry):
-    file1 = open("file_ingest.log", "a")
-    file1.write(logentry + "\n")
-    file1.close()
-    print(logentry + "\n")
-
-
-def load_single_document(file_path: str) -> Document:
-    # Loads a single document from a file path
-    try:
-        file_extension = os.path.splitext(file_path)[1]
-        loader_class = DOCUMENT_MAP.get(file_extension)
-        if loader_class:
-            file_log(file_path + " loaded.")
-            loader = loader_class(file_path)
-        else:
-            file_log(file_path + " document type is undefined.")
-            raise ValueError("Document type is undefined")
-        return loader.load()[0]
-    except Exception as ex:
-        file_log("%s loading error: \n%s" % (file_path, ex))
-        return None
-
-
-def load_document_batch(filepaths):
-    logging.info("Loading document batch")
-    # create a thread pool
-    with ThreadPoolExecutor(len(filepaths)) as exe:
-        # load files
-        futures = [exe.submit(load_single_document, name) for name in filepaths]
-        # collect data
-        if futures is None:
-            file_log(name + " failed to submit")
-            return None
-        else:
-            data_list = [future.result() for future in futures]
-            # return data and file paths
-            return (data_list, filepaths)
-
-
-def load_documents(source_dir: str) -> list[Document]:
-    # Loads all documents from the source documents directory, including nested folders
-    paths = []
-    for root, _, files in os.walk(source_dir):
-        for file_name in files:
-            print("Importing: " + file_name)
-            file_extension = os.path.splitext(file_name)[1]
-            source_file_path = os.path.join(root, file_name)
-            if file_extension in DOCUMENT_MAP.keys():
-                paths.append(source_file_path)
-
-    # Have at least one worker and at most INGEST_THREADS workers
-    n_workers = min(INGEST_THREADS, max(len(paths), 1))
-    chunksize = round(len(paths) / n_workers)
-    docs = []
-    with ProcessPoolExecutor(n_workers) as executor:
-        futures = []
-        # split the load operations into chunks
-        for i in range(0, len(paths), chunksize):
-            # select a chunk of filenames
-            filepaths = paths[i : (i + chunksize)]
-            # submit the task
-            try:
-                future = executor.submit(load_document_batch, filepaths)
-            except Exception as ex:
-                file_log("executor task failed: %s" % (ex))
-                future = None
-            if future is not None:
-                futures.append(future)
-        # process all results
-        for future in as_completed(futures):
-            # open the file and load the data
-            try:
-                contents, _ = future.result()
-                docs.extend(contents)
-            except Exception as ex:
-                file_log("Exception: %s" % (ex))
-
-    return docs
-
-
-def split_documents(documents: list[Document]) -> tuple[list[Document], list[Document]]:
-    # Splits documents for correct Text Splitter
-    text_docs, python_docs = [], []
-    for doc in documents:
-        if doc is not None:
-            file_extension = os.path.splitext(doc.metadata["source"])[1]
-            if file_extension == ".py":
-                python_docs.append(doc)
-            else:
-                text_docs.append(doc)
-    return text_docs, python_docs
-
-
-@click.command()
-@click.option(
-    "--device_type",
-    default="cuda" if torch.cuda.is_available() else "cpu",
-    type=click.Choice(
-        [
-            "cpu",
-            "cuda",
-            "ipu",
-            "xpu",
-            "mkldnn",
-            "opengl",
-            "opencl",
-            "ideep",
-            "hip",
-            "ve",
-            "fpga",
-            "ort",
-            "xla",
-            "lazy",
-            "vulkan",
-            "mps",
-            "meta",
-            "hpu",
-            "mtia",
-        ],
-    ),
-    help="Device to run on. (Default is cuda)",
-)
-def main(device_type):
-    # Load documents and split in chunks
-    logging.info(f"Loading documents from {SOURCE_DIRECTORY}")
-    documents = load_documents(SOURCE_DIRECTORY)
-    text_documents, python_documents = split_documents(documents)
-    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
-    python_splitter = RecursiveCharacterTextSplitter.from_language(
-        language=Language.PYTHON, chunk_size=880, chunk_overlap=200
-    )
-    texts = text_splitter.split_documents(text_documents)
-    texts.extend(python_splitter.split_documents(python_documents))
-    logging.info(f"Loaded {len(documents)} documents from {SOURCE_DIRECTORY}")
-    logging.info(f"Split into {len(texts)} chunks of text")
-
-    """
-    (1) Chooses an appropriate langchain library based on the enbedding model name.  Matching code is contained within fun_localGPT.py.
-    
-    (2) Provides additional arguments for instructor and BGE models to improve results, pursuant to the instructions contained on
-    their respective huggingface repository, project page or github repository.
-    """
-
-    embeddings = get_embeddings(device_type)
-
-    logging.info(f"Loaded embeddings from {EMBEDDING_MODEL_NAME}")
-
-    db = Chroma.from_documents(
-        texts,
-        embeddings,
-        persist_directory=PERSIST_DIRECTORY,
-        client_settings=CHROMA_SETTINGS,
-    )
-
-
-if __name__ == "__main__":
-    logging.basicConfig(
-        format="%(asctime)s - %(levelname)s - %(filename)s:%(lineno)s - %(message)s", level=logging.INFO
-    )
-    main()
--- a/load_models.py
+++ b/load_models.py
@ -1,213 +0,0 @@
-import sys
-
-import torch
-
-if sys.platform != "darwin":
-    from auto_gptq import AutoGPTQForCausalLM
-
-from huggingface_hub import hf_hub_download
-from langchain.llms import LlamaCpp
-from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaForCausalLM, LlamaTokenizer, BitsAndBytesConfig
-
-from constants import CONTEXT_WINDOW_SIZE, MAX_NEW_TOKENS, MODELS_PATH, N_BATCH, N_GPU_LAYERS
-
-
-def load_quantized_model_gguf_ggml(model_id, model_basename, device_type, logging):
-    """
-    Load a GGUF/GGML quantized model using LlamaCpp.
-
-    This function attempts to load a GGUF/GGML quantized model using the LlamaCpp library.
-    If the model is of type GGML, and newer version of LLAMA-CPP is used which does not support GGML,
-    it logs a message indicating that LLAMA-CPP has dropped support for GGML.
-
-    Parameters:
-    - model_id (str): The identifier for the model on HuggingFace Hub.
-    - model_basename (str): The base name of the model file.
-    - device_type (str): The type of device where the model will run, e.g., 'mps', 'cuda', etc.
-    - logging (logging.Logger): Logger instance for logging messages.
-
-    Returns:
-    - LlamaCpp: An instance of the LlamaCpp model if successful, otherwise None.
-
-    Notes:
-    - The function uses the `hf_hub_download` function to download the model from the HuggingFace Hub.
-    - The number of GPU layers is set based on the device type.
-    """
-
-    try:
-        logging.info("Using Llamacpp for GGUF/GGML quantized models")
-        model_path = hf_hub_download(
-            repo_id=model_id,
-            filename=model_basename,
-            resume_download=True,
-            cache_dir=MODELS_PATH,
-        )
-        kwargs = {
-            "model_path": model_path,
-            "n_ctx": CONTEXT_WINDOW_SIZE,
-            "max_tokens": MAX_NEW_TOKENS,
-            "n_batch": N_BATCH,  # set this based on your GPU & CPU RAM
-        }
-        if device_type.lower() == "mps":
-            kwargs["n_gpu_layers"] = 1
-        if device_type.lower() == "cuda":
-            kwargs["n_gpu_layers"] = N_GPU_LAYERS  # set this based on your GPU
-
-        return LlamaCpp(**kwargs)
-    except TypeError:
-        if "ggml" in model_basename:
-            logging.INFO("If you were using GGML model, LLAMA-CPP Dropped Support, Use GGUF Instead")
-        return None
-
-
-def load_quantized_model_qptq(model_id, model_basename, device_type, logging):
-    """
-    Load a GPTQ quantized model using AutoGPTQForCausalLM.
-
-    This function loads a quantized model that ends with GPTQ and may have variations
-    of .no-act.order or .safetensors in their HuggingFace repo.
-    It will not work for Macs, as AutoGPTQ only supports Linux and Windows:
-    - Nvidia CUDA (Windows and Linux)
-    - AMD ROCm (Linux only)
-    - CPU QiGen (Linux only, new and experimental)
-
-    Parameters:
-    - model_id (str): The identifier for the model on HuggingFace Hub.
-    - model_basename (str): The base name of the model file.
-    - device_type (str): The type of device where the model will run.
-    - logging (logging.Logger): Logger instance for logging messages.
-
-    Returns:
-    - model (AutoGPTQForCausalLM): The loaded quantized model.
-    - tokenizer (AutoTokenizer): The tokenizer associated with the model.
-
-    Notes:
-    - The function checks for the ".safetensors" ending in the model_basename and removes it if present.
-    """
-
-    if sys.platform == "darwin":
-        logging.INFO("GPTQ models will NOT work on Mac devices. Please choose a different model.")
-        return None, None
-
-    # The code supports all huggingface models that ends with GPTQ and have some variation
-    # of .no-act.order or .safetensors in their HF repo.
-    logging.info("Using AutoGPTQForCausalLM for quantized models")
-
-    if ".safetensors" in model_basename:
-        # Remove the ".safetensors" ending if present
-        model_basename = model_basename.replace(".safetensors", "")
-
-    tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
-    logging.info("Tokenizer loaded")
-
-    model = AutoGPTQForCausalLM.from_quantized(
-        model_id,
-        model_basename=model_basename,
-        use_safetensors=True,
-        trust_remote_code=True,
-        device_map="auto",
-        use_triton=False,
-        quantize_config=None,
-    )
-    return model, tokenizer
-
-
-def load_full_model(model_id, model_basename, device_type, logging):
-    """
-    Load a full model using either LlamaTokenizer or AutoModelForCausalLM.
-
-    This function loads a full model based on the specified device type.
-    If the device type is 'mps' or 'cpu', it uses LlamaTokenizer and LlamaForCausalLM.
-    Otherwise, it uses AutoModelForCausalLM.
-
-    Parameters:
-    - model_id (str): The identifier for the model on HuggingFace Hub.
-    - model_basename (str): The base name of the model file.
-    - device_type (str): The type of device where the model will run.
-    - logging (logging.Logger): Logger instance for logging messages.
-
-    Returns:
-    - model (Union[LlamaForCausalLM, AutoModelForCausalLM]): The loaded model.
-    - tokenizer (Union[LlamaTokenizer, AutoTokenizer]): The tokenizer associated with the model.
-
-    Notes:
-    - The function uses the `from_pretrained` method to load both the model and the tokenizer.
-    - Additional settings are provided for NVIDIA GPUs, such as loading in 4-bit and setting the compute dtype.
-    """
-
-    if device_type.lower() in ["mps", "cpu", "hpu"]:
-        logging.info("Using AutoModelForCausalLM")
-        # tokenizer = LlamaTokenizer.from_pretrained(model_id, cache_dir="./models/")
-        # model = LlamaForCausalLM.from_pretrained(model_id, cache_dir="./models/")
-
-        model = AutoModelForCausalLM.from_pretrained(model_id,
-                                            #  quantization_config=quantization_config,
-                                            #  low_cpu_mem_usage=True,
-                                            #  torch_dtype="auto",
-                                             torch_dtype=torch.bfloat16,
-                                             device_map="auto",
-                                             cache_dir="./models/")
-
-        tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="./models/")
-    else:
-        logging.info("Using AutoModelForCausalLM for full models")
-        tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="./models/")
-        logging.info("Tokenizer loaded")
-        bnb_config = BitsAndBytesConfig(
-                load_in_4bit=True,
-                bnb_4bit_use_double_quant=True,
-                bnb_4bit_quant_type="nf4",
-                bnb_4bit_compute_dtype=torch.float16
-                )
-        model = AutoModelForCausalLM.from_pretrained(
-            model_id,
-            device_map="auto",
-            torch_dtype=torch.float16,
-            low_cpu_mem_usage=True,
-            cache_dir=MODELS_PATH,
-            trust_remote_code=True,  # set these if you are using NVIDIA GPU
-            quantization_config=bnb_config
-           # load_in_4bit=True,
-           # bnb_4bit_quant_type="nf4",
-           # bnb_4bit_compute_dtype=torch.float16,
-           # max_memory={0: "15GB"},  # Uncomment this line with you encounter CUDA out of memory errors
-        )
-
-        model.tie_weights()
-    return model, tokenizer
-
-
-def load_quantized_model_awq(model_id, logging):
-    """
-    Load a AWQ quantized model using AutoModelForCausalLM.
-
-    This function loads a quantized model that ends with AWQ.
-    It will not work for Macs as AutoAWQ currently only supports Nvidia GPUs.
-
-    Parameters:
-    - model_id (str): The identifier for the model on HuggingFace Hub.
-    - logging (logging.Logger): Logger instance for logging messages.
-
-    Returns:
-    - model (AutoModelForCausalLM): The loaded quantized model.
-    - tokenizer (AutoTokenizer): The tokenizer associated with the model.
-
-    """
-
-    if sys.platform == "darwin":
-        logging.INFO("AWQ models will NOT work on Mac devices. Please choose a different model.")
-        return None, None
-
-    # The code supports all huggingface models that ends with AWQ.
-    logging.info("Using AutoModelForCausalLM for AWQ quantized models")
-
-    tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
-    logging.info("Tokenizer loaded")
-
-    model = AutoModelForCausalLM.from_pretrained(
-        model_id,
-        use_safetensors=True,
-        trust_remote_code=True,
-        device_map="auto",
-    )
-    return model, tokenizer
--- a/localGPTUI/localGPTUI.py
+++ b/localGPTUI/localGPTUI.py
@ -1,72 +0,0 @@
-import argparse
-import os
-import sys
-import tempfile
-
-import requests
-from flask import Flask, render_template, request
-from werkzeug.utils import secure_filename
-
-sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
-
-app = Flask(__name__)
-app.secret_key = "LeafmanZSecretKey"
-
-API_HOST = "http://localhost:5110/api"
-
-
-# PAGES #
-@app.route("/", methods=["GET", "POST"])
-def home_page():
-    if request.method == "POST":
-        if "user_prompt" in request.form:
-            user_prompt = request.form["user_prompt"]
-            print(f"User Prompt: {user_prompt}")
-
-            main_prompt_url = f"{API_HOST}/prompt_route"
-            response = requests.post(main_prompt_url, data={"user_prompt": user_prompt})
-            print(response.status_code)  # print HTTP response status code for debugging
-            if response.status_code == 200:
-                # print(response.json())  # Print the JSON data from the response
-                return render_template("home.html", show_response_modal=True, response_dict=response.json())
-        elif "documents" in request.files:
-            delete_source_url = f"{API_HOST}/delete_source"  # URL of the /api/delete_source endpoint
-            if request.form.get("action") == "reset":
-                response = requests.get(delete_source_url)
-
-            save_document_url = f"{API_HOST}/save_document"
-            run_ingest_url = f"{API_HOST}/run_ingest"  # URL of the /api/run_ingest endpoint
-            files = request.files.getlist("documents")
-            for file in files:
-                print(file.filename)
-                filename = secure_filename(file.filename)
-                with tempfile.SpooledTemporaryFile() as f:
-                    f.write(file.read())
-                    f.seek(0)
-                    response = requests.post(save_document_url, files={"document": (filename, f)})
-                    print(response.status_code)  # print HTTP response status code for debugging
-            # Make a GET request to the /api/run_ingest endpoint
-            response = requests.get(run_ingest_url)
-            print(response.status_code)  # print HTTP response status code for debugging
-
-    # Display the form for GET request
-    return render_template(
-        "home.html",
-        show_response_modal=False,
-        response_dict={"Prompt": "None", "Answer": "None", "Sources": [("ewf", "wef")]},
-    )
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--port", type=int, default=5111, help="Port to run the UI on. Defaults to 5111.")
-    parser.add_argument(
-        "--host",
-        type=str,
-        default="127.0.0.1",
-        help="Host to run the UI on. Defaults to 127.0.0.1. "
-        "Set to 0.0.0.0 to make the UI externally "
-        "accessible from other devices.",
-    )
-    args = parser.parse_args()
-    app.run(debug=False, host=args.host, port=args.port)
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.min.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.min.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.min.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.min.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.rtl.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.rtl.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.rtl.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.rtl.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.rtl.min.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.rtl.min.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.rtl.min.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-grid.rtl.min.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.css
@ -1,498 +0,0 @@
-/*!
- * Bootstrap Reboot v5.1.3 (https://getbootstrap.com/)
- * Copyright 2011-2021 The Bootstrap Authors
- * Copyright 2011-2021 Twitter, Inc.
- * Licensed under MIT (https://github.com/twbs/bootstrap/blob/main/LICENSE)
- * Forked from Normalize.css, licensed MIT (https://github.com/necolas/normalize.css/blob/master/LICENSE.md)
- */
-:root {
-  --bs-blue: #0d6efd;
-  --bs-indigo: #6610f2;
-  --bs-purple: #6f42c1;
-  --bs-pink: #d63384;
-  --bs-red: #dc3545;
-  --bs-orange: #fd7e14;
-  --bs-yellow: #ffc107;
-  --bs-green: #198754;
-  --bs-teal: #20c997;
-  --bs-cyan: #0dcaf0;
-  --bs-white: #fff;
-  --bs-gray: #6c757d;
-  --bs-gray-dark: #343a40;
-  --bs-gray-100: #f8f9fa;
-  --bs-gray-200: #e9ecef;
-  --bs-gray-300: #dee2e6;
-  --bs-gray-400: #ced4da;
-  --bs-gray-500: #adb5bd;
-  --bs-gray-600: #6c757d;
-  --bs-gray-700: #495057;
-  --bs-gray-800: #343a40;
-  --bs-gray-900: #212529;
-  --bs-primary: #0d6efd;
-  --bs-secondary: #6c757d;
-  --bs-success: #198754;
-  --bs-info: #0dcaf0;
-  --bs-warning: #ffc107;
-  --bs-danger: #dc3545;
-  --bs-light: #f8f9fa;
-  --bs-dark: #212529;
-  --bs-primary-rgb: 13, 110, 253;
-  --bs-secondary-rgb: 108, 117, 125;
-  --bs-success-rgb: 25, 135, 84;
-  --bs-info-rgb: 13, 202, 240;
-  --bs-warning-rgb: 255, 193, 7;
-  --bs-danger-rgb: 220, 53, 69;
-  --bs-light-rgb: 248, 249, 250;
-  --bs-dark-rgb: 33, 37, 41;
-  --bs-white-rgb: 255, 255, 255;
-  --bs-black-rgb: 0, 0, 0;
-  --bs-body-color-rgb: 33, 37, 41;
-  --bs-body-bg-rgb: 255, 255, 255;
-  --bs-font-sans-serif: system-ui, -apple-system, "Segoe UI", Roboto,
-    "Helvetica Neue", Arial, "Noto Sans", "Liberation Sans", sans-serif,
-    "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
-  --bs-font-monospace: SFMono-Regular, Menlo, Monaco, Consolas,
-    "Liberation Mono", "Courier New", monospace;
-  --bs-gradient: linear-gradient(
-    180deg,
-    rgba(255, 255, 255, 0.15),
-    rgba(255, 255, 255, 0)
-  );
-  --bs-body-font-family: var(--bs-font-sans-serif);
-  --bs-body-font-size: 1rem;
-  --bs-body-font-weight: 400;
-  --bs-body-line-height: 1.5;
-  --bs-body-color: #212529;
-  --bs-body-bg: #fff;
-}
-
-*,
-*::before,
-*::after {
-  box-sizing: border-box;
-}
-
-@media (prefers-reduced-motion: no-preference) {
-  :root {
-    scroll-behavior: smooth;
-  }
-}
-
-body {
-  margin: 0;
-  font-family: var(--bs-body-font-family);
-  font-size: var(--bs-body-font-size);
-  font-weight: var(--bs-body-font-weight);
-  line-height: var(--bs-body-line-height);
-  color: var(--bs-body-color);
-  text-align: var(--bs-body-text-align);
-  background-color: var(--bs-body-bg);
-  -webkit-text-size-adjust: 100%;
-  -webkit-tap-highlight-color: rgba(0, 0, 0, 0);
-}
-
-hr {
-  margin: 1rem 0;
-  color: inherit;
-  background-color: currentColor;
-  border: 0;
-  opacity: 0.25;
-}
-
-hr:not([size]) {
-  height: 1px;
-}
-
-h6,
-h5,
-h4,
-h3,
-h2,
-h1 {
-  margin-top: 0;
-  margin-bottom: 0.5rem;
-  font-weight: 500;
-  line-height: 1.2;
-}
-
-h1 {
-  font-size: calc(1.375rem + 1.5vw);
-}
-@media (min-width: 1200px) {
-  h1 {
-    font-size: 2.5rem;
-  }
-}
-
-h2 {
-  font-size: calc(1.325rem + 0.9vw);
-}
-@media (min-width: 1200px) {
-  h2 {
-    font-size: 2rem;
-  }
-}
-
-h3 {
-  font-size: calc(1.3rem + 0.6vw);
-}
-@media (min-width: 1200px) {
-  h3 {
-    font-size: 1.75rem;
-  }
-}
-
-h4 {
-  font-size: calc(1.275rem + 0.3vw);
-}
-@media (min-width: 1200px) {
-  h4 {
-    font-size: 1.5rem;
-  }
-}
-
-h5 {
-  font-size: 1.25rem;
-}
-
-h6 {
-  font-size: 1rem;
-}
-
-p {
-  margin-top: 0;
-  margin-bottom: 1rem;
-}
-
-abbr[title],
-abbr[data-bs-original-title] {
-  -webkit-text-decoration: underline dotted;
-  text-decoration: underline dotted;
-  cursor: help;
-  -webkit-text-decoration-skip-ink: none;
-  text-decoration-skip-ink: none;
-}
-
-address {
-  margin-bottom: 1rem;
-  font-style: normal;
-  line-height: inherit;
-}
-
-ol,
-ul {
-  padding-left: 2rem;
-}
-
-ol,
-ul,
-dl {
-  margin-top: 0;
-  margin-bottom: 1rem;
-}
-
-ol ol,
-ul ul,
-ol ul,
-ul ol {
-  margin-bottom: 0;
-}
-
-dt {
-  font-weight: 700;
-}
-
-dd {
-  margin-bottom: 0.5rem;
-  margin-left: 0;
-}
-
-blockquote {
-  margin: 0 0 1rem;
-}
-
-b,
-strong {
-  font-weight: bolder;
-}
-
-small {
-  font-size: 0.875em;
-}
-
-mark {
-  padding: 0.2em;
-  background-color: #fcf8e3;
-}
-
-sub,
-sup {
-  position: relative;
-  font-size: 0.75em;
-  line-height: 0;
-  vertical-align: baseline;
-}
-
-sub {
-  bottom: -0.25em;
-}
-
-sup {
-  top: -0.5em;
-}
-
-a {
-  color: #0d6efd;
-  text-decoration: underline;
-}
-a:hover {
-  color: #0a58ca;
-}
-
-a:not([href]):not([class]),
-a:not([href]):not([class]):hover {
-  color: inherit;
-  text-decoration: none;
-}
-
-pre,
-code,
-kbd,
-samp {
-  font-family: var(--bs-font-monospace);
-  font-size: 1em;
-  direction: ltr /* rtl:ignore */;
-  unicode-bidi: bidi-override;
-}
-
-pre {
-  display: block;
-  margin-top: 0;
-  margin-bottom: 1rem;
-  overflow: auto;
-  font-size: 0.875em;
-}
-pre code {
-  font-size: inherit;
-  color: inherit;
-  word-break: normal;
-}
-
-code {
-  font-size: 0.875em;
-  color: #d63384;
-  word-wrap: break-word;
-}
-a > code {
-  color: inherit;
-}
-
-kbd {
-  padding: 0.2rem 0.4rem;
-  font-size: 0.875em;
-  color: #fff;
-  background-color: #212529;
-  border-radius: 0.2rem;
-}
-kbd kbd {
-  padding: 0;
-  font-size: 1em;
-  font-weight: 700;
-}
-
-figure {
-  margin: 0 0 1rem;
-}
-
-img,
-svg {
-  vertical-align: middle;
-}
-
-table {
-  caption-side: bottom;
-  border-collapse: collapse;
-}
-
-caption {
-  padding-top: 0.5rem;
-  padding-bottom: 0.5rem;
-  color: #6c757d;
-  text-align: left;
-}
-
-th {
-  text-align: inherit;
-  text-align: -webkit-match-parent;
-}
-
-thead,
-tbody,
-tfoot,
-tr,
-td,
-th {
-  border-color: inherit;
-  border-style: solid;
-  border-width: 0;
-}
-
-label {
-  display: inline-block;
-}
-
-button {
-  border-radius: 0;
-}
-
-button:focus:not(:focus-visible) {
-  outline: 0;
-}
-
-input,
-button,
-select,
-optgroup,
-textarea {
-  margin: 0;
-  font-family: inherit;
-  font-size: inherit;
-  line-height: inherit;
-}
-
-button,
-select {
-  text-transform: none;
-}
-
-[role="button"] {
-  cursor: pointer;
-}
-
-select {
-  word-wrap: normal;
-}
-select:disabled {
-  opacity: 1;
-}
-
-[list]::-webkit-calendar-picker-indicator {
-  display: none;
-}
-
-button,
-[type="button"],
-[type="reset"],
-[type="submit"] {
-  -webkit-appearance: button;
-}
-button:not(:disabled),
-[type="button"]:not(:disabled),
-[type="reset"]:not(:disabled),
-[type="submit"]:not(:disabled) {
-  cursor: pointer;
-}
-
-::-moz-focus-inner {
-  padding: 0;
-  border-style: none;
-}
-
-textarea {
-  resize: vertical;
-}
-
-fieldset {
-  min-width: 0;
-  padding: 0;
-  margin: 0;
-  border: 0;
-}
-
-legend {
-  float: left;
-  width: 100%;
-  padding: 0;
-  margin-bottom: 0.5rem;
-  font-size: calc(1.275rem + 0.3vw);
-  line-height: inherit;
-}
-@media (min-width: 1200px) {
-  legend {
-    font-size: 1.5rem;
-  }
-}
-legend + * {
-  clear: left;
-}
-
-::-webkit-datetime-edit-fields-wrapper,
-::-webkit-datetime-edit-text,
-::-webkit-datetime-edit-minute,
-::-webkit-datetime-edit-hour-field,
-::-webkit-datetime-edit-day-field,
-::-webkit-datetime-edit-month-field,
-::-webkit-datetime-edit-year-field {
-  padding: 0;
-}
-
-::-webkit-inner-spin-button {
-  height: auto;
-}
-
-[type="search"] {
-  outline-offset: -2px;
-  -webkit-appearance: textfield;
-}
-
-/* rtl:raw:
-[type="tel"],
-[type="url"],
-[type="email"],
-[type="number"] {
-  direction: ltr;
-}
-*/
-::-webkit-search-decoration {
-  -webkit-appearance: none;
-}
-
-::-webkit-color-swatch-wrapper {
-  padding: 0;
-}
-
-::-webkit-file-upload-button {
-  font: inherit;
-}
-
-::file-selector-button {
-  font: inherit;
-}
-
-::-webkit-file-upload-button {
-  font: inherit;
-  -webkit-appearance: button;
-}
-
-output {
-  display: inline-block;
-}
-
-iframe {
-  border: 0;
-}
-
-summary {
-  display: list-item;
-  cursor: pointer;
-}
-
-progress {
-  vertical-align: baseline;
-}
-
-[hidden] {
-  display: none !important;
-}
-
-/*# sourceMappingURL=bootstrap-reboot.css.map */
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.min.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.min.css
@ -1,424 +0,0 @@
-/*!
- * Bootstrap Reboot v5.1.3 (https://getbootstrap.com/)
- * Copyright 2011-2021 The Bootstrap Authors
- * Copyright 2011-2021 Twitter, Inc.
- * Licensed under MIT (https://github.com/twbs/bootstrap/blob/main/LICENSE)
- * Forked from Normalize.css, licensed MIT (https://github.com/necolas/normalize.css/blob/master/LICENSE.md)
- */
-:root {
-  --bs-blue: #0d6efd;
-  --bs-indigo: #6610f2;
-  --bs-purple: #6f42c1;
-  --bs-pink: #d63384;
-  --bs-red: #dc3545;
-  --bs-orange: #fd7e14;
-  --bs-yellow: #ffc107;
-  --bs-green: #198754;
-  --bs-teal: #20c997;
-  --bs-cyan: #0dcaf0;
-  --bs-white: #fff;
-  --bs-gray: #6c757d;
-  --bs-gray-dark: #343a40;
-  --bs-gray-100: #f8f9fa;
-  --bs-gray-200: #e9ecef;
-  --bs-gray-300: #dee2e6;
-  --bs-gray-400: #ced4da;
-  --bs-gray-500: #adb5bd;
-  --bs-gray-600: #6c757d;
-  --bs-gray-700: #495057;
-  --bs-gray-800: #343a40;
-  --bs-gray-900: #212529;
-  --bs-primary: #0d6efd;
-  --bs-secondary: #6c757d;
-  --bs-success: #198754;
-  --bs-info: #0dcaf0;
-  --bs-warning: #ffc107;
-  --bs-danger: #dc3545;
-  --bs-light: #f8f9fa;
-  --bs-dark: #212529;
-  --bs-primary-rgb: 13, 110, 253;
-  --bs-secondary-rgb: 108, 117, 125;
-  --bs-success-rgb: 25, 135, 84;
-  --bs-info-rgb: 13, 202, 240;
-  --bs-warning-rgb: 255, 193, 7;
-  --bs-danger-rgb: 220, 53, 69;
-  --bs-light-rgb: 248, 249, 250;
-  --bs-dark-rgb: 33, 37, 41;
-  --bs-white-rgb: 255, 255, 255;
-  --bs-black-rgb: 0, 0, 0;
-  --bs-body-color-rgb: 33, 37, 41;
-  --bs-body-bg-rgb: 255, 255, 255;
-  --bs-font-sans-serif: system-ui, -apple-system, "Segoe UI", Roboto,
-    "Helvetica Neue", Arial, "Noto Sans", "Liberation Sans", sans-serif,
-    "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
-  --bs-font-monospace: SFMono-Regular, Menlo, Monaco, Consolas,
-    "Liberation Mono", "Courier New", monospace;
-  --bs-gradient: linear-gradient(
-    180deg,
-    rgba(255, 255, 255, 0.15),
-    rgba(255, 255, 255, 0)
-  );
-  --bs-body-font-family: var(--bs-font-sans-serif);
-  --bs-body-font-size: 1rem;
-  --bs-body-font-weight: 400;
-  --bs-body-line-height: 1.5;
-  --bs-body-color: #212529;
-  --bs-body-bg: #fff;
-}
-*,
-::after,
-::before {
-  box-sizing: border-box;
-}
-@media (prefers-reduced-motion: no-preference) {
-  :root {
-    scroll-behavior: smooth;
-  }
-}
-body {
-  margin: 0;
-  font-family: var(--bs-body-font-family);
-  font-size: var(--bs-body-font-size);
-  font-weight: var(--bs-body-font-weight);
-  line-height: var(--bs-body-line-height);
-  color: var(--bs-body-color);
-  text-align: var(--bs-body-text-align);
-  background-color: var(--bs-body-bg);
-  -webkit-text-size-adjust: 100%;
-  -webkit-tap-highlight-color: transparent;
-}
-hr {
-  margin: 1rem 0;
-  color: inherit;
-  background-color: currentColor;
-  border: 0;
-  opacity: 0.25;
-}
-hr:not([size]) {
-  height: 1px;
-}
-h1,
-h2,
-h3,
-h4,
-h5,
-h6 {
-  margin-top: 0;
-  margin-bottom: 0.5rem;
-  font-weight: 500;
-  line-height: 1.2;
-}
-h1 {
-  font-size: calc(1.375rem + 1.5vw);
-}
-@media (min-width: 1200px) {
-  h1 {
-    font-size: 2.5rem;
-  }
-}
-h2 {
-  font-size: calc(1.325rem + 0.9vw);
-}
-@media (min-width: 1200px) {
-  h2 {
-    font-size: 2rem;
-  }
-}
-h3 {
-  font-size: calc(1.3rem + 0.6vw);
-}
-@media (min-width: 1200px) {
-  h3 {
-    font-size: 1.75rem;
-  }
-}
-h4 {
-  font-size: calc(1.275rem + 0.3vw);
-}
-@media (min-width: 1200px) {
-  h4 {
-    font-size: 1.5rem;
-  }
-}
-h5 {
-  font-size: 1.25rem;
-}
-h6 {
-  font-size: 1rem;
-}
-p {
-  margin-top: 0;
-  margin-bottom: 1rem;
-}
-abbr[data-bs-original-title],
-abbr[title] {
-  -webkit-text-decoration: underline dotted;
-  text-decoration: underline dotted;
-  cursor: help;
-  -webkit-text-decoration-skip-ink: none;
-  text-decoration-skip-ink: none;
-}
-address {
-  margin-bottom: 1rem;
-  font-style: normal;
-  line-height: inherit;
-}
-ol,
-ul {
-  padding-left: 2rem;
-}
-dl,
-ol,
-ul {
-  margin-top: 0;
-  margin-bottom: 1rem;
-}
-ol ol,
-ol ul,
-ul ol,
-ul ul {
-  margin-bottom: 0;
-}
-dt {
-  font-weight: 700;
-}
-dd {
-  margin-bottom: 0.5rem;
-  margin-left: 0;
-}
-blockquote {
-  margin: 0 0 1rem;
-}
-b,
-strong {
-  font-weight: bolder;
-}
-small {
-  font-size: 0.875em;
-}
-mark {
-  padding: 0.2em;
-  background-color: #fcf8e3;
-}
-sub,
-sup {
-  position: relative;
-  font-size: 0.75em;
-  line-height: 0;
-  vertical-align: baseline;
-}
-sub {
-  bottom: -0.25em;
-}
-sup {
-  top: -0.5em;
-}
-a {
-  color: #0d6efd;
-  text-decoration: underline;
-}
-a:hover {
-  color: #0a58ca;
-}
-a:not([href]):not([class]),
-a:not([href]):not([class]):hover {
-  color: inherit;
-  text-decoration: none;
-}
-code,
-kbd,
-pre,
-samp {
-  font-family: var(--bs-font-monospace);
-  font-size: 1em;
-  direction: ltr;
-  unicode-bidi: bidi-override;
-}
-pre {
-  display: block;
-  margin-top: 0;
-  margin-bottom: 1rem;
-  overflow: auto;
-  font-size: 0.875em;
-}
-pre code {
-  font-size: inherit;
-  color: inherit;
-  word-break: normal;
-}
-code {
-  font-size: 0.875em;
-  color: #d63384;
-  word-wrap: break-word;
-}
-a > code {
-  color: inherit;
-}
-kbd {
-  padding: 0.2rem 0.4rem;
-  font-size: 0.875em;
-  color: #fff;
-  background-color: #212529;
-  border-radius: 0.2rem;
-}
-kbd kbd {
-  padding: 0;
-  font-size: 1em;
-  font-weight: 700;
-}
-figure {
-  margin: 0 0 1rem;
-}
-img,
-svg {
-  vertical-align: middle;
-}
-table {
-  caption-side: bottom;
-  border-collapse: collapse;
-}
-caption {
-  padding-top: 0.5rem;
-  padding-bottom: 0.5rem;
-  color: #6c757d;
-  text-align: left;
-}
-th {
-  text-align: inherit;
-  text-align: -webkit-match-parent;
-}
-tbody,
-td,
-tfoot,
-th,
-thead,
-tr {
-  border-color: inherit;
-  border-style: solid;
-  border-width: 0;
-}
-label {
-  display: inline-block;
-}
-button {
-  border-radius: 0;
-}
-button:focus:not(:focus-visible) {
-  outline: 0;
-}
-button,
-input,
-optgroup,
-select,
-textarea {
-  margin: 0;
-  font-family: inherit;
-  font-size: inherit;
-  line-height: inherit;
-}
-button,
-select {
-  text-transform: none;
-}
-[role="button"] {
-  cursor: pointer;
-}
-select {
-  word-wrap: normal;
-}
-select:disabled {
-  opacity: 1;
-}
-[list]::-webkit-calendar-picker-indicator {
-  display: none;
-}
-[type="button"],
-[type="reset"],
-[type="submit"],
-button {
-  -webkit-appearance: button;
-}
-[type="button"]:not(:disabled),
-[type="reset"]:not(:disabled),
-[type="submit"]:not(:disabled),
-button:not(:disabled) {
-  cursor: pointer;
-}
-::-moz-focus-inner {
-  padding: 0;
-  border-style: none;
-}
-textarea {
-  resize: vertical;
-}
-fieldset {
-  min-width: 0;
-  padding: 0;
-  margin: 0;
-  border: 0;
-}
-legend {
-  float: left;
-  width: 100%;
-  padding: 0;
-  margin-bottom: 0.5rem;
-  font-size: calc(1.275rem + 0.3vw);
-  line-height: inherit;
-}
-@media (min-width: 1200px) {
-  legend {
-    font-size: 1.5rem;
-  }
-}
-legend + * {
-  clear: left;
-}
-::-webkit-datetime-edit-day-field,
-::-webkit-datetime-edit-fields-wrapper,
-::-webkit-datetime-edit-hour-field,
-::-webkit-datetime-edit-minute,
-::-webkit-datetime-edit-month-field,
-::-webkit-datetime-edit-text,
-::-webkit-datetime-edit-year-field {
-  padding: 0;
-}
-::-webkit-inner-spin-button {
-  height: auto;
-}
-[type="search"] {
-  outline-offset: -2px;
-  -webkit-appearance: textfield;
-}
-::-webkit-search-decoration {
-  -webkit-appearance: none;
-}
-::-webkit-color-swatch-wrapper {
-  padding: 0;
-}
-::-webkit-file-upload-button {
-  font: inherit;
-}
-::file-selector-button {
-  font: inherit;
-}
-::-webkit-file-upload-button {
-  font: inherit;
-  -webkit-appearance: button;
-}
-output {
-  display: inline-block;
-}
-iframe {
-  border: 0;
-}
-summary {
-  display: list-item;
-  cursor: pointer;
-}
-progress {
-  vertical-align: baseline;
-}
-[hidden] {
-  display: none !important;
-}
-/*# sourceMappingURL=bootstrap-reboot.min.css.map */
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.min.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.min.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.rtl.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.rtl.css
@ -1,495 +0,0 @@
-/*!
- * Bootstrap Reboot v5.1.3 (https://getbootstrap.com/)
- * Copyright 2011-2021 The Bootstrap Authors
- * Copyright 2011-2021 Twitter, Inc.
- * Licensed under MIT (https://github.com/twbs/bootstrap/blob/main/LICENSE)
- * Forked from Normalize.css, licensed MIT (https://github.com/necolas/normalize.css/blob/master/LICENSE.md)
- */
-:root {
-  --bs-blue: #0d6efd;
-  --bs-indigo: #6610f2;
-  --bs-purple: #6f42c1;
-  --bs-pink: #d63384;
-  --bs-red: #dc3545;
-  --bs-orange: #fd7e14;
-  --bs-yellow: #ffc107;
-  --bs-green: #198754;
-  --bs-teal: #20c997;
-  --bs-cyan: #0dcaf0;
-  --bs-white: #fff;
-  --bs-gray: #6c757d;
-  --bs-gray-dark: #343a40;
-  --bs-gray-100: #f8f9fa;
-  --bs-gray-200: #e9ecef;
-  --bs-gray-300: #dee2e6;
-  --bs-gray-400: #ced4da;
-  --bs-gray-500: #adb5bd;
-  --bs-gray-600: #6c757d;
-  --bs-gray-700: #495057;
-  --bs-gray-800: #343a40;
-  --bs-gray-900: #212529;
-  --bs-primary: #0d6efd;
-  --bs-secondary: #6c757d;
-  --bs-success: #198754;
-  --bs-info: #0dcaf0;
-  --bs-warning: #ffc107;
-  --bs-danger: #dc3545;
-  --bs-light: #f8f9fa;
-  --bs-dark: #212529;
-  --bs-primary-rgb: 13, 110, 253;
-  --bs-secondary-rgb: 108, 117, 125;
-  --bs-success-rgb: 25, 135, 84;
-  --bs-info-rgb: 13, 202, 240;
-  --bs-warning-rgb: 255, 193, 7;
-  --bs-danger-rgb: 220, 53, 69;
-  --bs-light-rgb: 248, 249, 250;
-  --bs-dark-rgb: 33, 37, 41;
-  --bs-white-rgb: 255, 255, 255;
-  --bs-black-rgb: 0, 0, 0;
-  --bs-body-color-rgb: 33, 37, 41;
-  --bs-body-bg-rgb: 255, 255, 255;
-  --bs-font-sans-serif: system-ui, -apple-system, "Segoe UI", Roboto,
-    "Helvetica Neue", Arial, "Noto Sans", "Liberation Sans", sans-serif,
-    "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
-  --bs-font-monospace: SFMono-Regular, Menlo, Monaco, Consolas,
-    "Liberation Mono", "Courier New", monospace;
-  --bs-gradient: linear-gradient(
-    180deg,
-    rgba(255, 255, 255, 0.15),
-    rgba(255, 255, 255, 0)
-  );
-  --bs-body-font-family: var(--bs-font-sans-serif);
-  --bs-body-font-size: 1rem;
-  --bs-body-font-weight: 400;
-  --bs-body-line-height: 1.5;
-  --bs-body-color: #212529;
-  --bs-body-bg: #fff;
-}
-
-*,
-*::before,
-*::after {
-  box-sizing: border-box;
-}
-
-@media (prefers-reduced-motion: no-preference) {
-  :root {
-    scroll-behavior: smooth;
-  }
-}
-
-body {
-  margin: 0;
-  font-family: var(--bs-body-font-family);
-  font-size: var(--bs-body-font-size);
-  font-weight: var(--bs-body-font-weight);
-  line-height: var(--bs-body-line-height);
-  color: var(--bs-body-color);
-  text-align: var(--bs-body-text-align);
-  background-color: var(--bs-body-bg);
-  -webkit-text-size-adjust: 100%;
-  -webkit-tap-highlight-color: rgba(0, 0, 0, 0);
-}
-
-hr {
-  margin: 1rem 0;
-  color: inherit;
-  background-color: currentColor;
-  border: 0;
-  opacity: 0.25;
-}
-
-hr:not([size]) {
-  height: 1px;
-}
-
-h6,
-h5,
-h4,
-h3,
-h2,
-h1 {
-  margin-top: 0;
-  margin-bottom: 0.5rem;
-  font-weight: 500;
-  line-height: 1.2;
-}
-
-h1 {
-  font-size: calc(1.375rem + 1.5vw);
-}
-@media (min-width: 1200px) {
-  h1 {
-    font-size: 2.5rem;
-  }
-}
-
-h2 {
-  font-size: calc(1.325rem + 0.9vw);
-}
-@media (min-width: 1200px) {
-  h2 {
-    font-size: 2rem;
-  }
-}
-
-h3 {
-  font-size: calc(1.3rem + 0.6vw);
-}
-@media (min-width: 1200px) {
-  h3 {
-    font-size: 1.75rem;
-  }
-}
-
-h4 {
-  font-size: calc(1.275rem + 0.3vw);
-}
-@media (min-width: 1200px) {
-  h4 {
-    font-size: 1.5rem;
-  }
-}
-
-h5 {
-  font-size: 1.25rem;
-}
-
-h6 {
-  font-size: 1rem;
-}
-
-p {
-  margin-top: 0;
-  margin-bottom: 1rem;
-}
-
-abbr[title],
-abbr[data-bs-original-title] {
-  -webkit-text-decoration: underline dotted;
-  text-decoration: underline dotted;
-  cursor: help;
-  -webkit-text-decoration-skip-ink: none;
-  text-decoration-skip-ink: none;
-}
-
-address {
-  margin-bottom: 1rem;
-  font-style: normal;
-  line-height: inherit;
-}
-
-ol,
-ul {
-  padding-right: 2rem;
-}
-
-ol,
-ul,
-dl {
-  margin-top: 0;
-  margin-bottom: 1rem;
-}
-
-ol ol,
-ul ul,
-ol ul,
-ul ol {
-  margin-bottom: 0;
-}
-
-dt {
-  font-weight: 700;
-}
-
-dd {
-  margin-bottom: 0.5rem;
-  margin-right: 0;
-}
-
-blockquote {
-  margin: 0 0 1rem;
-}
-
-b,
-strong {
-  font-weight: bolder;
-}
-
-small {
-  font-size: 0.875em;
-}
-
-mark {
-  padding: 0.2em;
-  background-color: #fcf8e3;
-}
-
-sub,
-sup {
-  position: relative;
-  font-size: 0.75em;
-  line-height: 0;
-  vertical-align: baseline;
-}
-
-sub {
-  bottom: -0.25em;
-}
-
-sup {
-  top: -0.5em;
-}
-
-a {
-  color: #0d6efd;
-  text-decoration: underline;
-}
-a:hover {
-  color: #0a58ca;
-}
-
-a:not([href]):not([class]),
-a:not([href]):not([class]):hover {
-  color: inherit;
-  text-decoration: none;
-}
-
-pre,
-code,
-kbd,
-samp {
-  font-family: var(--bs-font-monospace);
-  font-size: 1em;
-  direction: ltr;
-  unicode-bidi: bidi-override;
-}
-
-pre {
-  display: block;
-  margin-top: 0;
-  margin-bottom: 1rem;
-  overflow: auto;
-  font-size: 0.875em;
-}
-pre code {
-  font-size: inherit;
-  color: inherit;
-  word-break: normal;
-}
-
-code {
-  font-size: 0.875em;
-  color: #d63384;
-  word-wrap: break-word;
-}
-a > code {
-  color: inherit;
-}
-
-kbd {
-  padding: 0.2rem 0.4rem;
-  font-size: 0.875em;
-  color: #fff;
-  background-color: #212529;
-  border-radius: 0.2rem;
-}
-kbd kbd {
-  padding: 0;
-  font-size: 1em;
-  font-weight: 700;
-}
-
-figure {
-  margin: 0 0 1rem;
-}
-
-img,
-svg {
-  vertical-align: middle;
-}
-
-table {
-  caption-side: bottom;
-  border-collapse: collapse;
-}
-
-caption {
-  padding-top: 0.5rem;
-  padding-bottom: 0.5rem;
-  color: #6c757d;
-  text-align: right;
-}
-
-th {
-  text-align: inherit;
-  text-align: -webkit-match-parent;
-}
-
-thead,
-tbody,
-tfoot,
-tr,
-td,
-th {
-  border-color: inherit;
-  border-style: solid;
-  border-width: 0;
-}
-
-label {
-  display: inline-block;
-}
-
-button {
-  border-radius: 0;
-}
-
-button:focus:not(:focus-visible) {
-  outline: 0;
-}
-
-input,
-button,
-select,
-optgroup,
-textarea {
-  margin: 0;
-  font-family: inherit;
-  font-size: inherit;
-  line-height: inherit;
-}
-
-button,
-select {
-  text-transform: none;
-}
-
-[role="button"] {
-  cursor: pointer;
-}
-
-select {
-  word-wrap: normal;
-}
-select:disabled {
-  opacity: 1;
-}
-
-[list]::-webkit-calendar-picker-indicator {
-  display: none;
-}
-
-button,
-[type="button"],
-[type="reset"],
-[type="submit"] {
-  -webkit-appearance: button;
-}
-button:not(:disabled),
-[type="button"]:not(:disabled),
-[type="reset"]:not(:disabled),
-[type="submit"]:not(:disabled) {
-  cursor: pointer;
-}
-
-::-moz-focus-inner {
-  padding: 0;
-  border-style: none;
-}
-
-textarea {
-  resize: vertical;
-}
-
-fieldset {
-  min-width: 0;
-  padding: 0;
-  margin: 0;
-  border: 0;
-}
-
-legend {
-  float: right;
-  width: 100%;
-  padding: 0;
-  margin-bottom: 0.5rem;
-  font-size: calc(1.275rem + 0.3vw);
-  line-height: inherit;
-}
-@media (min-width: 1200px) {
-  legend {
-    font-size: 1.5rem;
-  }
-}
-legend + * {
-  clear: right;
-}
-
-::-webkit-datetime-edit-fields-wrapper,
-::-webkit-datetime-edit-text,
-::-webkit-datetime-edit-minute,
-::-webkit-datetime-edit-hour-field,
-::-webkit-datetime-edit-day-field,
-::-webkit-datetime-edit-month-field,
-::-webkit-datetime-edit-year-field {
-  padding: 0;
-}
-
-::-webkit-inner-spin-button {
-  height: auto;
-}
-
-[type="search"] {
-  outline-offset: -2px;
-  -webkit-appearance: textfield;
-}
-
-[type="tel"],
-[type="url"],
-[type="email"],
-[type="number"] {
-  direction: ltr;
-}
-::-webkit-search-decoration {
-  -webkit-appearance: none;
-}
-
-::-webkit-color-swatch-wrapper {
-  padding: 0;
-}
-
-::-webkit-file-upload-button {
-  font: inherit;
-}
-
-::file-selector-button {
-  font: inherit;
-}
-
-::-webkit-file-upload-button {
-  font: inherit;
-  -webkit-appearance: button;
-}
-
-output {
-  display: inline-block;
-}
-
-iframe {
-  border: 0;
-}
-
-summary {
-  display: list-item;
-  cursor: pointer;
-}
-
-progress {
-  vertical-align: baseline;
-}
-
-[hidden] {
-  display: none !important;
-}
-/*# sourceMappingURL=bootstrap-reboot.rtl.css.map */
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.rtl.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.rtl.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.rtl.min.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.rtl.min.css
@ -1,430 +0,0 @@
-/*!
- * Bootstrap Reboot v5.1.3 (https://getbootstrap.com/)
- * Copyright 2011-2021 The Bootstrap Authors
- * Copyright 2011-2021 Twitter, Inc.
- * Licensed under MIT (https://github.com/twbs/bootstrap/blob/main/LICENSE)
- * Forked from Normalize.css, licensed MIT (https://github.com/necolas/normalize.css/blob/master/LICENSE.md)
- */
-:root {
-  --bs-blue: #0d6efd;
-  --bs-indigo: #6610f2;
-  --bs-purple: #6f42c1;
-  --bs-pink: #d63384;
-  --bs-red: #dc3545;
-  --bs-orange: #fd7e14;
-  --bs-yellow: #ffc107;
-  --bs-green: #198754;
-  --bs-teal: #20c997;
-  --bs-cyan: #0dcaf0;
-  --bs-white: #fff;
-  --bs-gray: #6c757d;
-  --bs-gray-dark: #343a40;
-  --bs-gray-100: #f8f9fa;
-  --bs-gray-200: #e9ecef;
-  --bs-gray-300: #dee2e6;
-  --bs-gray-400: #ced4da;
-  --bs-gray-500: #adb5bd;
-  --bs-gray-600: #6c757d;
-  --bs-gray-700: #495057;
-  --bs-gray-800: #343a40;
-  --bs-gray-900: #212529;
-  --bs-primary: #0d6efd;
-  --bs-secondary: #6c757d;
-  --bs-success: #198754;
-  --bs-info: #0dcaf0;
-  --bs-warning: #ffc107;
-  --bs-danger: #dc3545;
-  --bs-light: #f8f9fa;
-  --bs-dark: #212529;
-  --bs-primary-rgb: 13, 110, 253;
-  --bs-secondary-rgb: 108, 117, 125;
-  --bs-success-rgb: 25, 135, 84;
-  --bs-info-rgb: 13, 202, 240;
-  --bs-warning-rgb: 255, 193, 7;
-  --bs-danger-rgb: 220, 53, 69;
-  --bs-light-rgb: 248, 249, 250;
-  --bs-dark-rgb: 33, 37, 41;
-  --bs-white-rgb: 255, 255, 255;
-  --bs-black-rgb: 0, 0, 0;
-  --bs-body-color-rgb: 33, 37, 41;
-  --bs-body-bg-rgb: 255, 255, 255;
-  --bs-font-sans-serif: system-ui, -apple-system, "Segoe UI", Roboto,
-    "Helvetica Neue", Arial, "Noto Sans", "Liberation Sans", sans-serif,
-    "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
-  --bs-font-monospace: SFMono-Regular, Menlo, Monaco, Consolas,
-    "Liberation Mono", "Courier New", monospace;
-  --bs-gradient: linear-gradient(
-    180deg,
-    rgba(255, 255, 255, 0.15),
-    rgba(255, 255, 255, 0)
-  );
-  --bs-body-font-family: var(--bs-font-sans-serif);
-  --bs-body-font-size: 1rem;
-  --bs-body-font-weight: 400;
-  --bs-body-line-height: 1.5;
-  --bs-body-color: #212529;
-  --bs-body-bg: #fff;
-}
-*,
-::after,
-::before {
-  box-sizing: border-box;
-}
-@media (prefers-reduced-motion: no-preference) {
-  :root {
-    scroll-behavior: smooth;
-  }
-}
-body {
-  margin: 0;
-  font-family: var(--bs-body-font-family);
-  font-size: var(--bs-body-font-size);
-  font-weight: var(--bs-body-font-weight);
-  line-height: var(--bs-body-line-height);
-  color: var(--bs-body-color);
-  text-align: var(--bs-body-text-align);
-  background-color: var(--bs-body-bg);
-  -webkit-text-size-adjust: 100%;
-  -webkit-tap-highlight-color: transparent;
-}
-hr {
-  margin: 1rem 0;
-  color: inherit;
-  background-color: currentColor;
-  border: 0;
-  opacity: 0.25;
-}
-hr:not([size]) {
-  height: 1px;
-}
-h1,
-h2,
-h3,
-h4,
-h5,
-h6 {
-  margin-top: 0;
-  margin-bottom: 0.5rem;
-  font-weight: 500;
-  line-height: 1.2;
-}
-h1 {
-  font-size: calc(1.375rem + 1.5vw);
-}
-@media (min-width: 1200px) {
-  h1 {
-    font-size: 2.5rem;
-  }
-}
-h2 {
-  font-size: calc(1.325rem + 0.9vw);
-}
-@media (min-width: 1200px) {
-  h2 {
-    font-size: 2rem;
-  }
-}
-h3 {
-  font-size: calc(1.3rem + 0.6vw);
-}
-@media (min-width: 1200px) {
-  h3 {
-    font-size: 1.75rem;
-  }
-}
-h4 {
-  font-size: calc(1.275rem + 0.3vw);
-}
-@media (min-width: 1200px) {
-  h4 {
-    font-size: 1.5rem;
-  }
-}
-h5 {
-  font-size: 1.25rem;
-}
-h6 {
-  font-size: 1rem;
-}
-p {
-  margin-top: 0;
-  margin-bottom: 1rem;
-}
-abbr[data-bs-original-title],
-abbr[title] {
-  -webkit-text-decoration: underline dotted;
-  text-decoration: underline dotted;
-  cursor: help;
-  -webkit-text-decoration-skip-ink: none;
-  text-decoration-skip-ink: none;
-}
-address {
-  margin-bottom: 1rem;
-  font-style: normal;
-  line-height: inherit;
-}
-ol,
-ul {
-  padding-right: 2rem;
-}
-dl,
-ol,
-ul {
-  margin-top: 0;
-  margin-bottom: 1rem;
-}
-ol ol,
-ol ul,
-ul ol,
-ul ul {
-  margin-bottom: 0;
-}
-dt {
-  font-weight: 700;
-}
-dd {
-  margin-bottom: 0.5rem;
-  margin-right: 0;
-}
-blockquote {
-  margin: 0 0 1rem;
-}
-b,
-strong {
-  font-weight: bolder;
-}
-small {
-  font-size: 0.875em;
-}
-mark {
-  padding: 0.2em;
-  background-color: #fcf8e3;
-}
-sub,
-sup {
-  position: relative;
-  font-size: 0.75em;
-  line-height: 0;
-  vertical-align: baseline;
-}
-sub {
-  bottom: -0.25em;
-}
-sup {
-  top: -0.5em;
-}
-a {
-  color: #0d6efd;
-  text-decoration: underline;
-}
-a:hover {
-  color: #0a58ca;
-}
-a:not([href]):not([class]),
-a:not([href]):not([class]):hover {
-  color: inherit;
-  text-decoration: none;
-}
-code,
-kbd,
-pre,
-samp {
-  font-family: var(--bs-font-monospace);
-  font-size: 1em;
-  direction: ltr;
-  unicode-bidi: bidi-override;
-}
-pre {
-  display: block;
-  margin-top: 0;
-  margin-bottom: 1rem;
-  overflow: auto;
-  font-size: 0.875em;
-}
-pre code {
-  font-size: inherit;
-  color: inherit;
-  word-break: normal;
-}
-code {
-  font-size: 0.875em;
-  color: #d63384;
-  word-wrap: break-word;
-}
-a > code {
-  color: inherit;
-}
-kbd {
-  padding: 0.2rem 0.4rem;
-  font-size: 0.875em;
-  color: #fff;
-  background-color: #212529;
-  border-radius: 0.2rem;
-}
-kbd kbd {
-  padding: 0;
-  font-size: 1em;
-  font-weight: 700;
-}
-figure {
-  margin: 0 0 1rem;
-}
-img,
-svg {
-  vertical-align: middle;
-}
-table {
-  caption-side: bottom;
-  border-collapse: collapse;
-}
-caption {
-  padding-top: 0.5rem;
-  padding-bottom: 0.5rem;
-  color: #6c757d;
-  text-align: right;
-}
-th {
-  text-align: inherit;
-  text-align: -webkit-match-parent;
-}
-tbody,
-td,
-tfoot,
-th,
-thead,
-tr {
-  border-color: inherit;
-  border-style: solid;
-  border-width: 0;
-}
-label {
-  display: inline-block;
-}
-button {
-  border-radius: 0;
-}
-button:focus:not(:focus-visible) {
-  outline: 0;
-}
-button,
-input,
-optgroup,
-select,
-textarea {
-  margin: 0;
-  font-family: inherit;
-  font-size: inherit;
-  line-height: inherit;
-}
-button,
-select {
-  text-transform: none;
-}
-[role="button"] {
-  cursor: pointer;
-}
-select {
-  word-wrap: normal;
-}
-select:disabled {
-  opacity: 1;
-}
-[list]::-webkit-calendar-picker-indicator {
-  display: none;
-}
-[type="button"],
-[type="reset"],
-[type="submit"],
-button {
-  -webkit-appearance: button;
-}
-[type="button"]:not(:disabled),
-[type="reset"]:not(:disabled),
-[type="submit"]:not(:disabled),
-button:not(:disabled) {
-  cursor: pointer;
-}
-::-moz-focus-inner {
-  padding: 0;
-  border-style: none;
-}
-textarea {
-  resize: vertical;
-}
-fieldset {
-  min-width: 0;
-  padding: 0;
-  margin: 0;
-  border: 0;
-}
-legend {
-  float: right;
-  width: 100%;
-  padding: 0;
-  margin-bottom: 0.5rem;
-  font-size: calc(1.275rem + 0.3vw);
-  line-height: inherit;
-}
-@media (min-width: 1200px) {
-  legend {
-    font-size: 1.5rem;
-  }
-}
-legend + * {
-  clear: right;
-}
-::-webkit-datetime-edit-day-field,
-::-webkit-datetime-edit-fields-wrapper,
-::-webkit-datetime-edit-hour-field,
-::-webkit-datetime-edit-minute,
-::-webkit-datetime-edit-month-field,
-::-webkit-datetime-edit-text,
-::-webkit-datetime-edit-year-field {
-  padding: 0;
-}
-::-webkit-inner-spin-button {
-  height: auto;
-}
-[type="search"] {
-  outline-offset: -2px;
-  -webkit-appearance: textfield;
-}
-[type="email"],
-[type="number"],
-[type="tel"],
-[type="url"] {
-  direction: ltr;
-}
-::-webkit-search-decoration {
-  -webkit-appearance: none;
-}
-::-webkit-color-swatch-wrapper {
-  padding: 0;
-}
-::-webkit-file-upload-button {
-  font: inherit;
-}
-::file-selector-button {
-  font: inherit;
-}
-::-webkit-file-upload-button {
-  font: inherit;
-  -webkit-appearance: button;
-}
-output {
-  display: inline-block;
-}
-iframe {
-  border: 0;
-}
-summary {
-  display: list-item;
-  cursor: pointer;
-}
-progress {
-  vertical-align: baseline;
-}
-[hidden] {
-  display: none !important;
-}
-/*# sourceMappingURL=bootstrap-reboot.rtl.min.css.map */
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.rtl.min.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-reboot.rtl.min.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.min.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.min.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.min.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.min.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.rtl.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.rtl.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.rtl.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.rtl.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.rtl.min.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.rtl.min.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.rtl.min.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap-utilities.rtl.min.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.min.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.min.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.min.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.min.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.rtl.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.rtl.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.rtl.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.rtl.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.rtl.min.css
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.rtl.min.css
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.rtl.min.css.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/css/bootstrap.rtl.min.css.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.bundle.js
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.bundle.js
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.bundle.js.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.bundle.js.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.bundle.min.js
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.bundle.min.js
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.bundle.min.js.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.bundle.min.js.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.esm.js
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.esm.js
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.esm.js.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.esm.js.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.esm.min.js
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.esm.min.js
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.esm.min.js.map
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.esm.min.js.map
--- a/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.js
+++ b/localGPTUI/static/dependencies/bootstrap-5.1.3-dist/js/bootstrap.js
--- a/Show More
+++ b/Show More
				`@ -1 +0,0 @@`
				`Requirement already satisfied: protobuf in c:\users\kevin\anaconda3\lib\site-packages (4.24.4)`