mirror of https://github.com/zebrajr/localGPT.git synced 2025-12-06 12:20:53 +01:00

PromptEngineer 2421514f3e Integrate multimodal RAG codebase

- Replaced existing localGPT codebase with multimodal RAG implementation
- Includes full-stack application with backend, frontend, and RAG system
- Added Docker support and comprehensive documentation
- Enhanced with multimodal capabilities for document processing
- Preserved git history for localGPT while integrating new functionality

2025-07-11 00:17:15 -07:00

2.9 KiB

Raw Permalink Blame History

🏗️ System Architecture Overview

Last updated: 2025-07-06

This document explains how data and control flow through the Advanced RAG System — from a user's browser all the way to model inference and back. It is intended as the ground-truth reference for engineers and integrators.

1. Bird's-Eye Diagram

flowchart LR
    subgraph Client
        U["👤  User (Browser)"]
        FE["Next.js Front-end\nReact Components"]
        U --> FE
    end

    subgraph Network
        FE -->|HTTP/JSON| BE["Python HTTP Server\nbackend/server.py"]
    end

    subgraph Core["rag_system core package"]
        BE --> LOOP["Agent Loop\n(rag_system/agent/loop.py)"]
        BE --> IDX["Indexing Pipeline\n(pipelines/indexing_pipeline.py)"]

        LOOP --> RP["Retrieval Pipeline\n(pipelines/retrieval_pipeline.py)"]
        LOOP --> VER["Verifier (Grounding Check)"]
        RP --> RET["Retrievers\nBM25 | Dense | Hybrid"]
        RP --> RER["AI Reranker"]
        RP --> SYNT["Answer Synthesiser"]
    end

    subgraph Storage
        LDB[("LanceDB Vector Tables")]
        SQL[("SQLite – chat & metadata")]
    end

    subgraph Models
        OLLAMA["Ollama Server\n(qwen3, etc.)"]
        HF["HuggingFace Hosted\nEmbedding/Reranker Models"]
    end

    %% data edges
    IDX -->|chunks & embeddings| LDB
    RET -->|vector search| LDB
    LOOP -->|LLM calls| OLLAMA
    RP -->|LLM calls| OLLAMA
    VER -->|LLM calls| OLLAMA
    RP -->|rerank| HF

    BE -->|CRUD| SQL

Data-flow Narrative

User interacts with the Next.js UI; messages are posted via src/lib/api.ts.
backend/server.py receives JSON over HTTP, applies CORS, and proxies the request into rag_system.
Agent Loop decides (via Triage) whether to perform Retrieval-Augmented Generation (RAG) or direct LLM answering.
If RAG is chosen:
1. Retrieval Pipeline fetches candidates from LanceDB using BM25 + dense vectors.
2. AI Reranker (HF model) sorts snippets.
3. Answer Synthesiser calls Ollama to write the final answer.
Answers can be Verified for grounding (optional flag).
Index-building is an offline path triggered from the UI — PDF/📄 files are chunked, embedded and stored in LanceDB.

2. Component Documents

The table below links to deep-dives for each major component.

Component	Documentation
Agent Loop	`system_overview.md`
Indexing Pipeline	`indexing_pipeline.md`
Retrieval Pipeline	`retrieval_pipeline.md`
Verifier	`verifier.md`
Triage System	`triage_system.md`

Change-management: whenever architecture changes (new micro-service, different DB, etc.) update this overview diagram first, then individual component docs.

2.9 KiB Raw Permalink Blame History Unescape Escape

🏗️ System Architecture Overview

1. Bird's-Eye Diagram

Data-flow Narrative

2. Component Documents

2.9 KiB

Raw Permalink Blame History