Removed unused import statements from various Python files to improve code clarity and reduce unnecessary dependencies. |
||
|---|---|---|
| .. | ||
| agent | ||
| indexing | ||
| ingestion | ||
| pipelines | ||
| rerankers | ||
| retrieval | ||
| utils | ||
| __init__.py | ||
| api_server_with_progress.py | ||
| api_server.py | ||
| DOCUMENTATION.md | ||
| factory.py | ||
| main.py | ||
| README.md | ||
| requirements.txt | ||
Multimodal RAG System
This document provides a detailed overview of the multimodal Retrieval-Augmented Generation (RAG) system implemented in this directory. The system is designed to process and understand information from PDF documents, combining both textual and visual data to answer complex queries.
1. Overview
This RAG system is a sophisticated pipeline that leverages state-of-the-art open-source models to provide accurate, context-aware answers from a document corpus. Unlike traditional RAG systems that only process text, this implementation is fully multimodal. It extracts and indexes both text and images from PDFs, allowing a Vision Language Model (VLM) to reason over both modalities when generating a final answer.
The core capabilities include:
- Multimodal Indexing: Extracts text and images from PDFs and creates separate vector embeddings for each.
- Hybrid Retrieval: Combines dense vector search (for semantic similarity) with traditional keyword-based search (BM25) for robust retrieval.
- Advanced Reranking: Utilizes a powerful reranker model to improve the relevance of retrieved documents before they are passed to the generator.
- VLM-Powered Synthesis: Employs a Vision Language Model to synthesize the final answer, allowing it to analyze both the text and the images from the retrieved document chunks.
2. Architecture
The system is composed of several key Python modules that work together to form the RAG pipeline.
Key Modules:
main.py: The main entry point for the application. It contains the configuration for all models and pipelines and orchestrates the indexing and retrieval processes.rag_system/pipelines/: Contains the high-level orchestration for indexing and retrieval.indexing_pipeline.py: Manages the process of converting raw PDFs into indexed, searchable data.retrieval_pipeline.py: Handles the end-to-end process of taking a user query, retrieving relevant information, and generating a final answer.
rag_system/indexing/: Contains all modules related to data processing and indexing.multimodal.py: Responsible for extracting text and images from PDFs and generating embeddings using the configured vision model (colqwen2-v1.0).representations.py: Defines the text embedding model (Qwen2-7B-instruct) and other data representation generators.embedders.py: Manages the connection to the LanceDB vector database and handles the indexing of vector embeddings.
rag_system/retrieval/: Contains modules for retrieving and ranking documents.retrievers.py: Implements the logic for searching the vector database to find relevant text and image chunks.reranker.py: Contains theQwenRerankerclass, which re-ranks the retrieved documents for improved relevance.
rag_system/agent/: Contains theAgentloop that interacts with the user and the RAG pipelines.rag_system/utils/: Contains utility clients, such as theOllamaClientfor interacting with the Ollama server.
Data Flow:
- Indexing:
- The
MultimodalProcessorreads a PDF and splits it into pages. - For each page, it extracts the raw text and a full-page image.
- The
QwenEmbeddergenerates a vector embedding for the text. - The
LocalVisionModel(usingcolqwen2-v1.0) generates a vector embedding for the image. - The
VectorIndexerstores these embeddings in separate tables within a LanceDB database.
- The
- Retrieval:
- A user submits a query to the
Agent. - The
RetrievalPipeline'sMultiVectorRetrieversearches both the text and image tables in LanceDB for relevant chunks. - The retrieved documents are passed to the
QwenReranker, which re-orders them based on relevance to the query. - The top-ranked documents (containing both text and image references) are passed to the Vision Language Model (
qwen-vl). - The VLM analyzes the text and images to extract key facts.
- A final text generation model (
llama3) synthesizes these facts into a coherent, human-readable answer.
- A user submits a query to the
3. Models
This system relies on a suite of powerful, open-source models.
| Component | Model | Framework | Purpose |
|---|---|---|---|
| Image Embedding | vidore/colqwen2-v1.0 |
colpali |
Generates vector embeddings from images. |
| Text Embedding | Qwen/Qwen2-7B-instruct |
transformers |
Generates vector embeddings from text. |
| Reranker | Qwen/Qwen-reranker |
transformers |
Re-ranks retrieved documents for relevance. |
| Vision Language Model | qwen2.5vl:7b |
Ollama |
Extracts facts from text and images. |
| Text Generation | llama3 |
Ollama |
Synthesizes the final answer. |
4. Configuration
All system configurations are centralized in main.py.
OLLAMA_CONFIG: Defines the models that will be run via the Ollama server. This includes the final text generation model and the Vision Language Model.PIPELINE_CONFIGS: Contains the configurations for both theindexingandretrievalpipelines. Here you can specify:- The paths for the LanceDB database and source documents.
- The names of the tables to be used for text and image embeddings.
- The Hugging Face model names for the text embedder, vision model, and reranker.
- Parameters for the reranker and retrieval process (e.g.,
top_k,retrieval_k).
To change a model, simply update the corresponding model name in this configuration file.
5. Usage
To run the system, you first need to ensure the required models are available.
Prerequisites:
- Install Dependencies:
pip install -r requirements.txt - Download Ollama Models:
ollama pull llama3 ollama pull qwen2.5vl:7b - Hugging Face Models: The
transformersandcolpalilibraries will automatically download the required models the first time they are used. Ensure you have a stable internet connection.
Running the System:
- Execute the Main Script:
python rag_system/main.py - Indexing: The script will first run the indexing pipeline, processing any documents in the
rag_system/documentsdirectory and storing their embeddings in LanceDB. - Querying: Once indexing is complete, the RAG agent will be ready. You can ask questions about the documents you have indexed.
> What was the revenue growth in Q3? - Exit: To stop the agent, type
quit.