- Add .txt and .md extensions to SUPPORTED_FORMATS mapping
- Add _convert_txt_to_markdown method for plain text files
- Support docling's native MD InputFormat for markdown files
- Add proper format detection and routing logic
- Preserve existing PDF OCR detection and multi-format support
Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
- Rename PDFConverter to DocumentConverter with multi-format support
- Add SUPPORTED_FORMATS mapping for PDF, DOCX, HTML, HTM extensions
- Update indexing pipeline to use DocumentConverter
- Update file validation across all frontend components and scripts
- Preserve existing PDF OCR detection logic
- Add format-specific conversion methods for different document types
Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
- Add NaN and infinite value detection in QwenEmbedder and OllamaEmbedder
- Implement LanceDB table creation with on_bad_vectors='drop' parameter
- Add fallback strategy with on_bad_vectors='fill' and fill_value=0.0
- Add pre-filtering of chunks with invalid embeddings before indexing
- Add NaN validation to LateChunkEncoder
- Add detailed logging for skipped chunks and error handling
- Resolves LanceDB error: 'Vector column has NaNs' during indexing
This fix ensures robust handling of edge cases in embedding generation
and prevents indexing failures due to invalid vector values.
- Add environment auto-detection in ChatDatabase class
- Support both local development and Docker container paths
- Local development: uses 'backend/chat_data.db' (relative path)
- Docker containers: uses '/app/backend/chat_data.db' (absolute path)
- Maintain backward compatibility with explicit path overrides
- Update RAG API server to use auto-detection
This resolves the SQLite database connection error that occurred
when running LocalGPT in local development environments while
maintaining compatibility with Docker deployments.
Fixes: Database path hardcoded to Docker container path
Tested: Local development and Docker environment detection
Breaking: No breaking changes - maintains backward compatibility
- Modified OllamaClient to read OLLAMA_HOST environment variable
- Updated docker-compose.yml to pass OLLAMA_HOST to backend service
- Changed docker.env to use Docker gateway IP (172.18.0.1:11434)
- Configured Ollama service to bind to 0.0.0.0:11434 for container access
- Added test script to verify Ollama connectivity from within container
- All backend tests now pass including chat functionality
Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
- Create comprehensive text normalization utility to clean up excessive newlines
- Apply normalization to streaming tokens in session-chat.tsx
- Apply normalization to rendered text in conversation-page.tsx
- Add test case demonstrating the fix for excessive empty lines
- Preserve proper markdown formatting while removing visual gaps
Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
- Update MarkdownRecursiveChunker to use tokenizer for token-based sizing
- Update DoclingChunker to use tokenizer with proper error handling
- Ensure IndexingPipeline passes tokenizer_model to both chunkers
- Update UI tooltips to reflect that both modes now use tokens
- Keep Docling as default for enhanced granularity features
- Add fallback to character-based approximation when tokenizer fails
Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
- Change default chunker_mode from 'legacy' to 'docling' for token-based chunking
- Update UI to reflect new default with DoclingChunk enabled by default
- Improve tooltips to clarify token vs character chunking behavior
- Fixes issue where 512 token setting was using character-based chunking
Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
- Updated Smart Routing description to remove mention of 'graph queries'
- Changed from 'RAG, direct LLM, or graph queries' to 'RAG and direct LLM responses'
- Addresses PR feedback that graph-related features are not yet enabled
Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
- Document Query Decomposition with API examples
- Add Answer Verification system documentation
- Document Contextual Enrichment during indexing
- Add Late Chunking configuration details
- Document Multimodal Support with vision model integration
- Add detailed dependency information in installation section
- Improve batch processing documentation with real config examples
- Update Advanced Features section with comprehensive API examples
Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
- Fix incorrect repository URL in Docker deployment section
- Update API endpoints to match actual backend implementation
- Add session-based chat endpoints and management
- Document real index management endpoints
- Update model configuration with actual OLLAMA_CONFIG and EXTERNAL_MODELS
- Replace generic pipeline configs with actual default/fast configurations
- Add system launcher documentation and service architecture
- Improve health monitoring and logging documentation
- Update environment variables to match code implementation
Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>
- Replaced existing localGPT codebase with multimodal RAG implementation
- Includes full-stack application with backend, frontend, and RAG system
- Added Docker support and comprehensive documentation
- Enhanced with multimodal capabilities for document processing
- Preserved git history for localGPT while integrating new functionality