localGPT

mirror of https://github.com/zebrajr/localGPT.git synced 2025-12-06 12:20:53 +01:00

Author	SHA1	Message	Date
Devin AI	583c72e340	feat: Add TXT and MD file format support to DocumentConverter - Add .txt and .md extensions to SUPPORTED_FORMATS mapping - Add _convert_txt_to_markdown method for plain text files - Support docling's native MD InputFormat for markdown files - Add proper format detection and routing logic - Preserve existing PDF OCR detection and multi-format support Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>	2025-07-21 20:47:24 +00:00
Devin AI	d5929ce29b	feat: Add support for DOCX and HTML file formats using docling - Rename PDFConverter to DocumentConverter with multi-format support - Add SUPPORTED_FORMATS mapping for PDF, DOCX, HTML, HTM extensions - Update indexing pipeline to use DocumentConverter - Update file validation across all frontend components and scripts - Preserve existing PDF OCR detection logic - Add format-specific conversion methods for different document types Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>	2025-07-21 20:40:39 +00:00
PromptEngineer	acf6efb5a4	fix: Add comprehensive NaN handling for LanceDB indexing - Add NaN and infinite value detection in QwenEmbedder and OllamaEmbedder - Implement LanceDB table creation with on_bad_vectors='drop' parameter - Add fallback strategy with on_bad_vectors='fill' and fill_value=0.0 - Add pre-filtering of chunks with invalid embeddings before indexing - Add NaN validation to LateChunkEncoder - Add detailed logging for skipped chunks and error handling - Resolves LanceDB error: 'Vector column has NaNs' during indexing This fix ensures robust handling of edge cases in embedding generation and prevents indexing failures due to invalid vector values.	2025-07-18 00:26:39 -07:00
PromptEngineer	35697b23a4	fix: implement automatic database path detection for multi-environment compatibility - Add environment auto-detection in ChatDatabase class - Support both local development and Docker container paths - Local development: uses 'backend/chat_data.db' (relative path) - Docker containers: uses '/app/backend/chat_data.db' (absolute path) - Maintain backward compatibility with explicit path overrides - Update RAG API server to use auto-detection This resolves the SQLite database connection error that occurred when running LocalGPT in local development environments while maintaining compatibility with Docker deployments. Fixes: Database path hardcoded to Docker container path Tested: Local development and Docker environment detection Breaking: No breaking changes - maintains backward compatibility	2025-07-17 22:13:25 -07:00
Devin AI	a13a71d247	fix: make both chunking methods token-based - Update MarkdownRecursiveChunker to use tokenizer for token-based sizing - Update DoclingChunker to use tokenizer with proper error handling - Ensure IndexingPipeline passes tokenizer_model to both chunkers - Update UI tooltips to reflect that both modes now use tokens - Keep Docling as default for enhanced granularity features - Add fallback to character-based approximation when tokenizer fails Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>	2025-07-15 06:38:55 +00:00
Devin AI	3b648520c9	fix: default to token-based chunking for accurate chunk sizing - Change default chunker_mode from 'legacy' to 'docling' for token-based chunking - Update UI to reflect new default with DoclingChunk enabled by default - Improve tooltips to clarify token vs character chunking behavior - Fixes issue where 512 token setting was using character-based chunking Co-Authored-By: PromptEngineer <jnfarooq@outlook.com>	2025-07-15 06:05:22 +00:00
PromptEngineer	6d73a61e5c	refactor: Remove unused imports across codebase Removed unused import statements from various Python files to improve code clarity and reduce unnecessary dependencies.	2025-07-12 02:34:17 -07:00
PromptEngineer	c93b8639ab	fix(db): Correct database path and chat history logic	2025-07-12 01:51:57 -07:00
PromptEngineer	2421514f3e	Integrate multimodal RAG codebase - Replaced existing localGPT codebase with multimodal RAG implementation - Includes full-stack application with backend, frontend, and RAG system - Added Docker support and comprehensive documentation - Enhanced with multimodal capabilities for document processing - Preserved git history for localGPT while integrating new functionality	2025-07-11 00:17:15 -07:00

9 Commits