Commit Graph

397 Commits (37c534df212eefc0e6717c81635ef485478ca4b6)

Author SHA1 Message Date
yunqiqiliang 37c534df21
Merge 18230d12f9 into bd43ca6275 11 months ago
Asuka Minato ef51678c73
orm filter -> where (#22801)
Signed-off-by: -LAN- <laipz8200@outlook.com>
Co-authored-by: -LAN- <laipz8200@outlook.com>
Co-authored-by: Claude <noreply@anthropic.com>
11 months ago
wanttobeamaster 8278b39f85
fix tablestore full text search bug (#22853) 11 months ago
wanttobeamaster 1c3c40db69
fix: tablestore TypeError when vector is missing (#22843)
Co-authored-by: xiaozhiqing.xzq <xiaozhiqing.xzq@alibaba-inc.com>
11 months ago
wlleiiwang b4e152f775
FEAT: Tencent Vector search supports backward compatibility with the previous score calculation approach. (#22820)
Co-authored-by: wlleiiwang <wlleiiwang@tencent.com>
11 months ago
Asuka Minato 6d3e198c3c
Mapped column (#22644)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
11 months ago
wanttobeamaster a2048fd0f4
fix: tablestore vdb support metadata filter (#22774)
Co-authored-by: xiaozhiqing.xzq <xiaozhiqing.xzq@alibaba-inc.com>
11 months ago
issac2e 58d92970a9
Optimize tencent_vector knowledge base deletion error handling with batch processing support (#22726)
Co-authored-by: liuchen15 <liuchen15@gaotu.cn>
Co-authored-by: crazywoola <427733928@qq.com>
11 months ago
uply23333 ab012fe1a2
fix: improve document filtering in full text search(elasticsearch) (#22683) 11 months ago
8bitpd 9251a66a10
fix: update analyticdb vector to do filter by metadata (#22698)
Co-authored-by: xiaozeyu <xiaozeyu.xzy@alibaba-inc.com>
11 months ago
yunqiqiliang 18230d12f9 Auto-format: Fix code style for CI compliance
🤖 Automated formatting applied by CI test script
- Ensures 100% compliance with Python style guidelines
- No functional changes, only formatting improvements

Generated by: run_complete_ci_test.sh
11 months ago
yunqiqiliang c3851595d0 Fix MyPy type checking errors in ClickZetta vector implementation
- Add proper type annotations for Connection from clickzetta module
- Implement _ensure_connection() method to handle None connection checks
- Fix all database cursor access patterns to use proper null checking
- Add type annotation for result queue in _execute_write method
- Resolve factory method configuration issues with None value handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
yunqiqiliang f57fa13f1b fix: resolve CI linting issues and add missing newlines
- Fix all line length issues (120 character limit)
- Remove all trailing whitespace
- Add missing newlines at end of files
- Add CLICKZETTA_VOLUME_DIFY_PREFIX environment variable to docker-compose.yaml
- Ensure proper code formatting for all ClickZetta files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
yunqiqiliang 2de316c557 feat: add ClickZetta Volume storage support
- Add three volume types: User, Table, and External Volume
- Complete file operations: upload, download, delete, list, stream
- Intelligent configuration management with fallback to vector DB settings
- Simplified user experience with 'user' as default volume type
- Comprehensive error handling and logging
- Docker integration with updated compose files
- Integration tests for all volume types
- Disabled complex permission checking for stability

🎯 Features:
- User Volume: Personal/small team use, simple configuration
- Table Volume: Enterprise multi-tenant with smart routing
- External Volume: Data lake integration with external storage
- Flexible configuration with environment variable support
- Complete file lifecycle management

🔧 Technical:
- Reuses existing ClickZetta connection configuration
- Pydantic-based configuration validation
- Comprehensive error handling and logging
- Performance-optimized with connection reuse
- Clean integration with Dify's storage architecture

🚀 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
znn ed263aed9f
fix text splitter (#22596) 11 months ago
-LAN- 460a825ef1
refactor: decouple Node and NodeData (#22581)
Signed-off-by: -LAN- <laipz8200@outlook.com>
Co-authored-by: QuantumGhost <obelisk.reg+git@gmail.com>
11 months ago
helojo e7d80bf7bf
Fix: the pict type picture was not processed in the docx (#19305)
Co-authored-by: zqgame <zqgame@zqgame.local>
11 months ago
yunqiqiliang 8e707cace9 Fix recall testing and search functionality for ClickZetta integration
- Fix double JSON encoding issue in metadata parsing for all search methods
- Remove unnecessary dataset_id filters since each dataset has its own table
- Add robust metadata parsing with fallback for JSON decode errors
- Ensure document_id is always present for Dify's format_retrieval_documents
- Clean up debug logging while preserving essential error logs
- Support vector search, full-text search, and hybrid search in recall testing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
yunqiqiliang fcf8387f52 Fix SQL statement length issues and improve batch processing
- Add SQL length monitoring and automatic batch splitting
- Reduce default batch size from 100 to 20 to prevent large SQL statements
- Add detailed error logging for SQL execution failures
- Implement recursive batch splitting for oversized SQL statements
- Set 1MB limit for SQL statement length

This resolves issues where large batches create SQL statements that
exceed database limits, causing vector insertion failures.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
yunqiqiliang 8dea8766e9 Fix document content special characters causing SQL syntax errors
- Add specialized document content cleaning function
- Handle backticks, quotes, newlines, and control characters properly
- Replace problematic characters instead of just escaping them
- Normalize whitespace and remove control characters
- Fix "Syntax error at or near" issues from document content like shell commands

This resolves SQL syntax errors when documents contain shell scripts,
code snippets, or other text with special formatting characters.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
yunqiqiliang f116407045 Fix SQL injection vulnerabilities and character encoding issues
- Enhanced string escaping for SQL safety (backslashes, newlines, tabs)
- Added safe JSON formatting with ensure_ascii=True
- Implemented safe doc_id validation (alphanumeric + hyphens/underscores only)
- Protected all user input: document content, metadata, IDs, search queries
- Fixed potential SQL syntax errors from special characters in document content

This addresses "Syntax error at or near 'files'" errors that occur when
document content or metadata contains special characters that break SQL syntax.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
yunqiqiliang 9c2bf2b30f Fix SQL syntax errors with vector formatting
- Add safe vector formatting function to handle special float values
- Handle NaN, infinity values in vector embeddings
- Prevent SQL syntax errors from malformed VECTOR() statements
- Use consistent vector formatting across all SQL operations

This fixes "Syntax error at or near '{'" errors that occur when
vector embeddings contain special float values during knowledge
base construction.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
yunqiqiliang 1b7603deb1 Fix inverted index duplicate creation issue
- Add table existence check before creating indexes
- Improve error handling for ClickZetta specific error messages
- Remove duplicate _table_exists method definition
- Prevent high-frequency index creation attempts during bulk operations

This fixes the "already has index with the same type" errors during
large knowledge base construction with 600+ documents.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
yunqiqiliang c4d9e5c69f Add documentation section and clean up formatting
- Add clear section header for ClickZetta configuration
- Improve code organization and readability
- All lint checks should pass with latest fixes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
yunqiqiliang ed139a49a3 Fix code style issues for CI checks
- Remove unused imports (time, VectorType)
- Replace logger.error with logger.exception for exception handling
- Remove redundant exception objects from logging.exception calls
- Ensure all Python style checks pass

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
11 months ago
yunqiqiliang b201e5d502 feat: add Clickzetta vector database support
- Add ClickzettaVector implementation with write queue for concurrent safety
- Support vector similarity search using HNSW algorithm
- Support full-text search with inverted indexes
- Add comprehensive configuration and environment variables
- Add unit and integration tests
- Resolve dependency conflicts with clickzetta-connector-python 0.8.102

Co-authored-by: Claude <noreply@anthropic.com>
11 months ago
yihong d2933c2bfe
fix: drop dead code phase2 unused class (#22042)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
11 months ago
wanttobeamaster bf7b2c339b
tablestore vector support more method (#22225)
Co-authored-by: xiaozhiqing.xzq <xiaozhiqing.xzq@alibaba-inc.com>
11 months ago
Jacky Wu 3e96c0c468
fix: close session before doing long latency operation (#22306) 11 months ago
luckylhb90 a371390d6c
optimize: batch embedding and qdrant write_consistency_factor parameter (#21776)
Co-authored-by: hobo.l <hobo.l@binance.com>
11 months ago
wlleiiwang 89b52471fb
Optimize the memory usage of Tencent Vector Database (#22079)
Co-authored-by: wlleiiwang <wlleiiwang@tencent.com>
11 months ago
baonudesifeizhai 1c7404099d
fix: prevent timeout in file encoding detection for large files (#21453)
Co-authored-by: crazywoola <427733928@qq.com>
11 months ago
efrey kong 826bf25abf
Fix: prevent SQL errors when metadata filter Constant value is None or blank (#21803) 11 months ago
Dongyu Li 00f0b569cc
Feat/kb index (#20868)
Co-authored-by: twwu <twwu@dify.ai>
12 months ago
Jin 3e7f8bad56
fix: markdown_extractor lost chunks if it starts without a header(#21308) (#21309) 12 months ago
LiuBo 17fe62cf91
feat: add support for Matrixone database (#20714) 12 months ago
NeatGuyCoding 9835730278
Translation fix (#21194) 12 months ago
NeatGuyCoding 2eae7503e1
Minor Improvements for File Validation and Configuration Handling #21179 (#21171)
Co-authored-by: tech <cto@sb>
12 months ago
Ademílson Tonato 9e73e8b9e8
feat: add search endpoint for Firecrawl Integration (#20521)
Co-authored-by: crazywoola <427733928@qq.com>
12 months ago
Rain Wang 47e0f92c0f
Fixes #20748 KnowledgeRetrievalNode return all external documents when reranker disabled even top-k configed (#20762) 12 months ago
kazuya-awano 45c89bd6de
feat: add pagenation to notion extractor (#20919) 12 months ago
kurokobo 4689e8953e
fix: shorten connection timeout to pypi.org for deprecation check for weaviate client (#21131) 12 months ago
Bowen Liang 366ddb05ae
test: run vdb test of oceanbase with docker compose in CI tests (#20945) 1 year ago
Bowen Liang 0f3d4d0b6e
chore: bump mypy to 1.16 (#20608) 1 year ago
QuantumGhost c439e82038
refactor(api): Decouple `ParameterExtractorNode` from `LLMNode` (#20843)
- Extract methods used by `ParameterExtractorNode` from `LLMNode` into a separate file.
- Convert `ParameterExtractorNode` into a subclass of `BaseNode`.
- Refactor code referencing the extracted methods to ensure functionality and clarity.
- Fixes the issue that `ParameterExtractorNode` returns error when executed.
- Fix relevant test cases.

Closes #20840.
1 year ago
yihong 65c7c01d90
fix: clean up two unreachable code (#20773)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
jefferyvvv 37c3283450
fix: opensearch vector search falls back to keyword search (#20723)
Co-authored-by: wenjun.gu <wenjun.gu@envision-energy.com>
1 year ago
jefferyvvv 4271602cfc
fix: opensearch metadata filtering returns empty (#20701)
Co-authored-by: wenjun.gu <wenjun.gu@envision-energy.com>
Co-authored-by: crazywoola <427733928@qq.com>
1 year ago
jefferyvvv 138ad6e8b3
fix: opensearch fulltext search with metadata filtering dsl error (#20702)
Co-authored-by: wenjun.gu <wenjun.gu@envision-energy.com>
1 year ago
kenwoodjw 01d500db14
fix: autocorrect everything in web (#20605)
Signed-off-by: kenwoodjw <blackxin55+@gmail.com>
1 year ago