refactor: remove clickzetta/ folder and update service endpoint
- Remove clickzetta/ development folder from PR (add to .gitignore) - Update CLICKZETTA_SERVICE from uat-api.clickzetta.com to api.clickzetta.com - Update both docker/.env.example and docker/docker-compose.yaml for consistency 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>pull/22551/head
parent
0246f39564
commit
ecbe555cb0
@ -1,48 +0,0 @@
|
|||||||
# ClickZetta Dify Integration Environment Configuration
|
|
||||||
# Copy this file to .env and configure your ClickZetta credentials
|
|
||||||
|
|
||||||
# ClickZetta Database Configuration (Required)
|
|
||||||
CLICKZETTA_USERNAME=your_username
|
|
||||||
CLICKZETTA_PASSWORD=your_password
|
|
||||||
CLICKZETTA_INSTANCE=your_instance
|
|
||||||
|
|
||||||
# ClickZetta Advanced Settings (Optional)
|
|
||||||
CLICKZETTA_SERVICE=api.clickzetta.com
|
|
||||||
CLICKZETTA_WORKSPACE=quick_start
|
|
||||||
CLICKZETTA_VCLUSTER=default_ap
|
|
||||||
CLICKZETTA_SCHEMA=dify
|
|
||||||
CLICKZETTA_BATCH_SIZE=20
|
|
||||||
CLICKZETTA_ENABLE_INVERTED_INDEX=true
|
|
||||||
CLICKZETTA_ANALYZER_TYPE=chinese
|
|
||||||
CLICKZETTA_ANALYZER_MODE=smart
|
|
||||||
CLICKZETTA_VECTOR_DISTANCE_FUNCTION=cosine_distance
|
|
||||||
|
|
||||||
# Dify Core Settings
|
|
||||||
SECRET_KEY=dify
|
|
||||||
INIT_PASSWORD=
|
|
||||||
CONSOLE_WEB_URL=
|
|
||||||
CONSOLE_API_URL=
|
|
||||||
SERVICE_API_URL=
|
|
||||||
|
|
||||||
# Database Settings
|
|
||||||
DB_USERNAME=postgres
|
|
||||||
DB_PASSWORD=difyai123456
|
|
||||||
DB_HOST=db
|
|
||||||
DB_PORT=5432
|
|
||||||
DB_DATABASE=dify
|
|
||||||
|
|
||||||
# Redis Settings
|
|
||||||
REDIS_HOST=redis
|
|
||||||
REDIS_PORT=6379
|
|
||||||
REDIS_PASSWORD=difyai123456
|
|
||||||
REDIS_DB=0
|
|
||||||
|
|
||||||
# Storage Settings
|
|
||||||
STORAGE_TYPE=local
|
|
||||||
STORAGE_LOCAL_PATH=storage
|
|
||||||
|
|
||||||
# Nginx Settings
|
|
||||||
EXPOSE_NGINX_PORT=80
|
|
||||||
NGINX_SERVER_NAME=_
|
|
||||||
NGINX_HTTPS_ENABLED=false
|
|
||||||
NGINX_PORT=80
|
|
||||||
@ -1,93 +0,0 @@
|
|||||||
## 🚀 Feature Request: Add Clickzetta Lakehouse as Vector Database Option
|
|
||||||
|
|
||||||
### **Is your feature request related to a problem? Please describe.**
|
|
||||||
Currently, Dify supports several vector databases (Pinecone, Weaviate, Qdrant, etc.) but lacks support for Clickzetta Lakehouse. This creates a gap for customers who are already using Clickzetta Lakehouse as their data platform and want to integrate it with Dify for RAG applications.
|
|
||||||
|
|
||||||
### **Describe the solution you'd like**
|
|
||||||
Add Clickzetta Lakehouse as a vector database option in Dify, allowing users to configure Clickzetta as their vector storage backend through standard Dify configuration.
|
|
||||||
|
|
||||||
### **Business Justification**
|
|
||||||
- **Customer Demand**: Real commercial customers are actively waiting for Dify + Clickzetta integration solution for trial validation
|
|
||||||
- **Unified Data Platform**: Clickzetta Lakehouse provides a unified platform for both vector data and structured data storage
|
|
||||||
- **Performance**: Supports HNSW vector indexing and high-performance similarity search
|
|
||||||
- **Cost Efficiency**: Reduces the need for separate vector database infrastructure
|
|
||||||
|
|
||||||
### **Describe alternatives you've considered**
|
|
||||||
- **External Vector Database**: Using separate vector databases like Pinecone or Weaviate, but this adds infrastructure complexity and cost
|
|
||||||
- **Data Duplication**: Maintaining data in both Clickzetta and external vector databases, leading to synchronization challenges
|
|
||||||
- **Custom Integration**: Building custom connectors, but this lacks the seamless integration that native Dify support provides
|
|
||||||
|
|
||||||
### **Proposed Implementation**
|
|
||||||
Implement Clickzetta Lakehouse integration following Dify's existing vector database pattern:
|
|
||||||
|
|
||||||
#### **Core Components**:
|
|
||||||
- `ClickzettaVector` class implementing `BaseVector` interface
|
|
||||||
- `ClickzettaVectorFactory` for instance creation
|
|
||||||
- Configuration through Dify's standard config system
|
|
||||||
|
|
||||||
#### **Key Features**:
|
|
||||||
- ✅ Vector similarity search with HNSW indexing
|
|
||||||
- ✅ Full-text search with inverted indexes
|
|
||||||
- ✅ Concurrent write operations with queue mechanism
|
|
||||||
- ✅ Chinese text analysis support
|
|
||||||
- ✅ Automatic index management
|
|
||||||
|
|
||||||
#### **Configuration Example**:
|
|
||||||
```bash
|
|
||||||
VECTOR_STORE=clickzetta
|
|
||||||
CLICKZETTA_USERNAME=your_username
|
|
||||||
CLICKZETTA_PASSWORD=your_password
|
|
||||||
CLICKZETTA_INSTANCE=your_instance
|
|
||||||
CLICKZETTA_SERVICE=api.clickzetta.com
|
|
||||||
CLICKZETTA_WORKSPACE=your_workspace
|
|
||||||
CLICKZETTA_VCLUSTER=default_ap
|
|
||||||
CLICKZETTA_SCHEMA=dify
|
|
||||||
```
|
|
||||||
|
|
||||||
### **Technical Specifications**
|
|
||||||
- **Vector Operations**: Insert, search, delete vectors with metadata
|
|
||||||
- **Indexing**: Automatic HNSW vector index creation with configurable parameters
|
|
||||||
- **Concurrency**: Write queue mechanism for thread safety
|
|
||||||
- **Distance Metrics**: Support for cosine distance and L2 distance
|
|
||||||
- **Full-text Search**: Inverted index for content search with Chinese text analysis
|
|
||||||
- **Scalability**: Handles large-scale vector data with efficient batch operations
|
|
||||||
|
|
||||||
### **Implementation Status**
|
|
||||||
- ✅ Implementation is complete and ready for integration
|
|
||||||
- ✅ Comprehensive testing completed in real Clickzetta environments
|
|
||||||
- ✅ 100% test pass rate for core functionality
|
|
||||||
- ✅ Performance validated with production-like data volumes
|
|
||||||
- ✅ Backward compatibility verified with existing Dify configurations
|
|
||||||
- ✅ Full documentation provided
|
|
||||||
- ✅ PR submitted: #22551
|
|
||||||
|
|
||||||
### **Testing Evidence**
|
|
||||||
```
|
|
||||||
🧪 Standalone Tests: 3/3 passed (100%)
|
|
||||||
🧪 Integration Tests: 8/8 passed (100%)
|
|
||||||
🧪 Performance Tests: Vector search ~170ms, Insert rate ~5.3 docs/sec
|
|
||||||
🧪 Real Environment: Validated with actual Clickzetta Lakehouse instance
|
|
||||||
```
|
|
||||||
|
|
||||||
### **Business Impact**
|
|
||||||
- **Customer Enablement**: Enables customers already using Clickzetta to adopt Dify seamlessly
|
|
||||||
- **Infrastructure Simplification**: Reduces complexity by using unified data platform
|
|
||||||
- **Enterprise Ready**: Supports enterprise-grade deployments with proven stability
|
|
||||||
- **Cost Optimization**: Eliminates need for separate vector database infrastructure
|
|
||||||
|
|
||||||
### **Additional Context**
|
|
||||||
This feature request is backed by direct customer demand and includes a complete, tested implementation ready for integration. The implementation follows Dify's existing patterns and maintains full backward compatibility.
|
|
||||||
|
|
||||||
**Related Links:**
|
|
||||||
- Implementation PR: #22551
|
|
||||||
- User Configuration Guide: [Available in PR]
|
|
||||||
- Testing Guide with validation results: [Available in PR]
|
|
||||||
- Performance benchmarks: [Available in PR]
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Environment:**
|
|
||||||
- Dify Version: Latest main branch
|
|
||||||
- Clickzetta Version: Compatible with v1.0.0+
|
|
||||||
- Python Version: 3.11+
|
|
||||||
- Testing Environment: Real Clickzetta Lakehouse UAT instance
|
|
||||||
@ -1,25 +0,0 @@
|
|||||||
## Related Issue
|
|
||||||
Closes #22557
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
This PR adds Clickzetta Lakehouse as a vector database option in Dify, enabling customers to use Clickzetta as their unified data platform for both vector and structured data storage.
|
|
||||||
|
|
||||||
## Key Features
|
|
||||||
- ✅ Full BaseVector interface implementation
|
|
||||||
- ✅ HNSW vector indexing with automatic management
|
|
||||||
- ✅ Concurrent write operations with queue mechanism
|
|
||||||
- ✅ Chinese text analysis and full-text search
|
|
||||||
- ✅ Comprehensive error handling and retry mechanisms
|
|
||||||
|
|
||||||
## Testing Status
|
|
||||||
- 🧪 **Standalone Tests**: 3/3 passed (100%)
|
|
||||||
- 🧪 **Integration Tests**: 8/8 passed (100%)
|
|
||||||
- 🧪 **Performance**: Vector search ~170ms, Insert rate ~5.3 docs/sec
|
|
||||||
- 🧪 **Real Environment**: Validated with actual Clickzetta Lakehouse instance
|
|
||||||
|
|
||||||
## Business Impact
|
|
||||||
Real commercial customers are actively waiting for this Dify + Clickzetta integration solution for trial validation. This integration eliminates the need for separate vector database infrastructure while maintaining enterprise-grade performance and reliability.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
[保留原有的详细PR描述内容...]
|
|
||||||
@ -1,20 +0,0 @@
|
|||||||
# Updated PR Description Header
|
|
||||||
|
|
||||||
## Related Issue
|
|
||||||
This PR addresses the need for Clickzetta Lakehouse vector database integration in Dify. While no specific issue was opened beforehand, this feature is driven by:
|
|
||||||
|
|
||||||
- **Direct customer demand**: Real commercial customers are actively waiting for Dify + Clickzetta integration solution for trial validation
|
|
||||||
- **Business necessity**: Customers using Clickzetta Lakehouse need native Dify integration to avoid infrastructure duplication
|
|
||||||
- **Technical requirement**: Unified data platform support for both vector and structured data
|
|
||||||
|
|
||||||
## Feature Overview
|
|
||||||
Add Clickzetta Lakehouse as a vector database option in Dify, providing:
|
|
||||||
- Full BaseVector interface implementation
|
|
||||||
- HNSW vector indexing support
|
|
||||||
- Concurrent write operations with queue mechanism
|
|
||||||
- Chinese text analysis and full-text search
|
|
||||||
- Enterprise-grade performance and reliability
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
[Rest of existing PR description remains the same...]
|
|
||||||
@ -1,296 +0,0 @@
|
|||||||
# Clickzetta Vector Database Integration - PR Preparation Summary
|
|
||||||
|
|
||||||
## 🎯 Integration Completion Status
|
|
||||||
|
|
||||||
### ✅ Completed Work
|
|
||||||
|
|
||||||
#### 1. Core Functionality Implementation (100%)
|
|
||||||
- **ClickzettaVector Class**: Complete implementation of BaseVector interface
|
|
||||||
- **Configuration System**: ClickzettaConfig class with full configuration options support
|
|
||||||
- **Connection Management**: Robust connection management with retry mechanisms and error handling
|
|
||||||
- **Write Queue Mechanism**: Innovative design to address Clickzetta's concurrent write limitations
|
|
||||||
- **Search Functions**: Dual support for vector search and full-text search
|
|
||||||
|
|
||||||
#### 2. Architecture Integration (100%)
|
|
||||||
- **Dify Framework Compatibility**: Full compliance with BaseVector interface specifications
|
|
||||||
- **Factory Pattern Integration**: Properly registered with VectorFactory
|
|
||||||
- **Configuration System Integration**: Environment variable configuration support
|
|
||||||
- **Docker Environment Compatibility**: Works correctly in containerized environments
|
|
||||||
|
|
||||||
#### 3. Code Quality (100%)
|
|
||||||
- **Type Annotations**: Complete type hints
|
|
||||||
- **Error Handling**: Robust exception handling and retry mechanisms
|
|
||||||
- **Logging**: Detailed debugging and operational logs
|
|
||||||
- **Documentation**: Clear code documentation
|
|
||||||
|
|
||||||
#### 4. Dependency Management (100%)
|
|
||||||
- **Version Compatibility**: Resolved urllib3 version conflicts
|
|
||||||
- **Dependency Declaration**: Correctly added to pyproject.toml
|
|
||||||
- **Docker Integration**: Properly installed and loaded in container environments
|
|
||||||
|
|
||||||
### ✅ Testing Status
|
|
||||||
|
|
||||||
#### Technical Validation (100% Complete)
|
|
||||||
- ✅ **Module Import**: Correctly loaded in Docker environment
|
|
||||||
- ✅ **Class Structure**: All required methods exist and are correct
|
|
||||||
- ✅ **Configuration System**: Parameter validation and defaults working normally
|
|
||||||
- ✅ **Connection Mechanism**: API calls and error handling correct
|
|
||||||
- ✅ **Error Handling**: Retry and exception propagation normal
|
|
||||||
|
|
||||||
#### Functional Validation (100% Complete)
|
|
||||||
- ✅ **Data Operations**: Real environment testing passed (table creation, data insertion, queries)
|
|
||||||
- ✅ **Performance Testing**: Real environment validation complete (vector search 170ms, insertion 5.3 docs/sec)
|
|
||||||
- ✅ **Concurrent Testing**: Real database connection testing complete (3-thread concurrent writes)
|
|
||||||
|
|
||||||
## 📋 PR Content Checklist
|
|
||||||
|
|
||||||
### New Files
|
|
||||||
```
|
|
||||||
api/core/rag/datasource/vdb/clickzetta/
|
|
||||||
├── __init__.py
|
|
||||||
└── clickzetta_vector.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### Modified Files
|
|
||||||
```
|
|
||||||
api/core/rag/datasource/vdb/vector_factory.py
|
|
||||||
api/pyproject.toml
|
|
||||||
docker/.env.example
|
|
||||||
```
|
|
||||||
|
|
||||||
### Testing and Documentation
|
|
||||||
```
|
|
||||||
clickzetta/
|
|
||||||
├── test_clickzetta_integration.py
|
|
||||||
├── standalone_clickzetta_test.py
|
|
||||||
├── quick_test_clickzetta.py
|
|
||||||
├── docker_test.py
|
|
||||||
├── final_docker_test.py
|
|
||||||
├── TESTING_GUIDE.md
|
|
||||||
├── TEST_EVIDENCE.md
|
|
||||||
├── REAL_TEST_EVIDENCE.md
|
|
||||||
└── PR_SUMMARY.md
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🔧 Technical Features
|
|
||||||
|
|
||||||
### Core Functionality
|
|
||||||
1. **Vector Storage**: Support for 1536-dimensional vector storage and retrieval
|
|
||||||
2. **HNSW Indexing**: Automatic creation and management of HNSW vector indexes
|
|
||||||
3. **Full-text Search**: Inverted index support for Chinese word segmentation and search
|
|
||||||
4. **Batch Operations**: Optimized batch insertion and updates
|
|
||||||
5. **Concurrent Safety**: Write queue mechanism to resolve concurrent conflicts
|
|
||||||
|
|
||||||
### Innovative Design
|
|
||||||
1. **Write Queue Serialization**: Solves Clickzetta primary key table concurrent limitations
|
|
||||||
2. **Smart Retry**: 6-retry mechanism handles temporary network issues
|
|
||||||
3. **Configuration Flexibility**: Supports production and UAT environment switching
|
|
||||||
4. **Error Recovery**: Robust exception handling and state recovery
|
|
||||||
|
|
||||||
### Performance Optimizations
|
|
||||||
1. **Connection Pool Management**: Efficient database connection reuse
|
|
||||||
2. **Batch Processing Optimization**: Configurable maximum batch size
|
|
||||||
3. **Index Strategy**: Automatic index creation and management
|
|
||||||
4. **Query Optimization**: Configurable vector distance functions
|
|
||||||
|
|
||||||
## 📊 Test Evidence
|
|
||||||
|
|
||||||
### Real Environment Test Validation
|
|
||||||
```
|
|
||||||
🧪 Independent Connection Test: ✅ Passed (Successfully connected to Clickzetta UAT environment)
|
|
||||||
🧪 Table Operations Test: ✅ Passed (Table creation, inserted 5 records, query validation)
|
|
||||||
🧪 Vector Index Test: ✅ Passed (HNSW index creation successful)
|
|
||||||
🧪 Vector Search Test: ✅ Passed (170ms search latency, returned 3 results)
|
|
||||||
🧪 Concurrent Write Test: ✅ Passed (3-thread concurrent, 20 documents, 5.3 docs/sec)
|
|
||||||
🧪 Overall Pass Rate: ✅ 100% (3/3 test groups passed)
|
|
||||||
```
|
|
||||||
|
|
||||||
### API Integration Validation
|
|
||||||
```
|
|
||||||
✅ Correct HTTPS endpoint calls
|
|
||||||
✅ Complete error response parsing
|
|
||||||
✅ Retry mechanism working normally
|
|
||||||
✅ Chinese error message handling correct
|
|
||||||
```
|
|
||||||
|
|
||||||
### Code Quality Validation
|
|
||||||
```
|
|
||||||
✅ No syntax errors
|
|
||||||
✅ Type annotations correct
|
|
||||||
✅ Import dependencies normal
|
|
||||||
✅ Configuration validation working
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🚀 PR Submission Strategy
|
|
||||||
|
|
||||||
### 🏢 Business Necessity
|
|
||||||
**Real commercial customers are waiting for the Dify + Clickzetta integration solution for trial validation**, making this PR business-critical with time-sensitive requirements.
|
|
||||||
|
|
||||||
### Recommended Approach: Production-Ready Submission
|
|
||||||
|
|
||||||
#### Advantages
|
|
||||||
1. **Technical Completeness**: Code architecture and integration fully correct
|
|
||||||
2. **Quality Assurance**: Error handling and retry mechanisms robust
|
|
||||||
3. **Good Compatibility**: Fully backward compatible, no breaking changes
|
|
||||||
4. **Community Value**: Provides solution for users needing Clickzetta integration
|
|
||||||
5. **Test Validation**: Real environment 100% test pass
|
|
||||||
6. **Business Value**: Meets urgent customer needs
|
|
||||||
|
|
||||||
#### PR Description Strategy
|
|
||||||
1. **Highlight Completeness**: Emphasize technical implementation and testing completeness
|
|
||||||
2. **Test Evidence**: Provide detailed real environment test results
|
|
||||||
3. **Performance Data**: Include real performance benchmark test results
|
|
||||||
4. **User Guidance**: Provide clear configuration and usage guidelines
|
|
||||||
|
|
||||||
### PR Title Suggestion
|
|
||||||
```
|
|
||||||
feat: Add Clickzetta Lakehouse vector database integration
|
|
||||||
```
|
|
||||||
|
|
||||||
### PR Label Suggestions
|
|
||||||
```
|
|
||||||
- enhancement
|
|
||||||
- vector-database
|
|
||||||
- production-ready
|
|
||||||
- tested
|
|
||||||
```
|
|
||||||
|
|
||||||
## 📝 PR Description Template
|
|
||||||
|
|
||||||
````markdown
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
This PR adds support for Clickzetta Lakehouse as a vector database option in Dify, enabling users to leverage Clickzetta's high-performance vector storage and HNSW indexing capabilities for RAG applications.
|
|
||||||
|
|
||||||
## 🏢 Business Impact
|
|
||||||
|
|
||||||
**Real commercial customers are waiting for the Dify + Clickzetta integration solution for trial validation**, making this PR business-critical with time-sensitive requirements.
|
|
||||||
|
|
||||||
## ✅ Status: Production Ready
|
|
||||||
|
|
||||||
This integration is technically complete and has passed comprehensive testing in real Clickzetta environments with 100% test success rate.
|
|
||||||
|
|
||||||
## Features
|
|
||||||
|
|
||||||
- **Vector Storage**: Complete integration with Clickzetta's vector database capabilities
|
|
||||||
- **HNSW Indexing**: Automatic creation and management of HNSW indexes for efficient similarity search
|
|
||||||
- **Full-text Search**: Support for inverted indexes and Chinese text search functionality
|
|
||||||
- **Concurrent Safety**: Write queue mechanism to handle Clickzetta's primary key table limitations
|
|
||||||
- **Batch Operations**: Optimized batch insert/update operations for improved performance
|
|
||||||
- **Standard Interface**: Full implementation of Dify's BaseVector interface
|
|
||||||
|
|
||||||
## Technical Implementation
|
|
||||||
|
|
||||||
### Core Components
|
|
||||||
- `ClickzettaVector` class implementing BaseVector interface
|
|
||||||
- Write queue serialization for concurrent write operations
|
|
||||||
- Comprehensive error handling and connection management
|
|
||||||
- Support for both vector similarity and keyword search
|
|
||||||
|
|
||||||
### Key Innovation: Write Queue Mechanism
|
|
||||||
Clickzetta primary key tables support `parallelism=1` for writes. Our implementation includes a write queue that serializes all write operations while maintaining the existing API interface.
|
|
||||||
|
|
||||||
## Configuration
|
|
||||||
|
|
||||||
```bash
|
|
||||||
VECTOR_STORE=clickzetta
|
|
||||||
CLICKZETTA_USERNAME=your_username
|
|
||||||
CLICKZETTA_PASSWORD=your_password
|
|
||||||
CLICKZETTA_INSTANCE=your_instance
|
|
||||||
CLICKZETTA_SERVICE=uat-api.clickzetta.com
|
|
||||||
CLICKZETTA_WORKSPACE=your_workspace
|
|
||||||
CLICKZETTA_VCLUSTER=default_ap
|
|
||||||
CLICKZETTA_SCHEMA=dify
|
|
||||||
```
|
|
||||||
|
|
||||||
## Testing Status
|
|
||||||
|
|
||||||
### ✅ Comprehensive Real Environment Testing Complete
|
|
||||||
- **Connection Testing**: Successfully connected to Clickzetta UAT environment
|
|
||||||
- **Data Operations**: Table creation, data insertion (5 records), and retrieval verified
|
|
||||||
- **Vector Operations**: HNSW index creation and vector similarity search (170ms latency)
|
|
||||||
- **Concurrent Safety**: Multi-threaded write operations with 3 concurrent threads
|
|
||||||
- **Performance Benchmarks**: 5.3 docs/sec insertion rate, sub-200ms search latency
|
|
||||||
- **Error Handling**: Retry mechanism and exception handling validated
|
|
||||||
- **Overall Success Rate**: 100% (3/3 test suites passed)
|
|
||||||
|
|
||||||
## Test Evidence
|
|
||||||
|
|
||||||
```
|
|
||||||
🚀 Clickzetta Independent Test Started
|
|
||||||
✅ Connection Successful
|
|
||||||
|
|
||||||
🧪 Testing Table Operations...
|
|
||||||
✅ Table Created Successfully: test_vectors_1752736608
|
|
||||||
✅ Data Insertion Successful: 5 records, took 0.529 seconds
|
|
||||||
✅ Data Query Successful: 5 records in table
|
|
||||||
|
|
||||||
🧪 Testing Vector Operations...
|
|
||||||
✅ Vector Index Created Successfully
|
|
||||||
✅ Vector Search Successful: returned 3 results, took 170ms
|
|
||||||
|
|
||||||
🧪 Testing Concurrent Writes...
|
|
||||||
✅ Concurrent Write Test Complete:
|
|
||||||
- Total time: 3.79 seconds
|
|
||||||
- Successful threads: 3/3
|
|
||||||
- Total documents: 20
|
|
||||||
- Overall rate: 5.3 docs/sec
|
|
||||||
|
|
||||||
📊 Test Report:
|
|
||||||
- table_operations: ✅ Passed
|
|
||||||
- vector_operations: ✅ Passed
|
|
||||||
- concurrent_writes: ✅ Passed
|
|
||||||
|
|
||||||
🎯 Overall Result: 3/3 Passed (100.0%)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Dependencies
|
|
||||||
|
|
||||||
- Added `clickzetta-connector-python>=0.8.102` to support latest urllib3 versions
|
|
||||||
- Resolved dependency conflicts with existing Dify requirements
|
|
||||||
|
|
||||||
## Files Changed
|
|
||||||
|
|
||||||
- `api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py` - Main implementation
|
|
||||||
- `api/core/rag/datasource/vdb/vector_factory.py` - Factory registration
|
|
||||||
- `api/pyproject.toml` - Added dependency
|
|
||||||
- `docker/.env.example` - Added configuration examples
|
|
||||||
|
|
||||||
## Backward Compatibility
|
|
||||||
|
|
||||||
This change is fully backward compatible. Existing vector database configurations remain unchanged, and Clickzetta is added as an additional option.
|
|
||||||
|
|
||||||
## Request for Community Testing
|
|
||||||
|
|
||||||
We're seeking users with Clickzetta environments to help validate:
|
|
||||||
1. Real-world performance characteristics
|
|
||||||
2. Edge case handling
|
|
||||||
3. Production workload testing
|
|
||||||
4. Configuration optimization
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
1. Immediate PR submission for customer trial requirements
|
|
||||||
2. Community adoption and feedback collection
|
|
||||||
3. Performance optimization based on production usage
|
|
||||||
4. Additional feature enhancements based on user requests
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Technical Quality**: Production ready ✅
|
|
||||||
**Testing Status**: Comprehensive real environment validation complete ✅
|
|
||||||
**Business Impact**: Critical for waiting commercial customers ⚡
|
|
||||||
**Community Impact**: Enables Clickzetta Lakehouse integration for Dify users
|
|
||||||
````
|
|
||||||
|
|
||||||
## 🎯 Conclusion
|
|
||||||
|
|
||||||
The Clickzetta vector database integration has completed comprehensive validation and meets production-ready standards:
|
|
||||||
|
|
||||||
1. **Architecture Correct**: Fully compliant with Dify specifications
|
|
||||||
2. **Implementation Complete**: All required functions implemented and tested
|
|
||||||
3. **Quality Good**: Error handling and edge cases considered
|
|
||||||
4. **Integration Stable**: Real environment 100% test pass
|
|
||||||
5. **Performance Validated**: Vector search 170ms, concurrent writes 5.3 docs/sec
|
|
||||||
|
|
||||||
**Recommendation**: Submit as production-ready feature PR with complete test evidence and performance data, providing reliable vector database choice for Clickzetta users.
|
|
||||||
@ -1,188 +0,0 @@
|
|||||||
# Dify with ClickZetta Lakehouse Integration
|
|
||||||
|
|
||||||
This is a pre-release version of Dify with ClickZetta Lakehouse vector database integration, available while the official PR is under review.
|
|
||||||
|
|
||||||
## 🚀 Quick Start
|
|
||||||
|
|
||||||
### Prerequisites
|
|
||||||
- Docker and Docker Compose installed
|
|
||||||
- ClickZetta Lakehouse account and credentials
|
|
||||||
- At least 4GB RAM available for Docker
|
|
||||||
|
|
||||||
### 1. Download Configuration Files
|
|
||||||
```bash
|
|
||||||
# Download the docker-compose file
|
|
||||||
curl -O https://raw.githubusercontent.com/yunqiqiliang/dify/feature/clickzetta-vector-db/clickzetta/docker-compose.clickzetta.yml
|
|
||||||
|
|
||||||
# Download environment template
|
|
||||||
curl -O https://raw.githubusercontent.com/yunqiqiliang/dify/feature/clickzetta-vector-db/clickzetta/.env.clickzetta.example
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Configure Environment
|
|
||||||
```bash
|
|
||||||
# Copy environment template
|
|
||||||
cp .env.clickzetta.example .env
|
|
||||||
|
|
||||||
# Edit with your ClickZetta credentials
|
|
||||||
nano .env
|
|
||||||
```
|
|
||||||
|
|
||||||
**Required ClickZetta Settings:**
|
|
||||||
```bash
|
|
||||||
CLICKZETTA_USERNAME=your_username
|
|
||||||
CLICKZETTA_PASSWORD=your_password
|
|
||||||
CLICKZETTA_INSTANCE=your_instance
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Launch Dify
|
|
||||||
```bash
|
|
||||||
# Create required directories
|
|
||||||
mkdir -p volumes/app/storage volumes/db/data volumes/redis/data
|
|
||||||
|
|
||||||
# Start all services
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml up -d
|
|
||||||
|
|
||||||
# Check status
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml ps
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Access Dify
|
|
||||||
- Open http://localhost in your browser
|
|
||||||
- Complete the setup wizard
|
|
||||||
- In dataset settings, select "ClickZetta" as vector database
|
|
||||||
|
|
||||||
## 🎯 ClickZetta Features
|
|
||||||
|
|
||||||
### Supported Operations
|
|
||||||
- ✅ **Vector Search** - Semantic similarity search using HNSW index
|
|
||||||
- ✅ **Full-text Search** - Text search with Chinese/English analyzers
|
|
||||||
- ✅ **Hybrid Search** - Combined vector + full-text search
|
|
||||||
- ✅ **Metadata Filtering** - Filter by document attributes
|
|
||||||
- ✅ **Batch Processing** - Efficient bulk document ingestion
|
|
||||||
|
|
||||||
### Performance Features
|
|
||||||
- **Auto-scaling** - Lakehouse architecture scales with your data
|
|
||||||
- **Inverted Index** - Fast full-text search with configurable analyzers
|
|
||||||
- **Parameterized Queries** - Secure and optimized SQL execution
|
|
||||||
- **Batch Optimization** - Configurable batch sizes for optimal performance
|
|
||||||
|
|
||||||
### Configuration Options
|
|
||||||
```bash
|
|
||||||
# Performance tuning
|
|
||||||
CLICKZETTA_BATCH_SIZE=20 # Documents per batch
|
|
||||||
CLICKZETTA_VECTOR_DISTANCE_FUNCTION=cosine_distance # or l2_distance
|
|
||||||
|
|
||||||
# Full-text search
|
|
||||||
CLICKZETTA_ENABLE_INVERTED_INDEX=true # Enable text search
|
|
||||||
CLICKZETTA_ANALYZER_TYPE=chinese # chinese, english, unicode, keyword
|
|
||||||
CLICKZETTA_ANALYZER_MODE=smart # smart, max_word
|
|
||||||
|
|
||||||
# Database settings
|
|
||||||
CLICKZETTA_SCHEMA=dify # Database schema name
|
|
||||||
CLICKZETTA_WORKSPACE=quick_start # ClickZetta workspace
|
|
||||||
CLICKZETTA_VCLUSTER=default_ap # Virtual cluster name
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🔧 Troubleshooting
|
|
||||||
|
|
||||||
### Common Issues
|
|
||||||
|
|
||||||
**Connection Failed:**
|
|
||||||
```bash
|
|
||||||
# Check ClickZetta credentials
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml logs api | grep clickzetta
|
|
||||||
|
|
||||||
# Verify network connectivity
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml exec api ping api.clickzetta.com
|
|
||||||
```
|
|
||||||
|
|
||||||
**Performance Issues:**
|
|
||||||
```bash
|
|
||||||
# Adjust batch size for your instance
|
|
||||||
CLICKZETTA_BATCH_SIZE=10 # Reduce for smaller instances
|
|
||||||
CLICKZETTA_BATCH_SIZE=50 # Increase for larger instances
|
|
||||||
```
|
|
||||||
|
|
||||||
**Search Not Working:**
|
|
||||||
```bash
|
|
||||||
# Check index creation
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml logs api | grep "Created.*index"
|
|
||||||
|
|
||||||
# Verify table structure
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml logs api | grep "Created table"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Get Logs
|
|
||||||
```bash
|
|
||||||
# All services
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml logs
|
|
||||||
|
|
||||||
# Specific service
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml logs api
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml logs worker
|
|
||||||
```
|
|
||||||
|
|
||||||
### Clean Installation
|
|
||||||
```bash
|
|
||||||
# Stop and remove containers
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml down -v
|
|
||||||
|
|
||||||
# Remove data (WARNING: This deletes all data)
|
|
||||||
sudo rm -rf volumes/
|
|
||||||
|
|
||||||
# Start fresh
|
|
||||||
mkdir -p volumes/app/storage volumes/db/data volumes/redis/data
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
## 📚 Documentation
|
|
||||||
|
|
||||||
- [ClickZetta Lakehouse](https://docs.clickzetta.com/) - Official ClickZetta documentation
|
|
||||||
- [Dify Documentation](https://docs.dify.ai/) - Official Dify documentation
|
|
||||||
- [Integration Guide](./INSTALLATION_GUIDE.md) - Detailed setup instructions
|
|
||||||
|
|
||||||
## 🐛 Issues & Support
|
|
||||||
|
|
||||||
This is a preview version. If you encounter issues:
|
|
||||||
|
|
||||||
1. Check the troubleshooting section above
|
|
||||||
2. Review logs for error messages
|
|
||||||
3. Open an issue on the [GitHub repository](https://github.com/yunqiqiliang/dify/issues)
|
|
||||||
|
|
||||||
## 🔄 Updates
|
|
||||||
|
|
||||||
**Available Image Tags:**
|
|
||||||
- `v1.6.0` - Stable release (recommended)
|
|
||||||
- `latest` - Latest build
|
|
||||||
- `clickzetta-integration` - Development version
|
|
||||||
|
|
||||||
To update to the latest version:
|
|
||||||
```bash
|
|
||||||
# Pull latest images
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml pull
|
|
||||||
|
|
||||||
# Restart services
|
|
||||||
docker-compose -f docker-compose.clickzetta.yml up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
To use a specific version, edit `docker-compose.clickzetta.yml`:
|
|
||||||
```yaml
|
|
||||||
services:
|
|
||||||
api:
|
|
||||||
image: czqiliang/dify-clickzetta-api:v1.6.0 # or latest
|
|
||||||
worker:
|
|
||||||
image: czqiliang/dify-clickzetta-api:v1.6.0 # or latest
|
|
||||||
web:
|
|
||||||
image: langgenius/dify-web:1.6.0 # official Dify web image
|
|
||||||
```
|
|
||||||
|
|
||||||
## ⚠️ Production Use
|
|
||||||
|
|
||||||
This is a preview build for testing purposes. For production deployment:
|
|
||||||
- Wait for the official PR to be merged
|
|
||||||
- Use official Dify releases
|
|
||||||
- Follow Dify's production deployment guidelines
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Built with ❤️ for the Dify community**
|
|
||||||
@ -1,75 +0,0 @@
|
|||||||
# Clickzetta Vector Database Integration for Dify
|
|
||||||
|
|
||||||
This directory contains the implementation and testing materials for integrating Clickzetta Lakehouse as a vector database option in Dify.
|
|
||||||
|
|
||||||
## Files Overview
|
|
||||||
|
|
||||||
### Core Implementation
|
|
||||||
- **Location**: `api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py`
|
|
||||||
- **Factory Registration**: `api/core/rag/datasource/vdb/vector_factory.py`
|
|
||||||
- **Dependencies**: Added to `api/pyproject.toml`
|
|
||||||
|
|
||||||
### Testing and Documentation
|
|
||||||
- `standalone_clickzetta_test.py` - Independent Clickzetta connector tests (no Dify dependencies)
|
|
||||||
- `test_clickzetta_integration.py` - Comprehensive integration test suite with Dify framework
|
|
||||||
- `TESTING_GUIDE.md` - Testing instructions and methodology
|
|
||||||
- `PR_SUMMARY.md` - Complete PR preparation summary
|
|
||||||
- `DIFY_CLICKZETTA_VECTOR_DB_GUIDE.md` - **NEW**: Complete user guide for configuring Clickzetta in Dify
|
|
||||||
|
|
||||||
## Quick Start
|
|
||||||
|
|
||||||
### 1. Configuration
|
|
||||||
Add to your `.env` file:
|
|
||||||
```bash
|
|
||||||
VECTOR_STORE=clickzetta
|
|
||||||
CLICKZETTA_USERNAME=your_username
|
|
||||||
CLICKZETTA_PASSWORD=your_password
|
|
||||||
CLICKZETTA_INSTANCE=your_instance
|
|
||||||
CLICKZETTA_SERVICE=api.clickzetta.com
|
|
||||||
CLICKZETTA_WORKSPACE=your_workspace
|
|
||||||
CLICKZETTA_VCLUSTER=default_ap
|
|
||||||
CLICKZETTA_SCHEMA=dify
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Testing
|
|
||||||
```bash
|
|
||||||
# Run standalone tests (recommended first)
|
|
||||||
python standalone_clickzetta_test.py
|
|
||||||
|
|
||||||
# Run full integration tests
|
|
||||||
python test_clickzetta_integration.py
|
|
||||||
|
|
||||||
# See detailed testing guide
|
|
||||||
cat TESTING_GUIDE.md
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. User Guide
|
|
||||||
For detailed configuration and usage instructions, see `DIFY_CLICKZETTA_VECTOR_DB_GUIDE.md`.
|
|
||||||
|
|
||||||
### 4. PR Status
|
|
||||||
See `PR_SUMMARY.md` for complete PR preparation status and submission strategy.
|
|
||||||
|
|
||||||
## Technical Highlights
|
|
||||||
|
|
||||||
- ✅ **Full BaseVector Interface**: Complete implementation of Dify's vector database interface
|
|
||||||
- ✅ **Write Queue Mechanism**: Innovative solution for Clickzetta's concurrent write limitations
|
|
||||||
- ✅ **HNSW Vector Indexing**: Automatic creation and management of high-performance vector indexes
|
|
||||||
- ✅ **Full-text Search**: Inverted index support with Chinese text analysis
|
|
||||||
- ✅ **Error Recovery**: Robust error handling with retry mechanisms
|
|
||||||
- ✅ **Docker Ready**: Full compatibility with Dify's containerized environment
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
The integration follows Dify's standard vector database pattern:
|
|
||||||
1. `ClickzettaVector` class implements `BaseVector` interface
|
|
||||||
2. `ClickzettaVectorFactory` handles instance creation
|
|
||||||
3. Configuration through Dify's standard config system
|
|
||||||
4. Write operations serialized through queue mechanism for thread safety
|
|
||||||
|
|
||||||
## Status
|
|
||||||
|
|
||||||
**Technical Implementation**: ✅ Complete
|
|
||||||
**Testing Status**: ✅ Comprehensive real environment validation complete (100% pass rate)
|
|
||||||
**PR Readiness**: ✅ Ready for submission as production-ready feature
|
|
||||||
|
|
||||||
The integration is technically complete, fully tested in real Clickzetta environments, and ready for production use.
|
|
||||||
@ -1,221 +0,0 @@
|
|||||||
# Clickzetta Vector Database Testing Guide
|
|
||||||
|
|
||||||
## Testing Overview
|
|
||||||
|
|
||||||
This document provides detailed testing guidelines for the Clickzetta vector database integration, including test cases, execution steps, and expected results.
|
|
||||||
|
|
||||||
## Test Environment Setup
|
|
||||||
|
|
||||||
### 1. Environment Variable Configuration
|
|
||||||
|
|
||||||
Ensure the following environment variables are set:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export CLICKZETTA_USERNAME=your_username
|
|
||||||
export CLICKZETTA_PASSWORD=your_password
|
|
||||||
export CLICKZETTA_INSTANCE=your_instance
|
|
||||||
export CLICKZETTA_SERVICE=uat-api.clickzetta.com
|
|
||||||
export CLICKZETTA_WORKSPACE=your_workspace
|
|
||||||
export CLICKZETTA_VCLUSTER=default_ap
|
|
||||||
export CLICKZETTA_SCHEMA=dify
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Dependency Installation
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pip install clickzetta-connector-python>=0.8.102
|
|
||||||
pip install numpy
|
|
||||||
```
|
|
||||||
|
|
||||||
## Test Suite
|
|
||||||
|
|
||||||
### 1. Standalone Testing (standalone_clickzetta_test.py)
|
|
||||||
|
|
||||||
**Purpose**: Verify Clickzetta basic connection and core functionality
|
|
||||||
|
|
||||||
**Test Cases**:
|
|
||||||
- ✅ Database connection test
|
|
||||||
- ✅ Table creation and data insertion
|
|
||||||
- ✅ Vector index creation
|
|
||||||
- ✅ Vector similarity search
|
|
||||||
- ✅ Concurrent write safety
|
|
||||||
|
|
||||||
**Execution Command**:
|
|
||||||
```bash
|
|
||||||
python standalone_clickzetta_test.py
|
|
||||||
```
|
|
||||||
|
|
||||||
**Expected Results**:
|
|
||||||
```
|
|
||||||
🚀 Clickzetta Independent Test Started
|
|
||||||
✅ Connection Successful
|
|
||||||
|
|
||||||
🧪 Testing Table Operations...
|
|
||||||
✅ Table Created Successfully: test_vectors_1752736608
|
|
||||||
✅ Data Insertion Successful: 5 records, took 0.529 seconds
|
|
||||||
✅ Data Query Successful: 5 records in table
|
|
||||||
|
|
||||||
🧪 Testing Vector Operations...
|
|
||||||
✅ Vector Index Created Successfully
|
|
||||||
✅ Vector Search Successful: returned 3 results, took 170ms
|
|
||||||
Result 1: distance=0.2507, document=doc_3
|
|
||||||
Result 2: distance=0.2550, document=doc_4
|
|
||||||
Result 3: distance=0.2604, document=doc_2
|
|
||||||
|
|
||||||
🧪 Testing Concurrent Writes...
|
|
||||||
Started 3 concurrent worker threads...
|
|
||||||
✅ Concurrent Write Test Complete:
|
|
||||||
- Total time: 3.79 seconds
|
|
||||||
- Successful threads: 3/3
|
|
||||||
- Total documents: 20
|
|
||||||
- Overall rate: 5.3 docs/sec
|
|
||||||
- Thread 1: 8 documents, 2.5 docs/sec
|
|
||||||
- Thread 2: 6 documents, 1.7 docs/sec
|
|
||||||
- Thread 0: 6 documents, 1.7 docs/sec
|
|
||||||
|
|
||||||
📊 Test Report:
|
|
||||||
- table_operations: ✅ Passed
|
|
||||||
- vector_operations: ✅ Passed
|
|
||||||
- concurrent_writes: ✅ Passed
|
|
||||||
|
|
||||||
🎯 Overall Result: 3/3 Passed (100.0%)
|
|
||||||
🎉 Test overall success! Clickzetta integration ready.
|
|
||||||
✅ Cleanup Complete
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Integration Testing (test_clickzetta_integration.py)
|
|
||||||
|
|
||||||
**Purpose**: Comprehensive testing of functionality in Dify integration environment
|
|
||||||
|
|
||||||
**Test Cases**:
|
|
||||||
- ✅ Basic operations testing (CRUD)
|
|
||||||
- ✅ Concurrent operation safety
|
|
||||||
- ✅ Performance benchmarking
|
|
||||||
- ✅ Error handling testing
|
|
||||||
- ✅ Full-text search testing
|
|
||||||
|
|
||||||
**Execution Command** (requires Dify API environment):
|
|
||||||
```bash
|
|
||||||
cd /path/to/dify/api
|
|
||||||
python ../test_clickzetta_integration.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Docker Environment Testing
|
|
||||||
|
|
||||||
**Execution Steps**:
|
|
||||||
|
|
||||||
1. Build local image:
|
|
||||||
```bash
|
|
||||||
docker build -f api/Dockerfile -t dify-api-clickzetta:local api/
|
|
||||||
```
|
|
||||||
|
|
||||||
2. Update docker-compose.yaml to use local image:
|
|
||||||
```yaml
|
|
||||||
api:
|
|
||||||
image: dify-api-clickzetta:local
|
|
||||||
worker:
|
|
||||||
image: dify-api-clickzetta:local
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Start services and test:
|
|
||||||
```bash
|
|
||||||
docker-compose up -d
|
|
||||||
# Create knowledge base in Web UI and select Clickzetta as vector database
|
|
||||||
```
|
|
||||||
|
|
||||||
## Performance Benchmarks
|
|
||||||
|
|
||||||
### Single-threaded Performance
|
|
||||||
|
|
||||||
| Operation Type | Document Count | Average Time | Throughput |
|
|
||||||
|---------------|----------------|--------------|------------|
|
|
||||||
| Batch Insert | 10 | 0.5s | 20 docs/sec |
|
|
||||||
| Batch Insert | 50 | 2.1s | 24 docs/sec |
|
|
||||||
| Batch Insert | 100 | 4.3s | 23 docs/sec |
|
|
||||||
| Vector Search | - | 170ms | - |
|
|
||||||
| Text Search | - | 38ms | - |
|
|
||||||
|
|
||||||
### Concurrent Performance
|
|
||||||
|
|
||||||
| Thread Count | Docs per Thread | Total Time | Success Rate | Overall Throughput |
|
|
||||||
|-------------|----------------|------------|-------------|------------------|
|
|
||||||
| 2 | 15 | 1.8s | 100% | 16.7 docs/sec |
|
|
||||||
| 3 | 15 | 3.79s | 100% | 5.3 docs/sec |
|
|
||||||
| 4 | 15 | 1.5s | 75% | 40.0 docs/sec |
|
|
||||||
|
|
||||||
## Test Evidence Collection
|
|
||||||
|
|
||||||
### 1. Functional Validation Evidence
|
|
||||||
|
|
||||||
- [x] Successfully created vector tables and indexes
|
|
||||||
- [x] Correctly handles 1536-dimensional vector data
|
|
||||||
- [x] HNSW index automatically created and used
|
|
||||||
- [x] Inverted index supports full-text search
|
|
||||||
- [x] Batch operation performance optimization
|
|
||||||
|
|
||||||
### 2. Concurrent Safety Evidence
|
|
||||||
|
|
||||||
- [x] Write queue mechanism prevents concurrent conflicts
|
|
||||||
- [x] Thread-safe connection management
|
|
||||||
- [x] No data races during concurrent writes
|
|
||||||
- [x] Error recovery and retry mechanism
|
|
||||||
|
|
||||||
### 3. Performance Testing Evidence
|
|
||||||
|
|
||||||
- [x] Insertion performance: 5.3-24 docs/sec
|
|
||||||
- [x] Search latency: <200ms
|
|
||||||
- [x] Concurrent processing: supports multi-threaded writes
|
|
||||||
- [x] Memory usage: reasonable resource consumption
|
|
||||||
|
|
||||||
### 4. Compatibility Evidence
|
|
||||||
|
|
||||||
- [x] Complies with Dify BaseVector interface
|
|
||||||
- [x] Coexists with existing vector databases
|
|
||||||
- [x] Runs normally in Docker environment
|
|
||||||
- [x] Dependency version compatibility
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### Common Issues
|
|
||||||
|
|
||||||
1. **Connection Failure**
|
|
||||||
- Check environment variable settings
|
|
||||||
- Verify network connection to Clickzetta service
|
|
||||||
- Confirm user permissions and instance status
|
|
||||||
|
|
||||||
2. **Concurrent Conflicts**
|
|
||||||
- Ensure write queue mechanism is working properly
|
|
||||||
- Check if old connections are not properly closed
|
|
||||||
- Verify thread pool configuration
|
|
||||||
|
|
||||||
3. **Performance Issues**
|
|
||||||
- Check if vector indexes are created correctly
|
|
||||||
- Verify batch operation batch size
|
|
||||||
- Monitor network latency and database load
|
|
||||||
|
|
||||||
### Debug Commands
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check Clickzetta connection
|
|
||||||
python -c "from clickzetta.connector import connect; print('Connection OK')"
|
|
||||||
|
|
||||||
# Verify environment variables
|
|
||||||
env | grep CLICKZETTA
|
|
||||||
|
|
||||||
# Test basic functionality
|
|
||||||
python standalone_clickzetta_test.py
|
|
||||||
```
|
|
||||||
|
|
||||||
## Test Conclusion
|
|
||||||
|
|
||||||
The Clickzetta vector database integration has passed the following validations:
|
|
||||||
|
|
||||||
1. **Functional Completeness**: All BaseVector interface methods correctly implemented
|
|
||||||
2. **Concurrent Safety**: Write queue mechanism ensures concurrent write safety
|
|
||||||
3. **Performance**: Meets production environment performance requirements
|
|
||||||
4. **Stability**: Error handling and recovery mechanisms are robust
|
|
||||||
5. **Compatibility**: Fully compatible with Dify framework
|
|
||||||
|
|
||||||
Test Pass Rate: **100%** (Standalone Testing) / **95%+** (Full Dify environment integration testing)
|
|
||||||
|
|
||||||
Suitable for PR submission to langgenius/dify main repository.
|
|
||||||
@ -1,116 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
# Build and push multi-architecture Docker images for ClickZetta Dify integration
|
|
||||||
# This provides temporary access to users before the PR is merged
|
|
||||||
|
|
||||||
set -e
|
|
||||||
|
|
||||||
# Configuration
|
|
||||||
DOCKER_HUB_USERNAME="czqiliang"
|
|
||||||
IMAGE_NAME="dify-clickzetta"
|
|
||||||
TAG="latest"
|
|
||||||
VERSION_TAG="v1.6.0"
|
|
||||||
PLATFORMS="linux/amd64,linux/arm64"
|
|
||||||
|
|
||||||
# Colors for output
|
|
||||||
RED='\033[0;31m'
|
|
||||||
GREEN='\033[0;32m'
|
|
||||||
YELLOW='\033[1;33m'
|
|
||||||
BLUE='\033[0;34m'
|
|
||||||
NC='\033[0m' # No Color
|
|
||||||
|
|
||||||
echo -e "${BLUE}=== ClickZetta Dify Multi-Architecture Build Script ===${NC}"
|
|
||||||
echo -e "${YELLOW}Building and pushing images for: ${PLATFORMS}${NC}"
|
|
||||||
echo -e "${YELLOW}Target repository: ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}:${TAG}${NC}"
|
|
||||||
echo
|
|
||||||
|
|
||||||
# Check if Docker is running
|
|
||||||
if ! docker info >/dev/null 2>&1; then
|
|
||||||
echo -e "${RED}Error: Docker is not running. Please start Docker first.${NC}"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Check if buildx is available
|
|
||||||
if ! docker buildx version >/dev/null 2>&1; then
|
|
||||||
echo -e "${RED}Error: Docker buildx is not available. Please ensure Docker Desktop is updated.${NC}"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Login to Docker Hub
|
|
||||||
echo -e "${BLUE}Step 1: Docker Hub Login${NC}"
|
|
||||||
if ! docker login; then
|
|
||||||
echo -e "${RED}Error: Failed to login to Docker Hub${NC}"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
echo -e "${GREEN}✓ Successfully logged in to Docker Hub${NC}"
|
|
||||||
echo
|
|
||||||
|
|
||||||
# Create and use buildx builder
|
|
||||||
echo -e "${BLUE}Step 2: Setting up buildx builder${NC}"
|
|
||||||
BUILDER_NAME="dify-clickzetta-builder"
|
|
||||||
|
|
||||||
# Remove existing builder if it exists
|
|
||||||
docker buildx rm $BUILDER_NAME 2>/dev/null || true
|
|
||||||
|
|
||||||
# Create new builder
|
|
||||||
docker buildx create --name $BUILDER_NAME --platform $PLATFORMS --use
|
|
||||||
docker buildx inspect --bootstrap
|
|
||||||
|
|
||||||
echo -e "${GREEN}✓ Buildx builder configured for platforms: ${PLATFORMS}${NC}"
|
|
||||||
echo
|
|
||||||
|
|
||||||
# Build and push API image
|
|
||||||
echo -e "${BLUE}Step 3: Building and pushing API image${NC}"
|
|
||||||
cd ../docker
|
|
||||||
docker buildx build \
|
|
||||||
--platform $PLATFORMS \
|
|
||||||
--file api.Dockerfile \
|
|
||||||
--tag ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:${TAG} \
|
|
||||||
--tag ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:${VERSION_TAG} \
|
|
||||||
--tag ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:clickzetta-integration \
|
|
||||||
--push \
|
|
||||||
..
|
|
||||||
|
|
||||||
echo -e "${GREEN}✓ API image built and pushed successfully${NC}"
|
|
||||||
echo
|
|
||||||
|
|
||||||
# Web service uses official Dify image (no ClickZetta-specific changes needed)
|
|
||||||
echo -e "${BLUE}Step 4: Web service uses official langgenius/dify-web image${NC}"
|
|
||||||
echo -e "${GREEN}✓ Web service configuration completed${NC}"
|
|
||||||
echo
|
|
||||||
|
|
||||||
# User files are already created in clickzetta/ directory
|
|
||||||
echo -e "${BLUE}Step 5: User files already prepared in clickzetta/ directory${NC}"
|
|
||||||
cd ../clickzetta
|
|
||||||
|
|
||||||
echo -e "${GREEN}✓ User files available in clickzetta/ directory${NC}"
|
|
||||||
echo
|
|
||||||
|
|
||||||
# Cleanup buildx builder
|
|
||||||
echo -e "${BLUE}Step 6: Cleaning up builder${NC}"
|
|
||||||
docker buildx rm $BUILDER_NAME
|
|
||||||
echo -e "${GREEN}✓ Builder cleaned up${NC}"
|
|
||||||
echo
|
|
||||||
|
|
||||||
# Display final information
|
|
||||||
echo -e "${GREEN}=== Build Complete! ===${NC}"
|
|
||||||
echo -e "${YELLOW}ClickZetta API images pushed to Docker Hub:${NC}"
|
|
||||||
echo -e " • ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:${TAG}"
|
|
||||||
echo -e " • ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:${VERSION_TAG}"
|
|
||||||
echo -e " • ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:clickzetta-integration"
|
|
||||||
echo
|
|
||||||
echo -e "${YELLOW}Web service uses official Dify image:${NC}"
|
|
||||||
echo -e " • langgenius/dify-web:1.6.0 (no ClickZetta changes needed)"
|
|
||||||
echo
|
|
||||||
echo -e "${YELLOW}User files created:${NC}"
|
|
||||||
echo -e " • docker-compose.clickzetta.yml - Ready-to-use compose file"
|
|
||||||
echo -e " • .env.clickzetta.example - Environment template"
|
|
||||||
echo -e " • README.clickzetta.md - User documentation"
|
|
||||||
echo
|
|
||||||
echo -e "${BLUE}Next steps:${NC}"
|
|
||||||
echo -e "1. Test the images locally"
|
|
||||||
echo -e "2. Update README with Docker Hub links"
|
|
||||||
echo -e "3. Share with community for testing"
|
|
||||||
echo -e "4. Monitor for feedback and issues"
|
|
||||||
echo
|
|
||||||
echo -e "${GREEN}🎉 Multi-architecture images are now available for the community!${NC}"
|
|
||||||
@ -1,185 +0,0 @@
|
|||||||
version: '3.8'
|
|
||||||
|
|
||||||
services:
|
|
||||||
# API service with ClickZetta integration
|
|
||||||
api:
|
|
||||||
image: czqiliang/dify-clickzetta-api:v1.6.0
|
|
||||||
restart: always
|
|
||||||
environment:
|
|
||||||
# Core settings
|
|
||||||
- MODE=api
|
|
||||||
- LOG_LEVEL=INFO
|
|
||||||
- SECRET_KEY=${SECRET_KEY:-dify}
|
|
||||||
- CONSOLE_WEB_URL=${CONSOLE_WEB_URL:-}
|
|
||||||
- INIT_PASSWORD=${INIT_PASSWORD:-}
|
|
||||||
- CONSOLE_API_URL=${CONSOLE_API_URL:-}
|
|
||||||
- SERVICE_API_URL=${SERVICE_API_URL:-}
|
|
||||||
|
|
||||||
# Database settings
|
|
||||||
- DB_USERNAME=${DB_USERNAME:-postgres}
|
|
||||||
- DB_PASSWORD=${DB_PASSWORD:-difyai123456}
|
|
||||||
- DB_HOST=${DB_HOST:-db}
|
|
||||||
- DB_PORT=${DB_PORT:-5432}
|
|
||||||
- DB_DATABASE=${DB_DATABASE:-dify}
|
|
||||||
|
|
||||||
# Redis settings
|
|
||||||
- REDIS_HOST=${REDIS_HOST:-redis}
|
|
||||||
- REDIS_PORT=${REDIS_PORT:-6379}
|
|
||||||
- REDIS_PASSWORD=${REDIS_PASSWORD:-difyai123456}
|
|
||||||
- REDIS_DB=${REDIS_DB:-0}
|
|
||||||
|
|
||||||
# Celery settings
|
|
||||||
- CELERY_BROKER_URL=${CELERY_BROKER_URL:-redis://:difyai123456@redis:6379/1}
|
|
||||||
- BROKER_USE_SSL=${BROKER_USE_SSL:-false}
|
|
||||||
|
|
||||||
# Storage settings
|
|
||||||
- STORAGE_TYPE=${STORAGE_TYPE:-local}
|
|
||||||
- STORAGE_LOCAL_PATH=${STORAGE_LOCAL_PATH:-storage}
|
|
||||||
|
|
||||||
# Vector store settings - ClickZetta configuration
|
|
||||||
- VECTOR_STORE=${VECTOR_STORE:-clickzetta}
|
|
||||||
- CLICKZETTA_USERNAME=${CLICKZETTA_USERNAME}
|
|
||||||
- CLICKZETTA_PASSWORD=${CLICKZETTA_PASSWORD}
|
|
||||||
- CLICKZETTA_INSTANCE=${CLICKZETTA_INSTANCE}
|
|
||||||
- CLICKZETTA_SERVICE=${CLICKZETTA_SERVICE:-api.clickzetta.com}
|
|
||||||
- CLICKZETTA_WORKSPACE=${CLICKZETTA_WORKSPACE:-quick_start}
|
|
||||||
- CLICKZETTA_VCLUSTER=${CLICKZETTA_VCLUSTER:-default_ap}
|
|
||||||
- CLICKZETTA_SCHEMA=${CLICKZETTA_SCHEMA:-dify}
|
|
||||||
- CLICKZETTA_BATCH_SIZE=${CLICKZETTA_BATCH_SIZE:-20}
|
|
||||||
- CLICKZETTA_ENABLE_INVERTED_INDEX=${CLICKZETTA_ENABLE_INVERTED_INDEX:-true}
|
|
||||||
- CLICKZETTA_ANALYZER_TYPE=${CLICKZETTA_ANALYZER_TYPE:-chinese}
|
|
||||||
- CLICKZETTA_ANALYZER_MODE=${CLICKZETTA_ANALYZER_MODE:-smart}
|
|
||||||
- CLICKZETTA_VECTOR_DISTANCE_FUNCTION=${CLICKZETTA_VECTOR_DISTANCE_FUNCTION:-cosine_distance}
|
|
||||||
|
|
||||||
depends_on:
|
|
||||||
- db
|
|
||||||
- redis
|
|
||||||
volumes:
|
|
||||||
- ./volumes/app/storage:/app/api/storage
|
|
||||||
networks:
|
|
||||||
- dify
|
|
||||||
|
|
||||||
# Worker service
|
|
||||||
worker:
|
|
||||||
image: czqiliang/dify-clickzetta-api:v1.6.0
|
|
||||||
restart: always
|
|
||||||
environment:
|
|
||||||
- MODE=worker
|
|
||||||
- LOG_LEVEL=INFO
|
|
||||||
- SECRET_KEY=${SECRET_KEY:-dify}
|
|
||||||
|
|
||||||
# Database settings
|
|
||||||
- DB_USERNAME=${DB_USERNAME:-postgres}
|
|
||||||
- DB_PASSWORD=${DB_PASSWORD:-difyai123456}
|
|
||||||
- DB_HOST=${DB_HOST:-db}
|
|
||||||
- DB_PORT=${DB_PORT:-5432}
|
|
||||||
- DB_DATABASE=${DB_DATABASE:-dify}
|
|
||||||
|
|
||||||
# Redis settings
|
|
||||||
- REDIS_HOST=${REDIS_HOST:-redis}
|
|
||||||
- REDIS_PORT=${REDIS_PORT:-6379}
|
|
||||||
- REDIS_PASSWORD=${REDIS_PASSWORD:-difyai123456}
|
|
||||||
- REDIS_DB=${REDIS_DB:-0}
|
|
||||||
|
|
||||||
# Celery settings
|
|
||||||
- CELERY_BROKER_URL=${CELERY_BROKER_URL:-redis://:difyai123456@redis:6379/1}
|
|
||||||
- BROKER_USE_SSL=${BROKER_USE_SSL:-false}
|
|
||||||
|
|
||||||
# Vector store settings - ClickZetta configuration
|
|
||||||
- VECTOR_STORE=${VECTOR_STORE:-clickzetta}
|
|
||||||
- CLICKZETTA_USERNAME=${CLICKZETTA_USERNAME}
|
|
||||||
- CLICKZETTA_PASSWORD=${CLICKZETTA_PASSWORD}
|
|
||||||
- CLICKZETTA_INSTANCE=${CLICKZETTA_INSTANCE}
|
|
||||||
- CLICKZETTA_SERVICE=${CLICKZETTA_SERVICE:-api.clickzetta.com}
|
|
||||||
- CLICKZETTA_WORKSPACE=${CLICKZETTA_WORKSPACE:-quick_start}
|
|
||||||
- CLICKZETTA_VCLUSTER=${CLICKZETTA_VCLUSTER:-default_ap}
|
|
||||||
- CLICKZETTA_SCHEMA=${CLICKZETTA_SCHEMA:-dify}
|
|
||||||
- CLICKZETTA_BATCH_SIZE=${CLICKZETTA_BATCH_SIZE:-20}
|
|
||||||
- CLICKZETTA_ENABLE_INVERTED_INDEX=${CLICKZETTA_ENABLE_INVERTED_INDEX:-true}
|
|
||||||
- CLICKZETTA_ANALYZER_TYPE=${CLICKZETTA_ANALYZER_TYPE:-chinese}
|
|
||||||
- CLICKZETTA_ANALYZER_MODE=${CLICKZETTA_ANALYZER_MODE:-smart}
|
|
||||||
- CLICKZETTA_VECTOR_DISTANCE_FUNCTION=${CLICKZETTA_VECTOR_DISTANCE_FUNCTION:-cosine_distance}
|
|
||||||
|
|
||||||
depends_on:
|
|
||||||
- db
|
|
||||||
- redis
|
|
||||||
volumes:
|
|
||||||
- ./volumes/app/storage:/app/api/storage
|
|
||||||
networks:
|
|
||||||
- dify
|
|
||||||
|
|
||||||
# Web service
|
|
||||||
web:
|
|
||||||
image: langgenius/dify-web:1.6.0
|
|
||||||
restart: always
|
|
||||||
environment:
|
|
||||||
- CONSOLE_API_URL=${CONSOLE_API_URL:-}
|
|
||||||
- APP_API_URL=${APP_API_URL:-}
|
|
||||||
depends_on:
|
|
||||||
- api
|
|
||||||
networks:
|
|
||||||
- dify
|
|
||||||
|
|
||||||
# Database
|
|
||||||
db:
|
|
||||||
image: postgres:15-alpine
|
|
||||||
restart: always
|
|
||||||
environment:
|
|
||||||
- PGUSER=${PGUSER:-postgres}
|
|
||||||
- POSTGRES_PASSWORD=${DB_PASSWORD:-difyai123456}
|
|
||||||
- POSTGRES_DB=${DB_DATABASE:-dify}
|
|
||||||
command: >
|
|
||||||
postgres -c max_connections=100
|
|
||||||
-c shared_preload_libraries=pg_stat_statements
|
|
||||||
-c pg_stat_statements.max=10000
|
|
||||||
-c pg_stat_statements.track=all
|
|
||||||
volumes:
|
|
||||||
- ./volumes/db/data:/var/lib/postgresql/data
|
|
||||||
networks:
|
|
||||||
- dify
|
|
||||||
healthcheck:
|
|
||||||
test: ["CMD", "pg_isready"]
|
|
||||||
interval: 1s
|
|
||||||
timeout: 3s
|
|
||||||
retries: 30
|
|
||||||
|
|
||||||
# Redis
|
|
||||||
redis:
|
|
||||||
image: redis:6-alpine
|
|
||||||
restart: always
|
|
||||||
command: redis-server --requirepass ${REDIS_PASSWORD:-difyai123456}
|
|
||||||
volumes:
|
|
||||||
- ./volumes/redis/data:/data
|
|
||||||
networks:
|
|
||||||
- dify
|
|
||||||
healthcheck:
|
|
||||||
test: ["CMD", "redis-cli", "ping"]
|
|
||||||
interval: 1s
|
|
||||||
timeout: 3s
|
|
||||||
retries: 30
|
|
||||||
|
|
||||||
# Nginx reverse proxy
|
|
||||||
nginx:
|
|
||||||
image: nginx:latest
|
|
||||||
restart: always
|
|
||||||
volumes:
|
|
||||||
- ./docker/nginx/nginx.conf.template:/etc/nginx/nginx.conf.template
|
|
||||||
- ./docker/nginx/proxy.conf.template:/etc/nginx/proxy.conf.template
|
|
||||||
- ./docker/nginx/conf.d:/etc/nginx/conf.d
|
|
||||||
environment:
|
|
||||||
- NGINX_SERVER_NAME=${NGINX_SERVER_NAME:-_}
|
|
||||||
- NGINX_HTTPS_ENABLED=${NGINX_HTTPS_ENABLED:-false}
|
|
||||||
- NGINX_SSL_PORT=${NGINX_SSL_PORT:-443}
|
|
||||||
- NGINX_PORT=${NGINX_PORT:-80}
|
|
||||||
entrypoint: ["/bin/sh", "-c", "envsubst < /etc/nginx/nginx.conf.template > /etc/nginx/nginx.conf && nginx -g 'daemon off;'"]
|
|
||||||
depends_on:
|
|
||||||
- api
|
|
||||||
- web
|
|
||||||
ports:
|
|
||||||
- "${EXPOSE_NGINX_PORT:-80}:${NGINX_PORT:-80}"
|
|
||||||
networks:
|
|
||||||
- dify
|
|
||||||
|
|
||||||
networks:
|
|
||||||
dify:
|
|
||||||
driver: bridge
|
|
||||||
Loading…
Reference in New Issue