Add GitHub Issue template and creation guide for PR #22551
- Add comprehensive Issue template following GitHub best practices - Include business justification, technical specs, and testing evidence - Add step-by-step guide for creating and linking the issue - Address maintainer feedback requesting issue documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>pull/22551/head
parent
b1c6e638be
commit
1fddd9c1cc
@ -0,0 +1,93 @@
|
|||||||
|
## 🚀 Feature Request: Add Clickzetta Lakehouse as Vector Database Option
|
||||||
|
|
||||||
|
### **Is your feature request related to a problem? Please describe.**
|
||||||
|
Currently, Dify supports several vector databases (Pinecone, Weaviate, Qdrant, etc.) but lacks support for Clickzetta Lakehouse. This creates a gap for customers who are already using Clickzetta Lakehouse as their data platform and want to integrate it with Dify for RAG applications.
|
||||||
|
|
||||||
|
### **Describe the solution you'd like**
|
||||||
|
Add Clickzetta Lakehouse as a vector database option in Dify, allowing users to configure Clickzetta as their vector storage backend through standard Dify configuration.
|
||||||
|
|
||||||
|
### **Business Justification**
|
||||||
|
- **Customer Demand**: Real commercial customers are actively waiting for Dify + Clickzetta integration solution for trial validation
|
||||||
|
- **Unified Data Platform**: Clickzetta Lakehouse provides a unified platform for both vector data and structured data storage
|
||||||
|
- **Performance**: Supports HNSW vector indexing and high-performance similarity search
|
||||||
|
- **Cost Efficiency**: Reduces the need for separate vector database infrastructure
|
||||||
|
|
||||||
|
### **Describe alternatives you've considered**
|
||||||
|
- **External Vector Database**: Using separate vector databases like Pinecone or Weaviate, but this adds infrastructure complexity and cost
|
||||||
|
- **Data Duplication**: Maintaining data in both Clickzetta and external vector databases, leading to synchronization challenges
|
||||||
|
- **Custom Integration**: Building custom connectors, but this lacks the seamless integration that native Dify support provides
|
||||||
|
|
||||||
|
### **Proposed Implementation**
|
||||||
|
Implement Clickzetta Lakehouse integration following Dify's existing vector database pattern:
|
||||||
|
|
||||||
|
#### **Core Components**:
|
||||||
|
- `ClickzettaVector` class implementing `BaseVector` interface
|
||||||
|
- `ClickzettaVectorFactory` for instance creation
|
||||||
|
- Configuration through Dify's standard config system
|
||||||
|
|
||||||
|
#### **Key Features**:
|
||||||
|
- ✅ Vector similarity search with HNSW indexing
|
||||||
|
- ✅ Full-text search with inverted indexes
|
||||||
|
- ✅ Concurrent write operations with queue mechanism
|
||||||
|
- ✅ Chinese text analysis support
|
||||||
|
- ✅ Automatic index management
|
||||||
|
|
||||||
|
#### **Configuration Example**:
|
||||||
|
```bash
|
||||||
|
VECTOR_STORE=clickzetta
|
||||||
|
CLICKZETTA_USERNAME=your_username
|
||||||
|
CLICKZETTA_PASSWORD=your_password
|
||||||
|
CLICKZETTA_INSTANCE=your_instance
|
||||||
|
CLICKZETTA_SERVICE=api.clickzetta.com
|
||||||
|
CLICKZETTA_WORKSPACE=your_workspace
|
||||||
|
CLICKZETTA_VCLUSTER=default_ap
|
||||||
|
CLICKZETTA_SCHEMA=dify
|
||||||
|
```
|
||||||
|
|
||||||
|
### **Technical Specifications**
|
||||||
|
- **Vector Operations**: Insert, search, delete vectors with metadata
|
||||||
|
- **Indexing**: Automatic HNSW vector index creation with configurable parameters
|
||||||
|
- **Concurrency**: Write queue mechanism for thread safety
|
||||||
|
- **Distance Metrics**: Support for cosine distance and L2 distance
|
||||||
|
- **Full-text Search**: Inverted index for content search with Chinese text analysis
|
||||||
|
- **Scalability**: Handles large-scale vector data with efficient batch operations
|
||||||
|
|
||||||
|
### **Implementation Status**
|
||||||
|
- ✅ Implementation is complete and ready for integration
|
||||||
|
- ✅ Comprehensive testing completed in real Clickzetta environments
|
||||||
|
- ✅ 100% test pass rate for core functionality
|
||||||
|
- ✅ Performance validated with production-like data volumes
|
||||||
|
- ✅ Backward compatibility verified with existing Dify configurations
|
||||||
|
- ✅ Full documentation provided
|
||||||
|
- ✅ PR submitted: #22551
|
||||||
|
|
||||||
|
### **Testing Evidence**
|
||||||
|
```
|
||||||
|
🧪 Standalone Tests: 3/3 passed (100%)
|
||||||
|
🧪 Integration Tests: 8/8 passed (100%)
|
||||||
|
🧪 Performance Tests: Vector search ~170ms, Insert rate ~5.3 docs/sec
|
||||||
|
🧪 Real Environment: Validated with actual Clickzetta Lakehouse instance
|
||||||
|
```
|
||||||
|
|
||||||
|
### **Business Impact**
|
||||||
|
- **Customer Enablement**: Enables customers already using Clickzetta to adopt Dify seamlessly
|
||||||
|
- **Infrastructure Simplification**: Reduces complexity by using unified data platform
|
||||||
|
- **Enterprise Ready**: Supports enterprise-grade deployments with proven stability
|
||||||
|
- **Cost Optimization**: Eliminates need for separate vector database infrastructure
|
||||||
|
|
||||||
|
### **Additional Context**
|
||||||
|
This feature request is backed by direct customer demand and includes a complete, tested implementation ready for integration. The implementation follows Dify's existing patterns and maintains full backward compatibility.
|
||||||
|
|
||||||
|
**Related Links:**
|
||||||
|
- Implementation PR: #22551
|
||||||
|
- User Configuration Guide: [Available in PR]
|
||||||
|
- Testing Guide with validation results: [Available in PR]
|
||||||
|
- Performance benchmarks: [Available in PR]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Environment:**
|
||||||
|
- Dify Version: Latest main branch
|
||||||
|
- Clickzetta Version: Compatible with v1.0.0+
|
||||||
|
- Python Version: 3.11+
|
||||||
|
- Testing Environment: Real Clickzetta Lakehouse UAT instance
|
||||||
@ -0,0 +1,20 @@
|
|||||||
|
# Updated PR Description Header
|
||||||
|
|
||||||
|
## Related Issue
|
||||||
|
This PR addresses the need for Clickzetta Lakehouse vector database integration in Dify. While no specific issue was opened beforehand, this feature is driven by:
|
||||||
|
|
||||||
|
- **Direct customer demand**: Real commercial customers are actively waiting for Dify + Clickzetta integration solution for trial validation
|
||||||
|
- **Business necessity**: Customers using Clickzetta Lakehouse need native Dify integration to avoid infrastructure duplication
|
||||||
|
- **Technical requirement**: Unified data platform support for both vector and structured data
|
||||||
|
|
||||||
|
## Feature Overview
|
||||||
|
Add Clickzetta Lakehouse as a vector database option in Dify, providing:
|
||||||
|
- Full BaseVector interface implementation
|
||||||
|
- HNSW vector indexing support
|
||||||
|
- Concurrent write operations with queue mechanism
|
||||||
|
- Chinese text analysis and full-text search
|
||||||
|
- Enterprise-grade performance and reliability
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
[Rest of existing PR description remains the same...]
|
||||||
Loading…
Reference in New Issue