refactor: remove clickzetta/ folder and update service endpoint

- Remove clickzetta/ development folder from PR (add to .gitignore) - Update CLICKZETTA_SERVICE from uat-api.clickzetta.com to api.clickzetta.com - Update both docker/.env.example and docker/docker-compose.yaml for consistency 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
10 months ago · ecbe555cb0
parent 0246f39564
commit ecbe555cb0
21 changed files with 4 additions and 2970 deletions
--- a/.gitignore
+++ b/.gitignore
@ -219,3 +219,6 @@ api/.env.backup
 # Clickzetta test credentials
 .env.clickzetta
 .env.clickzetta.test
 # Clickzetta plugin development folder (keep local, ignore for PR)
 clickzetta/
--- a/clickzetta/.env.clickzetta.example
+++ b/clickzetta/.env.clickzetta.example
@ -1,48 +0,0 @@
 # ClickZetta Dify Integration Environment Configuration
 # Copy this file to .env and configure your ClickZetta credentials
 # ClickZetta Database Configuration (Required)
 CLICKZETTA_USERNAME=your_username
 CLICKZETTA_PASSWORD=your_password
 CLICKZETTA_INSTANCE=your_instance
 # ClickZetta Advanced Settings (Optional)
 CLICKZETTA_SERVICE=api.clickzetta.com
 CLICKZETTA_WORKSPACE=quick_start
 CLICKZETTA_VCLUSTER=default_ap
 CLICKZETTA_SCHEMA=dify
 CLICKZETTA_BATCH_SIZE=20
 CLICKZETTA_ENABLE_INVERTED_INDEX=true
 CLICKZETTA_ANALYZER_TYPE=chinese
 CLICKZETTA_ANALYZER_MODE=smart
 CLICKZETTA_VECTOR_DISTANCE_FUNCTION=cosine_distance
 # Dify Core Settings
 SECRET_KEY=dify
 INIT_PASSWORD=
 CONSOLE_WEB_URL=
 CONSOLE_API_URL=
 SERVICE_API_URL=
 # Database Settings
 DB_USERNAME=postgres
 DB_PASSWORD=difyai123456
 DB_HOST=db
 DB_PORT=5432
 DB_DATABASE=dify
 # Redis Settings
 REDIS_HOST=redis
 REDIS_PORT=6379
 REDIS_PASSWORD=difyai123456
 REDIS_DB=0
 # Storage Settings
 STORAGE_TYPE=local
 STORAGE_LOCAL_PATH=storage
 # Nginx Settings
 EXPOSE_NGINX_PORT=80
 NGINX_SERVER_NAME=_
 NGINX_HTTPS_ENABLED=false
 NGINX_PORT=80
--- a/clickzetta/CI_FIXES_SUMMARY.md
+++ b/clickzetta/CI_FIXES_SUMMARY.md
@ -1,73 +0,0 @@
 # CI检查修复总结
 ## 修复的问题
 ### ✅ 已修复：Python Style检查
 - **问题**: 代码样式不符合项目标准
 - **修复内容**:
  - 移除未使用的导入 (`time`, `VectorType`)
  - 将 `logger.error` 替换为 `logger.exception` 用于异常处理
  - 移除 `logging.exception` 调用中的冗余异常对象引用
 - **状态**: ✅ 已完成
 - **提交**: ed139a49a
 ### ⏳ 待观察：其他检查
 - **API Tests (Python 3.11/3.12)**: 可能由于缺少测试环境变量
 - **Docker Compose Template**: 可能需要更新模板
 - **SuperLinter**: 可能由于其他代码质量问题
 ## CI检查状态
 ### 成功的检查 ✅
 - VDB Tests (Python 3.11) - 成功
 - VDB Tests (Python 3.12) - 成功
 - Web Style - 成功
 - **Python Style** - 🎉 修复后成功
 ### 需要进一步关注的检查 ⚠️
 1. **API Tests**: 可能需要Mock测试环境
 2. **Docker Compose Template**: 可能需要更新配置
 3. **SuperLinter**: 可能需要其他代码质量修复
 ## 建议的后续行动
 ### 1. 监控CI结果
 - 推送修复后等待CI重新运行
 - 检查哪些检查现在通过了
 ### 2. 如果API Tests仍然失败
 - 检查是否需要更新测试环境配置
 - 确保Clickzetta测试有适当的Mock或跳过逻辑
 ### 3. 如果Docker Compose Template失败
 - 检查是否需要更新docker-compose模板
 - 确保没有语法错误
 ### 4. 如果SuperLinter失败
 - 检查其他代码质量问题
 - 可能需要更新文档或注释格式
 ## 测试策略
 ### 本地测试
 ```bash
 # 运行代码样式检查
 python -m ruff check api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py
 # 运行特定VDB测试
 pytest api/tests/integration_tests/vdb/clickzetta/test_clickzetta.py -v
 ```
 ### CI环境
 - VDB Tests已经通过，说明核心功能正常
 - 需要解决的主要是样式和配置问题
 ## 当前状态
 - **Python Style**: ✅ 已修复
 - **核心功能**: ✅ VDB测试通过
 - **整体进展**: 🟡 等待其他检查结果
 ## 下一步
 1. 等待CI重新运行结果
 2. 根据剩余失败的检查采取相应行动
 3. 与维护者沟通任何无法解决的问题
--- a/clickzetta/DIFY_CLICKZETTA_VECTOR_DB_GUIDE.md
+++ b/clickzetta/DIFY_CLICKZETTA_VECTOR_DB_GUIDE.md
@ -1,337 +0,0 @@
 # Dify中配置Clickzetta Lakehouse作为向量数据库指南
 ## 概述
 Clickzetta Lakehouse是一个统一的数据湖仓平台，支持向量数据存储和高性能搜索。本指南将帮助您在Dify中配置Clickzetta作为向量数据库，替代默认的向量数据库选项。
 ## 前置条件
 ### 1. 系统要求
 - Dify 平台已部署并运行
 - Python 3.11+ 环境
 - 可访问的Clickzetta Lakehouse实例
 ### 2. 必需的连接信息
 在开始配置之前，请确保您有以下Clickzetta Lakehouse连接信息：
 | 参数 | 说明 | 示例 |
 |------|------|------|
 | `username` | Clickzetta用户名 | `your_username` |
 | `password` | Clickzetta密码 | `your_password` |
 | `instance` | Clickzetta实例ID | `your_instance_id` |
 | `service` | 服务端点 | `api.clickzetta.com` |
 | `workspace` | 工作空间名称 | `quick_start` |
 | `vcluster` | 虚拟集群名称 | `default_ap` |
 | `schema` | 数据库模式 | `dify` |
 ## 配置步骤
 ### 1. 环境变量配置
 在Dify部署环境中设置以下环境变量：
 ```bash
 # Clickzetta Lakehouse连接配置
 export VECTOR_STORE=clickzetta
 export CLICKZETTA_USERNAME=your_username
 export CLICKZETTA_PASSWORD=your_password
 export CLICKZETTA_INSTANCE=your_instance_id
 export CLICKZETTA_SERVICE=api.clickzetta.com
 export CLICKZETTA_WORKSPACE=quick_start
 export CLICKZETTA_VCLUSTER=default_ap
 export CLICKZETTA_SCHEMA=dify
 # 可选的高级配置
 export CLICKZETTA_BATCH_SIZE=100
 export CLICKZETTA_ENABLE_INVERTED_INDEX=true
 export CLICKZETTA_ANALYZER_TYPE=chinese
 export CLICKZETTA_ANALYZER_MODE=smart
 export CLICKZETTA_VECTOR_DISTANCE_FUNCTION=cosine_distance
 ```
 ### 2. Docker Compose配置
 如果使用Docker Compose部署Dify，请在`docker-compose.yml`中添加环境变量：
 ```yaml
 version: '3'
 services:
  api:
    image: langgenius/dify-api:latest
    environment:
      # ... 其他配置
      # Clickzetta向量数据库配置
      VECTOR_STORE: clickzetta
      CLICKZETTA_USERNAME: ${CLICKZETTA_USERNAME}
      CLICKZETTA_PASSWORD: ${CLICKZETTA_PASSWORD}
      CLICKZETTA_INSTANCE: ${CLICKZETTA_INSTANCE}
      CLICKZETTA_SERVICE: ${CLICKZETTA_SERVICE:-api.clickzetta.com}
      CLICKZETTA_WORKSPACE: ${CLICKZETTA_WORKSPACE:-quick_start}
      CLICKZETTA_VCLUSTER: ${CLICKZETTA_VCLUSTER:-default_ap}
      CLICKZETTA_SCHEMA: ${CLICKZETTA_SCHEMA:-dify}
      # 可选的高级配置
      CLICKZETTA_BATCH_SIZE: ${CLICKZETTA_BATCH_SIZE:-100}
      CLICKZETTA_ENABLE_INVERTED_INDEX: ${CLICKZETTA_ENABLE_INVERTED_INDEX:-true}
      CLICKZETTA_ANALYZER_TYPE: ${CLICKZETTA_ANALYZER_TYPE:-chinese}
      CLICKZETTA_ANALYZER_MODE: ${CLICKZETTA_ANALYZER_MODE:-smart}
      CLICKZETTA_VECTOR_DISTANCE_FUNCTION: ${CLICKZETTA_VECTOR_DISTANCE_FUNCTION:-cosine_distance}
 ```
 ### 3. 配置文件设置
 如果使用配置文件方式，请在Dify配置文件中添加：
 ```python
 # config.py
 class Config:
    # ... 其他配置
    # 向量数据库配置
    VECTOR_STORE = "clickzetta"
    # Clickzetta连接配置
    CLICKZETTA_USERNAME = os.getenv("CLICKZETTA_USERNAME")
    CLICKZETTA_PASSWORD = os.getenv("CLICKZETTA_PASSWORD")
    CLICKZETTA_INSTANCE = os.getenv("CLICKZETTA_INSTANCE")
    CLICKZETTA_SERVICE = os.getenv("CLICKZETTA_SERVICE", "api.clickzetta.com")
    CLICKZETTA_WORKSPACE = os.getenv("CLICKZETTA_WORKSPACE", "quick_start")
    CLICKZETTA_VCLUSTER = os.getenv("CLICKZETTA_VCLUSTER", "default_ap")
    CLICKZETTA_SCHEMA = os.getenv("CLICKZETTA_SCHEMA", "dify")
    # 高级配置
    CLICKZETTA_BATCH_SIZE = int(os.getenv("CLICKZETTA_BATCH_SIZE", "100"))
    CLICKZETTA_ENABLE_INVERTED_INDEX = os.getenv("CLICKZETTA_ENABLE_INVERTED_INDEX", "true").lower() == "true"
    CLICKZETTA_ANALYZER_TYPE = os.getenv("CLICKZETTA_ANALYZER_TYPE", "chinese")
    CLICKZETTA_ANALYZER_MODE = os.getenv("CLICKZETTA_ANALYZER_MODE", "smart")
    CLICKZETTA_VECTOR_DISTANCE_FUNCTION = os.getenv("CLICKZETTA_VECTOR_DISTANCE_FUNCTION", "cosine_distance")
 ```
 ## 验证配置
 ### 1. 连接测试
 启动Dify后，可以通过以下方式验证Clickzetta连接：
 1. **查看日志**：
   ```bash
   # 查看Dify API日志
   docker logs dify-api
   # 查找Clickzetta相关日志
   docker logs dify-api | grep -i clickzetta
   ```
 2. **创建知识库测试**：
   - 登录Dify管理界面
   - 创建新的知识库
   - 上传测试文档
   - 观察是否成功创建向量索引
 ### 2. 功能验证
 在Dify中验证以下功能：
 - ✅ **知识库创建**：能否成功创建知识库
 - ✅ **文档上传**：能否上传和处理文档
 - ✅ **向量化存储**：文档是否被正确向量化并存储
 - ✅ **相似度搜索**：搜索功能是否正常工作
 - ✅ **问答功能**：基于知识库的问答是否准确
 ## 使用指南
 ### 1. 知识库管理
 #### 创建知识库
 1. 登录Dify管理界面
 2. 点击「知识库」→「创建知识库」
 3. 填写知识库名称和描述
 4. 选择嵌入模型（推荐使用支持中文的模型）
 5. 点击「保存并处理」
 #### 上传文档
 1. 在知识库中点击「上传文档」
 2. 选择支持的文件格式（PDF、Word、TXT等）
 3. 配置文档分块规则
 4. 点击「保存并处理」
 5. 等待文档处理完成
 #### 管理向量数据
 - **查看统计**：在知识库详情页查看向量数量和存储统计
 - **更新文档**：可以更新或删除已上传的文档
 - **搜索测试**：使用搜索功能测试向量检索效果
 ### 2. 应用开发
 #### 在聊天应用中使用
 1. 创建新的聊天应用
 2. 在「提示词编排」中关联知识库
 3. 配置检索设置：
   - **TopK值**：建议3-5
   - **相似度阈值**：建议0.3-0.7
   - **重排序**：可选启用
 4. 测试问答效果
 #### 在工作流中使用
 1. 创建工作流应用
 2. 添加「知识检索」节点
 3. 配置检索参数：
   - **查询变量**：`{{sys.query}}`
   - **知识库**：选择目标知识库
   - **检索设置**：TopK和相似度阈值
 4. 将检索结果传递给LLM节点
 ## 性能优化
 ### 1. 向量索引优化
 Clickzetta自动为向量字段创建HNSW索引，您可以通过以下方式优化：
 ```python
 # 在配置中调整索引参数
 CLICKZETTA_VECTOR_DISTANCE_FUNCTION = "cosine_distance"  # 适合文本嵌入
 # 或
 CLICKZETTA_VECTOR_DISTANCE_FUNCTION = "l2_distance"      # 适合图像嵌入
 ```
 ### 2. 批处理优化
 ```python
 # 调整批处理大小
 CLICKZETTA_BATCH_SIZE = 200  # 增加批处理大小可提高吞吐量
 ```
 ### 3. 全文搜索优化
 ```python
 # 启用倒排索引以支持全文搜索
 CLICKZETTA_ENABLE_INVERTED_INDEX = true
 CLICKZETTA_ANALYZER_TYPE = "chinese"  # 中文分词
 CLICKZETTA_ANALYZER_MODE = "smart"    # 智能分词模式
 ```
 ## 监控和维护
 ### 1. 性能监控
 监控以下关键指标：
 - **连接状态**：数据库连接是否正常
 - **查询延迟**：向量搜索响应时间
 - **吞吐量**：每秒处理的向量查询数
 - **存储使用**：向量数据存储空间使用情况
 ### 2. 日志分析
 关注以下日志信息：
 ```bash
 # 连接日志
 INFO - Clickzetta connection established successfully
 # 向量操作日志
 INFO - Vector insert completed: 1000 vectors in 2.3s
 INFO - Vector search completed: 5 results in 120ms
 # 错误日志
 ERROR - Clickzetta connection failed: ...
 WARNING - Vector search timeout: ...
 ```
 ### 3. 数据备份
 定期备份重要的向量数据：
 ```sql
 -- 查看向量集合
 SHOW TABLES IN dify;
 -- 备份向量数据
 CREATE TABLE dify.backup_vectors AS 
 SELECT * FROM dify.knowledge_base_vectors;
 -- 查看数据统计
 SELECT COUNT(*) FROM dify.knowledge_base_vectors;
 ```
 ## 故障排除
 ### 常见问题
 #### Q1: 连接失败
 **症状**: Dify启动时报Clickzetta连接错误
 **解决方案**:
 1. 检查网络连接
 2. 验证用户名和密码
 3. 确认实例ID正确
 4. 检查防火墙设置
 #### Q2: 向量搜索性能差
 **症状**: 搜索响应时间过长
 **解决方案**:
 1. 检查是否创建了向量索引
 2. 调整TopK值
 3. 优化查询条件
 4. 考虑增加计算资源
 #### Q3: 文档处理失败
 **症状**: 文档上传后处理失败
 **解决方案**:
 1. 检查文档格式是否支持
 2. 验证文档大小限制
 3. 查看详细错误日志
 4. 检查向量化模型状态
 #### Q4: 中文搜索效果差
 **症状**: 中文文档搜索结果不准确
 **解决方案**:
 1. 启用中文分词器
 2. 调整相似度阈值
 3. 使用支持中文的嵌入模型
 4. 检查文档分块设置
 ## 迁移指南
 ### 从其他向量数据库迁移
 如果您从其他向量数据库（如Pinecone、Weaviate等）迁移到Clickzetta：
 1. **备份现有数据**：
   ```bash
   # 导出现有向量数据
   python export_vectors.py --source=pinecone --output=vectors.json
   ```
 2. **更新配置**：
   - 修改环境变量
   - 重启Dify服务
 3. **数据导入**：
   ```bash
   # 导入向量数据到Clickzetta
   python import_vectors.py --source=vectors.json --target=clickzetta
   ```
 4. **验证迁移**：
   - 测试搜索功能
   - 验证数据完整性
   - 检查性能指标
 ## 技术支持
 ### 获取帮助
 如遇到问题，请：
 1. 查看Dify系统日志
 2. 检查Clickzetta连接状态
 3. 参考本指南的故障排除部分
 4. 联系技术支持团队
 ### 有用的资源
 - **Dify官方文档**: https://docs.dify.ai
 - **Clickzetta文档**: https://docs.clickzetta.com
 - **GitHub Issues**: https://github.com/langgenius/dify/issues
 - **社区论坛**: https://community.dify.ai
 ---
 *本指南基于Dify v0.8.0+ 和 Clickzetta Lakehouse v1.0.0+*
--- a/clickzetta/GITHUB_ISSUE_STEPS.md
+++ b/clickzetta/GITHUB_ISSUE_STEPS.md
@ -1,64 +0,0 @@
 # GitHub Issue 创建步骤指南
 ## 第1步：访问Dify项目的Issues页面
 访问：https://github.com/langgenius/dify/issues/new
 ## 第2步：选择Issue类型
 选择 "Feature Request" 或 "Get started"
 ## 第3步：填写Issue内容
 **标题**：
 ```
 🚀 Feature Request: Add Clickzetta Lakehouse as Vector Database Option
 ```
 **内容**：
 复制并粘贴 `ISSUE_TEMPLATE.md` 文件中的全部内容
 ## 第4步：添加标签（如果可能）
 建议添加以下标签：
 - `enhancement`
 - `vector-database`
 - `feature-request`
 ## 第5步：提交Issue
 点击 "Submit new issue" 按钮
 ## 第6步：获取Issue编号
 提交后，您将看到一个新的Issue编号（例如：#12345）
 ## 第7步：更新PR描述
 在PR #22551 的描述开头添加：
 ```
 Closes #[刚创建的issue编号]
 ```
 或者：
 ```
 Related to #[刚创建的issue编号]
 ```
 ## 第8步：通知维护者
 在PR中回复 @crazywoola：
 ```
@crazywoola I've created issue #[issue编号] to document this feature request as requested. The issue provides comprehensive context about customer demand and technical implementation details.
 ```
 ## 示例回复模板
 ```
@crazywoola Thank you for the feedback! I've created issue #[issue编号] to document this feature request as requested. 
 The issue provides:
 - Business justification and customer demand context
 - Technical specifications and implementation details  
 - Comprehensive testing evidence (100% pass rate)
 - Performance benchmarks and validation results
 The implementation is complete and ready for integration. Please let me know if you need any additional information or modifications.
 ```
 ## 预期结果
 - Issue将为维护者提供完整的功能需求上下文
 - PR将有明确的相关Issue链接
 - 符合Dify项目的贡献流程和最佳实践
 - 提高PR被接受的可能性
--- a/clickzetta/INDEX.md
+++ b/clickzetta/INDEX.md
@ -1,70 +0,0 @@
 # Clickzetta Lakehouse & Dify 集成方案
 ## 项目关系
 本目录包含Clickzetta Lakehouse与Dify集成的两种方案：
 ### 1. 核心向量数据库集成 (当前目录)
 - **位置**: `/Users/liangmo/Documents/GitHub/dify/clickzetta/`
 - **类型**: Dify核心功能集成
 - **用途**: 将Clickzetta Lakehouse作为Dify的底层向量数据库
 - **目标用户**: Dify部署管理员
 - **文档**: `DIFY_CLICKZETTA_VECTOR_DB_GUIDE.md`
 ### 2. 插件工具集成 (独立项目)
 - **位置**: `/Users/liangmo/Documents/GitHub/clickzetta_dify/`
 - **类型**: Dify插件工具
 - **用途**: 提供Clickzetta相关的工具供Dify工作流使用
 - **目标用户**: Dify应用开发者
 - **GitHub**: https://github.com/yunqiqiliang/clickzetta_dify
 - **文档**: 插件项目中的`docs/DIFY_CLICKZETTA_PLUGIN_INSTALLATION_GUIDE.md`
 ## 使用场景对比
 | 特性 | 核心集成 | 插件工具 |
 |------|----------|----------|
 | **安装方式** | 配置环境变量 | 安装插件包 |
 | **使用对象** | Dify系统管理员 | Dify应用开发者 |
 | **功能范围** | 底层向量存储 | 工作流工具 |
 | **配置复杂度** | 中等 | 简单 |
 | **适用场景** | 替换默认向量数据库 | 灵活的数据操作 |
 ## 推荐使用方案
 ### 场景1: 企业级部署
 - **使用**: 核心向量数据库集成
 - **优势**: 统一的数据存储，更好的性能和管理
 - **配置**: 参考 `DIFY_CLICKZETTA_VECTOR_DB_GUIDE.md`
 ### 场景2: 应用开发
 - **使用**: 插件工具集成
 - **优势**: 灵活的工具使用，无需系统级配置
 - **配置**: 参考插件项目的安装指南
 ### 场景3: 混合使用
 - **使用**: 同时使用两种方案
 - **优势**: 既有统一的底层存储，又有灵活的工具操作
 - **注意**: 确保两种方案使用相同的Clickzetta实例和配置
 ## 快速开始
 ### 核心集成配置
 ```bash
 # 设置环境变量
 export VECTOR_STORE=clickzetta
 export CLICKZETTA_USERNAME=your_username
 export CLICKZETTA_PASSWORD=your_password
 export CLICKZETTA_INSTANCE=your_instance
 # ... 其他配置
 # 重启Dify服务
 docker-compose restart
 ```
 ### 插件工具安装
 1. 从GitHub下载插件包
 2. 在Dify中安装插件
 3. 配置连接信息
 4. 在工作流中使用工具
 详细说明请参考各自的文档。
--- a/clickzetta/ISSUE_TEMPLATE.md
+++ b/clickzetta/ISSUE_TEMPLATE.md
@ -1,93 +0,0 @@
 ## 🚀 Feature Request: Add Clickzetta Lakehouse as Vector Database Option
 ### **Is your feature request related to a problem? Please describe.**
 Currently, Dify supports several vector databases (Pinecone, Weaviate, Qdrant, etc.) but lacks support for Clickzetta Lakehouse. This creates a gap for customers who are already using Clickzetta Lakehouse as their data platform and want to integrate it with Dify for RAG applications.
 ### **Describe the solution you'd like**
 Add Clickzetta Lakehouse as a vector database option in Dify, allowing users to configure Clickzetta as their vector storage backend through standard Dify configuration.
 ### **Business Justification**
 - **Customer Demand**: Real commercial customers are actively waiting for Dify + Clickzetta integration solution for trial validation
 - **Unified Data Platform**: Clickzetta Lakehouse provides a unified platform for both vector data and structured data storage
 - **Performance**: Supports HNSW vector indexing and high-performance similarity search
 - **Cost Efficiency**: Reduces the need for separate vector database infrastructure
 ### **Describe alternatives you've considered**
 - **External Vector Database**: Using separate vector databases like Pinecone or Weaviate, but this adds infrastructure complexity and cost
 - **Data Duplication**: Maintaining data in both Clickzetta and external vector databases, leading to synchronization challenges
 - **Custom Integration**: Building custom connectors, but this lacks the seamless integration that native Dify support provides
 ### **Proposed Implementation**
 Implement Clickzetta Lakehouse integration following Dify's existing vector database pattern:
 #### **Core Components**:
 - `ClickzettaVector` class implementing `BaseVector` interface
 - `ClickzettaVectorFactory` for instance creation  
 - Configuration through Dify's standard config system
 #### **Key Features**:
 - ✅ Vector similarity search with HNSW indexing
 - ✅ Full-text search with inverted indexes
 - ✅ Concurrent write operations with queue mechanism
 - ✅ Chinese text analysis support
 - ✅ Automatic index management
 #### **Configuration Example**:
 ```bash
 VECTOR_STORE=clickzetta
 CLICKZETTA_USERNAME=your_username
 CLICKZETTA_PASSWORD=your_password
 CLICKZETTA_INSTANCE=your_instance
 CLICKZETTA_SERVICE=api.clickzetta.com
 CLICKZETTA_WORKSPACE=your_workspace
 CLICKZETTA_VCLUSTER=default_ap
 CLICKZETTA_SCHEMA=dify
 ```
 ### **Technical Specifications**
 - **Vector Operations**: Insert, search, delete vectors with metadata
 - **Indexing**: Automatic HNSW vector index creation with configurable parameters
 - **Concurrency**: Write queue mechanism for thread safety
 - **Distance Metrics**: Support for cosine distance and L2 distance
 - **Full-text Search**: Inverted index for content search with Chinese text analysis
 - **Scalability**: Handles large-scale vector data with efficient batch operations
 ### **Implementation Status**
 - ✅ Implementation is complete and ready for integration
 - ✅ Comprehensive testing completed in real Clickzetta environments
 - ✅ 100% test pass rate for core functionality
 - ✅ Performance validated with production-like data volumes
 - ✅ Backward compatibility verified with existing Dify configurations
 - ✅ Full documentation provided
 - ✅ PR submitted: #22551
 ### **Testing Evidence**
 ```
 🧪 Standalone Tests: 3/3 passed (100%)
 🧪 Integration Tests: 8/8 passed (100%)
 🧪 Performance Tests: Vector search ~170ms, Insert rate ~5.3 docs/sec
 🧪 Real Environment: Validated with actual Clickzetta Lakehouse instance
 ```
 ### **Business Impact**
 - **Customer Enablement**: Enables customers already using Clickzetta to adopt Dify seamlessly
 - **Infrastructure Simplification**: Reduces complexity by using unified data platform
 - **Enterprise Ready**: Supports enterprise-grade deployments with proven stability
 - **Cost Optimization**: Eliminates need for separate vector database infrastructure
 ### **Additional Context**
 This feature request is backed by direct customer demand and includes a complete, tested implementation ready for integration. The implementation follows Dify's existing patterns and maintains full backward compatibility.
 **Related Links:**
 - Implementation PR: #22551
 - User Configuration Guide: [Available in PR]
 - Testing Guide with validation results: [Available in PR]
 - Performance benchmarks: [Available in PR]
 ---
 **Environment:**
 - Dify Version: Latest main branch
 - Clickzetta Version: Compatible with v1.0.0+
 - Python Version: 3.11+
 - Testing Environment: Real Clickzetta Lakehouse UAT instance
--- a/clickzetta/MAINTAINER_RESPONSE.md
+++ b/clickzetta/MAINTAINER_RESPONSE.md
@ -1,93 +0,0 @@
 # 维护者回复内容
 ## 发送给 @crazywoola 的回复
 ```markdown
@crazywoola Thank you for the feedback! I've addressed the lint errors and code style issues.
 ## ✅ Fixed Issues:
 ### Code Style & Lint:
 - **Removed unused imports**: `time` and `VectorType` modules
 - **Fixed logging patterns**: Replaced `logger.error` with `logger.exception` for proper exception handling
 - **Cleaned up redundant code**: Removed redundant exception objects from logging calls
 - **Architecture compliance**: ✅ Confirmed all Clickzetta code is within the `api/` directory as requested - no standalone services outside `api/`
 ### CI Status Progress:
 The following checks are now **passing**:
 - ✅ **Python Style** - All style issues resolved
 - ✅ **SuperLinter** - All lint issues resolved  
 - ✅ **Web Style** - Continues to pass
 - ✅ **Docker Compose Template** - Template checks passing
 ### Latest Update (All Style Issues Fixed):
 - ✅ **All Python Style Issues Resolved**:
  - Removed unused imports: `typing.cast`, `time`, `VectorType`, `json`
  - Fixed import sorting in all Clickzetta files with ruff auto-fix
  - Fixed logging patterns: replaced `logger.error` with `logger.exception`
 - ✅ **Comprehensive File Coverage**:
  - Main vector implementation: `clickzetta_vector.py`
  - Test files: `test_clickzetta.py`, `test_docker_integration.py`
  - Configuration: `clickzetta_config.py`
 - ✅ **Local Validation**: All files pass `ruff check` with zero errors
 - ✅ **Architecture Compliance**: All code within `api/` directory
 - ⏳ **CI Status**: Workflows awaiting maintainer approval to run (GitHub security requirement for forks)
 ## 🏗️ Implementation Details:
 The Clickzetta integration follows Dify's established patterns:
 - **Location**: All code properly contained within `api/core/rag/datasource/vdb/clickzetta/`
 - **Interface**: Full `BaseVector` interface implementation
 - **Factory Pattern**: Properly registered with `VectorFactory`
 - **Configuration**: Standard Dify config system integration
 - **Testing**: Comprehensive test suite included
 ## 🚀 Key Features:
 - HNSW vector indexing for high-performance similarity search
 - Concurrent write operations with queue mechanism for thread safety
 - Full-text search with Chinese text analysis support
 - Automatic index management
 - Complete backward compatibility
 The implementation is ready for production use with comprehensive testing showing 100% pass rates in our validation environment.
 ## 🐳 Preview Docker Images for Community Testing
 While the PR is under review, users can test the ClickZetta integration using multi-architecture Docker images:
 **Available Images:**
 - `czqiliang/dify-clickzetta-api:v1.6.0` (linux/amd64, linux/arm64) - Stable release
 - `czqiliang/dify-clickzetta-api:latest` (linux/amd64, linux/arm64) - Latest build
 - `czqiliang/dify-clickzetta-api:clickzetta-integration` (linux/amd64, linux/arm64) - Development
 - Web service uses official `langgenius/dify-web:1.6.0` (no ClickZetta changes needed)
 **Quick Start Guide:**
 ```bash
 # Download ready-to-use configuration
 curl -O https://raw.githubusercontent.com/yunqiqiliang/dify/feature/clickzetta-vector-db/clickzetta/docker-compose.clickzetta.yml
 curl -O https://raw.githubusercontent.com/yunqiqiliang/dify/feature/clickzetta-vector-db/clickzetta/.env.clickzetta.example
 # Configure and launch
 cp .env.clickzetta.example .env
 # Edit .env with your ClickZetta credentials
 mkdir -p volumes/app/storage volumes/db/data volumes/redis/data
 docker-compose -f docker-compose.clickzetta.yml up -d
 ```
 This allows the community to test and provide feedback before the official merge.
 Please let me know if you need any additional information or have concerns about the remaining CI checks!
 ```
 ---
 ## 备注
 这个回复强调了：
 1. **已修复的问题** - 所有lint和代码样式问题
 2. **CI进展** - 多个重要检查现在通过
 3. **架构合规** - 所有代码都在api/目录内
 4. **实现质量** - 遵循Dify模式，功能完整
 5. **继续跟进** - 正在解决剩余的API测试问题
 这样既展示了响应性和专业性，又为可能的剩余问题留出了空间。
--- a/clickzetta/MAINTAINER_UPDATE.md
+++ b/clickzetta/MAINTAINER_UPDATE.md
@ -1,65 +0,0 @@
 # 维护者更新 - CI检查修复完成
 ## 📊 CI检查状态更新
 感谢您的反馈！我已经修复了所有的lint错误和代码样式问题。
 ### ✅ 已通过的检查：
 - **Docker Compose Template** - 通过
 - **SuperLinter** - 通过  
 - **Python Style** - 通过
 - **Web Style** - 通过
 ### 🔄 正在运行的检查：
 - **API Tests** (Python 3.11 and 3.12)
 - **VDB Tests** (Python 3.11 and 3.12)
 ## 🔧 修复的问题
 ### 代码样式问题：
 - 移除了未使用的导入（`time`, `VectorType`）
 - 将 `logger.error` 替换为 `logger.exception` 用于异常处理
 - 移除了 `logging.exception` 调用中的冗余异常对象引用
 ### 架构合规性：
 - 确认所有Clickzetta相关代码都在 `api/` 目录内
 - 没有在 `api/` 目录外引入独立服务
 ## 📋 技术细节
 ### 代码位置：
 - 主实现：`api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py`
 - 工厂类：`api/core/rag/datasource/vdb/vector_factory.py`
 - 配置：`api/configs/middleware/vdb/clickzetta_config.py`
 - 测试：`api/tests/integration_tests/vdb/clickzetta/`
 ### 测试结果：
 - **VDB Tests**: 预期通过（之前一直通过）
 - **API Tests**: 正在运行中
 ## 📞 回复模板
 ```markdown
@crazywoola Thank you for the feedback! I've fixed all lint errors and code style issues.
 **Current CI Status:**
 - ✅ **Docker Compose Template** - Passing
 - ✅ **SuperLinter** - Passing  
 - ✅ **Python Style** - Passing
 - ✅ **Web Style** - Passing
 - 🔄 **API Tests** & **VDB Tests** - Currently running
 **Fixed Issues:**
 - Removed unused imports
 - Replaced logger.error with logger.exception for proper exception handling
 - Removed redundant exception objects from logging calls
 - Confirmed all code is within the `api/` directory as requested
 The implementation follows Dify's architecture patterns and maintains full backward compatibility. All code is properly contained within the `api/` directory without introducing standalone services outside of it.
 Please let me know if there are any other concerns or if you need additional information!
 ```
 ## 🎯 下一步
 等待API Tests和VDB Tests完成，然后向维护者报告最终结果。
--- a/clickzetta/PR_DESCRIPTION_HEADER.md
+++ b/clickzetta/PR_DESCRIPTION_HEADER.md
@ -1,25 +0,0 @@
 ## Related Issue
 Closes #22557
 ## Summary
 This PR adds Clickzetta Lakehouse as a vector database option in Dify, enabling customers to use Clickzetta as their unified data platform for both vector and structured data storage.
 ## Key Features
 - ✅ Full BaseVector interface implementation
 - ✅ HNSW vector indexing with automatic management
 - ✅ Concurrent write operations with queue mechanism
 - ✅ Chinese text analysis and full-text search
 - ✅ Comprehensive error handling and retry mechanisms
 ## Testing Status
 - 🧪 **Standalone Tests**: 3/3 passed (100%)
 - 🧪 **Integration Tests**: 8/8 passed (100%)
 - 🧪 **Performance**: Vector search ~170ms, Insert rate ~5.3 docs/sec
 - 🧪 **Real Environment**: Validated with actual Clickzetta Lakehouse instance
 ## Business Impact
 Real commercial customers are actively waiting for this Dify + Clickzetta integration solution for trial validation. This integration eliminates the need for separate vector database infrastructure while maintaining enterprise-grade performance and reliability.
 ---
 [保留原有的详细PR描述内容...]
--- a/clickzetta/PR_DESCRIPTION_UPDATE.md
+++ b/clickzetta/PR_DESCRIPTION_UPDATE.md
@ -1,20 +0,0 @@
 # Updated PR Description Header
 ## Related Issue
 This PR addresses the need for Clickzetta Lakehouse vector database integration in Dify. While no specific issue was opened beforehand, this feature is driven by:
 - **Direct customer demand**: Real commercial customers are actively waiting for Dify + Clickzetta integration solution for trial validation
 - **Business necessity**: Customers using Clickzetta Lakehouse need native Dify integration to avoid infrastructure duplication
 - **Technical requirement**: Unified data platform support for both vector and structured data
 ## Feature Overview
 Add Clickzetta Lakehouse as a vector database option in Dify, providing:
 - Full BaseVector interface implementation
 - HNSW vector indexing support
 - Concurrent write operations with queue mechanism
 - Chinese text analysis and full-text search
 - Enterprise-grade performance and reliability
 ---
 [Rest of existing PR description remains the same...]
--- a/clickzetta/PR_SUMMARY.md
+++ b/clickzetta/PR_SUMMARY.md
@ -1,296 +0,0 @@
 # Clickzetta Vector Database Integration - PR Preparation Summary
 ## 🎯 Integration Completion Status
 ### ✅ Completed Work
 #### 1. Core Functionality Implementation (100%)
 - **ClickzettaVector Class**: Complete implementation of BaseVector interface
 - **Configuration System**: ClickzettaConfig class with full configuration options support
 - **Connection Management**: Robust connection management with retry mechanisms and error handling
 - **Write Queue Mechanism**: Innovative design to address Clickzetta's concurrent write limitations
 - **Search Functions**: Dual support for vector search and full-text search
 #### 2. Architecture Integration (100%)
 - **Dify Framework Compatibility**: Full compliance with BaseVector interface specifications
 - **Factory Pattern Integration**: Properly registered with VectorFactory
 - **Configuration System Integration**: Environment variable configuration support
 - **Docker Environment Compatibility**: Works correctly in containerized environments
 #### 3. Code Quality (100%)
 - **Type Annotations**: Complete type hints
 - **Error Handling**: Robust exception handling and retry mechanisms
 - **Logging**: Detailed debugging and operational logs
 - **Documentation**: Clear code documentation
 #### 4. Dependency Management (100%)
 - **Version Compatibility**: Resolved urllib3 version conflicts
 - **Dependency Declaration**: Correctly added to pyproject.toml
 - **Docker Integration**: Properly installed and loaded in container environments
 ### ✅ Testing Status
 #### Technical Validation (100% Complete)
 - ✅ **Module Import**: Correctly loaded in Docker environment
 - ✅ **Class Structure**: All required methods exist and are correct
 - ✅ **Configuration System**: Parameter validation and defaults working normally
 - ✅ **Connection Mechanism**: API calls and error handling correct
 - ✅ **Error Handling**: Retry and exception propagation normal
 #### Functional Validation (100% Complete)
 - ✅ **Data Operations**: Real environment testing passed (table creation, data insertion, queries)
 - ✅ **Performance Testing**: Real environment validation complete (vector search 170ms, insertion 5.3 docs/sec)
 - ✅ **Concurrent Testing**: Real database connection testing complete (3-thread concurrent writes)
 ## 📋 PR Content Checklist
 ### New Files
 ```
 api/core/rag/datasource/vdb/clickzetta/
 ├── __init__.py
 └── clickzetta_vector.py
 ```
 ### Modified Files
 ```
 api/core/rag/datasource/vdb/vector_factory.py
 api/pyproject.toml
 docker/.env.example
 ```
 ### Testing and Documentation
 ```
 clickzetta/
 ├── test_clickzetta_integration.py
 ├── standalone_clickzetta_test.py
 ├── quick_test_clickzetta.py
 ├── docker_test.py
 ├── final_docker_test.py
 ├── TESTING_GUIDE.md
 ├── TEST_EVIDENCE.md
 ├── REAL_TEST_EVIDENCE.md
 └── PR_SUMMARY.md
 ```
 ## 🔧 Technical Features
 ### Core Functionality
 1. **Vector Storage**: Support for 1536-dimensional vector storage and retrieval
 2. **HNSW Indexing**: Automatic creation and management of HNSW vector indexes
 3. **Full-text Search**: Inverted index support for Chinese word segmentation and search
 4. **Batch Operations**: Optimized batch insertion and updates
 5. **Concurrent Safety**: Write queue mechanism to resolve concurrent conflicts
 ### Innovative Design
 1. **Write Queue Serialization**: Solves Clickzetta primary key table concurrent limitations
 2. **Smart Retry**: 6-retry mechanism handles temporary network issues
 3. **Configuration Flexibility**: Supports production and UAT environment switching
 4. **Error Recovery**: Robust exception handling and state recovery
 ### Performance Optimizations
 1. **Connection Pool Management**: Efficient database connection reuse
 2. **Batch Processing Optimization**: Configurable maximum batch size
 3. **Index Strategy**: Automatic index creation and management
 4. **Query Optimization**: Configurable vector distance functions
 ## 📊 Test Evidence
 ### Real Environment Test Validation
 ```
 🧪 Independent Connection Test: ✅ Passed (Successfully connected to Clickzetta UAT environment)
 🧪 Table Operations Test: ✅ Passed (Table creation, inserted 5 records, query validation)
 🧪 Vector Index Test: ✅ Passed (HNSW index creation successful)
 🧪 Vector Search Test: ✅ Passed (170ms search latency, returned 3 results)
 🧪 Concurrent Write Test: ✅ Passed (3-thread concurrent, 20 documents, 5.3 docs/sec)
 🧪 Overall Pass Rate: ✅ 100% (3/3 test groups passed)
 ```
 ### API Integration Validation
 ```
 ✅ Correct HTTPS endpoint calls
 ✅ Complete error response parsing
 ✅ Retry mechanism working normally
 ✅ Chinese error message handling correct
 ```
 ### Code Quality Validation
 ```
 ✅ No syntax errors
 ✅ Type annotations correct
 ✅ Import dependencies normal
 ✅ Configuration validation working
 ```
 ## 🚀 PR Submission Strategy
 ### 🏢 Business Necessity
 **Real commercial customers are waiting for the Dify + Clickzetta integration solution for trial validation**, making this PR business-critical with time-sensitive requirements.
 ### Recommended Approach: Production-Ready Submission
 #### Advantages
 1. **Technical Completeness**: Code architecture and integration fully correct
 2. **Quality Assurance**: Error handling and retry mechanisms robust
 3. **Good Compatibility**: Fully backward compatible, no breaking changes
 4. **Community Value**: Provides solution for users needing Clickzetta integration
 5. **Test Validation**: Real environment 100% test pass
 6. **Business Value**: Meets urgent customer needs
 #### PR Description Strategy
 1. **Highlight Completeness**: Emphasize technical implementation and testing completeness
 2. **Test Evidence**: Provide detailed real environment test results
 3. **Performance Data**: Include real performance benchmark test results
 4. **User Guidance**: Provide clear configuration and usage guidelines
 ### PR Title Suggestion
 ```
 feat: Add Clickzetta Lakehouse vector database integration
 ```
 ### PR Label Suggestions
 ```
 - enhancement
 - vector-database
 - production-ready
 - tested
 ```
 ## 📝 PR Description Template
 ````markdown
 ## Summary
 This PR adds support for Clickzetta Lakehouse as a vector database option in Dify, enabling users to leverage Clickzetta's high-performance vector storage and HNSW indexing capabilities for RAG applications.
 ## 🏢 Business Impact
 **Real commercial customers are waiting for the Dify + Clickzetta integration solution for trial validation**, making this PR business-critical with time-sensitive requirements.
 ## ✅ Status: Production Ready
 This integration is technically complete and has passed comprehensive testing in real Clickzetta environments with 100% test success rate.
 ## Features
 - **Vector Storage**: Complete integration with Clickzetta's vector database capabilities
 - **HNSW Indexing**: Automatic creation and management of HNSW indexes for efficient similarity search
 - **Full-text Search**: Support for inverted indexes and Chinese text search functionality  
 - **Concurrent Safety**: Write queue mechanism to handle Clickzetta's primary key table limitations
 - **Batch Operations**: Optimized batch insert/update operations for improved performance
 - **Standard Interface**: Full implementation of Dify's BaseVector interface
 ## Technical Implementation
 ### Core Components
 - `ClickzettaVector` class implementing BaseVector interface
 - Write queue serialization for concurrent write operations
 - Comprehensive error handling and connection management
 - Support for both vector similarity and keyword search
 ### Key Innovation: Write Queue Mechanism
 Clickzetta primary key tables support `parallelism=1` for writes. Our implementation includes a write queue that serializes all write operations while maintaining the existing API interface.
 ## Configuration
 ```bash
 VECTOR_STORE=clickzetta
 CLICKZETTA_USERNAME=your_username
 CLICKZETTA_PASSWORD=your_password  
 CLICKZETTA_INSTANCE=your_instance
 CLICKZETTA_SERVICE=uat-api.clickzetta.com
 CLICKZETTA_WORKSPACE=your_workspace
 CLICKZETTA_VCLUSTER=default_ap
 CLICKZETTA_SCHEMA=dify
 ```
 ## Testing Status
 ### ✅ Comprehensive Real Environment Testing Complete
 - **Connection Testing**: Successfully connected to Clickzetta UAT environment
 - **Data Operations**: Table creation, data insertion (5 records), and retrieval verified
 - **Vector Operations**: HNSW index creation and vector similarity search (170ms latency)
 - **Concurrent Safety**: Multi-threaded write operations with 3 concurrent threads
 - **Performance Benchmarks**: 5.3 docs/sec insertion rate, sub-200ms search latency
 - **Error Handling**: Retry mechanism and exception handling validated
 - **Overall Success Rate**: 100% (3/3 test suites passed)
 ## Test Evidence
 ```
 🚀 Clickzetta Independent Test Started
 ✅ Connection Successful
 🧪 Testing Table Operations...
 ✅ Table Created Successfully: test_vectors_1752736608
 ✅ Data Insertion Successful: 5 records, took 0.529 seconds
 ✅ Data Query Successful: 5 records in table
 🧪 Testing Vector Operations...
 ✅ Vector Index Created Successfully
 ✅ Vector Search Successful: returned 3 results, took 170ms
 🧪 Testing Concurrent Writes...
 ✅ Concurrent Write Test Complete:
  - Total time: 3.79 seconds
  - Successful threads: 3/3
  - Total documents: 20
  - Overall rate: 5.3 docs/sec
 📊 Test Report:
  - table_operations: ✅ Passed
  - vector_operations: ✅ Passed
  - concurrent_writes: ✅ Passed
 🎯 Overall Result: 3/3 Passed (100.0%)
 ```
 ## Dependencies
 - Added `clickzetta-connector-python>=0.8.102` to support latest urllib3 versions
 - Resolved dependency conflicts with existing Dify requirements
 ## Files Changed
 - `api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py` - Main implementation
 - `api/core/rag/datasource/vdb/vector_factory.py` - Factory registration
 - `api/pyproject.toml` - Added dependency
 - `docker/.env.example` - Added configuration examples
 ## Backward Compatibility
 This change is fully backward compatible. Existing vector database configurations remain unchanged, and Clickzetta is added as an additional option.
 ## Request for Community Testing
 We're seeking users with Clickzetta environments to help validate:
 1. Real-world performance characteristics
 2. Edge case handling
 3. Production workload testing
 4. Configuration optimization
 ## Next Steps
 1. Immediate PR submission for customer trial requirements
 2. Community adoption and feedback collection
 3. Performance optimization based on production usage
 4. Additional feature enhancements based on user requests
 ---
 **Technical Quality**: Production ready ✅
 **Testing Status**: Comprehensive real environment validation complete ✅
 **Business Impact**: Critical for waiting commercial customers ⚡
 **Community Impact**: Enables Clickzetta Lakehouse integration for Dify users
 ````
 ## 🎯 Conclusion
 The Clickzetta vector database integration has completed comprehensive validation and meets production-ready standards:
 1. **Architecture Correct**: Fully compliant with Dify specifications
 2. **Implementation Complete**: All required functions implemented and tested
 3. **Quality Good**: Error handling and edge cases considered
 4. **Integration Stable**: Real environment 100% test pass
 5. **Performance Validated**: Vector search 170ms, concurrent writes 5.3 docs/sec
 **Recommendation**: Submit as production-ready feature PR with complete test evidence and performance data, providing reliable vector database choice for Clickzetta users.
--- a/clickzetta/PR_UPDATE_ACTIONS.md
+++ b/clickzetta/PR_UPDATE_ACTIONS.md
@ -1,78 +0,0 @@
 # PR #22551 更新行动指南
 ## 第1步：更新PR描述
 在PR #22551 的描述最开头添加：
 ```markdown
 ## Related Issue
 Closes #22557
 ---
 [保留原有的PR描述内容...]
 ```
 ## 第2步：回复维护者
 在PR #22551 中回复 @crazywoola：
 ```markdown
@crazywoola Thank you for the feedback! I've created issue #22557 to document this feature request as requested.
 The issue provides comprehensive context including:
 - **Business justification** based on direct customer demand
 - **Technical specifications** and implementation details
 - **Testing evidence** with 100% pass rate across all test suites
 - **Performance benchmarks** validated in real Clickzetta environments
 ## Key Testing Results:
 - 🧪 Standalone Tests: 3/3 passed (100%)
 - 🧪 Integration Tests: 8/8 passed (100%)
 - 🧪 Performance: Vector search ~170ms, Insert rate ~5.3 docs/sec
 - 🧪 Real Environment: Validated with actual Clickzetta Lakehouse instance
 The implementation is complete, thoroughly tested, and ready for integration. It follows Dify's existing vector database patterns and maintains full backward compatibility.
 Please let me know if you need any additional information or modifications to move this forward.
 ```
 ## 第3步：准备后续跟进
 如果维护者需要更多信息，准备以下资源：
 ### 可能的问题和回答：
 **Q: 为什么选择Clickzetta？**
 A: 客户已经在使用Clickzetta作为统一数据平台，希望避免部署和维护额外的向量数据库基础设施。
 **Q: 性能如何？**
 A: 测试显示向量搜索平均170ms，插入速度5.3 docs/sec，支持HNSW索引优化。
 **Q: 维护成本？**
 A: 实现遵循Dify现有模式，维护成本最小化。包含完整的错误处理和重试机制。
 **Q: 向后兼容性？**
 A: 完全向后兼容，不影响现有配置。只有在显式配置VECTOR_STORE=clickzetta时才激活。
 ## 第4步：监控反馈
 定期检查以下内容：
 - PR评论和反馈
 - Issue讨论和标签变化
 - 是否有其他维护者参与讨论
 ## 第5步：准备演示（如果需要）
 如果维护者需要演示，准备以下材料：
 - 配置演示视频
 - 性能测试结果展示
 - 与现有向量数据库的对比
 ---
 **时间线预期：**
 - 立即：更新PR描述和回复维护者
 - 1-3天：等待维护者初步反馈
 - 1周内：完成技术讨论和可能的修改
 - 2周内：目标合并或明确后续步骤
--- a/clickzetta/README.clickzetta.md
+++ b/clickzetta/README.clickzetta.md
@ -1,188 +0,0 @@
 # Dify with ClickZetta Lakehouse Integration
 This is a pre-release version of Dify with ClickZetta Lakehouse vector database integration, available while the official PR is under review.
 ## 🚀 Quick Start
 ### Prerequisites
 - Docker and Docker Compose installed
 - ClickZetta Lakehouse account and credentials
 - At least 4GB RAM available for Docker
 ### 1. Download Configuration Files
 ```bash
 # Download the docker-compose file
 curl -O https://raw.githubusercontent.com/yunqiqiliang/dify/feature/clickzetta-vector-db/clickzetta/docker-compose.clickzetta.yml
 # Download environment template
 curl -O https://raw.githubusercontent.com/yunqiqiliang/dify/feature/clickzetta-vector-db/clickzetta/.env.clickzetta.example
 ```
 ### 2. Configure Environment
 ```bash
 # Copy environment template
 cp .env.clickzetta.example .env
 # Edit with your ClickZetta credentials
 nano .env
 ```
 **Required ClickZetta Settings:**
 ```bash
 CLICKZETTA_USERNAME=your_username
 CLICKZETTA_PASSWORD=your_password
 CLICKZETTA_INSTANCE=your_instance
 ```
 ### 3. Launch Dify
 ```bash
 # Create required directories
 mkdir -p volumes/app/storage volumes/db/data volumes/redis/data
 # Start all services
 docker-compose -f docker-compose.clickzetta.yml up -d
 # Check status
 docker-compose -f docker-compose.clickzetta.yml ps
 ```
 ### 4. Access Dify
 - Open http://localhost in your browser
 - Complete the setup wizard
 - In dataset settings, select "ClickZetta" as vector database
 ## 🎯 ClickZetta Features
 ### Supported Operations
 - ✅ **Vector Search** - Semantic similarity search using HNSW index
 - ✅ **Full-text Search** - Text search with Chinese/English analyzers
 - ✅ **Hybrid Search** - Combined vector + full-text search
 - ✅ **Metadata Filtering** - Filter by document attributes
 - ✅ **Batch Processing** - Efficient bulk document ingestion
 ### Performance Features  
 - **Auto-scaling** - Lakehouse architecture scales with your data
 - **Inverted Index** - Fast full-text search with configurable analyzers
 - **Parameterized Queries** - Secure and optimized SQL execution
 - **Batch Optimization** - Configurable batch sizes for optimal performance
 ### Configuration Options
 ```bash
 # Performance tuning
 CLICKZETTA_BATCH_SIZE=20                    # Documents per batch
 CLICKZETTA_VECTOR_DISTANCE_FUNCTION=cosine_distance  # or l2_distance
 # Full-text search
 CLICKZETTA_ENABLE_INVERTED_INDEX=true       # Enable text search
 CLICKZETTA_ANALYZER_TYPE=chinese            # chinese, english, unicode, keyword  
 CLICKZETTA_ANALYZER_MODE=smart              # smart, max_word
 # Database settings
 CLICKZETTA_SCHEMA=dify                      # Database schema name
 CLICKZETTA_WORKSPACE=quick_start            # ClickZetta workspace
 CLICKZETTA_VCLUSTER=default_ap              # Virtual cluster name
 ```
 ## 🔧 Troubleshooting
 ### Common Issues
 **Connection Failed:**
 ```bash
 # Check ClickZetta credentials
 docker-compose -f docker-compose.clickzetta.yml logs api | grep clickzetta
 # Verify network connectivity
 docker-compose -f docker-compose.clickzetta.yml exec api ping api.clickzetta.com
 ```
 **Performance Issues:**
 ```bash
 # Adjust batch size for your instance
 CLICKZETTA_BATCH_SIZE=10  # Reduce for smaller instances
 CLICKZETTA_BATCH_SIZE=50  # Increase for larger instances
 ```
 **Search Not Working:**
 ```bash
 # Check index creation
 docker-compose -f docker-compose.clickzetta.yml logs api | grep "Created.*index"
 # Verify table structure
 docker-compose -f docker-compose.clickzetta.yml logs api | grep "Created table"
 ```
 ### Get Logs
 ```bash
 # All services
 docker-compose -f docker-compose.clickzetta.yml logs
 # Specific service
 docker-compose -f docker-compose.clickzetta.yml logs api
 docker-compose -f docker-compose.clickzetta.yml logs worker
 ```
 ### Clean Installation
 ```bash
 # Stop and remove containers
 docker-compose -f docker-compose.clickzetta.yml down -v
 # Remove data (WARNING: This deletes all data)
 sudo rm -rf volumes/
 # Start fresh
 mkdir -p volumes/app/storage volumes/db/data volumes/redis/data
 docker-compose -f docker-compose.clickzetta.yml up -d
 ```
 ## 📚 Documentation
 - [ClickZetta Lakehouse](https://docs.clickzetta.com/) - Official ClickZetta documentation
 - [Dify Documentation](https://docs.dify.ai/) - Official Dify documentation  
 - [Integration Guide](./INSTALLATION_GUIDE.md) - Detailed setup instructions
 ## 🐛 Issues & Support
 This is a preview version. If you encounter issues:
 1. Check the troubleshooting section above
 2. Review logs for error messages
 3. Open an issue on the [GitHub repository](https://github.com/yunqiqiliang/dify/issues)
 ## 🔄 Updates
 **Available Image Tags:**
 - `v1.6.0` - Stable release (recommended)
 - `latest` - Latest build 
 - `clickzetta-integration` - Development version
 To update to the latest version:
 ```bash
 # Pull latest images
 docker-compose -f docker-compose.clickzetta.yml pull
 # Restart services  
 docker-compose -f docker-compose.clickzetta.yml up -d
 ```
 To use a specific version, edit `docker-compose.clickzetta.yml`:
 ```yaml
 services:
  api:
    image: czqiliang/dify-clickzetta-api:v1.6.0  # or latest
  worker:
    image: czqiliang/dify-clickzetta-api:v1.6.0  # or latest  
  web:
    image: langgenius/dify-web:1.6.0  # official Dify web image
 ```
 ## ⚠️ Production Use
 This is a preview build for testing purposes. For production deployment:
 - Wait for the official PR to be merged
 - Use official Dify releases
 - Follow Dify's production deployment guidelines
 ---
 **Built with ❤️ for the Dify community**
--- a/clickzetta/README.md
+++ b/clickzetta/README.md
@ -1,75 +0,0 @@
 # Clickzetta Vector Database Integration for Dify
 This directory contains the implementation and testing materials for integrating Clickzetta Lakehouse as a vector database option in Dify.
 ## Files Overview
 ### Core Implementation
 - **Location**: `api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py`
 - **Factory Registration**: `api/core/rag/datasource/vdb/vector_factory.py`
 - **Dependencies**: Added to `api/pyproject.toml`
 ### Testing and Documentation
 - `standalone_clickzetta_test.py` - Independent Clickzetta connector tests (no Dify dependencies)
 - `test_clickzetta_integration.py` - Comprehensive integration test suite with Dify framework
 - `TESTING_GUIDE.md` - Testing instructions and methodology
 - `PR_SUMMARY.md` - Complete PR preparation summary
 - `DIFY_CLICKZETTA_VECTOR_DB_GUIDE.md` - **NEW**: Complete user guide for configuring Clickzetta in Dify
 ## Quick Start
 ### 1. Configuration
 Add to your `.env` file:
 ```bash
 VECTOR_STORE=clickzetta
 CLICKZETTA_USERNAME=your_username
 CLICKZETTA_PASSWORD=your_password
 CLICKZETTA_INSTANCE=your_instance
 CLICKZETTA_SERVICE=api.clickzetta.com
 CLICKZETTA_WORKSPACE=your_workspace
 CLICKZETTA_VCLUSTER=default_ap
 CLICKZETTA_SCHEMA=dify
 ```
 ### 2. Testing
 ```bash
 # Run standalone tests (recommended first)
 python standalone_clickzetta_test.py
 # Run full integration tests
 python test_clickzetta_integration.py
 # See detailed testing guide
 cat TESTING_GUIDE.md
 ```
 ### 3. User Guide
 For detailed configuration and usage instructions, see `DIFY_CLICKZETTA_VECTOR_DB_GUIDE.md`.
 ### 4. PR Status
 See `PR_SUMMARY.md` for complete PR preparation status and submission strategy.
 ## Technical Highlights
 - ✅ **Full BaseVector Interface**: Complete implementation of Dify's vector database interface
 - ✅ **Write Queue Mechanism**: Innovative solution for Clickzetta's concurrent write limitations
 - ✅ **HNSW Vector Indexing**: Automatic creation and management of high-performance vector indexes
 - ✅ **Full-text Search**: Inverted index support with Chinese text analysis
 - ✅ **Error Recovery**: Robust error handling with retry mechanisms
 - ✅ **Docker Ready**: Full compatibility with Dify's containerized environment
 ## Architecture
 The integration follows Dify's standard vector database pattern:
 1. `ClickzettaVector` class implements `BaseVector` interface
 2. `ClickzettaVectorFactory` handles instance creation
 3. Configuration through Dify's standard config system
 4. Write operations serialized through queue mechanism for thread safety
 ## Status
 **Technical Implementation**: ✅ Complete  
 **Testing Status**: ✅ Comprehensive real environment validation complete (100% pass rate)  
 **PR Readiness**: ✅ Ready for submission as production-ready feature
 The integration is technically complete, fully tested in real Clickzetta environments, and ready for production use.
--- a/clickzetta/TESTING_GUIDE.md
+++ b/clickzetta/TESTING_GUIDE.md
@ -1,221 +0,0 @@
 # Clickzetta Vector Database Testing Guide
 ## Testing Overview
 This document provides detailed testing guidelines for the Clickzetta vector database integration, including test cases, execution steps, and expected results.
 ## Test Environment Setup
 ### 1. Environment Variable Configuration
 Ensure the following environment variables are set:
 ```bash
 export CLICKZETTA_USERNAME=your_username
 export CLICKZETTA_PASSWORD=your_password
 export CLICKZETTA_INSTANCE=your_instance
 export CLICKZETTA_SERVICE=uat-api.clickzetta.com
 export CLICKZETTA_WORKSPACE=your_workspace
 export CLICKZETTA_VCLUSTER=default_ap
 export CLICKZETTA_SCHEMA=dify
 ```
 ### 2. Dependency Installation
 ```bash
 pip install clickzetta-connector-python>=0.8.102
 pip install numpy
 ```
 ## Test Suite
 ### 1. Standalone Testing (standalone_clickzetta_test.py)
 **Purpose**: Verify Clickzetta basic connection and core functionality
 **Test Cases**:
 - ✅ Database connection test
 - ✅ Table creation and data insertion
 - ✅ Vector index creation
 - ✅ Vector similarity search
 - ✅ Concurrent write safety
 **Execution Command**:
 ```bash
 python standalone_clickzetta_test.py
 ```
 **Expected Results**:
 ```
 🚀 Clickzetta Independent Test Started
 ✅ Connection Successful
 🧪 Testing Table Operations...
 ✅ Table Created Successfully: test_vectors_1752736608
 ✅ Data Insertion Successful: 5 records, took 0.529 seconds
 ✅ Data Query Successful: 5 records in table
 🧪 Testing Vector Operations...
 ✅ Vector Index Created Successfully
 ✅ Vector Search Successful: returned 3 results, took 170ms
   Result 1: distance=0.2507, document=doc_3
   Result 2: distance=0.2550, document=doc_4
   Result 3: distance=0.2604, document=doc_2
 🧪 Testing Concurrent Writes...
 Started 3 concurrent worker threads...
 ✅ Concurrent Write Test Complete:
  - Total time: 3.79 seconds
  - Successful threads: 3/3
  - Total documents: 20
  - Overall rate: 5.3 docs/sec
  - Thread 1: 8 documents, 2.5 docs/sec
  - Thread 2: 6 documents, 1.7 docs/sec
  - Thread 0: 6 documents, 1.7 docs/sec
 📊 Test Report:
  - table_operations: ✅ Passed
  - vector_operations: ✅ Passed
  - concurrent_writes: ✅ Passed
 🎯 Overall Result: 3/3 Passed (100.0%)
 🎉 Test overall success! Clickzetta integration ready.
 ✅ Cleanup Complete
 ```
 ### 2. Integration Testing (test_clickzetta_integration.py)
 **Purpose**: Comprehensive testing of functionality in Dify integration environment
 **Test Cases**:
 - ✅ Basic operations testing (CRUD)
 - ✅ Concurrent operation safety
 - ✅ Performance benchmarking
 - ✅ Error handling testing
 - ✅ Full-text search testing
 **Execution Command** (requires Dify API environment):
 ```bash
 cd /path/to/dify/api
 python ../test_clickzetta_integration.py
 ```
 ### 3. Docker Environment Testing
 **Execution Steps**:
 1. Build local image:
 ```bash
 docker build -f api/Dockerfile -t dify-api-clickzetta:local api/
 ```
 2. Update docker-compose.yaml to use local image:
 ```yaml
 api:
  image: dify-api-clickzetta:local
 worker:
  image: dify-api-clickzetta:local
 ```
 3. Start services and test:
 ```bash
 docker-compose up -d
 # Create knowledge base in Web UI and select Clickzetta as vector database
 ```
 ## Performance Benchmarks
 ### Single-threaded Performance
 | Operation Type | Document Count | Average Time | Throughput |
 |---------------|----------------|--------------|------------|
 | Batch Insert | 10 | 0.5s | 20 docs/sec |
 | Batch Insert | 50 | 2.1s | 24 docs/sec |
 | Batch Insert | 100 | 4.3s | 23 docs/sec |
 | Vector Search | - | 170ms | - |
 | Text Search | - | 38ms | - |
 ### Concurrent Performance
 | Thread Count | Docs per Thread | Total Time | Success Rate | Overall Throughput |
 |-------------|----------------|------------|-------------|------------------|
 | 2 | 15 | 1.8s | 100% | 16.7 docs/sec |
 | 3 | 15 | 3.79s | 100% | 5.3 docs/sec |
 | 4 | 15 | 1.5s | 75% | 40.0 docs/sec |
 ## Test Evidence Collection
 ### 1. Functional Validation Evidence
 - [x] Successfully created vector tables and indexes
 - [x] Correctly handles 1536-dimensional vector data
 - [x] HNSW index automatically created and used
 - [x] Inverted index supports full-text search
 - [x] Batch operation performance optimization
 ### 2. Concurrent Safety Evidence
 - [x] Write queue mechanism prevents concurrent conflicts
 - [x] Thread-safe connection management
 - [x] No data races during concurrent writes
 - [x] Error recovery and retry mechanism
 ### 3. Performance Testing Evidence
 - [x] Insertion performance: 5.3-24 docs/sec
 - [x] Search latency: <200ms
 - [x] Concurrent processing: supports multi-threaded writes
 - [x] Memory usage: reasonable resource consumption
 ### 4. Compatibility Evidence
 - [x] Complies with Dify BaseVector interface
 - [x] Coexists with existing vector databases
 - [x] Runs normally in Docker environment
 - [x] Dependency version compatibility
 ## Troubleshooting
 ### Common Issues
 1. **Connection Failure**
   - Check environment variable settings
   - Verify network connection to Clickzetta service
   - Confirm user permissions and instance status
 2. **Concurrent Conflicts**
   - Ensure write queue mechanism is working properly
   - Check if old connections are not properly closed
   - Verify thread pool configuration
 3. **Performance Issues**
   - Check if vector indexes are created correctly
   - Verify batch operation batch size
   - Monitor network latency and database load
 ### Debug Commands
 ```bash
 # Check Clickzetta connection
 python -c "from clickzetta.connector import connect; print('Connection OK')"
 # Verify environment variables
 env | grep CLICKZETTA
 # Test basic functionality
 python standalone_clickzetta_test.py
 ```
 ## Test Conclusion
 The Clickzetta vector database integration has passed the following validations:
 1. **Functional Completeness**: All BaseVector interface methods correctly implemented
 2. **Concurrent Safety**: Write queue mechanism ensures concurrent write safety
 3. **Performance**: Meets production environment performance requirements
 4. **Stability**: Error handling and recovery mechanisms are robust
 5. **Compatibility**: Fully compatible with Dify framework
 Test Pass Rate: **100%** (Standalone Testing) / **95%+** (Full Dify environment integration testing)
 Suitable for PR submission to langgenius/dify main repository.
--- a/clickzetta/build-and-push-multiarch.sh
+++ b/clickzetta/build-and-push-multiarch.sh
@ -1,116 +0,0 @@
 #!/bin/bash
 # Build and push multi-architecture Docker images for ClickZetta Dify integration
 # This provides temporary access to users before the PR is merged
 set -e
 # Configuration
 DOCKER_HUB_USERNAME="czqiliang"
 IMAGE_NAME="dify-clickzetta"
 TAG="latest"
 VERSION_TAG="v1.6.0"
 PLATFORMS="linux/amd64,linux/arm64"
 # Colors for output
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
 BLUE='\033[0;34m'
 NC='\033[0m' # No Color
 echo -e "${BLUE}=== ClickZetta Dify Multi-Architecture Build Script ===${NC}"
 echo -e "${YELLOW}Building and pushing images for: ${PLATFORMS}${NC}"
 echo -e "${YELLOW}Target repository: ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}:${TAG}${NC}"
 echo
 # Check if Docker is running
 if ! docker info >/dev/null 2>&1; then
    echo -e "${RED}Error: Docker is not running. Please start Docker first.${NC}"
    exit 1
 fi
 # Check if buildx is available
 if ! docker buildx version >/dev/null 2>&1; then
    echo -e "${RED}Error: Docker buildx is not available. Please ensure Docker Desktop is updated.${NC}"
    exit 1
 fi
 # Login to Docker Hub
 echo -e "${BLUE}Step 1: Docker Hub Login${NC}"
 if ! docker login; then
    echo -e "${RED}Error: Failed to login to Docker Hub${NC}"
    exit 1
 fi
 echo -e "${GREEN}✓ Successfully logged in to Docker Hub${NC}"
 echo
 # Create and use buildx builder
 echo -e "${BLUE}Step 2: Setting up buildx builder${NC}"
 BUILDER_NAME="dify-clickzetta-builder"
 # Remove existing builder if it exists
 docker buildx rm $BUILDER_NAME 2>/dev/null || true
 # Create new builder
 docker buildx create --name $BUILDER_NAME --platform $PLATFORMS --use
 docker buildx inspect --bootstrap
 echo -e "${GREEN}✓ Buildx builder configured for platforms: ${PLATFORMS}${NC}"
 echo
 # Build and push API image
 echo -e "${BLUE}Step 3: Building and pushing API image${NC}"
 cd ../docker
 docker buildx build \
    --platform $PLATFORMS \
    --file api.Dockerfile \
    --tag ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:${TAG} \
    --tag ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:${VERSION_TAG} \
    --tag ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:clickzetta-integration \
    --push \
    ..
 echo -e "${GREEN}✓ API image built and pushed successfully${NC}"
 echo
 # Web service uses official Dify image (no ClickZetta-specific changes needed)
 echo -e "${BLUE}Step 4: Web service uses official langgenius/dify-web image${NC}"
 echo -e "${GREEN}✓ Web service configuration completed${NC}"
 echo
 # User files are already created in clickzetta/ directory
 echo -e "${BLUE}Step 5: User files already prepared in clickzetta/ directory${NC}"
 cd ../clickzetta
 echo -e "${GREEN}✓ User files available in clickzetta/ directory${NC}"
 echo
 # Cleanup buildx builder
 echo -e "${BLUE}Step 6: Cleaning up builder${NC}"
 docker buildx rm $BUILDER_NAME
 echo -e "${GREEN}✓ Builder cleaned up${NC}"
 echo
 # Display final information
 echo -e "${GREEN}=== Build Complete! ===${NC}"
 echo -e "${YELLOW}ClickZetta API images pushed to Docker Hub:${NC}"
 echo -e "  • ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:${TAG}"
 echo -e "  • ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:${VERSION_TAG}"
 echo -e "  • ${DOCKER_HUB_USERNAME}/${IMAGE_NAME}-api:clickzetta-integration"
 echo
 echo -e "${YELLOW}Web service uses official Dify image:${NC}"
 echo -e "  • langgenius/dify-web:1.6.0 (no ClickZetta changes needed)"
 echo
 echo -e "${YELLOW}User files created:${NC}"
 echo -e "  • docker-compose.clickzetta.yml - Ready-to-use compose file"
 echo -e "  • .env.clickzetta.example - Environment template"
 echo -e "  • README.clickzetta.md - User documentation"
 echo
 echo -e "${BLUE}Next steps:${NC}"
 echo -e "1. Test the images locally"
 echo -e "2. Update README with Docker Hub links"
 echo -e "3. Share with community for testing"
 echo -e "4. Monitor for feedback and issues"
 echo
 echo -e "${GREEN}🎉 Multi-architecture images are now available for the community!${NC}"
--- a/clickzetta/docker-compose.clickzetta.yml
+++ b/clickzetta/docker-compose.clickzetta.yml
@ -1,185 +0,0 @@
 version: '3.8'
 services:
  # API service with ClickZetta integration
  api:
    image: czqiliang/dify-clickzetta-api:v1.6.0
    restart: always
    environment:
      # Core settings
      - MODE=api
      - LOG_LEVEL=INFO
      - SECRET_KEY=${SECRET_KEY:-dify}
      - CONSOLE_WEB_URL=${CONSOLE_WEB_URL:-}
      - INIT_PASSWORD=${INIT_PASSWORD:-}
      - CONSOLE_API_URL=${CONSOLE_API_URL:-}
      - SERVICE_API_URL=${SERVICE_API_URL:-}
      # Database settings  
      - DB_USERNAME=${DB_USERNAME:-postgres}
      - DB_PASSWORD=${DB_PASSWORD:-difyai123456}
      - DB_HOST=${DB_HOST:-db}
      - DB_PORT=${DB_PORT:-5432}
      - DB_DATABASE=${DB_DATABASE:-dify}
      # Redis settings
      - REDIS_HOST=${REDIS_HOST:-redis}
      - REDIS_PORT=${REDIS_PORT:-6379}
      - REDIS_PASSWORD=${REDIS_PASSWORD:-difyai123456}
      - REDIS_DB=${REDIS_DB:-0}
      # Celery settings
      - CELERY_BROKER_URL=${CELERY_BROKER_URL:-redis://:difyai123456@redis:6379/1}
      - BROKER_USE_SSL=${BROKER_USE_SSL:-false}
      # Storage settings
      - STORAGE_TYPE=${STORAGE_TYPE:-local}
      - STORAGE_LOCAL_PATH=${STORAGE_LOCAL_PATH:-storage}
      # Vector store settings - ClickZetta configuration
      - VECTOR_STORE=${VECTOR_STORE:-clickzetta}
      - CLICKZETTA_USERNAME=${CLICKZETTA_USERNAME}
      - CLICKZETTA_PASSWORD=${CLICKZETTA_PASSWORD}
      - CLICKZETTA_INSTANCE=${CLICKZETTA_INSTANCE}
      - CLICKZETTA_SERVICE=${CLICKZETTA_SERVICE:-api.clickzetta.com}
      - CLICKZETTA_WORKSPACE=${CLICKZETTA_WORKSPACE:-quick_start}
      - CLICKZETTA_VCLUSTER=${CLICKZETTA_VCLUSTER:-default_ap}
      - CLICKZETTA_SCHEMA=${CLICKZETTA_SCHEMA:-dify}
      - CLICKZETTA_BATCH_SIZE=${CLICKZETTA_BATCH_SIZE:-20}
      - CLICKZETTA_ENABLE_INVERTED_INDEX=${CLICKZETTA_ENABLE_INVERTED_INDEX:-true}
      - CLICKZETTA_ANALYZER_TYPE=${CLICKZETTA_ANALYZER_TYPE:-chinese}
      - CLICKZETTA_ANALYZER_MODE=${CLICKZETTA_ANALYZER_MODE:-smart}
      - CLICKZETTA_VECTOR_DISTANCE_FUNCTION=${CLICKZETTA_VECTOR_DISTANCE_FUNCTION:-cosine_distance}
    depends_on:
      - db
      - redis
    volumes:
      - ./volumes/app/storage:/app/api/storage
    networks:
      - dify
  # Worker service
  worker:
    image: czqiliang/dify-clickzetta-api:v1.6.0
    restart: always
    environment:
      - MODE=worker
      - LOG_LEVEL=INFO
      - SECRET_KEY=${SECRET_KEY:-dify}
      # Database settings
      - DB_USERNAME=${DB_USERNAME:-postgres}
      - DB_PASSWORD=${DB_PASSWORD:-difyai123456}
      - DB_HOST=${DB_HOST:-db}
      - DB_PORT=${DB_PORT:-5432}
      - DB_DATABASE=${DB_DATABASE:-dify}
      # Redis settings
      - REDIS_HOST=${REDIS_HOST:-redis}
      - REDIS_PORT=${REDIS_PORT:-6379}
      - REDIS_PASSWORD=${REDIS_PASSWORD:-difyai123456}
      - REDIS_DB=${REDIS_DB:-0}
      # Celery settings
      - CELERY_BROKER_URL=${CELERY_BROKER_URL:-redis://:difyai123456@redis:6379/1}
      - BROKER_USE_SSL=${BROKER_USE_SSL:-false}
      # Vector store settings - ClickZetta configuration
      - VECTOR_STORE=${VECTOR_STORE:-clickzetta}
      - CLICKZETTA_USERNAME=${CLICKZETTA_USERNAME}
      - CLICKZETTA_PASSWORD=${CLICKZETTA_PASSWORD}
      - CLICKZETTA_INSTANCE=${CLICKZETTA_INSTANCE}
      - CLICKZETTA_SERVICE=${CLICKZETTA_SERVICE:-api.clickzetta.com}
      - CLICKZETTA_WORKSPACE=${CLICKZETTA_WORKSPACE:-quick_start}
      - CLICKZETTA_VCLUSTER=${CLICKZETTA_VCLUSTER:-default_ap}
      - CLICKZETTA_SCHEMA=${CLICKZETTA_SCHEMA:-dify}
      - CLICKZETTA_BATCH_SIZE=${CLICKZETTA_BATCH_SIZE:-20}
      - CLICKZETTA_ENABLE_INVERTED_INDEX=${CLICKZETTA_ENABLE_INVERTED_INDEX:-true}
      - CLICKZETTA_ANALYZER_TYPE=${CLICKZETTA_ANALYZER_TYPE:-chinese}
      - CLICKZETTA_ANALYZER_MODE=${CLICKZETTA_ANALYZER_MODE:-smart}
      - CLICKZETTA_VECTOR_DISTANCE_FUNCTION=${CLICKZETTA_VECTOR_DISTANCE_FUNCTION:-cosine_distance}
    depends_on:
      - db
      - redis
    volumes:
      - ./volumes/app/storage:/app/api/storage
    networks:
      - dify
  # Web service
  web:
    image: langgenius/dify-web:1.6.0
    restart: always
    environment:
      - CONSOLE_API_URL=${CONSOLE_API_URL:-}
      - APP_API_URL=${APP_API_URL:-}
    depends_on:
      - api
    networks:
      - dify
  # Database
  db:
    image: postgres:15-alpine
    restart: always
    environment:
      - PGUSER=${PGUSER:-postgres}
      - POSTGRES_PASSWORD=${DB_PASSWORD:-difyai123456}
      - POSTGRES_DB=${DB_DATABASE:-dify}
    command: >
      postgres -c max_connections=100
               -c shared_preload_libraries=pg_stat_statements
               -c pg_stat_statements.max=10000
               -c pg_stat_statements.track=all
    volumes:
      - ./volumes/db/data:/var/lib/postgresql/data
    networks:
      - dify
    healthcheck:
      test: ["CMD", "pg_isready"]
      interval: 1s
      timeout: 3s
      retries: 30
  # Redis
  redis:
    image: redis:6-alpine
    restart: always
    command: redis-server --requirepass ${REDIS_PASSWORD:-difyai123456}
    volumes:
      - ./volumes/redis/data:/data
    networks:
      - dify
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 1s
      timeout: 3s
      retries: 30
  # Nginx reverse proxy
  nginx:
    image: nginx:latest
    restart: always
    volumes:
      - ./docker/nginx/nginx.conf.template:/etc/nginx/nginx.conf.template
      - ./docker/nginx/proxy.conf.template:/etc/nginx/proxy.conf.template
      - ./docker/nginx/conf.d:/etc/nginx/conf.d
    environment:
      - NGINX_SERVER_NAME=${NGINX_SERVER_NAME:-_}
      - NGINX_HTTPS_ENABLED=${NGINX_HTTPS_ENABLED:-false}
      - NGINX_SSL_PORT=${NGINX_SSL_PORT:-443}
      - NGINX_PORT=${NGINX_PORT:-80}
    entrypoint: ["/bin/sh", "-c", "envsubst < /etc/nginx/nginx.conf.template > /etc/nginx/nginx.conf && nginx -g 'daemon off;'"]
    depends_on:
      - api
      - web
    ports:
      - "${EXPOSE_NGINX_PORT:-80}:${NGINX_PORT:-80}"
    networks:
      - dify
 networks:
  dify:
    driver: bridge
--- a/clickzetta/standalone_clickzetta_test.py
+++ b/clickzetta/standalone_clickzetta_test.py
@ -1,402 +0,0 @@
 #!/usr/bin/env python3
 """
 Clickzetta 独立测试脚本
 此脚本独立测试 Clickzetta 连接器的基础功能，不依赖 Dify 框架。
 用于验证 Clickzetta 集成的核心功能是否正常工作。
 运行要求:
 - 设置正确的环境变量
 - 安装 clickzetta-connector-python
 - 确保能访问 Clickzetta 服务
 作者: Claude Code Assistant
 日期: 2025-07-17
 """
 import json
 import logging
 import os
 import random
 import string
 import threading
 import time
 import uuid
 from typing import List, Dict, Any
 try:
    import clickzetta
 except ImportError:
    print("❌ 错误: 请安装 clickzetta-connector-python")
    print("   pip install clickzetta-connector-python>=0.8.102")
    exit(1)
 try:
    import numpy as np
 except ImportError:
    print("❌ 错误: 请安装 numpy")
    print("   pip install numpy")
    exit(1)
 # 配置日志
 logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
 logger = logging.getLogger(__name__)
 class ClickzettaStandaloneTest:
    """Clickzetta 独立测试类"""
    def __init__(self):
        """初始化测试环境"""
        self.connection = None
        self.test_table = f"test_vectors_{int(time.time())}"
        self.test_schema = os.getenv("CLICKZETTA_SCHEMA", "dify")
        self.results = {}
        # 从环境变量获取配置
        self.config = {
            "username": os.getenv("CLICKZETTA_USERNAME"),
            "password": os.getenv("CLICKZETTA_PASSWORD"),
            "instance": os.getenv("CLICKZETTA_INSTANCE"),
            "service": os.getenv("CLICKZETTA_SERVICE", "api.clickzetta.com"),
            "workspace": os.getenv("CLICKZETTA_WORKSPACE", "quick_start"),
            "vcluster": os.getenv("CLICKZETTA_VCLUSTER", "default_ap"),
            "schema": self.test_schema
        }
        # 验证必需的配置
        required_keys = ["username", "password", "instance", "service", "workspace", "vcluster"]
        missing_keys = [key for key in required_keys if not self.config.get(key)]
        if missing_keys:
            raise ValueError(f"缺少必需的环境变量: {missing_keys}")
    def connect(self) -> bool:
        """测试数据库连接"""
        try:
            print("🔌 正在连接 Clickzetta...")
            self.connection = clickzetta.connect(
                username=self.config["username"],
                password=self.config["password"],
                instance=self.config["instance"],
                service=self.config["service"],
                workspace=self.config["workspace"],
                vcluster=self.config["vcluster"],
                schema=self.config["schema"]
            )
            print("✅ 连接成功")
            return True
        except Exception as e:
            print(f"❌ 连接失败: {e}")
            return False
    def test_table_operations(self) -> bool:
        """测试表操作"""
        print("\n🧪 测试表操作...")
        try:
            with self.connection.cursor() as cursor:
                # 创建测试表
                create_sql = f"""
                CREATE TABLE IF NOT EXISTS {self.test_schema}.{self.test_table} (
                    id STRING NOT NULL,
                    content STRING NOT NULL,
                    metadata JSON,
                    embedding VECTOR(FLOAT, 1536) NOT NULL,
                    PRIMARY KEY (id)
                )
                """
                cursor.execute(create_sql)
                print(f"✅ 表创建成功: {self.test_table}")
                # 准备测试数据
                test_data = []
                for i in range(5):
                    doc_id = str(uuid.uuid4())
                    content = f"测试文档 {i+1}: 这是一个用于测试向量搜索的示例文档。"
                    metadata = {
                        "doc_id": doc_id,
                        "document_id": f"doc_{i+1}",
                        "source": "test",
                        "created_at": time.time()
                    }
                    # 生成随机向量
                    embedding = np.random.random(1536).tolist()
                    test_data.append((doc_id, content, json.dumps(metadata), embedding))
                # 批量插入数据
                start_time = time.time()
                values = []
                for doc_id, content, metadata_json, embedding in test_data:
                    embedding_str = f"VECTOR({','.join(map(str, embedding))})"
                    escaped_content = content.replace("'", "''")
                    values.append(f"('{doc_id}', '{escaped_content}', "
                                f"JSON '{metadata_json}', {embedding_str})")
                insert_sql = f"""
                INSERT INTO {self.test_schema}.{self.test_table}
                (id, content, metadata, embedding)
                VALUES {','.join(values)}
                """
                cursor.execute(insert_sql)
                insert_time = time.time() - start_time
                print(f"✅ 数据插入成功: {len(test_data)} 条记录，耗时 {insert_time:.3f}秒")
                # 验证数据
                cursor.execute(f"SELECT COUNT(*) FROM {self.test_schema}.{self.test_table}")
                count = cursor.fetchone()[0]
                print(f"✅ 数据查询成功: 表中共有 {count} 条记录")
                self.results["table_operations"] = True
                return True
        except Exception as e:
            print(f"❌ 表操作测试失败: {e}")
            self.results["table_operations"] = False
            return False
    def test_vector_operations(self) -> bool:
        """测试向量操作"""
        print("\n🧪 测试向量操作...")
        try:
            with self.connection.cursor() as cursor:
                # 创建向量索引
                index_name = f"idx_{self.test_table}_vector"
                index_sql = f"""
                CREATE VECTOR INDEX IF NOT EXISTS {index_name}
                ON TABLE {self.test_schema}.{self.test_table}(embedding)
                PROPERTIES (
                    "distance.function" = "cosine_distance",
                    "scalar.type" = "f32",
                    "m" = "16",
                    "ef.construction" = "128"
                )
                """
                cursor.execute(index_sql)
                print("✅ 向量索引创建成功")
                # 测试向量搜索
                query_vector = np.random.random(1536).tolist()
                search_sql = f"""
                SELECT id, content, metadata,
                       COSINE_DISTANCE(embedding, VECTOR({','.join(map(str, query_vector))})) AS distance
                FROM {self.test_schema}.{self.test_table}
                ORDER BY distance
                LIMIT 3
                """
                start_time = time.time()
                cursor.execute(search_sql)
                results = cursor.fetchall()
                search_time = time.time() - start_time
                print(f"✅ 向量搜索成功: 返回 {len(results)} 个结果，耗时 {search_time*1000:.0f}ms")
                # 验证结果
                for i, row in enumerate(results):
                    metadata = json.loads(row[2]) if row[2] else {}
                    distance = row[3]
                    print(f"   结果 {i+1}: 距离={distance:.4f}, 文档={metadata.get('document_id', 'unknown')}")
                self.results["vector_operations"] = True
                return True
        except Exception as e:
            print(f"❌ 向量操作测试失败: {e}")
            self.results["vector_operations"] = False
            return False
    def test_concurrent_writes(self) -> bool:
        """测试并发写入"""
        print("\n🧪 测试并发写入...")
        def worker_thread(thread_id: int, doc_count: int) -> Dict[str, Any]:
            """工作线程函数"""
            try:
                # 每个线程使用独立连接
                worker_connection = clickzetta.connect(
                    username=self.config["username"],
                    password=self.config["password"],
                    instance=self.config["instance"],
                    service=self.config["service"],
                    workspace=self.config["workspace"],
                    vcluster=self.config["vcluster"],
                    schema=self.config["schema"]
                )
                start_time = time.time()
                successful_inserts = 0
                with worker_connection.cursor() as cursor:
                    for i in range(doc_count):
                        try:
                            doc_id = f"thread_{thread_id}_doc_{i}_{uuid.uuid4()}"
                            content = f"线程 {thread_id} 文档 {i+1}: 并发测试内容"
                            metadata = {
                                "thread_id": thread_id,
                                "doc_index": i,
                                "timestamp": time.time()
                            }
                            embedding = np.random.random(1536).tolist()
                            embedding_str = f"VECTOR({','.join(map(str, embedding))})"
                            insert_sql = f"""
                            INSERT INTO {self.test_schema}.{self.test_table}
                            (id, content, metadata, embedding)
                            VALUES ('{doc_id}', '{content}', JSON '{json.dumps(metadata)}', {embedding_str})
                            """
                            cursor.execute(insert_sql)
                            successful_inserts += 1
                            # 短暂延迟模拟真实场景
                            time.sleep(0.05)
                        except Exception as e:
                            logger.warning(f"线程 {thread_id} 插入失败: {e}")
                elapsed_time = time.time() - start_time
                return {
                    "thread_id": thread_id,
                    "successful_inserts": successful_inserts,
                    "elapsed_time": elapsed_time,
                    "rate": successful_inserts / elapsed_time if elapsed_time > 0 else 0
                }
            except Exception as e:
                logger.error(f"线程 {thread_id} 执行失败: {e}")
                return {
                    "thread_id": thread_id,
                    "successful_inserts": 0,
                    "elapsed_time": 0,
                    "rate": 0,
                    "error": str(e)
                }
        try:
            # 启动多个工作线程
            num_threads = 3
            docs_per_thread = 15
            threads = []
            results = []
            print(f"启动 {num_threads} 个并发工作线程...")
            start_time = time.time()
            # 创建并启动线程
            for i in range(num_threads):
                thread = threading.Thread(
                    target=lambda tid=i: results.append(worker_thread(tid, docs_per_thread))
                )
                threads.append(thread)
                thread.start()
            # 等待所有线程完成
            for thread in threads:
                thread.join()
            total_time = time.time() - start_time
            # 统计结果
            total_docs = sum(r.get("successful_inserts", 0) for r in results)
            successful_threads = len([r for r in results if r.get("successful_inserts", 0) > 0])
            overall_rate = total_docs / total_time if total_time > 0 else 0
            print(f"✅ 并发写入测试完成:")
            print(f"  - 总耗时: {total_time:.2f} 秒")
            print(f"  - 成功线程: {successful_threads}/{num_threads}")
            print(f"  - 总文档数: {total_docs}")
            print(f"  - 整体速率: {overall_rate:.1f} docs/sec")
            # 详细结果
            for result in results:
                if "error" in result:
                    print(f"  - 线程 {result['thread_id']}: 失败 - {result['error']}")
                else:
                    print(f"  - 线程 {result['thread_id']}: {result['successful_inserts']} 文档, "
                          f"{result['rate']:.1f} docs/sec")
            self.results["concurrent_writes"] = successful_threads >= num_threads * 0.8  # 80% 成功率
            return self.results["concurrent_writes"]
        except Exception as e:
            print(f"❌ 并发写入测试失败: {e}")
            self.results["concurrent_writes"] = False
            return False
    def cleanup(self) -> None:
        """清理测试数据"""
        try:
            if self.connection:
                with self.connection.cursor() as cursor:
                    cursor.execute(f"DROP TABLE IF EXISTS {self.test_schema}.{self.test_table}")
                print("✅ 清理完成")
        except Exception as e:
            print(f"⚠️ 清理警告: {e}")
    def run_all_tests(self) -> None:
        """运行所有测试"""
        print("🚀 Clickzetta 独立测试开始")
        print(f"📋 测试配置:")
        print(f"  - 服务: {self.config['service']}")
        print(f"  - 实例: {self.config['instance']}")
        print(f"  - 工作空间: {self.config['workspace']}")
        print(f"  - 模式: {self.config['schema']}")
        print(f"  - 测试表: {self.test_table}")
        print()
        try:
            # 1. 连接测试
            if not self.connect():
                return
            # 2. 表操作测试
            self.test_table_operations()
            # 3. 向量操作测试
            self.test_vector_operations()
            # 4. 并发写入测试
            self.test_concurrent_writes()
            # 5. 生成测试报告
            self.generate_report()
        finally:
            # 清理
            self.cleanup()
    def generate_report(self) -> None:
        """生成测试报告"""
        print("\n📊 测试报告:")
        total_tests = len(self.results)
        passed_tests = sum(1 for passed in self.results.values() if passed)
        for test_name, passed in self.results.items():
            status = "✅ 通过" if passed else "❌ 失败"
            print(f"  - {test_name}: {status}")
        success_rate = (passed_tests / total_tests * 100) if total_tests > 0 else 0
        print(f"\n🎯 总体结果: {passed_tests}/{total_tests} 通过 ({success_rate:.1f}%)")
        if success_rate >= 80:
            print("🎉 测试总体成功！Clickzetta 集成准备就绪。")
        else:
            print("⚠️ 部分测试失败，需要进一步调试。")
 def main():
    """主函数"""
    try:
        test = ClickzettaStandaloneTest()
        test.run_all_tests()
    except KeyboardInterrupt:
        print("\n🛑 测试被用户中断")
    except Exception as e:
        print(f"\n❌ 测试执行失败: {e}")
        logger.exception("详细错误信息:")
 if __name__ == "__main__":
    main()
--- a/clickzetta/test_clickzetta_integration.py
+++ b/clickzetta/test_clickzetta_integration.py
@ -1,520 +0,0 @@
 #!/usr/bin/env python3
 """
 Clickzetta Vector Database Integration Test Suite
 Comprehensive test cases covering all core functionality of Clickzetta vector database integration
 with Dify framework, including CRUD operations, concurrent safety, and performance benchmarking.
 """
 import os
 import sys
 import time
 import threading
 import asyncio
 from concurrent.futures import ThreadPoolExecutor
 from typing import List, Dict, Any
 import numpy as np
 # Add the API directory to the path so we can import Dify modules
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'api'))
 try:
    from core.rag.datasource.vdb.clickzetta.clickzetta_vector import ClickzettaVector
    from core.rag.models.document import Document
    from core.rag.datasource.vdb.vector_factory import AbstractVectorFactory
 except ImportError as e:
    print(f"❌ Failed to import Dify modules: {e}")
    print("This test requires running in Dify environment")
    sys.exit(1)
 class ClickzettaIntegrationTest:
    """Clickzetta Vector Database Test Suite"""
    def __init__(self):
        """Initialize test environment"""
        self.collection_name = f"test_collection_{int(time.time())}"
        self.vector_client = None
        self.test_results = {}
    def setup_test_environment(self):
        """Set up test environment"""
        try:
            # Test configuration
            config = {
                'username': os.getenv('CLICKZETTA_USERNAME'),
                'password': os.getenv('CLICKZETTA_PASSWORD'),
                'instance': os.getenv('CLICKZETTA_INSTANCE'),
                'service': os.getenv('CLICKZETTA_SERVICE', 'uat-api.clickzetta.com'),
                'workspace': os.getenv('CLICKZETTA_WORKSPACE', 'quick_start'),
                'vcluster': os.getenv('CLICKZETTA_VCLUSTER', 'default_ap'),
                'schema': os.getenv('CLICKZETTA_SCHEMA', 'dify')
            }
            # Check required environment variables
            required_vars = [
                'CLICKZETTA_USERNAME',
                'CLICKZETTA_PASSWORD', 
                'CLICKZETTA_INSTANCE'
            ]
            missing_vars = [var for var in required_vars if not os.getenv(var)]
            if missing_vars:
                raise ValueError(f"Missing required environment variables: {missing_vars}")
            print(f"✅ Test environment setup successful, using collection: {self.collection_name}")
            return True
        except Exception as e:
            print(f"❌ Test environment setup failed: {str(e)}")
            return False
    def cleanup_test_data(self):
        """Clean up test data"""
        try:
            if self.vector_client:
                self.vector_client.delete()
            print("✅ Test data cleanup complete")
        except Exception as e:
            print(f"⚠️ Error during test data cleanup: {str(e)}")
    def generate_test_documents(self, count: int) -> List[Document]:
        """Generate test documents"""
        documents = []
        for i in range(count):
            doc = Document(
                page_content=f"This is test document {i+1}, containing content about artificial intelligence and machine learning.",
                metadata={
                    'doc_id': f'test_doc_{i+1}',
                    'document_id': f'doc_{i+1}',
                    'source': 'test_integration',
                    'index': i
                }
            )
            documents.append(doc)
        return documents
    def test_basic_operations(self):
        """Test basic operations: create, insert, query, delete"""
        print("\n🧪 Testing Basic Operations...")
        try:
            # 1. Test document insertion
            print("  📝 Testing document insertion...")
            test_docs = self.generate_test_documents(5)
            embeddings = [np.random.random(1536).tolist() for _ in range(5)]
            start_time = time.time()
            self.vector_client.create(texts=test_docs, embeddings=embeddings)
            insert_time = time.time() - start_time
            print(f"     ✅ Inserted {len(test_docs)} documents in {insert_time:.3f}s")
            # 2. Test similarity search
            print("  🔍 Testing similarity search...")
            query_vector = np.random.random(1536).tolist()
            start_time = time.time()
            search_results = self.vector_client.search_by_vector(query_vector, top_k=3)
            search_time = time.time() - start_time
            print(f"     ✅ Found {len(search_results)} results in {search_time*1000:.0f}ms")
            # 3. Test text search
            print("  📖 Testing text search...")
            start_time = time.time()
            text_results = self.vector_client.search_by_full_text("artificial intelligence", top_k=3)
            text_search_time = time.time() - start_time
            print(f"     ✅ Text search returned {len(text_results)} results in {text_search_time*1000:.0f}ms")
            # 4. Test document deletion
            print("  🗑️ Testing document deletion...")
            if search_results:
                doc_ids = [doc.metadata.get('doc_id') for doc in search_results[:2]]
                self.vector_client.delete_by_ids(doc_ids)
                print(f"     ✅ Deleted {len(doc_ids)} documents")
            self.test_results['basic_operations'] = {
                'status': 'passed',
                'insert_time': insert_time,
                'search_time': search_time,
                'text_search_time': text_search_time,
                'documents_processed': len(test_docs)
            }
            print("✅ Basic operations test passed")
            return True
        except Exception as e:
            print(f"❌ Basic operations test failed: {str(e)}")
            self.test_results['basic_operations'] = {
                'status': 'failed',
                'error': str(e)
            }
            return False
    def test_concurrent_operations(self):
        """Test concurrent operation safety"""
        print("\n🧪 Testing Concurrent Operations...")
        def concurrent_insert_worker(worker_id: int, doc_count: int):
            """Worker function for concurrent inserts"""
            try:
                documents = []
                embeddings = []
                for i in range(doc_count):
                    doc = Document(
                        page_content=f"Concurrent worker {worker_id} document {i+1}",
                        metadata={
                            'doc_id': f'concurrent_{worker_id}_{i+1}',
                            'worker_id': worker_id,
                            'doc_index': i
                        }
                    )
                    documents.append(doc)
                    embeddings.append(np.random.random(1536).tolist())
                start_time = time.time()
                self.vector_client.add_texts(documents, embeddings)
                elapsed = time.time() - start_time
                return {
                    'worker_id': worker_id,
                    'documents_inserted': len(documents),
                    'time_taken': elapsed,
                    'success': True
                }
            except Exception as e:
                return {
                    'worker_id': worker_id,
                    'documents_inserted': 0,
                    'time_taken': 0,
                    'success': False,
                    'error': str(e)
                }
        try:
            # Run concurrent insertions
            num_workers = 3
            docs_per_worker = 10
            print(f"  🚀 Starting {num_workers} concurrent workers...")
            start_time = time.time()
            with ThreadPoolExecutor(max_workers=num_workers) as executor:
                futures = [
                    executor.submit(concurrent_insert_worker, i, docs_per_worker)
                    for i in range(num_workers)
                ]
                results = [future.result() for future in futures]
            total_time = time.time() - start_time
            # Analyze results
            successful_workers = [r for r in results if r['success']]
            total_docs = sum(r['documents_inserted'] for r in successful_workers)
            print(f"  ✅ Concurrent operations completed:")
            print(f"     - Total time: {total_time:.2f}s")
            print(f"     - Successful workers: {len(successful_workers)}/{num_workers}")
            print(f"     - Total documents: {total_docs}")
            print(f"     - Overall throughput: {total_docs/total_time:.1f} docs/sec")
            self.test_results['concurrent_operations'] = {
                'status': 'passed',
                'total_time': total_time,
                'successful_workers': len(successful_workers),
                'total_workers': num_workers,
                'total_documents': total_docs,
                'throughput': total_docs/total_time
            }
            print("✅ Concurrent operations test passed")
            return True
        except Exception as e:
            print(f"❌ Concurrent operations test failed: {str(e)}")
            self.test_results['concurrent_operations'] = {
                'status': 'failed',
                'error': str(e)
            }
            return False
    def test_performance_benchmarks(self):
        """Performance benchmark testing"""
        print("\n🧪 Testing Performance Benchmarks...")
        try:
            batch_sizes = [10, 50, 100]
            benchmark_results = {}
            for batch_size in batch_sizes:
                print(f"  📊 Testing batch size: {batch_size}")
                # Generate test data
                test_docs = self.generate_test_documents(batch_size)
                embeddings = [np.random.random(1536).tolist() for _ in range(batch_size)]
                # Test insertion performance
                start_time = time.time()
                self.vector_client.add_texts(test_docs, embeddings)
                insert_time = time.time() - start_time
                throughput = batch_size / insert_time
                # Test search performance
                query_vector = np.random.random(1536).tolist()
                search_times = []
                for _ in range(5):  # Run 5 searches for average
                    start_time = time.time()
                    self.vector_client.search_by_vector(query_vector, top_k=10)
                    search_times.append(time.time() - start_time)
                avg_search_time = sum(search_times) / len(search_times)
                benchmark_results[batch_size] = {
                    'insert_time': insert_time,
                    'throughput': throughput,
                    'avg_search_time': avg_search_time
                }
                print(f"     ✅ Batch {batch_size}: {throughput:.1f} docs/sec, {avg_search_time*1000:.0f}ms search")
            self.test_results['performance_benchmarks'] = {
                'status': 'passed',
                'results': benchmark_results
            }
            print("✅ Performance benchmarks test passed")
            return True
        except Exception as e:
            print(f"❌ Performance benchmarks test failed: {str(e)}")
            self.test_results['performance_benchmarks'] = {
                'status': 'failed',
                'error': str(e)
            }
            return False
    def test_error_handling(self):
        """Test error handling"""
        print("\n🧪 Testing Error Handling...")
        try:
            # 1. Test invalid embedding dimension
            print("  ⚠️ Testing invalid embedding dimension...")
            try:
                self.vector_client.add_texts(
                    texts=[Document(page_content="Test text", metadata={})],
                    embeddings=[[1, 2, 3]]  # Wrong dimension
                )
                print("     ❌ Should have failed with dimension error")
            except Exception as e:
                print(f"     ✅ Correctly handled dimension error: {type(e).__name__}")
            # 2. Test empty text
            print("  📝 Testing empty text handling...")
            try:
                self.vector_client.add_texts(
                    texts=[Document(page_content="", metadata={})],
                    embeddings=[np.random.random(1536).tolist()]
                )
                print("     ✅ Empty text handled gracefully")
            except Exception as e:
                print(f"     ℹ️ Empty text rejected: {type(e).__name__}")
            # 3. Test large batch data
            print("  📦 Testing large batch handling...")
            try:
                large_docs = self.generate_test_documents(500)
                large_embeddings = [np.random.random(1536).tolist() for _ in range(500)]
                start_time = time.time()
                self.vector_client.add_texts(large_docs, large_embeddings)
                large_batch_time = time.time() - start_time
                print(f"     ✅ Large batch (500 docs) processed in {large_batch_time:.2f}s")
            except Exception as e:
                print(f"     ⚠️ Large batch handling issue: {type(e).__name__}")
            self.test_results['error_handling'] = {
                'status': 'passed',
                'tests_completed': 3
            }
            print("✅ Error handling test passed")
            return True
        except Exception as e:
            print(f"❌ Error handling test failed: {str(e)}")
            self.test_results['error_handling'] = {
                'status': 'failed',
                'error': str(e)
            }
            return False
    def test_full_text_search(self):
        """Test full-text search functionality"""
        print("\n🧪 Testing Full-text Search...")
        try:
            # Prepare test documents with specific content
            test_docs = [
                Document(
                    page_content="Machine learning is a subset of artificial intelligence.",
                    metadata={'doc_id': 'ml_doc_1', 'category': 'AI'}
                ),
                Document(
                    page_content="Vector database is a specialized database system for storing and retrieving high-dimensional vector data.",
                    metadata={'doc_id': 'vdb_doc_1', 'category': 'Database'}
                ),
                Document(
                    page_content="Natural language processing enables computers to understand human language.",
                    metadata={'doc_id': 'nlp_doc_1', 'category': 'NLP'}
                )
            ]
            # Insert test documents
            embeddings = [np.random.random(1536).tolist() for _ in range(len(test_docs))]
            self.vector_client.add_texts(test_docs, embeddings)
            # Test different search queries
            search_queries = [
                ("machine learning", "AI"),
                ("vector", "database"),
                ("natural language", "NLP")
            ]
            for query, expected_category in search_queries:
                print(f"  🔍 Searching for: '{query}'")
                start_time = time.time()
                results = self.vector_client.search_by_full_text(query, top_k=5)
                search_time = time.time() - start_time
                print(f"     ✅ Found {len(results)} results in {search_time*1000:.0f}ms")
                # Verify results contain expected content
                if results:
                    for result in results:
                        if expected_category in result.metadata.get('category', ''):
                            print(f"     📄 Relevant result found: {result.metadata['doc_id']}")
                            break
            self.test_results['full_text_search'] = {
                'status': 'passed',
                'queries_tested': len(search_queries)
            }
            print("✅ Full-text search test passed")
            return True
        except Exception as e:
            print(f"❌ Full-text search test failed: {str(e)}")
            self.test_results['full_text_search'] = {
                'status': 'failed',
                'error': str(e)
            }
            return False
    def generate_test_report(self):
        """Generate test report"""
        print("\n" + "="*60)
        print("📊 Clickzetta Vector Database Test Report")
        print("="*60)
        passed_tests = sum(1 for result in self.test_results.values() if result['status'] == 'passed')
        total_tests = len(self.test_results)
        print(f"Total tests: {total_tests}")
        print(f"Passed: {passed_tests}")
        print(f"Failed: {total_tests - passed_tests}")
        print(f"Success rate: {(passed_tests/total_tests)*100:.1f}%")
        print("\n📋 Detailed Results:")
        for test_name, result in self.test_results.items():
            status_icon = "✅" if result['status'] == 'passed' else "❌"
            print(f"  {status_icon} {test_name}: {result['status'].upper()}")
            if result['status'] == 'failed':
                print(f"      Error: {result.get('error', 'Unknown error')}")
            elif test_name == 'basic_operations' and result['status'] == 'passed':
                print(f"      Insert time: {result['insert_time']:.3f}s")
                print(f"      Search time: {result['search_time']*1000:.0f}ms")
            elif test_name == 'performance_benchmarks' and result['status'] == 'passed':
                print("      Throughput by batch size:")
                for batch_size, metrics in result['results'].items():
                    print(f"        {batch_size} docs: {metrics['throughput']:.1f} docs/sec")
        return {
            'total_tests': total_tests,
            'passed_tests': passed_tests,
            'failed_tests': total_tests - passed_tests,
            'success_rate': (passed_tests/total_tests)*100,
            'summary': self.test_results
        }
    def run_all_tests(self):
        """Run all tests"""
        print("🚀 Starting Clickzetta Vector Database Integration Tests")
        print("="*60)
        # Setup test environment
        if not self.setup_test_environment():
            print("❌ Test environment setup failed, aborting tests")
            return None
        # Note: Since we can't create actual ClickzettaVector instances without full Dify setup,
        # this is a template for the test structure. In a real environment, you would:
        # 1. Initialize the vector client with proper configuration
        # 2. Run each test method
        # 3. Generate the final report
        print("⚠️ Note: This test requires full Dify environment setup")
        print("         Please run this test within the Dify API environment")
        # Test execution order
        tests = [
            self.test_basic_operations,
            self.test_concurrent_operations,
            self.test_performance_benchmarks,
            self.test_error_handling,
            self.test_full_text_search
        ]
        # In a real environment, you would run:
        # for test in tests:
        #     test()
        # Generate final report
        # return self.generate_test_report()
        print("\n🎯 Test template ready for execution in Dify environment")
        return None
 def main():
    """Main function"""
    # Run test suite
    test_suite = ClickzettaIntegrationTest()
    try:
        report = test_suite.run_all_tests()
        if report:
            print(f"\n🎯 Tests completed! Success rate: {report['summary']['success_rate']:.1f}%")
    except KeyboardInterrupt:
        print("\n🛑 Tests interrupted by user")
    except Exception as e:
        print(f"\n❌ Test execution failed: {e}")
    finally:
        test_suite.cleanup_test_data()
 if __name__ == "__main__":
    main()
--- a/docker/.env.example
+++ b/docker/.env.example
@ -638,7 +638,7 @@ TABLESTORE_ACCESS_KEY_SECRET=xxx
 CLICKZETTA_USERNAME=
 CLICKZETTA_PASSWORD=
 CLICKZETTA_INSTANCE=
-CLICKZETTA_SERVICE=uat-api.clickzetta.com
+CLICKZETTA_SERVICE=api.clickzetta.com
 CLICKZETTA_WORKSPACE=
 CLICKZETTA_VCLUSTER=default_ap
 CLICKZETTA_SCHEMA=dify