update knowledge_retrieval logic

Signed-off-by: kenwoodjw <blackxin55+@gmail.com>
12 months ago · 9b0bf4cedf
parent 45abd17794
commit 9b0bf4cedf
7 changed files with 49 additions and 623 deletions
--- a/ARRAY_METADATA_FILTER_DEBUG.md
+++ b/ARRAY_METADATA_FILTER_DEBUG.md
@ -1,151 +0,0 @@
-# 数组元数据过滤功能 - Debug指南
-
-## 问题分析
-
-从你的截图可以看出，当前选择的是`name`字段，它是`string`类型的元数据字段。要使用数组变量选择器，需要满足以下条件：
-
-1. **元数据字段必须是array类型** - 当前显示的是string类型
-2. **工作流中需要有array类型的变量** - 用于变量选择
-
-## Debug步骤
-
-### 1. 创建array类型的元数据字段
-
-首先需要在知识库中创建一个array类型的元数据字段：
-
-```bash
-# 通过API创建array类型元数据字段
-curl -X POST \
-  http://localhost:3000/datasets/{dataset_id}/metadata \
-  -H "Authorization: Bearer your-api-key" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "type": "array",
-    "name": "job_ids"
-  }'
-```
-
-### 2. 在工作流中添加数组变量
-
-在工作流的开始节点中添加array类型的变量：
-
-```json
-{
-  "key": "user_jobs",
-  "name": "用户工作ID列表",
-  "type": "array",
-  "required": true,
-  "default": ["job1", "job2"]
-}
-```
-
-### 3. 测试流程
-
-1. **选择array类型字段**：在元数据过滤条件中选择`job_ids`字段
-2. **查看变量选择器**：此时应该显示array类型的变量
-3. **验证操作符**：确认显示了`in`, `not in`, `contains`, `not contains`等数组操作符
-
-### 4. 添加调试日志
-
-我已经在代码中添加了调试日志，打开浏览器开发者工具查看：
-
- `🔍 ConditionArray Debug` - 显示可用的数组变量
- `🔧 数组变量被选择` - 显示变量选择过程
-
-### 5. 检查数据流
-
-**当前问题**：
- 元数据字段类型：`string` (name字段)
- 需要的字段类型：`array` (如job_ids字段)
-
-**解决方案**：
-1. 创建array类型的元数据字段
-2. 选择该字段进行过滤
-3. 此时变量选择器会显示数组类型变量
-
-## 验证方法
-
-### 检查元数据字段列表
-```javascript
-// 在浏览器控制台运行
-console.log('当前元数据字段:', metadataList);
-```
-
-### 检查可用变量
-```javascript
-// 检查数组变量
-console.log('可用数组变量:', availableArrayVars);
-console.log('通用数组变量:', availableCommonArrayVars);
-```
-
-### 验证变量过滤逻辑
-```javascript
-// 检查变量过滤器
-const filterArrayVar = (varPayload) => {
-  return [
-    'arrayString', 
-    'arrayNumber', 
-    'arrayObject', 
-    'array'
-  ].includes(varPayload.type);
-};
-console.log('过滤后的数组变量:', variables.filter(filterArrayVar));
-```
-
-## 常见问题
-
-### Q1: 为什么没有显示数组变量？
-**A**: 当前选择的是string类型字段。数组变量只在array类型字段中显示。
-
-### Q2: 如何创建array类型的元数据字段？
-**A**: 
-1. 通过API创建：`POST /datasets/{id}/metadata` with `{"type": "array", "name": "field_name"}`
-2. 或在知识库管理界面创建
-
-### Q3: 变量选择器为空？
-**A**: 检查工作流中是否有array类型的变量，确保变量类型为 `arrayString`, `arrayNumber`, `arrayObject`, 或 `array`
-
-### Q4: 如何验证功能正常工作？
-**A**: 
-1. 创建array类型元数据字段
-2. 在工作流开始节点添加array变量
-3. 选择array字段进行过滤
-4. 查看变量选择器是否显示数组变量
-
-## 示例配置
-
-### 元数据字段示例
-```json
-{
-  "id": "job_ids_field",
-  "name": "job_ids", 
-  "type": "array"
-}
-```
-
-### 工作流变量示例
-```json
-{
-  "key": "target_jobs",
-  "name": "目标工作列表",
-  "type": "array",
-  "default": ["job1", "job2", "job3"]
-}
-```
-
-### 过滤条件示例
-```json
-{
-  "id": "condition_1",
-  "name": "job_ids",
-  "comparison_operator": "in",
-  "value": ["job1", "job2"]
-}
-```
-
-## 下一步
-
-1. 首先确认是否有array类型的元数据字段
-2. 如果没有，创建一个
-3. 确保工作流中有array类型的变量
-4. 选择array字段进行测试 
--- a/ARRAY_METADATA_FILTER_FINAL_TEST.md
+++ b/ARRAY_METADATA_FILTER_FINAL_TEST.md
@ -1,147 +0,0 @@
-# 数组元数据过滤功能 - 最终测试指南
-
-## 🎯 功能概述
-
-现在Dify的知识检索节点支持使用**数组变量作为过滤条件的值**，实现如下过滤逻辑：
- `document_type in ["pdf", "docx", "txt"]`
- `priority not in [1, 2, 3]`
-
-## ✅ 已修复的问题
-
-### 1. **ComparisonOperator导入错误**
- **问题**: `ReferenceError: ComparisonOperator is not defined`
- **修复**: 修改导入语句，导入枚举值而非仅类型定义
- **文件**: `condition-item.tsx`, `condition-operator.tsx`
-
-### 2. **操作符支持范围**
- **问题**: string/number类型字段没有显示in/not in操作符
- **修复**: 在`utils.ts`中为基础类型添加数组操作符
- **文件**: `utils.ts`
-
-### 3. **条件渲染逻辑**
- **问题**: in/not in操作符没有使用数组输入组件
- **修复**: 修改条件渲染逻辑，根据操作符类型选择组件
- **文件**: `condition-item.tsx`
-
-### 4. **数组变量过滤逻辑**
- **问题**: 数组变量过滤过于严格，遗漏某些数组类型
- **修复**: 改进filterArrayVar函数，支持所有数组类型
- **文件**: `use-config.ts`
-
-### 5. **变量类型匹配**
- **问题**: ConditionVariableSelector类型定义过于严格
- **修复**: 支持字符串类型参数，改进数组类型匹配
- **文件**: `condition-variable-selector.tsx`
-
-### 6. **数据传递链路**
- **问题**: ConditionList没有传递数组变量相关props
- **修复**: 添加availableArrayVars等props传递
- **文件**: `condition-list/index.tsx`
-
-## 🧪 测试步骤
-
-### 步骤1: 创建测试工作流
-
-1. **创建新工作流**，包含以下节点：
-   - **开始节点**: 输入变量 `query`
-   - **代码执行节点**: 输出字符串数组
-   - **知识检索节点**: 使用元数据过滤
-
-### 步骤2: 配置代码执行节点
-
-```python
-def main() -> dict:
-    return {
-        "file_types": ["pdf", "docx", "txt"],
-        "priorities": [1, 2, 3],
-        "categories": ["tech", "business", "personal"]
-    }
-```
-
-### 步骤3: 配置知识检索节点
-
-1. **添加数据集**（确保数据集有元数据字段）
-2. **设置元数据过滤模式**为"手动"
-3. **添加过滤条件**：
-   - 选择字符串类型元数据字段（如 `document_type`）
-   - 选择操作符 `in`
-   - 选择变量模式，选择代码节点的 `file_types` 输出
-
-### 步骤4: 验证功能
-
-#### 前端验证
- [ ] 能看到 `in` 和 `not in` 操作符选项
- [ ] 能选择数组类型的变量
- [ ] 界面正确显示选择的数组变量
- [ ] 配置能够正确保存和加载
-
-#### 后端验证
- [ ] 运行工作流不报错
- [ ] 数组变量值被正确解析
- [ ] 过滤结果符合预期
- [ ] 支持多条件组合
-
-## 🔍 调试日志检查
-
-现在调试日志应该显示：
-
-```javascript
-🔍 ConditionArray Debug:
-  - valueMethod: variable
-  - isCommonVariable: undefined
-  - nodesOutputVars (数组变量): [{ nodeId: 'code_node', vars: [...] }]  // 不再是空数组
-  - availableNodes: [{ id: 'code_node', data: {...} }]  // 不再是空数组
-  - commonVariables: []
-
-🔍 ConditionVariableSelector Debug:
-  - varType: array
-  - nodesOutputVars: [{ nodeId: 'code_node', vars: [...] }]  // 应该有数据
-  - availableNodes: [{ id: 'code_node', data: {...} }]  // 应该有数据
-```
-
-## 🎯 支持的数组类型
-
-现在支持以下所有数组类型：
- `array` - 通用数组
- `array[string]` - 字符串数组
- `array[number]` - 数字数组
- `array[object]` - 对象数组
- `array[file]` - 文件数组
- 任何以 `array` 开头的自定义类型
-
-## 🚀 使用场景示例
-
-### 场景1: 文档类型过滤
-```
-document_type in {{code_node.file_types}}
-// 其中 file_types = ["pdf", "docx", "txt"]
-```
-
-### 场景2: 优先级排除
-```
-priority not in {{code_node.excluded_priorities}}
-// 其中 excluded_priorities = [0, 10]
-```
-
-### 场景3: 多条件组合
-```
-document_type in {{code_node.allowed_types}} AND
-created_date > "2024-01-01" AND
-priority not in {{code_node.excluded_priorities}}
-```
-
-## ✅ 完成状态
-
- [x] 前端操作符支持
- [x] 前端条件渲染修复
- [x] 变量选择器集成
- [x] 导入错误修复
- [x] 数组类型过滤改进
- [x] 变量类型匹配修复
- [x] 数据传递链路修复
- [x] 后端数组处理支持
- [x] 类型安全保证
-
-## 🎉 功能已完全可用！
-
-现在您可以在知识检索节点中完全使用数组变量进行元数据过滤了！ 
--- a/ARRAY_METADATA_FILTER_README.md
+++ b/ARRAY_METADATA_FILTER_README.md
@ -1,154 +0,0 @@
-# 元数据数组过滤功能实现
-
-## 功能概述
-
-这个实现为Dify的知识检索系统添加了对数组类型元数据的过滤支持，解决了GitHub Issue #16195中提到的需求。
-
-## 问题背景
-
-用户在使用Dify的知识检索功能时，需要根据包含特定`job_ids`的数组来过滤文档，但现有系统只支持字符串、数字和时间类型的元数据过滤，不支持数组类型的条件匹配。
-
-## 解决方案
-
-### 1. 前端改动
-
-#### 新增数组类型支持
- 在`MetadataFilteringVariableType`枚举中添加了`array`类型
- 更新了`MetadataFilteringCondition`类型以支持`string[]`值
- 为数组类型添加了专门的操作符：`in`、`not in`、`contains`、`not contains`、`empty`、`not empty`
- 添加了数组类型的图标支持（使用`RiListUnordered`图标）
-
-#### 文件修改
-```typescript
-// web/app/components/workflow/nodes/knowledge-retrieval/types.ts
-export enum MetadataFilteringVariableType {
-  string = 'string',
-  number = 'number', 
-  time = 'time',
-  select = 'select',
-  array = 'array',  // 新增
-}
-
-export type MetadataFilteringCondition = {
-  id: string
-  name: string
-  comparison_operator: ComparisonOperator
-  value?: string | number | string[]  // 支持数组值
-}
-```
-
-### 2. 后端改动
-
-#### 数据库查询逻辑
-在PostgreSQL的JSONB字段中实现数组条件查询：
-
-**`in` 操作符逻辑：**
- 检查文档的元数据字段是否包含输入数组中的任何值
- 使用OR逻辑连接多个LIKE条件
-
-**`not in` 操作符逻辑：**
- 检查文档的元数据字段是否不包含输入数组中的任何值  
- 使用AND逻辑连接多个NOT LIKE条件
-
-#### 文件修改
-```python
-# api/core/rag/retrieval/dataset_retrieval.py
-# api/core/workflow/nodes/knowledge_retrieval/knowledge_retrieval_node.py
-
-case "in":
-    if isinstance(value, (list, tuple)):
-        or_conditions = []
-        for i, v in enumerate(value):
-            param_key = f"{key_value}_{i}"
-            if isinstance(v, str):
-                or_conditions.append(
-                    (text(f"documents.doc_metadata ->> :{key} LIKE :{param_key}")).params(
-                        **{key: metadata_name, param_key: f'%"{v}"%'}
-                    )
-                )
-        if or_conditions:
-            filters.append(or_(*or_conditions))
-```
-
-## 使用示例
-
-### 场景：根据job_ids数组过滤文档
-
-假设有以下文档元数据：
-```json
-{
-  "doc1": {"job_ids": ["job1", "job2", "job3"]},
-  "doc2": {"job_ids": ["job2", "job4", "job5"]}, 
-  "doc3": {"job_ids": ["job6", "job7"]}
-}
-```
-
-### 查询1：包含指定job_ids的文档
-```
-条件：job_ids in ["job1", "job4"]
-结果：返回doc1和doc2，因为它们分别包含job1和job4
-```
-
-### 查询2：不包含指定job_ids的文档  
-```
-条件：job_ids not in ["job2", "job6"]
-结果：返回doc3（如果存在不包含job2和job6的其他文档）
-```
-
-## 对应的SQL查询
-
-### 包含查询 (in)
-```sql
-SELECT * FROM documents WHERE
-  doc_metadata ->> 'job_ids' LIKE '%"job1"%' OR
-  doc_metadata ->> 'job_ids' LIKE '%"job4"%';
-```
-
-### 排除查询 (not in)
-```sql  
-SELECT * FROM documents WHERE
-  doc_metadata ->> 'job_ids' NOT LIKE '%"job2"%' AND
-  doc_metadata ->> 'job_ids' NOT LIKE '%"job6"%';
-```
-
-## 测试
-
-运行测试脚本：
-```bash
-python test_array_metadata_filter.py
-```
-
-这将演示数组过滤功能的工作原理。
-
-## 技术细节
-
-### 数据存储
- 元数据存储在PostgreSQL的JSONB字段中
- 数组值在JSON中以字符串数组形式存储：`["value1", "value2"]`
- 使用LIKE操作符进行部分匹配：`LIKE '%"value"%'`
-
-### 性能考虑
- 使用了数据库索引：`db.Index("document_metadata_idx", "doc_metadata", postgresql_using="gin")`
- JSONB字段支持GIN索引，能够高效处理包含查询
-
-### 支持的操作符
- `in`: 检查字段是否包含数组中的任意值
- `not in`: 检查字段是否不包含数组中的任意值
- `contains`: 检查字段是否包含特定值
- `not contains`: 检查字段是否不包含特定值
- `empty`: 检查字段是否为空
- `not empty`: 检查字段是否不为空
-
-## 扩展性
-
-这个实现为未来支持更复杂的数组操作奠定了基础，比如：
- `all of`: 检查是否包含数组中的所有值
- `any of`: 检查是否包含数组中的任意值（类似当前的`in`）
- 数组长度比较
- 数组交集/并集操作
-
-## 兼容性
-
- 向后兼容：现有的字符串、数字、时间类型过滤功能保持不变
- 数据库兼容：利用PostgreSQL的JSONB特性，无需额外的schema变更
- API兼容：扩展现有的元数据过滤API，不破坏现有接口 
--- a/ARRAY_METADATA_FILTER_TEST.md
+++ b/ARRAY_METADATA_FILTER_TEST.md
@ -1,108 +0,0 @@
-# 数组元数据过滤功能测试
-
-## 🧪 测试场景
-
-### 1. 字符串字段 + 数组变量过滤
-
-**测试目标**: 验证字符串类型的元数据字段能否使用数组变量进行 `in`/`not in` 过滤
-
-**测试步骤**:
-1. 创建一个工作流，包含：
-   - 开始节点：输入变量 `filename`
-   - 代码执行节点：输出数组 `["doc1.pdf", "doc2.pdf", "doc3.pdf"]`
-   - 知识检索节点：使用元数据过滤
-
-2. 在知识检索节点中：
-   - 选择字符串类型元数据字段（如 `document_name`）
-   - 选择操作符 `in` 或 `not in`
-   - 在值选择中选择代码执行节点的数组输出
-
-**期望结果**: 
- 能够在操作符下拉中看到 `in` 和 `not in` 选项
- 能够选择数组类型的变量作为过滤值
- 运行时正确过滤匹配的文档
-
-### 2. 数字字段 + 数组变量过滤
-
-**测试目标**: 验证数字类型的元数据字段能否使用数组变量进行过滤
-
-**测试步骤**:
-1. 创建代码执行节点输出数字数组 `[1, 2, 3]`
-2. 在知识检索节点中：
-   - 选择数字类型元数据字段（如 `priority`）
-   - 选择操作符 `in`
-   - 选择数组变量作为过滤值
-
-**期望结果**: 文档按数字数组正确过滤
-
-### 3. 多条件组合测试
-
-**测试目标**: 验证数组过滤与其他条件的组合
-
-**测试步骤**:
-1. 设置多个过滤条件：
-   - `document_type in ["pdf", "docx"]`（数组过滤）
-   - `created_date > "2024-01-01"`（常规过滤）
-   - 逻辑操作符：AND
-
-**期望结果**: 所有条件正确组合执行
-
-## 🔍 验证要点
-
-### 前端检查
- [ ] 操作符下拉菜单包含 `in` 和 `not in`
- [ ] 变量选择器显示数组类型变量
- [ ] 界面正确渲染数组输入组件
- [ ] 保存/加载配置正确
-
-### 后端检查
- [ ] 正确解析数组变量值
- [ ] 数据库查询语句正确生成
- [ ] 过滤结果准确
- [ ] 错误处理完善
-
-## 🐛 已知问题修复
-
-### 1. ComparisonOperator 导入错误
-**问题**: `ReferenceError: ComparisonOperator is not defined`
-**修复**: 修改导入语句，导入枚举值而非仅类型定义
-
-### 2. 操作符可见性
-**问题**: string/number 类型字段没有显示 in/not in 操作符
-**修复**: 在 `utils.ts` 中为基础类型添加数组操作符
-
-### 3. 条件渲染逻辑
-**问题**: in/not in 操作符没有使用数组输入组件
-**修复**: 修改 `condition-item.tsx` 中的条件渲染逻辑
-
-## ✅ 功能完成状态
-
- [x] 前端操作符支持
- [x] 前端条件渲染
- [x] 变量选择器集成
- [x] 导入错误修复
- [x] 后端数组处理
- [x] 类型安全保证
-
-## 🚀 使用示例
-
-```javascript
-// 工作流配置示例
-{
-  "metadata_filtering_conditions": {
-    "logical_operator": "and",
-    "conditions": [
-      {
-        "name": "document_type",
-        "comparison_operator": "in",
-        "value": "{{code_node.file_types}}"  // 数组变量
-      },
-      {
-        "name": "priority",
-        "comparison_operator": "not in", 
-        "value": "{{code_node.excluded_priorities}}"  // 数字数组
-      }
-    ]
-  }
-}
-``` 
--- a/api/core/rag/retrieval/dataset_retrieval.py
+++ b/api/core/rag/retrieval/dataset_retrieval.py
@ -1048,59 +1048,51 @@ class DatasetRetrieval:
                    filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Float) != value)
            case "in":
                if isinstance(value, list | tuple):
-                    # For arrays: check if metadata field contains any value from the input array
+                    # For arrays: check if metadata field (single value) is in the input array
                    or_conditions = []
                    for i, v in enumerate(value):
                        param_key = f"{key_value}_{i}"
                        if isinstance(v, str):
-                            or_conditions.append(
-                                (text(f"documents.doc_metadata ->> :{key} LIKE :{param_key}")).params(
-                                    **{key: metadata_name, param_key: f'%"{v}"%'}
-                                )
-                            )
+                            # For string type: exact match with quoted string
+                            or_conditions.append(DatasetDocument.doc_metadata[metadata_name] == f'"{v}"')
                        else:
+                            # For number type: exact match as numeric value
                            or_conditions.append(
-                                (text(f"documents.doc_metadata ->> :{key} = :{param_key}")).params(
-                                    **{key: metadata_name, param_key: str(v)}
-                                )
+                                sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Float) == v
                            )
                    if or_conditions:
                        filters.append(or_(*or_conditions))
                else:
-                    # Single value case
+                    # Single value case (backward compatibility)
                    if isinstance(value, str):
+                        filters.append(DatasetDocument.doc_metadata[metadata_name] == f'"{value}"')
+                    else:
                        filters.append(
-                            (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
-                                **{key: metadata_name, key_value: f'%"{value}"%'}
-                            )
+                            sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Float) == value
                        )
            case "not in":
                if isinstance(value, list | tuple):
-                    # For arrays: check if metadata field does not contain any value from the input array
+                    # For arrays: check if metadata field (single value) is not in the input array
                    and_conditions = []
                    for i, v in enumerate(value):
                        param_key = f"{key_value}_{i}"
                        if isinstance(v, str):
-                            and_conditions.append(
-                                (text(f"documents.doc_metadata ->> :{key} NOT LIKE :{param_key}")).params(
-                                    **{key: metadata_name, param_key: f'%"{v}"%'}
-                                )
-                            )
+                            # For string type: not equal to quoted string
+                            and_conditions.append(DatasetDocument.doc_metadata[metadata_name] != f'"{v}"')
                        else:
+                            # For number type: not equal to numeric value
                            and_conditions.append(
-                                (text(f"documents.doc_metadata ->> :{key} != :{param_key}")).params(
-                                    **{key: metadata_name, param_key: str(v)}
-                                )
+                                sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Float) != v
                            )
                    if and_conditions:
                        filters.append(and_(*and_conditions))
                else:
-                    # Single value case
+                    # Single value case (backward compatibility)
                    if isinstance(value, str):
+                        filters.append(DatasetDocument.doc_metadata[metadata_name] != f'"{value}"')
+                    else:
                        filters.append(
-                            (text(f"documents.doc_metadata ->> :{key} NOT LIKE :{key_value}")).params(
-                                **{key: metadata_name, key_value: f'%"{value}"%'}
-                            )
+                            sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Float) != value
                        )
            case "empty":
                filters.append(DatasetDocument.doc_metadata[metadata_name].is_(None))
--- a/api/core/workflow/nodes/knowledge_retrieval/knowledge_retrieval_node.py
+++ b/api/core/workflow/nodes/knowledge_retrieval/knowledge_retrieval_node.py
@ -376,7 +376,10 @@ class KnowledgeRetrievalNode(LLMNode):
                                elif expected_value.value_type == "string":  # type: ignore
                                    expected_value = re.sub(r"[\r\n\t]+", " ", expected_value.text).strip()  # type: ignore
                                elif expected_value.value_type in (
-                                    "array[number]", "array[string]", "array[object]", "array"
+                                    "array[number]",
+                                    "array[string]",
+                                    "array[object]",
+                                    "array",
                                ):  # type: ignore
                                    expected_value = expected_value.value  # type: ignore
                                else:
@ -524,60 +527,48 @@ class KnowledgeRetrievalNode(LLMNode):
                    filters.append(sqlalchemy_cast(Document.doc_metadata[metadata_name].astext, Float) != value)
            case "in":
                if isinstance(value, list | tuple):
-                    # For arrays: check if metadata field contains any value from the input array
+                    # For arrays: check if metadata field (single value) is in the input array
                    or_conditions = []
                    for i, v in enumerate(value):
                        param_key = f"{key_value}_{i}"
                        if isinstance(v, str):
-                            or_conditions.append(
-                                (text(f"documents.doc_metadata ->> :{key} LIKE :{param_key}")).params(
-                                    **{key: metadata_name, param_key: f'%"{v}"%'}
-                                )
-                            )
+                            # For string type: exact match with quoted string
+                            or_conditions.append(Document.doc_metadata[metadata_name] == f'"{v}"')
                        else:
+                            # For number type: exact match as numeric value
                            or_conditions.append(
-                                (text(f"documents.doc_metadata ->> :{key} = :{param_key}")).params(
-                                    **{key: metadata_name, param_key: str(v)}
-                                )
+                                sqlalchemy_cast(Document.doc_metadata[metadata_name].astext, Float) == v
                            )
                    if or_conditions:
                        filters.append(or_(*or_conditions))
                else:
-                    # Single value case
+                    # Single value case (backward compatibility)
                    if isinstance(value, str):
-                        filters.append(
-                            (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
-                                **{key: metadata_name, key_value: f'%"{value}"%'}
-                            )
-                        )
+                        filters.append(Document.doc_metadata[metadata_name] == f'"{value}"')
+                    else:
+                        filters.append(sqlalchemy_cast(Document.doc_metadata[metadata_name].astext, Float) == value)
            case "not in":
                if isinstance(value, list | tuple):
-                    # For arrays: check if metadata field does not contain any value from the input array
+                    # For arrays: check if metadata field (single value) is not in the input array
                    and_conditions = []
                    for i, v in enumerate(value):
                        param_key = f"{key_value}_{i}"
                        if isinstance(v, str):
-                            and_conditions.append(
-                                (text(f"documents.doc_metadata ->> :{key} NOT LIKE :{param_key}")).params(
-                                    **{key: metadata_name, param_key: f'%"{v}"%'}
-                                )
-                            )
+                            # For string type: not equal to quoted string
+                            and_conditions.append(Document.doc_metadata[metadata_name] != f'"{v}"')
                        else:
+                            # For number type: not equal to numeric value
                            and_conditions.append(
-                                (text(f"documents.doc_metadata ->> :{key} != :{param_key}")).params(
-                                    **{key: metadata_name, param_key: str(v)}
-                                )
+                                sqlalchemy_cast(Document.doc_metadata[metadata_name].astext, Float) != v
                            )
                    if and_conditions:
                        filters.append(and_(*and_conditions))
                else:
-                    # Single value case
+                    # Single value case (backward compatibility)
                    if isinstance(value, str):
-                        filters.append(
-                            (text(f"documents.doc_metadata ->> :{key} NOT LIKE :{key_value}")).params(
-                                **{key: metadata_name, key_value: f'%"{value}"%'}
-                            )
-                        )
+                        filters.append(Document.doc_metadata[metadata_name] != f'"{value}"')
+                    else:
+                        filters.append(sqlalchemy_cast(Document.doc_metadata[metadata_name].astext, Float) != value)
            case "empty":
                filters.append(Document.doc_metadata[metadata_name].is_(None))
            case "not empty":
--- a/web/app/components/workflow/nodes/knowledge-retrieval/components/metadata/condition-list/condition-array.tsx
+++ b/web/app/components/workflow/nodes/knowledge-retrieval/components/metadata/condition-list/condition-array.tsx
@ -87,16 +87,19 @@ const ConditionArray = ({
      const trimmed = item.trim()
      if (trimmed === '') return null

-      // Try to convert to number if it's a valid number
-      const numericValue = Number(trimmed)
-      if (!isNaN(numericValue) && isFinite(numericValue))
-        return numericValue
+      // Try to convert to number if it's a valid number (only if it looks like a pure numeric value)
+      if (/^-?\d+(\.\d+)?$/.test(trimmed)) {
+        const numericValue = Number(trimmed)
+        if (!isNaN(numericValue) && isFinite(numericValue))
+          return numericValue
+      }

-      // Otherwise keep as string
-      return trimmed
+      // Otherwise keep as string (remove quotes if present)
+      return trimmed.replace(/^["']|["']$/g, '')
    }).filter(item => item !== null)

    console.log('🔧 常量数组值被设置:', arrayValues)
+    console.log('🔧 数组类型检测:', arrayValues.map(v => typeof v))
    onChange(arrayValues)
  }, [onChange])