Commit Graph

301 Commits (abead647e226ca47aed2efa84c46bbffd1f6c987)

Author SHA1 Message Date
liuzhenghua 47a64610ca
Fix the issue of repeated escaping of quotes in hit test (#13477) 1 year ago
Ademílson Tonato d0a21086bd
refactor: Update Firecrawl API parameters and default settings (#13082) 1 year ago
Ademílson Tonato 6024d8a42d
refactor: Update Firecrawl to use v1 API (#12574)
Co-authored-by: Ademílson Tonato <ademilson.tonato@refurbed.com>
1 year ago
huangzhuo1949 4c3076f2a4
feat: add pg vector index (#12338)
Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com>
1 year ago
Bowen Liang 166221d784
chore(lint): fix quotes for f-string formatting by bumping ruff to 0.9.x (#12702) 1 year ago
yihong 4e101604c3
fix: ruff check for True if ... else (#12576)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
CN-P5 cd257b91c5
Fix pandas indexing method for knowledge base imports (#12637) (#12638)
Co-authored-by: CN-P5 <heibai2006@qq.com>
1 year ago
YoungLH 040a3b782c
FEAT: support milvus to full text search (#11430)
Signed-off-by: YoungLH <974840768@qq.com>
1 year ago
Yingchun Lai 53bb37b749
fix: fix the incorrect plaintext file key when saving (#10429) 1 year ago
Hiroshi Fujita d2586278d6
Feat elasticsearch japanese (#12194) 1 year ago
Jyong 05bda6f38d
add tidb on qdrant redis lock (#12462) 1 year ago
huangzhuo1949 70698024f5
fix: empty delete bug (#12339)
Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com>
1 year ago
Jyong b873e6349c
add child chunk preview number limit (#12309) 1 year ago
-LAN- 8d15c8cfbf
fix: improve error handling in NotionExtractor data fetching (#12182)
Signed-off-by: -LAN- <laipz8200@outlook.com>
1 year ago
-LAN- dae1b5a619
fix: import jieba.analyse (#12133)
Signed-off-by: -LAN- <laipz8200@outlook.com>
1 year ago
Jyong 811e4bd0cf
fix unstructured setting (#12116) 1 year ago
Jyong 84ac004772
py lint (#12102)
Signed-off-by: -LAN- <laipz8200@outlook.com>
Co-authored-by: -LAN- <laipz8200@outlook.com>
1 year ago
Jyong 9231fdbf4c
Feat/support parent child chunk (#12092) 1 year ago
yihong 56e15d09a9
feat: mypy for all type check (#10921) 1 year ago
-LAN- 599d410d99
fix: validate reranking model attributes before processing (#11930)
Signed-off-by: -LAN- <laipz8200@outlook.com>
1 year ago
-LAN- 8c559d6231
fix(retrieval_service): avoid to use exception (#11925)
Signed-off-by: -LAN- <laipz8200@outlook.com>
1 year ago
yihong 7b03a0316d
fix: better memory usage from 800+ to 500+ (#11796)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
yihong 463fbe2680
fix: better gard nan value from numpy for issue #11827 (#11864)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
yihong 5a8a901560
fix: float values are not json for nan value close #11827 (#11840)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
Jiang ad17ff9a92
Lindorm vdb bug-fix (#11790)
Co-authored-by: jiangzhijie <jiangzhijie.jzj@alibaba-inc.com>
1 year ago
Bowen Liang 924b4fe742
test: run vdb tests on TiDB Vector with docker in CI tests (#11645) 1 year ago
yihong 22258fb0bf
fix: filter bug for keywork cause code can not reach (#11666)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
yihong 36cb25b341
fix: support mdx files close #11557 (#11565)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
Jiang 0d04cdc323
Lindorm vdb (#11574)
Co-authored-by: jiangzhijie <jiangzhijie.jzj@alibaba-inc.com>
1 year ago
Jyong 9b7adcd4d9
update tidb batch get endpoint to basic mode (#11426) 1 year ago
Jyong d7c1f43b49
fix tidb full-text-search vector missed (#11337) 1 year ago
Jyong c58d2fce89
roll back rerank topn setting (#11297) 1 year ago
yihong e686f12317
fix: better handle error (#11265)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
-LAN- 9601102885
fix(word_extractor): Fix type error and remove stream in ssrf_proxy (#11241)
Signed-off-by: -LAN- <laipz8200@outlook.com>
1 year ago
Cling_o3 f9c2aa7689
feat: add retireval_top_n to config in env (#11132) 1 year ago
kazuya-awano 2d6865d421
Ensure consistent float type for cached embedding return values (#10185) 1 year ago
yihong d7160ee563
fix: typo in upstashVector if id is always true, also fix some type hint (#11183)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
-LAN- 9789905a1f
chore(*): Removes debugging print statements (#11145)
Signed-off-by: -LAN- <laipz8200@outlook.com>
1 year ago
Bowen Liang 6c8e208ef3
chore: bump minimum supported Python version to 3.11 (#10386) 1 year ago
yihong ed55de888a
fix: rules should not be None for in (#10977)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
AkisAya cb0c55daa7
fix weight rerank of knowledge retrieval (#10931) 1 year ago
yihong 58a9d9eb9a
fix: better WeightRerankRunner run logic use O(1) and delete unused code (#10849)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
1 year ago
Zane 14f3d44c37
refactor: improve handling of leading punctuation removal (#10761) 1 year ago
8bitpd 873e9720e9
feat: AnalyticDB vector store supports invocation via SQL. (#10802)
Co-authored-by: 璟义 <yangshangpo.ysp@alibaba-inc.com>
1 year ago
Bowen Liang 51db59622c
chore(lint): cleanup repeated cause exception in logging.exception replaced by helpful message (#10425) 1 year ago
Jyong 0b2d51d859
add the index field for elasticsearch (#10592) 1 year ago
-LAN- a1543b7da0
fix(extractor): temporary file (#10543) 1 year ago
Leo.Wang c9f785e00f
Feat/tools/gitlab (#10407) 1 year ago
Bowen Liang 574c4a264f
chore(lint): Use logging.exception instead of logging.error (#10415) 1 year ago
Jyong 1024fc623e
fix the ssrf of docx file extractor external images (#10237) 1 year ago
Jiang 0c9e79cd67
Add Lindorm as a VDB choice (#10202)
Co-authored-by: jiangzhijie <jiangzhijie.jzj@alibaba-inc.com>
1 year ago
Shili Cao b61baa87ec
fix: avoid unexpected error when create knowledge base with baidu vector database and wenxin embedding model (#10130) 1 year ago
Jyong dad041c49f
fix issue: query is none when doing retrieval (#10129) 1 year ago
omr 11ca1bec0b
fix: optimize unique document filtering with set (#10082) 1 year ago
zhuhao 7433095240
chore: use dify_config.TIDB_SPEND_LIMIT instead of constant value (#10038) 1 year ago
Jyong 9ebd453b87
add rerank check when doing mutil-retrieval (#9998) 1 year ago
powerfool 878d13ef42
Added OceanBase as an option for the vector store in Dify (#10010) 1 year ago
Jyong 5580bcf870
add tidb spend limit config (#9999) 1 year ago
roadgoat19 c8ef9223e5
feat: couchbase integration (#6165)
Co-authored-by: crazywoola <427733928@qq.com>
Co-authored-by: Elliot Scribner <elliot.scribner@couchbase.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
Co-authored-by: Bowen Liang <bowenliang@apache.org>
1 year ago
Jyong f47177ecb4
add top_k for es full text search (#9963) 1 year ago
virgosoy 17cacf258e
fix: wrong element object (#9868) 2 years ago
Jyong 18106a4fc6
add tidb on qdrant type (#9831)
Co-authored-by: Zhaofeng Miao <522856232@qq.com>
2 years ago
Zixuan Cheng 88dec6ef2b
Added description for .ppt, specify the reason for unstructured.io (#9452)
Co-authored-by: crazywoola <427733928@qq.com>
2 years ago
Jyong 5f11fe521d
remove unstructured pdf extract (#9794) 2 years ago
Jyong 3e9d271b52
nltk security issue and upgrade unstructured (#9558) 2 years ago
ice yao ceb2c4f3ef
chore: reuse existing test functions with upstash vdb (#9679) 2 years ago
Zven 8e7a752b2a
feat: add upstash as a new vector database provider (#9644) 2 years ago
-LAN- 5f12c17355
fix(core): use CreatedByRole enum for role consistency (#9607) 2 years ago
Bowen Liang 4d9160ca9f
refactor: use dify_config to replace legacy usage of flask app's config (#9089) 2 years ago
-LAN- e61752bd3a
feat/enhance the multi-modal support (#8818) 2 years ago
ice yao 2155bba5b0
fix: update mismatch vector type (#9462) 2 years ago
zhuhao b90ad587c2
refactor: move the embedding to the rag module and abstract the rerank runner for extension (#9423) 2 years ago
zhuhao 86594851cb
refactor: update the default values of top-k parameter in vdb to be consistent (#9367) 2 years ago
Jyong 50635e9c15
Fix/economical knowledge retrieval (#9396) 2 years ago
zhuhao cd7ab6231f
refactor: Add an enumeration type and use the factory pattern to obtain the corresponding class (#9356) 2 years ago
ice yao d15ba3939d
Add Volcengine VikingDB as new vector provider (#9287) 2 years ago
zhuhao d97d3ff5fc
chore: add abstract decorator and output log when query embedding fails (#9264) 2 years ago
Shili Cao 2ec6ffe478
feat:support baidu vector db (#9185) 2 years ago
Jyong 42b02b3a5f
Fix/agent external knowledge retrieval (#9241) 2 years ago
Jyong 80b62d50f5
Fix/add es num_candidates (#9225) 2 years ago
Jyong cabdb4ef17
fix retrieval resource positon missed (#9155)
Co-authored-by: Bowen Liang <liangbowen@gf.com.cn>
2 years ago
Bowen Liang 240b66d737
chore: avoid implicit optional in type annotations of method (#8727) 2 years ago
Aurelius Huang 4585cffce1
fix: Compatible with special characters in pg full-text search. (#8921)
Co-authored-by: Aurelius Huang <cm.huang@aftership.com>
2 years ago
Jyong 9d221a5e19
external knowledge api (#8913)
Co-authored-by: Yi <yxiaoisme@gmail.com>
2 years ago
Zhaofeng Miao 369e1e6f58
feat(website-crawl): add jina reader as additional alternative for website crawling (#8761) 2 years ago
Bowen Liang 74f58f29f9
chore: bump ruff to 0.6.8 for fixing violation in SIM910 (#8869) 2 years ago
ice yao 27e33fb15c
chore: fix wrong VectorType match case (#8857) 2 years ago
zhuhao 55e6123db9
feat: add min-connection and max-connection for pgvector (#8841) 2 years ago
8bitpd 4c1063e1c5
fix: AnalyticdbVector retrieval scores (#8803) 2 years ago
zhuhao 008e0efeb0
refactor: update delete method as an abstract method (#8794) 2 years ago
crazywoola bf64ff215b
fix: . is missing in file_extension (#8736) 2 years ago
omr 8fd297f8b4
fix: redundant check for available_document_count (#8491) 2 years ago
Jyong 7e611ffbf3
multi-retrival use dataset's top-k (#8416) 2 years ago
Bowen Liang a1104ab97e
chore: refurish python code by applying Pylint linter rules (#8322) 2 years ago
Bowen Liang 6613b8f2e0
chore: fix unnecessary string concatation in single line (#8311) 2 years ago
-LAN- 08c486452f
fix: score_threshold handling in vector search methods (#8356) 2 years ago
Jyong 49cee773c5
fixed score threshold is none (#8342) 2 years ago
Bowen Liang 40fb4d16ef
chore: refurbish Python code by applying refurb linter rules (#8296) 2 years ago
Bowen Liang c69f5b07ba
chore: apply ruff E501 line-too-long linter rule (#8275)
Co-authored-by: -LAN- <laipz8200@outlook.com>
2 years ago
takatost 56c90e212a
fix(workflow): missing content in the answer node stream output during iterations (#8292)
Co-authored-by: -LAN- <laipz8200@outlook.com>
2 years ago