Commit Graph

60 Commits (7aaa1ff27080ff28bf856428ce6fb16a7ac7fd67)

Author SHA1 Message Date
Oliver Lee 176d91937d
fix 'NoneType' and new ContentType supported. (#4818) 2 years ago
Jyong 1b2d862973
add error msg for hit test (#4704) 2 years ago
yalei 026175c8f7
feat: update notion extractor (#3898)
Co-authored-by: duyalei <>
2 years ago
Jyong 24624491cd
add qdrant metadata.doc_id index when create qdrant collection (#4570) 2 years ago
Jyong 233c4150d1
support images and tables extract from docx (#4619) 2 years ago
Rain Chen c255a20d7c
allow to config max segmentation tokens length for RAG document using environment variable (#4375) 2 years ago
majian b5204111da
Add UNSTRUCTURED_API_KEY env support (#4369) 2 years ago
Bowen Liang 04ad46dd31
chore: skip unnecessary key checks prior to accessing a dictionary (#4497) 2 years ago
Charlie.Wei 97b65f9b4b
Optimize webscraper (#4392)
Co-authored-by: luowei <glpat-EjySCyNjWiLqAED-YmwM>
Co-authored-by: crazywoola <427733928@qq.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2 years ago
kerlion 98140ae5d9
fix the issue of MILVUS_DATABASE has no effect. (#4353) 2 years ago
LiuVaayne 875249eb00
Feat/vector db pgvector (#3879) 2 years ago
Bowen Liang 142814d451
chore: skip deprecated field_schema param in creating payload index on Qdrant (#3903) 2 years ago
Jyong 3e9dbe3e0a
add pgvecto_rs support and upgrade SQLAlchemy (#3833) 2 years ago
呆萌闷油瓶 0940f01634
enhancement:support Qdrant gRPC mode (#3929) 2 years ago
Bowen Liang 045827043d
test: improve vector store tests (#3855) 2 years ago
Bowen Liang 7919596a21
fix: UP031 style rule violation (#3866) 2 years ago
Bowen Liang 45dd1683fd
test: add tests covering all methods of vector store (#3849) 2 years ago
Jingpan Xiong 1be222af2e
fix: using api can not execute relyt vector database (#3766)
Co-authored-by: jingsi <jingsi@leadincloud.com>
2 years ago
呆萌闷油瓶 78988ed60e
fix:still enable SSL verification when using qdrant based on HTTP protocol (#3805) 2 years ago
Bowen Liang 9cec8c1750
test: add unit tests for vector stores of Milvus, Qdrant and Weaviate (#3688) 2 years ago
Jyong f257f2c396
Knowledge optimization (#3755)
Co-authored-by: crazywoola <427733928@qq.com>
Co-authored-by: JzoNg <jzongcode@gmail.com>
2 years ago
Bowen Liang 2867d29021
fix: milvus usage with create_collection (#3683) 2 years ago
Jingpan Xiong 33397836a5
feat: support relyt vector database (#3367)
Co-authored-by: jingsi <jingsi@leadincloud.com>
2 years ago
Jyong 0737e930cb
chore: remove Langchain tools import (#3407) 2 years ago
chenxu9741 ad65c891e7
add xls file suport (#3321) 2 years ago
LiuVaayne b00466f025
feat:api Add support for extracting EPUB files in ExtractProcessor (#3254)
Co-authored-by: crazywoola <427733928@qq.com>
2 years ago
Jyong 1f302990c6
add segment with keyword issue (#3351)
Co-authored-by: StyleZhang <jasonapring2015@outlook.com>
2 years ago
Jyong 6164604462
fix dataset retrival in dataset mode (#3334) 2 years ago
Jyong b6de97ad53
Remove langchain dataset retrival agent logic (#3311) 2 years ago
Jyong 8fcf459285
fix milvus database name parameter missed (#3229) 2 years ago
Leo Q 9c01bcb3e5
feat: support setting database used in Milvus (#3003) 2 years ago
Jyong 283979fc46
fix keyword index error when storage source is S3 (#3182) 2 years ago
takatost 7753ba2d37
FEAT: NEW WORKFLOW ENGINE (#3160)
Co-authored-by: Joel <iamjoel007@gmail.com>
Co-authored-by: Yeuoly <admin@srmxy.cn>
Co-authored-by: JzoNg <jzongcode@gmail.com>
Co-authored-by: StyleZhang <jasonapring2015@outlook.com>
Co-authored-by: jyong <jyong@dify.ai>
Co-authored-by: nite-knite <nkCoding@gmail.com>
Co-authored-by: jyong <718720800@qq.com>
2 years ago
Jyong 9eba6ffdd4
Optimize csv and excel extract (#3155)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Vikey Chen e4f686deb7
fix unstructured api,remove unused parameters (#3056) 2 years ago
Jyong a94d86da6d
add keyword table s3 storage support (#3065)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Jyong 84d118de07
add redis lock on create collection in multiple thread mode (#3054)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Jyong a6cd0f0e73
fix add segment when dataset and document is empty (#3021)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Jyong b0b0cc045f
add mutil-thread document embedding (#3016)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Qiwen Tong 180775a0ec
fix: init qdrant vector max recursion (#2909) 2 years ago
listeng 696efe494e
fix: Ignore some emtpy page_content when append to split_documents (#2898) 2 years ago
Weaxs 20bd49285b
excel: get keys from every sheet (#2796) 2 years ago
Jyong 8ba38e8e74
fix overlap and splitter optimization (#2742)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Bowen Liang b163545771
Use `python-docx` to extract docx files (#2654) 2 years ago
Jyong 31070ffbca
fix qa index processor tenant id is None error (#2713)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Charlie.Wei fa7ba30ba3
Fix rebuild index&csv parsing (#2705)
Co-authored-by: luowei <glpat-EjySCyNjWiLqAED-YmwM>
Co-authored-by: crazywoola <427733928@qq.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2 years ago
Jyong 3631e53ff0
Feat/add annotation migrate (#2675)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
waltcow f322d9bddb
Fix vdb merge error (#2650) 2 years ago
Bowen Liang 801d135390
generalize the generation of new collection name by dataset id (#2620) 2 years ago
takatost a4d86496e1
fix: notion extractor raise 'NoneType' object has no attribute 'curre… (#2608) 2 years ago