Commit Graph

39 Commits (fix/note-node-zoom-issue)

Author SHA1 Message Date
Bowen Liang ccb6ddd840
chore: bump Ruff to 0.5.7 (#7174) 2 years ago
Jyong 12095f8cd6
extract docx filter comment element (#7092) 2 years ago
chenxu9741 72c75b75cf
feat: Add hyperlink parsing to the DOCX document. (#7017) 2 years ago
yanghx c53875ce8c
fix #6902 .docx handles images within tables and handles cross-column tables (#6951) 2 years ago
Jyong cf258b7a67
add xlsx support hyperlink extract (#6722) 2 years ago
Yeuoly 79cb23e8ac
security/SSRF vulns (#6682) 2 years ago
灰灰 5e4ac11df3
fix: code block segmentation problem of markdown document (#6465) 2 years ago
Poorandy c8f5dfcf17
refactor(rag): switch to dify_config. (#6410)
Co-authored-by: -LAN- <laipz8200@outlook.com>
2 years ago
tangyoha 0cbbaf3f68
fix: markdown proc will remove image (#5855) 2 years ago
Matri a9ee52f2d7
Fix/firecrawl parameters issue (#6213) 2 years ago
Aurelius Huang f546db5437
fix: document truncation and loss in notion document sync (#5631)
Co-authored-by: Aurelius Huang <cm.huang@aftership.com>
2 years ago
Jyong 43335b5c87
delete the deprecated method (#5612) 2 years ago
Bowen Liang 39c14ec7c1
improve: unify Excel files parsing in either xls or xlsx file format by Pandas (#4965) 2 years ago
takatost 12c815c597
fix: ExtractSetting optional value missing None as default val (#5238) 2 years ago
Jyong ba5f8afaa8
Feat/firecrawl data source (#5232)
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
Co-authored-by: chenhe <guchenhe@gmail.com>
Co-authored-by: takatost <takatost@gmail.com>
2 years ago
Bowen Liang f976740b57
improve: mordernizing validation by migrating pydantic from 1.x to 2.x (#4592) 2 years ago
Jyong 3b60c28b3a
deal the external image when extract docx image (#5024) 2 years ago
YC 9f8ca75a81
fixing a bug of handling header row when parsing xls file, and tune xls/xlsx parsing result to be more structured (#3600) 2 years ago
Bowen Liang 58db719a2c
dep: bump pandas from 1.x to 2.x (#4820) 2 years ago
Oliver Lee 176d91937d
fix 'NoneType' and new ContentType supported. (#4818) 2 years ago
yalei 026175c8f7
feat: update notion extractor (#3898)
Co-authored-by: duyalei <>
2 years ago
Jyong 233c4150d1
support images and tables extract from docx (#4619) 2 years ago
majian b5204111da
Add UNSTRUCTURED_API_KEY env support (#4369) 2 years ago
Charlie.Wei 97b65f9b4b
Optimize webscraper (#4392)
Co-authored-by: luowei <glpat-EjySCyNjWiLqAED-YmwM>
Co-authored-by: crazywoola <427733928@qq.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2 years ago
Bowen Liang 7919596a21
fix: UP031 style rule violation (#3866) 2 years ago
Jyong 0737e930cb
chore: remove Langchain tools import (#3407) 2 years ago
chenxu9741 ad65c891e7
add xls file suport (#3321) 2 years ago
LiuVaayne b00466f025
feat:api Add support for extracting EPUB files in ExtractProcessor (#3254)
Co-authored-by: crazywoola <427733928@qq.com>
2 years ago
Jyong 6164604462
fix dataset retrival in dataset mode (#3334) 2 years ago
Jyong 9eba6ffdd4
Optimize csv and excel extract (#3155)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Vikey Chen e4f686deb7
fix unstructured api,remove unused parameters (#3056) 2 years ago
Jyong b0b0cc045f
add mutil-thread document embedding (#3016)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Weaxs 20bd49285b
excel: get keys from every sheet (#2796) 2 years ago
Bowen Liang b163545771
Use `python-docx` to extract docx files (#2654) 2 years ago
Charlie.Wei fa7ba30ba3
Fix rebuild index&csv parsing (#2705)
Co-authored-by: luowei <glpat-EjySCyNjWiLqAED-YmwM>
Co-authored-by: crazywoola <427733928@qq.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2 years ago
takatost a4d86496e1
fix: notion extractor raise 'NoneType' object has no attribute 'curre… (#2608) 2 years ago
Jyong 5b953c1ef2
Fix some RAG bugs (#2570)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Jyong 91ea6fe4ee
Fix/langchain document schema (#2539)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago
Jyong 6c4e6bf1d6
Feat/dify rag (#2528)
Co-authored-by: jyong <jyong@dify.ai>
2 years ago