Update api/core/rag/extractor/pdf_extractor.py

Since page.extract_text() may return None when no text is found, consider adding a check before performing encoding operations to avoid potential AttributeError. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 year ago · 370a785d48
parent 236c9d64c3
commit 370a785d48
1 changed files with 1 additions and 1 deletions
--- a/api/core/rag/extractor/pdf_extractor.py
+++ b/api/core/rag/extractor/pdf_extractor.py
@ -69,7 +69,7 @@ class PdfExtractor(BaseExtractor):
            with pdfplumber.open(file_obj) as pdf:
                for page_number, page in enumerate(pdf.pages):
                    # Extract text with layout preservation and encoding detection
-                    content = page.extract_text(layout=True)
+                    content = page.extract_text(layout=True) or ""
                    # Try to detect and fix encoding issues
                    try:
                        # First try to decode as UTF-8