Merge branch 'feat/parent-child-retrieval' of https://github.com/langgenius/dify into feat/parent-child-retrieval

2 years ago · 78fff31e61
parent b18eb58770 8541153b15
commit 78fff31e61
359 changed files with 9391 additions and 2200 deletions
--- a/.github/workflows/db-migration-test.yml
+++ b/.github/workflows/db-migration-test.yml
@ -48,6 +48,8 @@ jobs:
          cp .env.example .env

      - name: Run DB Migration
+        env:
+          DEBUG: true
        run: |
          cd api
          poetry run python -m flask upgrade-db
--- a/README.md
+++ b/README.md
@ -147,6 +147,13 @@ Deploy Dify to Cloud Platform with a single click using [terraform](https://www.
 ##### Google Cloud
 - [Google Cloud Terraform by @sotazum](https://github.com/DeNA/dify-google-cloud-terraform)

+#### Using AWS CDK for Deployment
+
+Deploy Dify to AWS with [CDK](https://aws.amazon.com/cdk/)
+
+##### AWS 
+- [AWS CDK by @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## Contributing

 For those who'd like to contribute code, see our [Contribution Guide](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). 
--- a/README_AR.md
+++ b/README_AR.md
@ -190,6 +190,13 @@ docker compose up -d
 ##### Google Cloud
 - [Google Cloud Terraform بواسطة @sotazum](https://github.com/DeNA/dify-google-cloud-terraform)

+#### استخدام AWS CDK للنشر
+
+انشر Dify على AWS باستخدام [CDK](https://aws.amazon.com/cdk/)
+
+##### AWS 
+- [AWS CDK بواسطة @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## المساهمة

 لأولئك الذين يرغبون في المساهمة، انظر إلى [دليل المساهمة](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md) لدينا. 
@ -222,3 +229,10 @@ docker compose up -d
 ## الرخصة

 هذا المستودع متاح تحت [رخصة البرنامج الحر Dify](LICENSE)، والتي تعتبر بشكل أساسي Apache 2.0 مع بعض القيود الإضافية.
+## الكشف عن الأمان
+
+لحماية خصوصيتك، يرجى تجنب نشر مشكلات الأمان على GitHub. بدلاً من ذلك، أرسل أسئلتك إلى security@dify.ai وسنقدم لك إجابة أكثر تفصيلاً.
+
+## الرخصة
+
+هذا المستودع متاح تحت [رخصة البرنامج الحر Dify](LICENSE)، والتي تعتبر بشكل أساسي Apache 2.0 مع بعض القيود الإضافية.
--- a/README_CN.md
+++ b/README_CN.md
@ -213,6 +213,13 @@ docker compose up -d
 ##### Google Cloud
 - [Google Cloud Terraform by @sotazum](https://github.com/DeNA/dify-google-cloud-terraform)

+#### 使用 AWS CDK 部署
+
+使用 [CDK](https://aws.amazon.com/cdk/) 将 Dify 部署到 AWS
+
+##### AWS 
+- [AWS CDK by @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## Star History

 [![Star History Chart](https://api.star-history.com/svg?repos=langgenius/dify&type=Date)](https://star-history.com/#langgenius/dify&Date)
--- a/README_ES.md
+++ b/README_ES.md
@ -215,6 +215,13 @@ Despliega Dify en una plataforma en la nube con un solo clic utilizando [terrafo
 ##### Google Cloud
 - [Google Cloud Terraform por @sotazum](https://github.com/DeNA/dify-google-cloud-terraform)

+#### Usando AWS CDK para el Despliegue
+
+Despliegue Dify en AWS usando [CDK](https://aws.amazon.com/cdk/)
+
+##### AWS 
+- [AWS CDK por @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## Contribuir

 Para aquellos que deseen contribuir con código, consulten nuestra [Guía de contribución](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). 
@ -248,3 +255,10 @@ Para proteger tu privacidad, evita publicar problemas de seguridad en GitHub. En
 ## Licencia

 Este repositorio está disponible bajo la [Licencia de Código Abierto de Dify](LICENSE), que es esencialmente Apache 2.0 con algunas restricciones adicionales.
+## Divulgación de Seguridad
+
+Para proteger tu privacidad, evita publicar problemas de seguridad en GitHub. En su lugar, envía tus preguntas a security@dify.ai y te proporcionaremos una respuesta más detallada.
+
+## Licencia
+
+Este repositorio está disponible bajo la [Licencia de Código Abierto de Dify](LICENSE), que es esencialmente Apache 2.0 con algunas restricciones adicionales.
--- a/README_FR.md
+++ b/README_FR.md
@ -213,6 +213,13 @@ Déployez Dify sur une plateforme cloud en un clic en utilisant [terraform](http
 ##### Google Cloud
 - [Google Cloud Terraform par @sotazum](https://github.com/DeNA/dify-google-cloud-terraform)

+#### Utilisation d'AWS CDK pour le déploiement
+
+Déployez Dify sur AWS en utilisant [CDK](https://aws.amazon.com/cdk/)
+
+##### AWS 
+- [AWS CDK par @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## Contribuer

 Pour ceux qui souhaitent contribuer du code, consultez notre [Guide de contribution](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). 
@ -246,3 +253,10 @@ Pour protéger votre vie privée, veuillez éviter de publier des problèmes de
 ## Licence

 Ce référentiel est disponible sous la [Licence open source Dify](LICENSE), qui est essentiellement l'Apache 2.0 avec quelques restrictions supplémentaires.
+## Divulgation de sécurité
+
+Pour protéger votre vie privée, veuillez éviter de publier des problèmes de sécurité sur GitHub. Au lieu de cela, envoyez vos questions à security@dify.ai et nous vous fournirons une réponse plus détaillée.
+
+## Licence
+
+Ce référentiel est disponible sous la [Licence open source Dify](LICENSE), qui est essentiellement l'Apache 2.0 avec quelques restrictions supplémentaires.
--- a/README_JA.md
+++ b/README_JA.md
@ -212,6 +212,13 @@ docker compose up -d
 ##### Google Cloud
 - [@sotazumによるGoogle Cloud Terraform](https://github.com/DeNA/dify-google-cloud-terraform)

+#### AWS CDK を使用したデプロイ
+
+[CDK](https://aws.amazon.com/cdk/) を使用して、DifyをAWSにデプロイします
+
+##### AWS 
+- [@KevinZhaoによるAWS CDK](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## 貢献

 コードに貢献したい方は、[Contribution Guide](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md)を参照してください。
--- a/README_KL.md
+++ b/README_KL.md
@ -213,6 +213,13 @@ wa'logh nIqHom neH ghun deployment toy'wI' [terraform](https://www.terraform.io/
 ##### Google Cloud
 - [Google Cloud Terraform qachlot @sotazum](https://github.com/DeNA/dify-google-cloud-terraform)

+#### AWS CDK atorlugh pilersitsineq
+
+wa'logh nIqHom neH ghun deployment toy'wI' [CDK](https://aws.amazon.com/cdk/) lo'laH.
+
+##### AWS 
+- [AWS CDK qachlot @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## Contributing

 For those who'd like to contribute code, see our [Contribution Guide](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). 
--- a/README_KR.md
+++ b/README_KR.md
@ -205,6 +205,13 @@ Dify를 Kubernetes에 배포하고 프리미엄 스케일링 설정을 구성했
 ##### Google Cloud
 - [sotazum의 Google Cloud Terraform](https://github.com/DeNA/dify-google-cloud-terraform)

+#### AWS CDK를 사용한 배포
+
+[CDK](https://aws.amazon.com/cdk/)를 사용하여 AWS에 Dify 배포
+
+##### AWS 
+- [KevinZhao의 AWS CDK](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## 기여

 코드에 기여하고 싶은 분들은 [기여 가이드](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md)를 참조하세요.
--- a/README_PT.md
+++ b/README_PT.md
@ -211,6 +211,13 @@ Implante o Dify na Plataforma Cloud com um único clique usando [terraform](http
 ##### Google Cloud
 - [Google Cloud Terraform por @sotazum](https://github.com/DeNA/dify-google-cloud-terraform)

+#### Usando AWS CDK para Implantação
+
+Implante o Dify na AWS usando [CDK](https://aws.amazon.com/cdk/)
+
+##### AWS 
+- [AWS CDK por @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## Contribuindo

 Para aqueles que desejam contribuir com código, veja nosso [Guia de Contribuição](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). 
--- a/README_SI.md
+++ b/README_SI.md
@ -145,6 +145,13 @@ namestite Dify v Cloud Platform z enim klikom z uporabo [terraform](https://www.
 ##### Google Cloud
 - [Google Cloud Terraform by @sotazum](https://github.com/DeNA/dify-google-cloud-terraform)

+#### Uporaba AWS CDK za uvajanje
+
+Uvedite Dify v AWS z uporabo [CDK](https://aws.amazon.com/cdk/)
+
+##### AWS 
+- [AWS CDK by @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## Prispevam

 Za tiste, ki bi radi prispevali kodo, si oglejte naš vodnik za prispevke . Hkrati vas prosimo, da podprete Dify tako, da ga delite na družbenih medijih ter na dogodkih in konferencah. 
--- a/README_TR.md
+++ b/README_TR.md
@ -211,6 +211,13 @@ Dify'ı bulut platformuna tek tıklamayla dağıtın [terraform](https://www.ter
 ##### Google Cloud
 - [Google Cloud Terraform tarafından @sotazum](https://github.com/DeNA/dify-google-cloud-terraform)

+#### AWS CDK ile Dağıtım
+
+[CDK](https://aws.amazon.com/cdk/) kullanarak Dify'ı AWS'ye dağıtın
+
+##### AWS 
+- [AWS CDK tarafından @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## Katkıda Bulunma

 Kod katkısında bulunmak isteyenler için [Katkı Kılavuzumuza](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md) bakabilirsiniz.
--- a/README_VI.md
+++ b/README_VI.md
@ -207,6 +207,13 @@ Triển khai Dify lên nền tảng đám mây với một cú nhấp chuột b
 ##### Google Cloud
 - [Google Cloud Terraform bởi @sotazum](https://github.com/DeNA/dify-google-cloud-terraform)

+#### Sử dụng AWS CDK để Triển khai
+
+Triển khai Dify trên AWS bằng [CDK](https://aws.amazon.com/cdk/)
+
+##### AWS 
+- [AWS CDK bởi @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws)
+
 ## Đóng góp

 Đối với những người muốn đóng góp mã, xem [Hướng dẫn Đóng góp](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md) của chúng tôi. 
--- a/api/.env.example
+++ b/api/.env.example
@ -329,6 +329,7 @@ NOTION_INTERNAL_SECRET=you-internal-secret
 ETL_TYPE=dify
 UNSTRUCTURED_API_URL=
 UNSTRUCTURED_API_KEY=
+SCARF_NO_ANALYTICS=true

 #ssrf
 SSRF_PROXY_HTTP_URL=
@ -382,7 +383,7 @@ LOG_DATEFORMAT=%Y-%m-%d %H:%M:%S
 LOG_TZ=UTC

 # Indexing configuration
-INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH=1000
+INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH=4000

 # Workflow runtime configuration
 WORKFLOW_MAX_EXECUTION_STEPS=500
@ -410,4 +411,5 @@ POSITION_PROVIDER_EXCLUDES=
 # Reset password token expiry minutes
 RESET_PASSWORD_TOKEN_EXPIRY_MINUTES=5

-CREATE_TIDB_SERVICE_JOB_ENABLED=false
+CREATE_TIDB_SERVICE_JOB_ENABLED=false
+
--- a/api/.ruff.toml
+++ b/api/.ruff.toml
@ -0,0 +1,96 @@
+exclude = [
+    "migrations/*",
+]
+line-length = 120
+
+[format]
+quote-style = "double"
+
+[lint]
+preview = true
+select = [
+    "B", # flake8-bugbear rules
+    "C4", # flake8-comprehensions
+    "E", # pycodestyle E rules
+    "F", # pyflakes rules
+    "FURB", # refurb rules
+    "I", # isort rules
+    "N", # pep8-naming
+    "PT", # flake8-pytest-style rules
+    "PLC0208", # iteration-over-set
+    "PLC2801", # unnecessary-dunder-call
+    "PLC0414", # useless-import-alias
+    "PLE0604", # invalid-all-object
+    "PLE0605", # invalid-all-format
+    "PLR0402", # manual-from-import
+    "PLR1711", # useless-return
+    "PLR1714", # repeated-equality-comparison
+    "RUF013", # implicit-optional
+    "RUF019", # unnecessary-key-check
+    "RUF100", # unused-noqa
+    "RUF101", # redirected-noqa
+    "RUF200", # invalid-pyproject-toml
+    "RUF022", # unsorted-dunder-all
+    "S506", # unsafe-yaml-load
+    "SIM", # flake8-simplify rules
+    "TRY400", # error-instead-of-exception
+    "TRY401", # verbose-log-message
+    "UP", # pyupgrade rules
+    "W191", # tab-indentation
+    "W605", # invalid-escape-sequence
+]
+
+ignore = [
+    "E402", # module-import-not-at-top-of-file
+    "E711", # none-comparison
+    "E712", # true-false-comparison
+    "E721", # type-comparison
+    "E722", # bare-except
+    "E731", # lambda-assignment
+    "F821", # undefined-name
+    "F841", # unused-variable
+    "FURB113", # repeated-append
+    "FURB152", # math-constant
+    "UP007", # non-pep604-annotation
+    "UP032", # f-string
+    "B005", # strip-with-multi-characters
+    "B006", # mutable-argument-default
+    "B007", # unused-loop-control-variable
+    "B026", # star-arg-unpacking-after-keyword-arg
+    "B904", # raise-without-from-inside-except
+    "B905", # zip-without-explicit-strict
+    "N806", # non-lowercase-variable-in-function
+    "N815", # mixed-case-variable-in-class-scope
+    "PT011", # pytest-raises-too-broad
+    "SIM102", # collapsible-if
+    "SIM103", # needless-bool
+    "SIM105", # suppressible-exception
+    "SIM107", # return-in-try-except-finally
+    "SIM108", # if-else-block-instead-of-if-exp
+    "SIM113", # eumerate-for-loop
+    "SIM117", # multiple-with-statements
+    "SIM210", # if-expr-with-true-false
+    "SIM300", # yoda-conditions,
+]
+
+[lint.per-file-ignores]
+"__init__.py" = [
+    "F401", # unused-import
+    "F811", # redefined-while-unused
+]
+"configs/*" = [
+    "N802", # invalid-function-name
+]
+"libs/gmpy2_pkcs10aep_cipher.py" = [
+    "N803", # invalid-argument-name
+]
+"tests/*" = [
+    "F811", # redefined-while-unused
+    "F401", # unused-import
+]
+
+[lint.pyflakes]
+extend-generics = [
+    "_pytest.monkeypatch",
+    "tests.integration_tests",
+]
--- a/api/Dockerfile
+++ b/api/Dockerfile
@ -55,7 +55,7 @@ RUN apt-get update \
    && echo "deb http://deb.debian.org/debian testing main" > /etc/apt/sources.list \
    && apt-get update \
    # For Security
-    && apt-get install -y --no-install-recommends expat=2.6.4-1 libldap-2.5-0=2.5.18+dfsg-3+b1 perl=5.40.0-7 libsqlite3-0=3.46.1-1 zlib1g=1:1.3.dfsg+really1.3.1-1+b1 \
+    && apt-get install -y --no-install-recommends expat=2.6.4-1 libldap-2.5-0=2.5.18+dfsg-3+b1 perl=5.40.0-8 libsqlite3-0=3.46.1-1 zlib1g=1:1.3.dfsg+really1.3.1-1+b1 \
    # install a chinese font to support the use of tools like matplotlib
    && apt-get install -y fonts-noto-cjk \
    && apt-get autoremove -y \
--- a/api/app.py
+++ b/api/app.py
@ -1,113 +1,13 @@
-import os
-import sys
-
-python_version = sys.version_info
-if not ((3, 11) <= python_version < (3, 13)):
-    print(f"Python 3.11 or 3.12 is required, current version is {python_version.major}.{python_version.minor}")
-    raise SystemExit(1)
-
-from configs import dify_config
-
-if not dify_config.DEBUG:
-    from gevent import monkey
-
-    monkey.patch_all()
-
-    import grpc.experimental.gevent
-
-    grpc.experimental.gevent.init_gevent()
-
-import json
-import threading
-import time
-import warnings
-
-from flask import Response
-
 from app_factory import create_app
+from libs import threadings_utils, version_utils

-# DO NOT REMOVE BELOW
-from events import event_handlers  # noqa: F401
-from extensions.ext_database import db
-
-# TODO: Find a way to avoid importing models here
-from models import account, dataset, model, source, task, tool, tools, web  # noqa: F401
-
-# DO NOT REMOVE ABOVE
-
-
-warnings.simplefilter("ignore", ResourceWarning)
-
-os.environ["TZ"] = "UTC"
-# windows platform not support tzset
-if hasattr(time, "tzset"):
-    time.tzset()
-
+# preparation before creating app
+version_utils.check_supported_python_version()
+threadings_utils.apply_gevent_threading_patch()

 # create app
 app = create_app()
 celery = app.extensions["celery"]

-if dify_config.TESTING:
-    print("App is running in TESTING mode")
-
-
-@app.after_request
-def after_request(response):
-    """Add Version headers to the response."""
-    response.headers.add("X-Version", dify_config.CURRENT_VERSION)
-    response.headers.add("X-Env", dify_config.DEPLOY_ENV)
-    return response
-
-
-@app.route("/health")
-def health():
-    return Response(
-        json.dumps({"pid": os.getpid(), "status": "ok", "version": dify_config.CURRENT_VERSION}),
-        status=200,
-        content_type="application/json",
-    )
-
-
-@app.route("/threads")
-def threads():
-    num_threads = threading.active_count()
-    threads = threading.enumerate()
-
-    thread_list = []
-    for thread in threads:
-        thread_name = thread.name
-        thread_id = thread.ident
-        is_alive = thread.is_alive()
-
-        thread_list.append(
-            {
-                "name": thread_name,
-                "id": thread_id,
-                "is_alive": is_alive,
-            }
-        )
-
-    return {
-        "pid": os.getpid(),
-        "thread_num": num_threads,
-        "threads": thread_list,
-    }
-
-
-@app.route("/db-pool-stat")
-def pool_stat():
-    engine = db.engine
-    return {
-        "pid": os.getpid(),
-        "pool_size": engine.pool.size(),
-        "checked_in_connections": engine.pool.checkedin(),
-        "checked_out_connections": engine.pool.checkedout(),
-        "overflow_connections": engine.pool.overflow(),
-        "connection_timeout": engine.pool.timeout(),
-        "recycle_time": db.engine.pool._recycle,
-    }
-
-
 if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5001)
--- a/api/app_factory.py
+++ b/api/app_factory.py
@ -1,54 +1,15 @@
+import logging
 import os
+import time

 from configs import dify_config
-
-if not dify_config.DEBUG:
-    from gevent import monkey
-
-    monkey.patch_all()
-
-    import grpc.experimental.gevent
-
-    grpc.experimental.gevent.init_gevent()
-
-import json
-
-from flask import Flask, Response, request
-from flask_cors import CORS
-from werkzeug.exceptions import Unauthorized
-
-import contexts
-from commands import register_commands
-from configs import dify_config
-from extensions import (
-    ext_celery,
-    ext_code_based_extension,
-    ext_compress,
-    ext_database,
-    ext_hosting_provider,
-    ext_logging,
-    ext_login,
-    ext_mail,
-    ext_migrate,
-    ext_proxy_fix,
-    ext_redis,
-    ext_sentry,
-    ext_storage,
-)
-from extensions.ext_database import db
-from extensions.ext_login import login_manager
-from libs.passport import PassportService
-from services.account_service import AccountService
-
-
-class DifyApp(Flask):
-    pass
+from dify_app import DifyApp


 # ----------------------------
 # Application Factory Function
 # ----------------------------
-def create_flask_app_with_configs() -> Flask:
+def create_flask_app_with_configs() -> DifyApp:
    """
    create a raw flask app
    with configs loaded from .env file
@ -68,111 +29,72 @@ def create_flask_app_with_configs() -> Flask:
    return dify_app


-def create_app() -> Flask:
+def create_app() -> DifyApp:
+    start_time = time.perf_counter()
    app = create_flask_app_with_configs()
-    app.secret_key = dify_config.SECRET_KEY
    initialize_extensions(app)
-    register_blueprints(app)
-    register_commands(app)
-
+    end_time = time.perf_counter()
+    if dify_config.DEBUG:
+        logging.info(f"Finished create_app ({round((end_time - start_time) * 1000, 2)} ms)")
    return app


-def initialize_extensions(app):
-    # Since the application instance is now created, pass it to each Flask
-    # extension instance to bind it to the Flask application instance (app)
-    ext_logging.init_app(app)
-    ext_compress.init_app(app)
-    ext_code_based_extension.init()
-    ext_database.init_app(app)
-    ext_migrate.init(app, db)
-    ext_redis.init_app(app)
-    ext_storage.init_app(app)
-    ext_celery.init_app(app)
-    ext_login.init_app(app)
-    ext_mail.init_app(app)
-    ext_hosting_provider.init_app(app)
-    ext_sentry.init_app(app)
-    ext_proxy_fix.init_app(app)
-
-
-# Flask-Login configuration
-@login_manager.request_loader
-def load_user_from_request(request_from_flask_login):
-    """Load user based on the request."""
-    if request.blueprint not in {"console", "inner_api"}:
-        return None
-    # Check if the user_id contains a dot, indicating the old format
-    auth_header = request.headers.get("Authorization", "")
-    if not auth_header:
-        auth_token = request.args.get("_token")
-        if not auth_token:
-            raise Unauthorized("Invalid Authorization token.")
-    else:
-        if " " not in auth_header:
-            raise Unauthorized("Invalid Authorization header format. Expected 'Bearer <api-key>' format.")
-        auth_scheme, auth_token = auth_header.split(None, 1)
-        auth_scheme = auth_scheme.lower()
-        if auth_scheme != "bearer":
-            raise Unauthorized("Invalid Authorization header format. Expected 'Bearer <api-key>' format.")
-
-    decoded = PassportService().verify(auth_token)
-    user_id = decoded.get("user_id")
-
-    logged_in_account = AccountService.load_logged_in_account(account_id=user_id)
-    if logged_in_account:
-        contexts.tenant_id.set(logged_in_account.current_tenant_id)
-    return logged_in_account
-
-
-@login_manager.unauthorized_handler
-def unauthorized_handler():
-    """Handle unauthorized requests."""
-    return Response(
-        json.dumps({"code": "unauthorized", "message": "Unauthorized."}),
-        status=401,
-        content_type="application/json",
-    )
-
-
-# register blueprint routers
-def register_blueprints(app):
-    from controllers.console import bp as console_app_bp
-    from controllers.files import bp as files_bp
-    from controllers.inner_api import bp as inner_api_bp
-    from controllers.service_api import bp as service_api_bp
-    from controllers.web import bp as web_bp
-
-    CORS(
-        service_api_bp,
-        allow_headers=["Content-Type", "Authorization", "X-App-Code"],
-        methods=["GET", "PUT", "POST", "DELETE", "OPTIONS", "PATCH"],
+def initialize_extensions(app: DifyApp):
+    from extensions import (
+        ext_app_metrics,
+        ext_blueprints,
+        ext_celery,
+        ext_code_based_extension,
+        ext_commands,
+        ext_compress,
+        ext_database,
+        ext_hosting_provider,
+        ext_import_modules,
+        ext_logging,
+        ext_login,
+        ext_mail,
+        ext_migrate,
+        ext_proxy_fix,
+        ext_redis,
+        ext_sentry,
+        ext_set_secretkey,
+        ext_storage,
+        ext_timezone,
+        ext_warnings,
    )
-    app.register_blueprint(service_api_bp)
-
-    CORS(
-        web_bp,
-        resources={r"/*": {"origins": dify_config.WEB_API_CORS_ALLOW_ORIGINS}},
-        supports_credentials=True,
-        allow_headers=["Content-Type", "Authorization", "X-App-Code"],
-        methods=["GET", "PUT", "POST", "DELETE", "OPTIONS", "PATCH"],
-        expose_headers=["X-Version", "X-Env"],
-    )
-
-    app.register_blueprint(web_bp)
-
-    CORS(
-        console_app_bp,
-        resources={r"/*": {"origins": dify_config.CONSOLE_CORS_ALLOW_ORIGINS}},
-        supports_credentials=True,
-        allow_headers=["Content-Type", "Authorization"],
-        methods=["GET", "PUT", "POST", "DELETE", "OPTIONS", "PATCH"],
-        expose_headers=["X-Version", "X-Env"],
-    )
-
-    app.register_blueprint(console_app_bp)
-
-    CORS(files_bp, allow_headers=["Content-Type"], methods=["GET", "PUT", "POST", "DELETE", "OPTIONS", "PATCH"])
-    app.register_blueprint(files_bp)

-    app.register_blueprint(inner_api_bp)
+    extensions = [
+        ext_timezone,
+        ext_logging,
+        ext_warnings,
+        ext_import_modules,
+        ext_set_secretkey,
+        ext_compress,
+        ext_code_based_extension,
+        ext_database,
+        ext_app_metrics,
+        ext_migrate,
+        ext_redis,
+        ext_storage,
+        ext_celery,
+        ext_login,
+        ext_mail,
+        ext_hosting_provider,
+        ext_sentry,
+        ext_proxy_fix,
+        ext_blueprints,
+        ext_commands,
+    ]
+    for ext in extensions:
+        short_name = ext.__name__.split(".")[-1]
+        is_enabled = ext.is_enabled() if hasattr(ext, "is_enabled") else True
+        if not is_enabled:
+            if dify_config.DEBUG:
+                logging.info(f"Skipped {short_name}")
+            continue
+
+        start_time = time.perf_counter()
+        ext.init_app(app)
+        end_time = time.perf_counter()
+        if dify_config.DEBUG:
+            logging.info(f"Loaded {short_name} ({round((end_time - start_time) * 1000, 2)} ms)")
--- a/api/commands.py
+++ b/api/commands.py
@ -640,15 +640,3 @@ where sites.id is null limit 1000"""
                break

    click.echo(click.style("Fix for missing app-related sites completed successfully!", fg="green"))
-
-
-def register_commands(app):
-    app.cli.add_command(reset_password)
-    app.cli.add_command(reset_email)
-    app.cli.add_command(reset_encrypt_key_pair)
-    app.cli.add_command(vdb_migrate)
-    app.cli.add_command(convert_to_agent_apps)
-    app.cli.add_command(add_qdrant_doc_id_index)
-    app.cli.add_command(create_tenant)
-    app.cli.add_command(upgrade_db)
-    app.cli.add_command(fix_app_site_missing)
--- a/api/configs/deploy/init.py
+++ b/api/configs/deploy/init.py
@ -17,11 +17,6 @@ class DeploymentConfig(BaseSettings):
        default=False,
    )

-    TESTING: bool = Field(
-        description="Enable testing mode for running automated tests",
-        default=False,
-    )
-
    EDITION: str = Field(
        description="Deployment edition of the application (e.g., 'SELF_HOSTED', 'CLOUD')",
        default="SELF_HOSTED",
--- a/api/configs/feature/init.py
+++ b/api/configs/feature/init.py
@ -585,6 +585,11 @@ class RagEtlConfig(BaseSettings):
        default=None,
    )

+    SCARF_NO_ANALYTICS: Optional[str] = Field(
+        description="This is about whether to disable Scarf analytics in Unstructured library.",
+        default="false",
+    )
+

 class DataSetConfig(BaseSettings):
    """
@ -640,7 +645,7 @@ class IndexingConfig(BaseSettings):

    INDEXING_MAX_SEGMENTATION_TOKENS_LENGTH: PositiveInt = Field(
        description="Maximum token length for text segmentation during indexing",
-        default=1000,
+        default=4000,
    )


--- a/api/configs/packaging/init.py
+++ b/api/configs/packaging/init.py
@ -9,7 +9,7 @@ class PackagingInfo(BaseSettings):

    CURRENT_VERSION: str = Field(
        description="Dify version",
-        default="0.12.0",
+        default="0.13.0",
    )

    COMMIT_SHA: str = Field(
--- a/api/constants/languages.py
+++ b/api/constants/languages.py
@ -18,6 +18,7 @@ language_timezone_mapping = {
    "tr-TR": "Europe/Istanbul",
    "fa-IR": "Asia/Tehran",
    "sl-SI": "Europe/Ljubljana",
+    "th-TH": "Asia/Bangkok",
 }

 languages = list(language_timezone_mapping.keys())
--- a/api/controllers/console/app/workflow.py
+++ b/api/controllers/console/app/workflow.py
@ -100,11 +100,11 @@ class DraftWorkflowApi(Resource):
        try:
            environment_variables_list = args.get("environment_variables") or []
            environment_variables = [
-                variable_factory.build_variable_from_mapping(obj) for obj in environment_variables_list
+                variable_factory.build_environment_variable_from_mapping(obj) for obj in environment_variables_list
            ]
            conversation_variables_list = args.get("conversation_variables") or []
            conversation_variables = [
-                variable_factory.build_variable_from_mapping(obj) for obj in conversation_variables_list
+                variable_factory.build_conversation_variable_from_mapping(obj) for obj in conversation_variables_list
            ]
            workflow = workflow_service.sync_draft_workflow(
                app_model=app_model,
@ -382,7 +382,7 @@ class DefaultBlockConfigApi(Resource):
        filters = None
        if args.get("q"):
            try:
-                filters = json.loads(args.get("q"))
+                filters = json.loads(args.get("q", ""))
            except json.JSONDecodeError:
                raise ValueError("Invalid filters")

--- a/api/controllers/console/auth/data_source_oauth.py
+++ b/api/controllers/console/auth/data_source_oauth.py
@ -34,7 +34,6 @@ class OAuthDataSource(Resource):
        OAUTH_DATASOURCE_PROVIDERS = get_oauth_providers()
        with current_app.app_context():
            oauth_provider = OAUTH_DATASOURCE_PROVIDERS.get(provider)
-            print(vars(oauth_provider))
        if not oauth_provider:
            return {"error": "Invalid provider"}, 400
        if dify_config.NOTION_INTEGRATION_TYPE == "internal":
--- a/api/controllers/console/auth/oauth.py
+++ b/api/controllers/console/auth/oauth.py
@ -52,7 +52,6 @@ class OAuthLogin(Resource):
        OAUTH_PROVIDERS = get_oauth_providers()
        with current_app.app_context():
            oauth_provider = OAUTH_PROVIDERS.get(provider)
-            print(vars(oauth_provider))
        if not oauth_provider:
            return {"error": "Invalid provider"}, 400

--- a/api/controllers/console/datasets/datasets_document.py
+++ b/api/controllers/console/datasets/datasets_document.py
@ -106,6 +106,7 @@ class GetProcessRuleApi(Resource):
        # get default rules
        mode = DocumentService.DEFAULT_RULES["mode"]
        rules = DocumentService.DEFAULT_RULES["rules"]
+        limits = DocumentService.DEFAULT_RULES["limits"]
        if document_id:
            # get the latest process rule
            document = Document.query.get_or_404(document_id)
@ -132,7 +133,7 @@ class GetProcessRuleApi(Resource):
                mode = dataset_process_rule.mode
                rules = dataset_process_rule.rules_dict

-        return {"mode": mode, "rules": rules}
+        return {"mode": mode, "rules": rules, "limits": limits}


 class DatasetDocumentListApi(Resource):
--- a/api/controllers/service_api/app/app.py
+++ b/api/controllers/service_api/app/app.py
@ -48,7 +48,8 @@ class AppInfoApi(Resource):
    @validate_app_token
    def get(self, app_model: App):
        """Get app information"""
-        return {"name": app_model.name, "description": app_model.description}
+        tags = [tag.name for tag in app_model.tags]
+        return {"name": app_model.name, "description": app_model.description, "tags": tags}


 api.add_resource(AppParameterApi, "/parameters")
--- a/api/core/app/app_config/easy_ui_based_app/model_config/manager.py
+++ b/api/core/app/app_config/easy_ui_based_app/model_config/manager.py
@ -1,3 +1,6 @@
+from collections.abc import Mapping
+from typing import Any
+
 from core.app.app_config.entities import ModelConfigEntity
 from core.model_runtime.entities.model_entities import ModelPropertyKey, ModelType
 from core.model_runtime.model_providers import model_provider_factory
@ -36,7 +39,7 @@ class ModelConfigManager:
        )

    @classmethod
-    def validate_and_set_defaults(cls, tenant_id: str, config: dict) -> tuple[dict, list[str]]:
+    def validate_and_set_defaults(cls, tenant_id: str, config: Mapping[str, Any]) -> tuple[dict, list[str]]:
        """
        Validate and set defaults for model config

--- a/api/core/app/apps/advanced_chat/app_generator.py
+++ b/api/core/app/apps/advanced_chat/app_generator.py
@ -2,8 +2,8 @@ import contextvars
 import logging
 import threading
 import uuid
-from collections.abc import Generator
-from typing import Any, Literal, Optional, Union, overload
+from collections.abc import Generator, Mapping
+from typing import Any, Optional, Union

 from flask import Flask, current_app
 from pydantic import ValidationError
@ -23,6 +23,7 @@ from core.app.entities.app_invoke_entities import AdvancedChatAppGenerateEntity,
 from core.app.entities.task_entities import ChatbotAppBlockingResponse, ChatbotAppStreamResponse
 from core.model_runtime.errors.invoke import InvokeAuthorizationError, InvokeError
 from core.ops.ops_trace_manager import TraceQueueManager
+from core.prompt.utils.get_thread_messages_length import get_thread_messages_length
 from extensions.ext_database import db
 from factories import file_factory
 from models.account import Account
@ -33,37 +34,17 @@ logger = logging.getLogger(__name__)


 class AdvancedChatAppGenerator(MessageBasedAppGenerator):
-    @overload
-    def generate(
-        self,
-        app_model: App,
-        workflow: Workflow,
-        user: Union[Account, EndUser],
-        args: dict,
-        invoke_from: InvokeFrom,
-        stream: Literal[True] = True,
-    ) -> Generator[str, None, None]: ...
+    _dialogue_count: int

-    @overload
    def generate(
        self,
        app_model: App,
        workflow: Workflow,
        user: Union[Account, EndUser],
-        args: dict,
+        args: Mapping[str, Any],
        invoke_from: InvokeFrom,
-        stream: Literal[False] = False,
-    ) -> dict: ...
-
-    def generate(
-        self,
-        app_model: App,
-        workflow: Workflow,
-        user: Union[Account, EndUser],
-        args: dict,
-        invoke_from: InvokeFrom,
-        stream: bool = True,
-    ) -> dict[str, Any] | Generator[str, Any, None]:
+        streaming: bool = True,
+    ) -> Mapping[str, Any] | Generator[str, None, None]:
        """
        Generate App response.

@ -134,7 +115,7 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
            files=file_objs,
            parent_message_id=args.get("parent_message_id") if invoke_from != InvokeFrom.SERVICE_API else UUID_NIL,
            user_id=user.id,
-            stream=stream,
+            stream=streaming,
            invoke_from=invoke_from,
            extras=extras,
            trace_manager=trace_manager,
@ -148,12 +129,12 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
            invoke_from=invoke_from,
            application_generate_entity=application_generate_entity,
            conversation=conversation,
-            stream=stream,
+            stream=streaming,
        )

    def single_iteration_generate(
-        self, app_model: App, workflow: Workflow, node_id: str, user: Account, args: dict, stream: bool = True
-    ) -> dict[str, Any] | Generator[str, Any, None]:
+        self, app_model: App, workflow: Workflow, node_id: str, user: Account, args: dict, streaming: bool = True
+    ) -> Mapping[str, Any] | Generator[str, None, None]:
        """
        Generate App response.

@ -182,7 +163,7 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
            query="",
            files=[],
            user_id=user.id,
-            stream=stream,
+            stream=streaming,
            invoke_from=InvokeFrom.DEBUGGER,
            extras={"auto_generate_conversation_name": False},
            single_iteration_run=AdvancedChatAppGenerateEntity.SingleIterationRunEntity(
@ -197,7 +178,7 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
            invoke_from=InvokeFrom.DEBUGGER,
            application_generate_entity=application_generate_entity,
            conversation=None,
-            stream=stream,
+            stream=streaming,
        )

    def _generate(
@ -209,7 +190,7 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
        application_generate_entity: AdvancedChatAppGenerateEntity,
        conversation: Optional[Conversation] = None,
        stream: bool = True,
-    ) -> dict[str, Any] | Generator[str, Any, None]:
+    ) -> Mapping[str, Any] | Generator[str, None, None]:
        """
        Generate App response.

@ -233,6 +214,9 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
            db.session.commit()
            db.session.refresh(conversation)

+        # get conversation dialogue count
+        self._dialogue_count = get_thread_messages_length(conversation.id)
+
        # init queue manager
        queue_manager = MessageBasedAppQueueManager(
            task_id=application_generate_entity.task_id,
@ -303,6 +287,7 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
                    queue_manager=queue_manager,
                    conversation=conversation,
                    message=message,
+                    dialogue_count=self._dialogue_count,
                )

                runner.run()
@ -356,6 +341,7 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
            message=message,
            user=user,
            stream=stream,
+            dialogue_count=self._dialogue_count,
        )

        try:
--- a/api/core/app/apps/advanced_chat/app_runner.py
+++ b/api/core/app/apps/advanced_chat/app_runner.py
@ -39,12 +39,14 @@ class AdvancedChatAppRunner(WorkflowBasedAppRunner):
        queue_manager: AppQueueManager,
        conversation: Conversation,
        message: Message,
+        dialogue_count: int,
    ) -> None:
        super().__init__(queue_manager)

        self.application_generate_entity = application_generate_entity
        self.conversation = conversation
        self.message = message
+        self._dialogue_count = dialogue_count

    def run(self) -> None:
        app_config = self.application_generate_entity.app_config
@ -122,19 +124,13 @@ class AdvancedChatAppRunner(WorkflowBasedAppRunner):

                session.commit()

-            # Increment dialogue count.
-            self.conversation.dialogue_count += 1
-
-            conversation_dialogue_count = self.conversation.dialogue_count
-            db.session.commit()
-
            # Create a variable pool.
            system_inputs = {
                SystemVariableKey.QUERY: query,
                SystemVariableKey.FILES: files,
                SystemVariableKey.CONVERSATION_ID: self.conversation.id,
                SystemVariableKey.USER_ID: user_id,
-                SystemVariableKey.DIALOGUE_COUNT: conversation_dialogue_count,
+                SystemVariableKey.DIALOGUE_COUNT: self._dialogue_count,
                SystemVariableKey.APP_ID: app_config.app_id,
                SystemVariableKey.WORKFLOW_ID: app_config.workflow_id,
                SystemVariableKey.WORKFLOW_RUN_ID: self.application_generate_entity.workflow_run_id,
--- a/api/core/app/apps/advanced_chat/generate_task_pipeline.py
+++ b/api/core/app/apps/advanced_chat/generate_task_pipeline.py
@ -88,6 +88,7 @@ class AdvancedChatAppGenerateTaskPipeline(BasedGenerateTaskPipeline, WorkflowCyc
        message: Message,
        user: Union[Account, EndUser],
        stream: bool,
+        dialogue_count: int,
    ) -> None:
        """
        Initialize AdvancedChatAppGenerateTaskPipeline.
@ -98,6 +99,7 @@ class AdvancedChatAppGenerateTaskPipeline(BasedGenerateTaskPipeline, WorkflowCyc
        :param message: message
        :param user: user
        :param stream: stream
+        :param dialogue_count: dialogue count
        """
        super().__init__(application_generate_entity, queue_manager, user, stream)

@ -114,7 +116,7 @@ class AdvancedChatAppGenerateTaskPipeline(BasedGenerateTaskPipeline, WorkflowCyc
            SystemVariableKey.FILES: application_generate_entity.files,
            SystemVariableKey.CONVERSATION_ID: conversation.id,
            SystemVariableKey.USER_ID: user_id,
-            SystemVariableKey.DIALOGUE_COUNT: conversation.dialogue_count,
+            SystemVariableKey.DIALOGUE_COUNT: dialogue_count,
            SystemVariableKey.APP_ID: application_generate_entity.app_config.app_id,
            SystemVariableKey.WORKFLOW_ID: workflow.id,
            SystemVariableKey.WORKFLOW_RUN_ID: application_generate_entity.workflow_run_id,
@ -125,6 +127,7 @@ class AdvancedChatAppGenerateTaskPipeline(BasedGenerateTaskPipeline, WorkflowCyc

        self._conversation_name_generate_thread = None
        self._recorded_files: list[Mapping[str, Any]] = []
+        self.total_tokens: int = 0

    def process(self):
        """
@ -358,6 +361,8 @@ class AdvancedChatAppGenerateTaskPipeline(BasedGenerateTaskPipeline, WorkflowCyc
                if not workflow_run:
                    raise Exception("Workflow run not initialized.")

+                # FIXME for issue #11221 quick fix maybe have a better solution
+                self.total_tokens += event.metadata.get("total_tokens", 0) if event.metadata else 0
                yield self._workflow_iteration_completed_to_stream_response(
                    task_id=self._application_generate_entity.task_id, workflow_run=workflow_run, event=event
                )
@ -371,7 +376,7 @@ class AdvancedChatAppGenerateTaskPipeline(BasedGenerateTaskPipeline, WorkflowCyc
                workflow_run = self._handle_workflow_run_success(
                    workflow_run=workflow_run,
                    start_at=graph_runtime_state.start_at,
-                    total_tokens=graph_runtime_state.total_tokens,
+                    total_tokens=graph_runtime_state.total_tokens or self.total_tokens,
                    total_steps=graph_runtime_state.node_run_steps,
                    outputs=event.outputs,
                    conversation_id=self._conversation.id,
--- a/api/core/app/apps/agent_chat/app_config_manager.py
+++ b/api/core/app/apps/agent_chat/app_config_manager.py
@ -1,5 +1,6 @@
 import uuid
-from typing import Optional
+from collections.abc import Mapping
+from typing import Any, Optional

 from core.agent.entities import AgentEntity
 from core.app.app_config.base_app_config_manager import BaseAppConfigManager
@ -85,7 +86,7 @@ class AgentChatAppConfigManager(BaseAppConfigManager):
        return app_config

    @classmethod
-    def config_validate(cls, tenant_id: str, config: dict) -> dict:
+    def config_validate(cls, tenant_id: str, config: Mapping[str, Any]) -> dict:
        """
        Validate for agent chat app model config

--- a/api/core/app/apps/agent_chat/app_generator.py
+++ b/api/core/app/apps/agent_chat/app_generator.py
@ -1,8 +1,8 @@
 import logging
 import threading
 import uuid
-from collections.abc import Generator
-from typing import Any, Literal, Union, overload
+from collections.abc import Generator, Mapping
+from typing import Any, Union

 from flask import Flask, current_app
 from pydantic import ValidationError
@ -28,34 +28,15 @@ logger = logging.getLogger(__name__)


 class AgentChatAppGenerator(MessageBasedAppGenerator):
-    @overload
    def generate(
        self,
+        *,
        app_model: App,
        user: Union[Account, EndUser],
-        args: dict,
+        args: Mapping[str, Any],
        invoke_from: InvokeFrom,
-        stream: Literal[True] = True,
-    ) -> Generator[dict, None, None]: ...
-
-    @overload
-    def generate(
-        self,
-        app_model: App,
-        user: Union[Account, EndUser],
-        args: dict,
-        invoke_from: InvokeFrom,
-        stream: Literal[False] = False,
-    ) -> dict: ...
-
-    def generate(
-        self,
-        app_model: App,
-        user: Union[Account, EndUser],
-        args: Any,
-        invoke_from: InvokeFrom,
-        stream: bool = True,
-    ) -> Union[dict, Generator[dict, None, None]]:
+        streaming: bool = True,
+    ) -> Mapping[str, Any] | Generator[str, None, None]:
        """
        Generate App response.

@ -65,7 +46,7 @@ class AgentChatAppGenerator(MessageBasedAppGenerator):
        :param invoke_from: invoke from source
        :param stream: is stream
        """
-        if not stream:
+        if not streaming:
            raise ValueError("Agent Chat App does not support blocking mode")

        if not args.get("query"):
@ -96,7 +77,8 @@ class AgentChatAppGenerator(MessageBasedAppGenerator):

            # validate config
            override_model_config_dict = AgentChatAppConfigManager.config_validate(
-                tenant_id=app_model.tenant_id, config=args.get("model_config")
+                tenant_id=app_model.tenant_id,
+                config=args["model_config"],
            )

            # always enable retriever resource in debugger mode
@ -141,7 +123,7 @@ class AgentChatAppGenerator(MessageBasedAppGenerator):
            files=file_objs,
            parent_message_id=args.get("parent_message_id") if invoke_from != InvokeFrom.SERVICE_API else UUID_NIL,
            user_id=user.id,
-            stream=stream,
+            stream=streaming,
            invoke_from=invoke_from,
            extras=extras,
            call_depth=0,
@ -182,7 +164,7 @@ class AgentChatAppGenerator(MessageBasedAppGenerator):
            conversation=conversation,
            message=message,
            user=user,
-            stream=stream,
+            stream=streaming,
        )

        return AgentChatAppGenerateResponseConverter.convert(response=response, invoke_from=invoke_from)
--- a/api/core/app/apps/base_app_generate_response_converter.py
+++ b/api/core/app/apps/base_app_generate_response_converter.py
@ -1,6 +1,6 @@
 import logging
 from abc import ABC, abstractmethod
-from collections.abc import Generator
+from collections.abc import Generator, Mapping
 from typing import Any, Union

 from core.app.entities.app_invoke_entities import InvokeFrom
@ -14,8 +14,10 @@ class AppGenerateResponseConverter(ABC):

    @classmethod
    def convert(
-        cls, response: Union[AppBlockingResponse, Generator[AppStreamResponse, Any, None]], invoke_from: InvokeFrom
-    ) -> dict[str, Any] | Generator[str, Any, None]:
+        cls,
+        response: Union[AppBlockingResponse, Generator[AppStreamResponse, Any, None]],
+        invoke_from: InvokeFrom,
+    ) -> Mapping[str, Any] | Generator[str, None, None]:
        if invoke_from in {InvokeFrom.DEBUGGER, InvokeFrom.SERVICE_API}:
            if isinstance(response, AppBlockingResponse):
                return cls.convert_blocking_full_response(response)
--- a/api/core/app/apps/chat/app_generator.py
+++ b/api/core/app/apps/chat/app_generator.py
@ -55,7 +55,7 @@ class ChatAppGenerator(MessageBasedAppGenerator):
        user: Union[Account, EndUser],
        args: Any,
        invoke_from: InvokeFrom,
-        stream: bool = True,
+        streaming: bool = True,
    ) -> Union[dict, Generator[str, None, None]]:
        """
        Generate App response.
@ -142,7 +142,7 @@ class ChatAppGenerator(MessageBasedAppGenerator):
            invoke_from=invoke_from,
            extras=extras,
            trace_manager=trace_manager,
-            stream=stream,
+            stream=streaming,
        )

        # init generate records
@ -179,7 +179,7 @@ class ChatAppGenerator(MessageBasedAppGenerator):
            conversation=conversation,
            message=message,
            user=user,
-            stream=stream,
+            stream=streaming,
        )

        return ChatAppGenerateResponseConverter.convert(response=response, invoke_from=invoke_from)
--- a/api/core/app/apps/completion/app_generator.py
+++ b/api/core/app/apps/completion/app_generator.py
@ -50,7 +50,7 @@ class CompletionAppGenerator(MessageBasedAppGenerator):
    ) -> dict: ...

    def generate(
-        self, app_model: App, user: Union[Account, EndUser], args: Any, invoke_from: InvokeFrom, stream: bool = True
+        self, app_model: App, user: Union[Account, EndUser], args: Any, invoke_from: InvokeFrom, streaming: bool = True
    ) -> Union[dict, Generator[str, None, None]]:
        """
        Generate App response.
@ -119,7 +119,7 @@ class CompletionAppGenerator(MessageBasedAppGenerator):
            query=query,
            files=file_objs,
            user_id=user.id,
-            stream=stream,
+            stream=streaming,
            invoke_from=invoke_from,
            extras=extras,
            trace_manager=trace_manager,
@ -158,7 +158,7 @@ class CompletionAppGenerator(MessageBasedAppGenerator):
            conversation=conversation,
            message=message,
            user=user,
-            stream=stream,
+            stream=streaming,
        )

        return CompletionAppGenerateResponseConverter.convert(response=response, invoke_from=invoke_from)
--- a/api/core/app/apps/workflow/app_generator.py
+++ b/api/core/app/apps/workflow/app_generator.py
@ -3,7 +3,7 @@ import logging
 import threading
 import uuid
 from collections.abc import Generator, Mapping, Sequence
-from typing import Any, Literal, Optional, Union, overload
+from typing import Any, Optional, Union

 from flask import Flask, current_app
 from pydantic import ValidationError
@ -30,43 +30,18 @@ logger = logging.getLogger(__name__)


 class WorkflowAppGenerator(BaseAppGenerator):
-    @overload
-    def generate(
-        self,
-        app_model: App,
-        workflow: Workflow,
-        user: Union[Account, EndUser],
-        args: dict,
-        invoke_from: InvokeFrom,
-        stream: Literal[True] = True,
-        call_depth: int = 0,
-        workflow_thread_pool_id: Optional[str] = None,
-    ) -> Generator[str, None, None]: ...
-
-    @overload
-    def generate(
-        self,
-        app_model: App,
-        workflow: Workflow,
-        user: Union[Account, EndUser],
-        args: dict,
-        invoke_from: InvokeFrom,
-        stream: Literal[False] = False,
-        call_depth: int = 0,
-        workflow_thread_pool_id: Optional[str] = None,
-    ) -> dict: ...
-
    def generate(
        self,
+        *,
        app_model: App,
        workflow: Workflow,
-        user: Union[Account, EndUser],
+        user: Account | EndUser,
        args: Mapping[str, Any],
        invoke_from: InvokeFrom,
-        stream: bool = True,
+        streaming: bool = True,
        call_depth: int = 0,
        workflow_thread_pool_id: Optional[str] = None,
-    ):
+    ) -> Mapping[str, Any] | Generator[str, None, None]:
        files: Sequence[Mapping[str, Any]] = args.get("files") or []

        # parse files
@ -101,7 +76,7 @@ class WorkflowAppGenerator(BaseAppGenerator):
            ),
            files=system_files,
            user_id=user.id,
-            stream=stream,
+            stream=streaming,
            invoke_from=invoke_from,
            call_depth=call_depth,
            trace_manager=trace_manager,
@ -115,7 +90,7 @@ class WorkflowAppGenerator(BaseAppGenerator):
            user=user,
            application_generate_entity=application_generate_entity,
            invoke_from=invoke_from,
-            stream=stream,
+            streaming=streaming,
            workflow_thread_pool_id=workflow_thread_pool_id,
        )

@ -127,20 +102,9 @@ class WorkflowAppGenerator(BaseAppGenerator):
        user: Union[Account, EndUser],
        application_generate_entity: WorkflowAppGenerateEntity,
        invoke_from: InvokeFrom,
-        stream: bool = True,
+        streaming: bool = True,
        workflow_thread_pool_id: Optional[str] = None,
-    ) -> dict[str, Any] | Generator[str, None, None]:
-        """
-        Generate App response.
-
-        :param app_model: App
-        :param workflow: Workflow
-        :param user: account or end user
-        :param application_generate_entity: application generate entity
-        :param invoke_from: invoke from source
-        :param stream: is stream
-        :param workflow_thread_pool_id: workflow thread pool id
-        """
+    ) -> Mapping[str, Any] | Generator[str, None, None]:
        # init queue manager
        queue_manager = WorkflowAppQueueManager(
            task_id=application_generate_entity.task_id,
@ -169,14 +133,20 @@ class WorkflowAppGenerator(BaseAppGenerator):
            workflow=workflow,
            queue_manager=queue_manager,
            user=user,
-            stream=stream,
+            stream=streaming,
        )

        return WorkflowAppGenerateResponseConverter.convert(response=response, invoke_from=invoke_from)

    def single_iteration_generate(
-        self, app_model: App, workflow: Workflow, node_id: str, user: Account, args: dict, stream: bool = True
-    ) -> dict[str, Any] | Generator[str, Any, None]:
+        self,
+        app_model: App,
+        workflow: Workflow,
+        node_id: str,
+        user: Account,
+        args: Mapping[str, Any],
+        streaming: bool = True,
+    ) -> Mapping[str, Any] | Generator[str, None, None]:
        """
        Generate App response.

@ -203,7 +173,7 @@ class WorkflowAppGenerator(BaseAppGenerator):
            inputs={},
            files=[],
            user_id=user.id,
-            stream=stream,
+            stream=streaming,
            invoke_from=InvokeFrom.DEBUGGER,
            extras={"auto_generate_conversation_name": False},
            single_iteration_run=WorkflowAppGenerateEntity.SingleIterationRunEntity(
@ -218,7 +188,7 @@ class WorkflowAppGenerator(BaseAppGenerator):
            user=user,
            invoke_from=InvokeFrom.DEBUGGER,
            application_generate_entity=application_generate_entity,
-            stream=stream,
+            streaming=streaming,
        )

    def _generate_worker(
--- a/api/core/app/apps/workflow/generate_task_pipeline.py
+++ b/api/core/app/apps/workflow/generate_task_pipeline.py
@ -106,6 +106,7 @@ class WorkflowAppGenerateTaskPipeline(BasedGenerateTaskPipeline, WorkflowCycleMa

        self._task_state = WorkflowTaskState()
        self._wip_workflow_node_executions = {}
+        self.total_tokens: int = 0

    def process(self) -> Union[WorkflowAppBlockingResponse, Generator[WorkflowAppStreamResponse, None, None]]:
        """
@ -319,6 +320,8 @@ class WorkflowAppGenerateTaskPipeline(BasedGenerateTaskPipeline, WorkflowCycleMa
                if not workflow_run:
                    raise Exception("Workflow run not initialized.")

+                # FIXME for issue #11221 quick fix maybe have a better solution
+                self.total_tokens += event.metadata.get("total_tokens", 0) if event.metadata else 0
                yield self._workflow_iteration_completed_to_stream_response(
                    task_id=self._application_generate_entity.task_id, workflow_run=workflow_run, event=event
                )
@ -332,7 +335,7 @@ class WorkflowAppGenerateTaskPipeline(BasedGenerateTaskPipeline, WorkflowCycleMa
                workflow_run = self._handle_workflow_run_success(
                    workflow_run=workflow_run,
                    start_at=graph_runtime_state.start_at,
-                    total_tokens=graph_runtime_state.total_tokens,
+                    total_tokens=graph_runtime_state.total_tokens or self.total_tokens,
                    total_steps=graph_runtime_state.node_run_steps,
                    outputs=event.outputs,
                    conversation_id=None,
--- a/api/core/app/apps/workflow_app_runner.py
+++ b/api/core/app/apps/workflow_app_runner.py
@ -43,7 +43,7 @@ from core.workflow.graph_engine.entities.event import (
 )
 from core.workflow.graph_engine.entities.graph import Graph
 from core.workflow.nodes import NodeType
-from core.workflow.nodes.node_mapping import node_type_classes_mapping
+from core.workflow.nodes.node_mapping import NODE_TYPE_CLASSES_MAPPING
 from core.workflow.workflow_entry import WorkflowEntry
 from extensions.ext_database import db
 from models.model import App
@ -138,7 +138,8 @@ class WorkflowBasedAppRunner(AppRunner):

        # Get node class
        node_type = NodeType(iteration_node_config.get("data", {}).get("type"))
-        node_cls = node_type_classes_mapping[node_type]
+        node_version = iteration_node_config.get("data", {}).get("version", "1")
+        node_cls = NODE_TYPE_CLASSES_MAPPING[node_type][node_version]

        # init variable pool
        variable_pool = VariablePool(
--- a/api/core/app/features/rate_limiting/rate_limit.py
+++ b/api/core/app/features/rate_limiting/rate_limit.py
@ -1,9 +1,9 @@
 import logging
 import time
 import uuid
-from collections.abc import Generator
+from collections.abc import Generator, Mapping
 from datetime import timedelta
-from typing import Optional, Union
+from typing import Any, Optional, Union

 from core.errors.error import AppInvokeQuotaExceededError
 from extensions.ext_redis import redis_client
@ -88,20 +88,17 @@ class RateLimit:
    def gen_request_key() -> str:
        return str(uuid.uuid4())

-    def generate(self, generator: Union[Generator, callable, dict], request_id: str):
-        if isinstance(generator, dict):
+    def generate(self, generator: Union[Generator[str, None, None], Mapping[str, Any]], request_id: str):
+        if isinstance(generator, Mapping):
            return generator
        else:
-            return RateLimitGenerator(self, generator, request_id)
+            return RateLimitGenerator(rate_limit=self, generator=generator, request_id=request_id)


 class RateLimitGenerator:
-    def __init__(self, rate_limit: RateLimit, generator: Union[Generator, callable], request_id: str):
+    def __init__(self, rate_limit: RateLimit, generator: Generator[str, None, None], request_id: str):
        self.rate_limit = rate_limit
-        if callable(generator):
-            self.generator = generator()
-        else:
-            self.generator = generator
+        self.generator = generator
        self.request_id = request_id
        self.closed = False

--- a/api/core/app/task_pipeline/workflow_cycle_manage.py
+++ b/api/core/app/task_pipeline/workflow_cycle_manage.py
@ -340,7 +340,7 @@ class WorkflowCycleManage:
                WorkflowNodeExecution.status: WorkflowNodeExecutionStatus.FAILED.value,
                WorkflowNodeExecution.error: event.error,
                WorkflowNodeExecution.inputs: json.dumps(inputs) if inputs else None,
-                WorkflowNodeExecution.process_data: json.dumps(event.process_data) if event.process_data else None,
+                WorkflowNodeExecution.process_data: json.dumps(process_data) if process_data else None,
                WorkflowNodeExecution.outputs: json.dumps(outputs) if outputs else None,
                WorkflowNodeExecution.finished_at: finished_at,
                WorkflowNodeExecution.elapsed_time: elapsed_time,
--- a/api/core/file/init.py
+++ b/api/core/file/init.py
@ -7,13 +7,13 @@ from .models import (
 )

 __all__ = [
+    "FILE_MODEL_IDENTITY",
+    "ArrayFileAttribute",
+    "File",
+    "FileAttribute",
+    "FileBelongsTo",
+    "FileTransferMethod",
    "FileType",
    "FileUploadConfig",
-    "FileTransferMethod",
-    "FileBelongsTo",
-    "File",
    "ImageConfig",
-    "FileAttribute",
-    "ArrayFileAttribute",
-    "FILE_MODEL_IDENTITY",
 ]
--- a/api/core/helper/ssrf_proxy.py
+++ b/api/core/helper/ssrf_proxy.py
@ -53,8 +53,6 @@ def make_request(method, url, max_retries=SSRF_DEFAULT_MAX_RETRIES, **kwargs):
                    response = client.request(method=method, url=url, **kwargs)

            if response.status_code not in STATUS_FORCELIST:
-                if stream:
-                    return response.iter_bytes()
                return response
            else:
                logging.warning(f"Received status code {response.status_code} for URL {url} which is in the force list")
--- a/api/core/llm_generator/output_parser/suggested_questions_after_answer.py
+++ b/api/core/llm_generator/output_parser/suggested_questions_after_answer.py
@ -15,6 +15,5 @@ class SuggestedQuestionsAfterAnswerOutputParser:
            json_obj = json.loads(action_match.group(0).strip())
        else:
            json_obj = []
-            print(f"Could not parse LLM output: {text}")

        return json_obj
--- a/api/core/model_runtime/entities/init.py
+++ b/api/core/model_runtime/entities/init.py
@ -18,25 +18,25 @@ from .message_entities import (
 from .model_entities import ModelPropertyKey

 __all__ = [
+    "AssistantPromptMessage",
+    "AudioPromptMessageContent",
+    "DocumentPromptMessageContent",
    "ImagePromptMessageContent",
-    "VideoPromptMessageContent",
-    "PromptMessage",
-    "PromptMessageRole",
+    "LLMResult",
+    "LLMResultChunk",
+    "LLMResultChunkDelta",
    "LLMUsage",
    "ModelPropertyKey",
-    "AssistantPromptMessage",
+    "PromptMessage",
    "PromptMessage",
    "PromptMessageContent",
+    "PromptMessageContentType",
    "PromptMessageRole",
+    "PromptMessageRole",
+    "PromptMessageTool",
    "SystemPromptMessage",
    "TextPromptMessageContent",
-    "UserPromptMessage",
-    "PromptMessageTool",
    "ToolPromptMessage",
-    "PromptMessageContentType",
-    "LLMResult",
-    "LLMResultChunk",
-    "LLMResultChunkDelta",
-    "AudioPromptMessageContent",
-    "DocumentPromptMessageContent",
+    "UserPromptMessage",
+    "VideoPromptMessageContent",
 ]
--- a/api/core/model_runtime/model_providers/anthropic/llm/llm.py
+++ b/api/core/model_runtime/model_providers/anthropic/llm/llm.py
@ -483,6 +483,10 @@ class AnthropicLargeLanguageModel(LargeLanguageModel):
                if isinstance(message, UserPromptMessage):
                    message = cast(UserPromptMessage, message)
                    if isinstance(message.content, str):
+                        # handle empty user prompt see #10013 #10520
+                        # responses, ignore user prompts containing only whitespace, the Claude API can't handle it.
+                        if not message.content.strip():
+                            continue
                        message_dict = {"role": "user", "content": message.content}
                        prompt_message_dicts.append(message_dict)
                    else:
--- a/api/core/model_runtime/model_providers/azure_openai/llm/llm.py
+++ b/api/core/model_runtime/model_providers/azure_openai/llm/llm.py
@ -598,6 +598,9 @@ class AzureOpenAILargeLanguageModel(_CommonAzureOpenAI, LargeLanguageModel):
            # message = cast(AssistantPromptMessage, message)
            message_dict = {"role": "assistant", "content": message.content}
            if message.tool_calls:
+                # fix azure when enable json schema cant process content = "" in assistant fix with None
+                if not message.content:
+                    message_dict["content"] = None
                message_dict["tool_calls"] = [helper.dump_model(tool_call) for tool_call in message.tool_calls]
        elif isinstance(message, SystemPromptMessage):
            message = cast(SystemPromptMessage, message)
--- a/api/core/model_runtime/model_providers/azure_openai/tts/tts.py
+++ b/api/core/model_runtime/model_providers/azure_openai/tts/tts.py
@ -14,7 +14,7 @@ from core.model_runtime.model_providers.azure_openai._constant import TTS_BASE_M

 class AzureOpenAIText2SpeechModel(_CommonAzureOpenAI, TTSModel):
    """
-    Model class for OpenAI Speech to text model.
+    Model class for OpenAI text2speech model.
    """

    def _invoke(
--- a/api/core/model_runtime/model_providers/bedrock/llm/amazon.nova-lite-v1.yaml
+++ b/api/core/model_runtime/model_providers/bedrock/llm/amazon.nova-lite-v1.yaml
@ -0,0 +1,52 @@
+model: amazon.nova-lite-v1:0
+label:
+  en_US: Nova Lite V1
+model_type: llm
+features:
+  - agent-thought
+  - tool-call
+  - stream-tool-call
+model_properties:
+  mode: chat
+  context_size: 300000
+parameter_rules:
+  - name: max_new_tokens
+    use_template: max_tokens
+    required: true
+    default: 2048
+    min: 1
+    max: 5000
+  - name: temperature
+    use_template: temperature
+    required: false
+    type: float
+    default: 1
+    min: 0.0
+    max: 1.0
+    help:
+      zh_Hans: 生成内容的随机性。
+      en_US: The amount of randomness injected into the response.
+  - name: top_p
+    required: false
+    type: float
+    default: 0.999
+    min: 0.000
+    max: 1.000
+    help:
+      zh_Hans: 在核采样中，Anthropic Claude 按概率递减顺序计算每个后续标记的所有选项的累积分布，并在达到 top_p 指定的特定概率时将其切断。您应该更改温度或top_p，但不能同时更改两者。
+      en_US: In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.
+  - name: top_k
+    required: false
+    type: int
+    default: 0
+    min: 0
+    # tip docs from aws has error, max value is 500
+    max: 500
+    help:
+      zh_Hans: 对于每个后续标记，仅从前 K 个选项中进行采样。使用 top_k 删除长尾低概率响应。
+      en_US: Only sample from the top K options for each subsequent token. Use top_k to remove long tail low probability responses.
+pricing:
+  input: '0.00006'
+  output: '0.00024'
+  unit: '0.001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/bedrock/llm/amazon.nova-micro-v1.yaml
+++ b/api/core/model_runtime/model_providers/bedrock/llm/amazon.nova-micro-v1.yaml
@ -0,0 +1,52 @@
+model: amazon.nova-micro-v1:0
+label:
+  en_US: Nova Micro V1
+model_type: llm
+features:
+  - agent-thought
+  - tool-call
+  - stream-tool-call
+model_properties:
+  mode: chat
+  context_size: 128000
+parameter_rules:
+  - name: max_new_tokens
+    use_template: max_tokens
+    required: true
+    default: 2048
+    min: 1
+    max: 5000
+  - name: temperature
+    use_template: temperature
+    required: false
+    type: float
+    default: 1
+    min: 0.0
+    max: 1.0
+    help:
+      zh_Hans: 生成内容的随机性。
+      en_US: The amount of randomness injected into the response.
+  - name: top_p
+    required: false
+    type: float
+    default: 0.999
+    min: 0.000
+    max: 1.000
+    help:
+      zh_Hans: 在核采样中，Anthropic Claude 按概率递减顺序计算每个后续标记的所有选项的累积分布，并在达到 top_p 指定的特定概率时将其切断。您应该更改温度或top_p，但不能同时更改两者。
+      en_US: In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.
+  - name: top_k
+    required: false
+    type: int
+    default: 0
+    min: 0
+    # tip docs from aws has error, max value is 500
+    max: 500
+    help:
+      zh_Hans: 对于每个后续标记，仅从前 K 个选项中进行采样。使用 top_k 删除长尾低概率响应。
+      en_US: Only sample from the top K options for each subsequent token. Use top_k to remove long tail low probability responses.
+pricing:
+  input: '0.000035'
+  output: '0.00014'
+  unit: '0.001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/bedrock/llm/amazon.nova-pro-v1.yaml
+++ b/api/core/model_runtime/model_providers/bedrock/llm/amazon.nova-pro-v1.yaml
@ -0,0 +1,52 @@
+model: amazon.nova-pro-v1:0
+label:
+  en_US: Nova Pro V1
+model_type: llm
+features:
+  - agent-thought
+  - tool-call
+  - stream-tool-call
+model_properties:
+  mode: chat
+  context_size: 300000
+parameter_rules:
+  - name: max_new_tokens
+    use_template: max_tokens
+    required: true
+    default: 2048
+    min: 1
+    max: 5000
+  - name: temperature
+    use_template: temperature
+    required: false
+    type: float
+    default: 1
+    min: 0.0
+    max: 1.0
+    help:
+      zh_Hans: 生成内容的随机性。
+      en_US: The amount of randomness injected into the response.
+  - name: top_p
+    required: false
+    type: float
+    default: 0.999
+    min: 0.000
+    max: 1.000
+    help:
+      zh_Hans: 在核采样中，Anthropic Claude 按概率递减顺序计算每个后续标记的所有选项的累积分布，并在达到 top_p 指定的特定概率时将其切断。您应该更改温度或top_p，但不能同时更改两者。
+      en_US: In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.
+  - name: top_k
+    required: false
+    type: int
+    default: 0
+    min: 0
+    # tip docs from aws has error, max value is 500
+    max: 500
+    help:
+      zh_Hans: 对于每个后续标记，仅从前 K 个选项中进行采样。使用 top_k 删除长尾低概率响应。
+      en_US: Only sample from the top K options for each subsequent token. Use top_k to remove long tail low probability responses.
+pricing:
+  input: '0.0008'
+  output: '0.0032'
+  unit: '0.001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/bedrock/llm/llm.py
+++ b/api/core/model_runtime/model_providers/bedrock/llm/llm.py
@ -70,6 +70,8 @@ class BedrockLargeLanguageModel(LargeLanguageModel):
        {"prefix": "cohere.command-r", "support_system_prompts": True, "support_tool_use": True},
        {"prefix": "amazon.titan", "support_system_prompts": False, "support_tool_use": False},
        {"prefix": "ai21.jamba-1-5", "support_system_prompts": True, "support_tool_use": False},
+        {"prefix": "amazon.nova", "support_system_prompts": True, "support_tool_use": False},
+        {"prefix": "us.amazon.nova", "support_system_prompts": True, "support_tool_use": False},
    ]

    @staticmethod
@ -194,6 +196,13 @@ class BedrockLargeLanguageModel(LargeLanguageModel):
        if model_info["support_tool_use"] and tools:
            parameters["toolConfig"] = self._convert_converse_tool_config(tools=tools)
        try:
+            # for issue #10976
+            conversations_list = parameters["messages"]
+            # if two consecutive user messages found, combine them into one message
+            for i in range(len(conversations_list) - 2, -1, -1):
+                if conversations_list[i]["role"] == conversations_list[i + 1]["role"]:
+                    conversations_list[i]["content"].extend(conversations_list.pop(i + 1)["content"])
+
            if stream:
                response = bedrock_client.converse_stream(**parameters)
                return self._handle_converse_stream_response(
--- a/api/core/model_runtime/model_providers/bedrock/llm/us.amazon.nova-lite-v1.yaml
+++ b/api/core/model_runtime/model_providers/bedrock/llm/us.amazon.nova-lite-v1.yaml
@ -0,0 +1,52 @@
+model: us.amazon.nova-lite-v1:0
+label:
+  en_US: Nova Lite V1 (US.Cross Region Inference)
+model_type: llm
+features:
+  - agent-thought
+  - tool-call
+  - stream-tool-call
+model_properties:
+  mode: chat
+  context_size: 300000
+parameter_rules:
+  - name: max_new_tokens
+    use_template: max_tokens
+    required: true
+    default: 2048
+    min: 1
+    max: 5000
+  - name: temperature
+    use_template: temperature
+    required: false
+    type: float
+    default: 1
+    min: 0.0
+    max: 1.0
+    help:
+      zh_Hans: 生成内容的随机性。
+      en_US: The amount of randomness injected into the response.
+  - name: top_p
+    required: false
+    type: float
+    default: 0.999
+    min: 0.000
+    max: 1.000
+    help:
+      zh_Hans: 在核采样中，Anthropic Claude 按概率递减顺序计算每个后续标记的所有选项的累积分布，并在达到 top_p 指定的特定概率时将其切断。您应该更改温度或top_p，但不能同时更改两者。
+      en_US: In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.
+  - name: top_k
+    required: false
+    type: int
+    default: 0
+    min: 0
+    # tip docs from aws has error, max value is 500
+    max: 500
+    help:
+      zh_Hans: 对于每个后续标记，仅从前 K 个选项中进行采样。使用 top_k 删除长尾低概率响应。
+      en_US: Only sample from the top K options for each subsequent token. Use top_k to remove long tail low probability responses.
+pricing:
+  input: '0.00006'
+  output: '0.00024'
+  unit: '0.001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/bedrock/llm/us.amazon.nova-micro-v1.yaml
+++ b/api/core/model_runtime/model_providers/bedrock/llm/us.amazon.nova-micro-v1.yaml
@ -0,0 +1,52 @@
+model: us.amazon.nova-micro-v1:0
+label:
+  en_US: Nova Micro V1 (US.Cross Region Inference)
+model_type: llm
+features:
+  - agent-thought
+  - tool-call
+  - stream-tool-call
+model_properties:
+  mode: chat
+  context_size: 128000
+parameter_rules:
+  - name: max_new_tokens
+    use_template: max_tokens
+    required: true
+    default: 2048
+    min: 1
+    max: 5000
+  - name: temperature
+    use_template: temperature
+    required: false
+    type: float
+    default: 1
+    min: 0.0
+    max: 1.0
+    help:
+      zh_Hans: 生成内容的随机性。
+      en_US: The amount of randomness injected into the response.
+  - name: top_p
+    required: false
+    type: float
+    default: 0.999
+    min: 0.000
+    max: 1.000
+    help:
+      zh_Hans: 在核采样中，Anthropic Claude 按概率递减顺序计算每个后续标记的所有选项的累积分布，并在达到 top_p 指定的特定概率时将其切断。您应该更改温度或top_p，但不能同时更改两者。
+      en_US: In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.
+  - name: top_k
+    required: false
+    type: int
+    default: 0
+    min: 0
+    # tip docs from aws has error, max value is 500
+    max: 500
+    help:
+      zh_Hans: 对于每个后续标记，仅从前 K 个选项中进行采样。使用 top_k 删除长尾低概率响应。
+      en_US: Only sample from the top K options for each subsequent token. Use top_k to remove long tail low probability responses.
+pricing:
+  input: '0.000035'
+  output: '0.00014'
+  unit: '0.001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/bedrock/llm/us.amazon.nova-pro-v1.yaml
+++ b/api/core/model_runtime/model_providers/bedrock/llm/us.amazon.nova-pro-v1.yaml
@ -0,0 +1,52 @@
+model: us.amazon.nova-pro-v1:0
+label:
+  en_US: Nova Pro V1 (US.Cross Region Inference)
+model_type: llm
+features:
+  - agent-thought
+  - tool-call
+  - stream-tool-call
+model_properties:
+  mode: chat
+  context_size: 300000
+parameter_rules:
+  - name: max_new_tokens
+    use_template: max_tokens
+    required: true
+    default: 2048
+    min: 1
+    max: 5000
+  - name: temperature
+    use_template: temperature
+    required: false
+    type: float
+    default: 1
+    min: 0.0
+    max: 1.0
+    help:
+      zh_Hans: 生成内容的随机性。
+      en_US: The amount of randomness injected into the response.
+  - name: top_p
+    required: false
+    type: float
+    default: 0.999
+    min: 0.000
+    max: 1.000
+    help:
+      zh_Hans: 在核采样中，Anthropic Claude 按概率递减顺序计算每个后续标记的所有选项的累积分布，并在达到 top_p 指定的特定概率时将其切断。您应该更改温度或top_p，但不能同时更改两者。
+      en_US: In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.
+  - name: top_k
+    required: false
+    type: int
+    default: 0
+    min: 0
+    # tip docs from aws has error, max value is 500
+    max: 500
+    help:
+      zh_Hans: 对于每个后续标记，仅从前 K 个选项中进行采样。使用 top_k 删除长尾低概率响应。
+      en_US: Only sample from the top K options for each subsequent token. Use top_k to remove long tail low probability responses.
+pricing:
+  input: '0.0008'
+  output: '0.0032'
+  unit: '0.001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/gitee_ai/llm/llm.py
+++ b/api/core/model_runtime/model_providers/gitee_ai/llm/llm.py
@ -32,12 +32,12 @@ class GiteeAILargeLanguageModel(OAIAPICompatLargeLanguageModel):
        return super()._invoke(model, credentials, prompt_messages, model_parameters, tools, stop, stream, user)

    def validate_credentials(self, model: str, credentials: dict) -> None:
-        self._add_custom_parameters(credentials, model, None)
+        self._add_custom_parameters(credentials, None)
        super().validate_credentials(model, credentials)

-    def _add_custom_parameters(self, credentials: dict, model: str, model_parameters: dict) -> None:
+    def _add_custom_parameters(self, credentials: dict, model: Optional[str]) -> None:
        if model is None:
-            model = "bge-large-zh-v1.5"
+            model = "Qwen2-72B-Instruct"

        model_identity = GiteeAILargeLanguageModel.MODEL_TO_IDENTITY.get(model, model)
        credentials["endpoint_url"] = f"https://ai.gitee.com/api/serverless/{model_identity}/"
@ -47,5 +47,7 @@ class GiteeAILargeLanguageModel(OAIAPICompatLargeLanguageModel):
            credentials["mode"] = LLMMode.CHAT.value

        schema = self.get_model_schema(model, credentials)
+        assert schema is not None, f"Model schema not found for model {model}"
+        assert schema.features is not None, f"Model features not found for model {model}"
        if ModelFeature.TOOL_CALL in schema.features or ModelFeature.MULTI_TOOL_CALL in schema.features:
            credentials["function_calling_type"] = "tool_call"
--- a/api/core/model_runtime/model_providers/gitee_ai/tts/tts.py
+++ b/api/core/model_runtime/model_providers/gitee_ai/tts/tts.py
@ -10,7 +10,7 @@ from core.model_runtime.model_providers.gitee_ai._common import _CommonGiteeAI

 class GiteeAIText2SpeechModel(_CommonGiteeAI, TTSModel):
    """
-    Model class for OpenAI Speech to text model.
+    Model class for OpenAI text2speech model.
    """

    def _invoke(
--- a/api/core/model_runtime/model_providers/google/llm/llm.py
+++ b/api/core/model_runtime/model_providers/google/llm/llm.py
@ -254,8 +254,12 @@ class GoogleLargeLanguageModel(LargeLanguageModel):
        assistant_prompt_message = AssistantPromptMessage(content=response.text)

        # calculate num tokens
-        prompt_tokens = self.get_num_tokens(model, credentials, prompt_messages)
-        completion_tokens = self.get_num_tokens(model, credentials, [assistant_prompt_message])
+        if response.usage_metadata:
+            prompt_tokens = response.usage_metadata.prompt_token_count
+            completion_tokens = response.usage_metadata.candidates_token_count
+        else:
+            prompt_tokens = self.get_num_tokens(model, credentials, prompt_messages)
+            completion_tokens = self.get_num_tokens(model, credentials, [assistant_prompt_message])

        # transform usage
        usage = self._calc_response_usage(model, credentials, prompt_tokens, completion_tokens)
--- a/api/core/model_runtime/model_providers/moonshot/llm/llm.py
+++ b/api/core/model_runtime/model_providers/moonshot/llm/llm.py
@ -252,7 +252,7 @@ class MoonshotLargeLanguageModel(OAIAPICompatLargeLanguageModel):
                # ignore sse comments
                if chunk.startswith(":"):
                    continue
-                decoded_chunk = chunk.strip().lstrip("data: ").lstrip()
+                decoded_chunk = chunk.strip().removeprefix("data: ")
                chunk_json = None
                try:
                    chunk_json = json.loads(decoded_chunk)
--- a/api/core/model_runtime/model_providers/openai/tts/tts.py
+++ b/api/core/model_runtime/model_providers/openai/tts/tts.py
@ -11,7 +11,7 @@ from core.model_runtime.model_providers.openai._common import _CommonOpenAI

 class OpenAIText2SpeechModel(_CommonOpenAI, TTSModel):
    """
-    Model class for OpenAI Speech to text model.
+    Model class for OpenAI text2speech model.
    """

    def _invoke(
--- a/api/core/model_runtime/model_providers/openai_api_compatible/llm/llm.py
+++ b/api/core/model_runtime/model_providers/openai_api_compatible/llm/llm.py
@ -462,7 +462,7 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):
                # ignore sse comments
                if chunk.startswith(":"):
                    continue
-                decoded_chunk = chunk.strip().lstrip("data: ").lstrip()
+                decoded_chunk = chunk.strip().removeprefix("data: ")
                if decoded_chunk == "[DONE]":  # Some provider returns "data: [DONE]"
                    continue

--- a/api/core/model_runtime/model_providers/openai_api_compatible/openai_api_compatible.yaml
+++ b/api/core/model_runtime/model_providers/openai_api_compatible/openai_api_compatible.yaml
@ -9,6 +9,7 @@ supported_model_types:
  - text-embedding
  - speech2text
  - rerank
+  - tts
 configurate_methods:
  - customizable-model
 model_credential_schema:
@ -67,7 +68,7 @@ model_credential_schema:
        - variable: __model_type
          value: llm
      type: text-input
-      default: '4096'
+      default: "4096"
      placeholder:
        zh_Hans: 在此输入您的模型上下文长度
        en_US: Enter your Model context size
@ -80,7 +81,7 @@ model_credential_schema:
        - variable: __model_type
          value: text-embedding
      type: text-input
-      default: '4096'
+      default: "4096"
      placeholder:
        zh_Hans: 在此输入您的模型上下文长度
        en_US: Enter your Model context size
@ -93,7 +94,7 @@ model_credential_schema:
        - variable: __model_type
          value: rerank
      type: text-input
-      default: '4096'
+      default: "4096"
      placeholder:
        zh_Hans: 在此输入您的模型上下文长度
        en_US: Enter your Model context size
@ -104,7 +105,7 @@ model_credential_schema:
      show_on:
        - variable: __model_type
          value: llm
-      default: '4096'
+      default: "4096"
      type: text-input
    - variable: function_calling_type
      show_on:
@ -174,3 +175,19 @@ model_credential_schema:
          value: llm
      default: '\n\n'
      type: text-input
+    - variable: voices
+      show_on:
+        - variable: __model_type
+          value: tts
+      label:
+        en_US: Available Voices (comma-separated)
+        zh_Hans: 可用声音（用英文逗号分隔）
+      type: text-input
+      required: false
+      default: "alloy"
+      placeholder:
+        en_US: "alloy,echo,fable,onyx,nova,shimmer"
+        zh_Hans: "alloy,echo,fable,onyx,nova,shimmer"
+      help:
+        en_US: "List voice names separated by commas. First voice will be used as default."
+        zh_Hans: "用英文逗号分隔的声音列表。第一个声音将作为默认值。"
--- a/api/core/model_runtime/model_providers/openai_api_compatible/text_embedding/text_embedding.py
+++ b/api/core/model_runtime/model_providers/openai_api_compatible/text_embedding/text_embedding.py
@ -139,13 +139,17 @@ class OAICompatEmbeddingModel(_CommonOaiApiCompat, TextEmbeddingModel):
            if api_key:
                headers["Authorization"] = f"Bearer {api_key}"

-            endpoint_url = credentials.get("endpoint_url")
+            endpoint_url = credentials.get("endpoint_url", "")
            if not endpoint_url.endswith("/"):
                endpoint_url += "/"

            endpoint_url = urljoin(endpoint_url, "embeddings")

            payload = {"input": "ping", "model": model}
+            # For nvidia models, the "input_type":"query" need in the payload
+            # more to check issue #11193 or NvidiaTextEmbeddingModel
+            if model.startswith("nvidia/"):
+                payload["input_type"] = "query"

            response = requests.post(url=endpoint_url, headers=headers, data=json.dumps(payload), timeout=(10, 300))

--- a/api/core/model_runtime/model_providers/openai_api_compatible/tts/init.py
+++ b/api/core/model_runtime/model_providers/openai_api_compatible/tts/init.py
--- a/api/core/model_runtime/model_providers/openai_api_compatible/tts/tts.py
+++ b/api/core/model_runtime/model_providers/openai_api_compatible/tts/tts.py
@ -0,0 +1,145 @@
+from collections.abc import Iterable
+from typing import Optional
+from urllib.parse import urljoin
+
+import requests
+
+from core.model_runtime.entities.common_entities import I18nObject
+from core.model_runtime.entities.model_entities import AIModelEntity, FetchFrom, ModelPropertyKey, ModelType
+from core.model_runtime.errors.invoke import InvokeBadRequestError
+from core.model_runtime.errors.validate import CredentialsValidateFailedError
+from core.model_runtime.model_providers.__base.tts_model import TTSModel
+from core.model_runtime.model_providers.openai_api_compatible._common import _CommonOaiApiCompat
+
+
+class OAICompatText2SpeechModel(_CommonOaiApiCompat, TTSModel):
+    """
+    Model class for OpenAI-compatible text2speech model.
+    """
+
+    def _invoke(
+        self,
+        model: str,
+        tenant_id: str,
+        credentials: dict,
+        content_text: str,
+        voice: str,
+        user: Optional[str] = None,
+    ) -> Iterable[bytes]:
+        """
+        Invoke TTS model
+
+        :param model: model name
+        :param tenant_id: user tenant id
+        :param credentials: model credentials
+        :param content_text: text content to be translated
+        :param voice: model voice/speaker
+        :param user: unique user id
+        :return: audio data as bytes iterator
+        """
+        # Set up headers with authentication if provided
+        headers = {}
+        if api_key := credentials.get("api_key"):
+            headers["Authorization"] = f"Bearer {api_key}"
+
+        # Construct endpoint URL
+        endpoint_url = credentials.get("endpoint_url")
+        if not endpoint_url.endswith("/"):
+            endpoint_url += "/"
+        endpoint_url = urljoin(endpoint_url, "audio/speech")
+
+        # Get audio format from model properties
+        audio_format = self._get_model_audio_type(model, credentials)
+
+        # Split text into chunks if needed based on word limit
+        word_limit = self._get_model_word_limit(model, credentials)
+        sentences = self._split_text_into_sentences(content_text, word_limit)
+
+        for sentence in sentences:
+            # Prepare request payload
+            payload = {"model": model, "input": sentence, "voice": voice, "response_format": audio_format}
+
+            # Make POST request
+            response = requests.post(endpoint_url, headers=headers, json=payload, stream=True)
+
+            if response.status_code != 200:
+                raise InvokeBadRequestError(response.text)
+
+            # Stream the audio data
+            for chunk in response.iter_content(chunk_size=4096):
+                if chunk:
+                    yield chunk
+
+    def validate_credentials(self, model: str, credentials: dict) -> None:
+        """
+        Validate model credentials
+
+        :param model: model name
+        :param credentials: model credentials
+        :return:
+        """
+        try:
+            # Get default voice for validation
+            voice = self._get_model_default_voice(model, credentials)
+
+            # Test with a simple text
+            next(
+                self._invoke(
+                    model=model, tenant_id="validate", credentials=credentials, content_text="Test.", voice=voice
+                )
+            )
+        except Exception as ex:
+            raise CredentialsValidateFailedError(str(ex))
+
+    def get_customizable_model_schema(self, model: str, credentials: dict) -> Optional[AIModelEntity]:
+        """
+        Get customizable model schema
+        """
+        # Parse voices from comma-separated string
+        voice_names = credentials.get("voices", "alloy").strip().split(",")
+        voices = []
+
+        for voice in voice_names:
+            voice = voice.strip()
+            if not voice:
+                continue
+
+            # Use en-US for all voices
+            voices.append(
+                {
+                    "name": voice,
+                    "mode": voice,
+                    "language": "en-US",
+                }
+            )
+
+        # If no voices provided or all voices were empty strings, use 'alloy' as default
+        if not voices:
+            voices = [{"name": "Alloy", "mode": "alloy", "language": "en-US"}]
+
+        return AIModelEntity(
+            model=model,
+            label=I18nObject(en_US=model),
+            fetch_from=FetchFrom.CUSTOMIZABLE_MODEL,
+            model_type=ModelType.TTS,
+            model_properties={
+                ModelPropertyKey.AUDIO_TYPE: credentials.get("audio_type", "mp3"),
+                ModelPropertyKey.WORD_LIMIT: int(credentials.get("word_limit", 4096)),
+                ModelPropertyKey.DEFAULT_VOICE: voices[0]["mode"],
+                ModelPropertyKey.VOICES: voices,
+            },
+        )
+
+    def get_tts_model_voices(self, model: str, credentials: dict, language: Optional[str] = None) -> list:
+        """
+        Override base get_tts_model_voices to handle customizable voices
+        """
+        model_schema = self.get_customizable_model_schema(model, credentials)
+
+        if not model_schema or ModelPropertyKey.VOICES not in model_schema.model_properties:
+            raise ValueError("this model does not support voice")
+
+        voices = model_schema.model_properties[ModelPropertyKey.VOICES]
+
+        # Always return all voices regardless of language
+        return [{"name": d["name"], "value": d["mode"]} for d in voices]
--- a/api/core/model_runtime/model_providers/stepfun/llm/llm.py
+++ b/api/core/model_runtime/model_providers/stepfun/llm/llm.py
@ -250,7 +250,7 @@ class StepfunLargeLanguageModel(OAIAPICompatLargeLanguageModel):
                # ignore sse comments
                if chunk.startswith(":"):
                    continue
-                decoded_chunk = chunk.strip().lstrip("data: ").lstrip()
+                decoded_chunk = chunk.strip().removeprefix("data: ")
                chunk_json = None
                try:
                    chunk_json = json.loads(decoded_chunk)
--- a/api/core/model_runtime/model_providers/volcengine_maas/legacy/volc_sdk/init.py
+++ b/api/core/model_runtime/model_providers/volcengine_maas/legacy/volc_sdk/init.py
@ -1,4 +1,4 @@
 from .common import ChatRole
 from .maas import MaasError, MaasService

-__all__ = ["MaasService", "ChatRole", "MaasError"]
+__all__ = ["ChatRole", "MaasError", "MaasService"]
--- a/api/core/model_runtime/model_providers/wenxin/rerank/rerank.py
+++ b/api/core/model_runtime/model_providers/wenxin/rerank/rerank.py
@ -17,7 +17,13 @@ class WenxinRerank(_CommonWenxin):
    def rerank(self, model: str, query: str, docs: list[str], top_n: Optional[int] = None):
        access_token = self._get_access_token()
        url = f"{self.api_bases[model]}?access_token={access_token}"
-
+        # For issue #11252
+        # for wenxin Rerank model top_n length should be equal or less than docs length
+        if top_n is not None and top_n > len(docs):
+            top_n = len(docs)
+        # for wenxin Rerank model, query should not be an empty string
+        if query == "":
+            query = " "  # FIXME: this is a workaround for wenxin rerank model for better user experience.
        try:
            response = httpx.post(
                url,
@ -25,7 +31,11 @@ class WenxinRerank(_CommonWenxin):
                headers={"Content-Type": "application/json"},
            )
            response.raise_for_status()
-            return response.json()
+            data = response.json()
+            # wenxin error handling
+            if "error_code" in data:
+                raise InternalServerError(data["error_msg"])
+            return data
        except httpx.HTTPStatusError as e:
            raise InternalServerError(str(e))

@ -69,6 +79,9 @@ class WenxinRerankModel(RerankModel):
            results = wenxin_rerank.rerank(model, query, docs, top_n)

            rerank_documents = []
+            if "results" not in results:
+                raise ValueError("results key not found in response")
+
            for result in results["results"]:
                index = result["index"]
                if "document" in result:
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm-4-0520.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm-4-0520.yaml
@ -8,6 +8,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
+  context_size: 131072
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm-4-air.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm-4-air.yaml
@ -8,6 +8,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
+  context_size: 131072
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm-4-airx.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm-4-airx.yaml
@ -8,6 +8,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
+  context_size: 8192
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm-4-flash.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm-4-flash.yaml
@ -8,6 +8,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
+  context_size: 131072
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm-4-flashx.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm-4-flashx.yaml
@ -8,6 +8,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
+  context_size: 131072
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm_3_turbo.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm_3_turbo.yaml
@ -8,6 +8,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
+  context_size: 131072
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm_4.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm_4.yaml
@ -8,6 +8,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
+  context_size: 131072
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm_4_long.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm_4_long.yaml
@ -8,7 +8,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
-  context_size: 10240
+  context_size: 1048576
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm_4_plus.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm_4_plus.yaml
@ -8,6 +8,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
+  context_size: 131072
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm_4v.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm_4v.yaml
@ -4,6 +4,7 @@ label:
 model_type: llm
 model_properties:
  mode: chat
+  context_size: 2048
 features:
  - vision
 parameter_rules:
--- a/api/core/model_runtime/model_providers/zhipuai/llm/glm_4v_plus.yaml
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/glm_4v_plus.yaml
@ -4,6 +4,7 @@ label:
 model_type: llm
 model_properties:
  mode: chat
+  context_size: 8192
 features:
  - vision
  - video
--- a/api/core/model_runtime/model_providers/zhipuai/llm/llm.py
+++ b/api/core/model_runtime/model_providers/zhipuai/llm/llm.py
@ -22,18 +22,6 @@ from core.model_runtime.model_providers.__base.large_language_model import Large
 from core.model_runtime.model_providers.zhipuai._common import _CommonZhipuaiAI
 from core.model_runtime.utils import helper

-GLM_JSON_MODE_PROMPT = """You should always follow the instructions and output a valid JSON object.
-The structure of the JSON object you can found in the instructions, use {"answer": "$your_answer"} as the default structure
-if you are not sure about the structure.
-
-And you should always end the block with a "```" to indicate the end of the JSON object.
-
-<instructions>
-{{instructions}}
-</instructions>
-
-```JSON"""  # noqa: E501
-

 class ZhipuAILargeLanguageModel(_CommonZhipuaiAI, LargeLanguageModel):
    def _invoke(
@ -64,42 +52,8 @@ class ZhipuAILargeLanguageModel(_CommonZhipuaiAI, LargeLanguageModel):
        credentials_kwargs = self._to_credential_kwargs(credentials)

        # invoke model
-        # stop = stop or []
-        # self._transform_json_prompts(model, credentials, prompt_messages, model_parameters, tools, stop, stream, user)
        return self._generate(model, credentials_kwargs, prompt_messages, model_parameters, tools, stop, stream, user)

-    # def _transform_json_prompts(self, model: str, credentials: dict,
-    #                             prompt_messages: list[PromptMessage], model_parameters: dict,
-    #                             tools: list[PromptMessageTool] | None = None, stop: list[str] | None = None,
-    #                             stream: bool = True, user: str | None = None) \
-    #                         -> None:
-    #     """
-    #     Transform json prompts to model prompts
-    #     """
-    #     if "}\n\n" not in stop:
-    #         stop.append("}\n\n")
-
-    #     # check if there is a system message
-    #     if len(prompt_messages) > 0 and isinstance(prompt_messages[0], SystemPromptMessage):
-    #         # override the system message
-    #         prompt_messages[0] = SystemPromptMessage(
-    #             content=GLM_JSON_MODE_PROMPT.replace("{{instructions}}", prompt_messages[0].content)
-    #         )
-    #     else:
-    #         # insert the system message
-    #         prompt_messages.insert(0, SystemPromptMessage(
-    #             content=GLM_JSON_MODE_PROMPT.replace("{{instructions}}", "Please output a valid JSON object.")
-    #         ))
-    #     # check if the last message is a user message
-    #     if len(prompt_messages) > 0 and isinstance(prompt_messages[-1], UserPromptMessage):
-    #         # add ```JSON\n to the last message
-    #         prompt_messages[-1].content += "\n```JSON\n"
-    #     else:
-    #         # append a user message
-    #         prompt_messages.append(UserPromptMessage(
-    #             content="```JSON\n"
-    #         ))
-
    def get_num_tokens(
        self,
        model: str,
@ -170,7 +124,7 @@ class ZhipuAILargeLanguageModel(_CommonZhipuaiAI, LargeLanguageModel):
        :return: full response or stream response chunk generator result
        """
        extra_model_kwargs = {}
-        # request to glm-4v-plus with stop words will always response "finish_reason":"network_error"
+        # request to glm-4v-plus with stop words will always respond "finish_reason":"network_error"
        if stop and model != "glm-4v-plus":
            extra_model_kwargs["stop"] = stop

@ -186,7 +140,7 @@ class ZhipuAILargeLanguageModel(_CommonZhipuaiAI, LargeLanguageModel):
        # resolve zhipuai model not support system message and user message, assistant message must be in sequence
        new_prompt_messages: list[PromptMessage] = []
        for prompt_message in prompt_messages:
-            copy_prompt_message = prompt_message.copy()
+            copy_prompt_message = prompt_message.model_copy()
            if copy_prompt_message.role in {PromptMessageRole.USER, PromptMessageRole.SYSTEM, PromptMessageRole.TOOL}:
                if isinstance(copy_prompt_message.content, list):
                    # check if model is 'glm-4v'
@ -238,59 +192,38 @@ class ZhipuAILargeLanguageModel(_CommonZhipuaiAI, LargeLanguageModel):
            params = self._construct_glm_4v_parameter(model, new_prompt_messages, model_parameters)
        else:
            params = {"model": model, "messages": [], **model_parameters}
-            # glm model
-            if not model.startswith("chatglm"):
-                for prompt_message in new_prompt_messages:
-                    if prompt_message.role == PromptMessageRole.TOOL:
+            for prompt_message in new_prompt_messages:
+                if prompt_message.role == PromptMessageRole.TOOL:
+                    params["messages"].append(
+                        {
+                            "role": "tool",
+                            "content": prompt_message.content,
+                            "tool_call_id": prompt_message.tool_call_id,
+                        }
+                    )
+                elif isinstance(prompt_message, AssistantPromptMessage):
+                    if prompt_message.tool_calls:
                        params["messages"].append(
                            {
-                                "role": "tool",
+                                "role": "assistant",
                                "content": prompt_message.content,
-                                "tool_call_id": prompt_message.tool_call_id,
+                                "tool_calls": [
+                                    {
+                                        "id": tool_call.id,
+                                        "type": tool_call.type,
+                                        "function": {
+                                            "name": tool_call.function.name,
+                                            "arguments": tool_call.function.arguments,
+                                        },
+                                    }
+                                    for tool_call in prompt_message.tool_calls
+                                ],
                            }
                        )
-                    elif isinstance(prompt_message, AssistantPromptMessage):
-                        if prompt_message.tool_calls:
-                            params["messages"].append(
-                                {
-                                    "role": "assistant",
-                                    "content": prompt_message.content,
-                                    "tool_calls": [
-                                        {
-                                            "id": tool_call.id,
-                                            "type": tool_call.type,
-                                            "function": {
-                                                "name": tool_call.function.name,
-                                                "arguments": tool_call.function.arguments,
-                                            },
-                                        }
-                                        for tool_call in prompt_message.tool_calls
-                                    ],
-                                }
-                            )
-                        else:
-                            params["messages"].append({"role": "assistant", "content": prompt_message.content})
                    else:
-                        params["messages"].append(
-                            {"role": prompt_message.role.value, "content": prompt_message.content}
-                        )
-            else:
-                # chatglm model
-                for prompt_message in new_prompt_messages:
-                    # merge system message to user message
-                    if prompt_message.role in {
-                        PromptMessageRole.SYSTEM,
-                        PromptMessageRole.TOOL,
-                        PromptMessageRole.USER,
-                    }:
-                        if len(params["messages"]) > 0 and params["messages"][-1]["role"] == "user":
-                            params["messages"][-1]["content"] += "\n\n" + prompt_message.content
-                        else:
-                            params["messages"].append({"role": "user", "content": prompt_message.content})
-                    else:
-                        params["messages"].append(
-                            {"role": prompt_message.role.value, "content": prompt_message.content}
-                        )
+                        params["messages"].append({"role": "assistant", "content": prompt_message.content})
+                else:
+                    params["messages"].append({"role": prompt_message.role.value, "content": prompt_message.content})

        if tools and len(tools) > 0:
            params["tools"] = [{"type": "function", "function": helper.dump_model(tool)} for tool in tools]
@ -406,7 +339,7 @@ class ZhipuAILargeLanguageModel(_CommonZhipuaiAI, LargeLanguageModel):
        Handle llm stream response

        :param model: model name
-        :param response: response
+        :param responses: response
        :param prompt_messages: prompt messages
        :return: llm response chunk generator result
        """
@ -505,7 +438,7 @@ class ZhipuAILargeLanguageModel(_CommonZhipuaiAI, LargeLanguageModel):
        if tools and len(tools) > 0:
            text += "\n\nTools:"
            for tool in tools:
-                text += f"\n{tool.json()}"
+                text += f"\n{tool.model_dump_json()}"

        # trim off the trailing ' ' that might come from the "Assistant: "
        return text.rstrip()
--- a/api/core/prompt/prompt_templates/advanced_prompt_templates.py
+++ b/api/core/prompt/prompt_templates/advanced_prompt_templates.py
@ -5,7 +5,7 @@ BAICHUAN_CONTEXT = "用户在与一个客观的助手对话。助手会尊重找
 CHAT_APP_COMPLETION_PROMPT_CONFIG = {
    "completion_prompt_config": {
        "prompt": {
-            "text": "{{#pre_prompt#}}\nHere is the chat histories between human and assistant, inside <histories></histories> XML tags.\n\n<histories>\n{{#histories#}}\n</histories>\n\n\nHuman: {{#query#}}\n\nAssistant: "  # noqa: E501
+            "text": "{{#pre_prompt#}}\nHere are the chat histories between human and assistant, inside <histories></histories> XML tags.\n\n<histories>\n{{#histories#}}\n</histories>\n\n\nHuman: {{#query#}}\n\nAssistant: "  # noqa: E501
        },
        "conversation_histories_role": {"user_prefix": "Human", "assistant_prefix": "Assistant"},
    },
--- a/api/core/prompt/utils/get_thread_messages_length.py
+++ b/api/core/prompt/utils/get_thread_messages_length.py
@ -0,0 +1,32 @@
+from core.prompt.utils.extract_thread_messages import extract_thread_messages
+from extensions.ext_database import db
+from models.model import Message
+
+
+def get_thread_messages_length(conversation_id: str) -> int:
+    """
+    Get the number of thread messages based on the parent message id.
+    """
+    # Fetch all messages related to the conversation
+    query = (
+        db.session.query(
+            Message.id,
+            Message.parent_message_id,
+            Message.answer,
+        )
+        .filter(
+            Message.conversation_id == conversation_id,
+        )
+        .order_by(Message.created_at.desc())
+    )
+
+    messages = query.all()
+
+    # Extract thread messages
+    thread_messages = extract_thread_messages(messages)
+
+    # Exclude the newly created message with an empty answer
+    if thread_messages and not thread_messages[0].answer:
+        thread_messages.pop(0)
+
+    return len(thread_messages)
--- a/api/core/rag/datasource/retrieval_service.py
+++ b/api/core/rag/datasource/retrieval_service.py
@ -110,8 +110,12 @@ class RetrievalService:
                str(dataset.tenant_id), reranking_mode, reranking_model, weights, False
            )
            all_documents = data_post_processor.invoke(
-                query=query, documents=all_documents, score_threshold=score_threshold, top_n=top_k
+                query=query,
+                documents=all_documents,
+                score_threshold=score_threshold,
+                top_n=top_k,
            )
+
        return all_documents

    @classmethod
@ -178,7 +182,10 @@ class RetrievalService:
                        )
                        all_documents.extend(
                            data_post_processor.invoke(
-                                query=query, documents=documents, score_threshold=score_threshold, top_n=len(documents)
+                                query=query,
+                                documents=documents,
+                                score_threshold=score_threshold,
+                                top_n=len(documents),
                            )
                        )
                    else:
@ -220,7 +227,10 @@ class RetrievalService:
                        )
                        all_documents.extend(
                            data_post_processor.invoke(
-                                query=query, documents=documents, score_threshold=score_threshold, top_n=len(documents)
+                                query=query,
+                                documents=documents,
+                                score_threshold=score_threshold,
+                                top_n=len(documents),
                            )
                        )
                    else:
--- a/api/core/rag/datasource/vdb/oceanbase/oceanbase_vector.py
+++ b/api/core/rag/datasource/vdb/oceanbase/oceanbase_vector.py
@ -104,8 +104,7 @@ class OceanBaseVector(BaseVector):
                val = int(row[6])
                vals.append(val)
            if len(vals) == 0:
-                print("ob_vector_memory_limit_percentage not found in parameters.")
-                exit(1)
+                raise ValueError("ob_vector_memory_limit_percentage not found in parameters.")
            if any(val == 0 for val in vals):
                try:
                    self._client.perform_raw_text_sql("ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30")
@ -200,10 +199,10 @@ class OceanBaseVectorFactory(AbstractVectorFactory):
        return OceanBaseVector(
            collection_name,
            OceanBaseVectorConfig(
-                host=dify_config.OCEANBASE_VECTOR_HOST,
-                port=dify_config.OCEANBASE_VECTOR_PORT,
-                user=dify_config.OCEANBASE_VECTOR_USER,
+                host=dify_config.OCEANBASE_VECTOR_HOST or "",
+                port=dify_config.OCEANBASE_VECTOR_PORT or 0,
+                user=dify_config.OCEANBASE_VECTOR_USER or "",
                password=(dify_config.OCEANBASE_VECTOR_PASSWORD or ""),
-                database=dify_config.OCEANBASE_VECTOR_DATABASE,
+                database=dify_config.OCEANBASE_VECTOR_DATABASE or "",
            ),
        )
--- a/api/core/rag/datasource/vdb/oracle/oraclevector.py
+++ b/api/core/rag/datasource/vdb/oracle/oraclevector.py
@ -230,7 +230,6 @@ class OracleVector(BaseVector):
                except LookupError:
                    nltk.download("punkt")
                    nltk.download("stopwords")
-                    print("run download")
                e_str = re.sub(r"[^\w ]", "", query)
                all_tokens = nltk.word_tokenize(e_str)
                stop_words = stopwords.words("english")
--- a/api/core/rag/datasource/vdb/tidb_on_qdrant/tidb_on_qdrant_vector.py
+++ b/api/core/rag/datasource/vdb/tidb_on_qdrant/tidb_on_qdrant_vector.py
@ -375,7 +375,6 @@ class TidbOnQdrantVector(BaseVector):
        for result in results:
            if result:
                document = self._document_from_scored_point(result, Field.CONTENT_KEY.value, Field.METADATA_KEY.value)
-                document.metadata["vector"] = result.vector
                documents.append(document)

        return documents
@ -394,6 +393,7 @@ class TidbOnQdrantVector(BaseVector):
    ) -> Document:
        return Document(
            page_content=scored_point.payload.get(content_payload_key),
+            vector=scored_point.vector,
            metadata=scored_point.payload.get(metadata_payload_key) or {},
        )

--- a/api/core/rag/datasource/vdb/upstash/upstash_vector.py
+++ b/api/core/rag/datasource/vdb/upstash/upstash_vector.py
@ -64,7 +64,7 @@ class UpstashVector(BaseVector):
        item_ids = []
        for doc_id in ids:
            ids = self.get_ids_by_metadata_field("doc_id", doc_id)
-            if id:
+            if ids:
                item_ids += ids
        self._delete_by_ids(ids=item_ids)

@ -95,9 +95,10 @@ class UpstashVector(BaseVector):
            metadata = record.metadata
            text = record.data
            score = record.score
-            metadata["score"] = score
-            if score > score_threshold:
-                docs.append(Document(page_content=text, metadata=metadata))
+            if metadata is not None and text is not None:
+                metadata["score"] = score
+                if score > score_threshold:
+                    docs.append(Document(page_content=text, metadata=metadata))
        return docs

    def search_by_full_text(self, query: str, **kwargs: Any) -> list[Document]:
@ -123,7 +124,7 @@ class UpstashVectorFactory(AbstractVectorFactory):
        return UpstashVector(
            collection_name=collection_name,
            config=UpstashVectorConfig(
-                url=dify_config.UPSTASH_VECTOR_URL,
-                token=dify_config.UPSTASH_VECTOR_TOKEN,
+                url=dify_config.UPSTASH_VECTOR_URL or "",
+                token=dify_config.UPSTASH_VECTOR_TOKEN or "",
            ),
        )
--- a/api/core/rag/embedding/cached_embedding.py
+++ b/api/core/rag/embedding/cached_embedding.py
@ -102,7 +102,8 @@ class CacheEmbedding(Embeddings):
        embedding = redis_client.get(embedding_cache_key)
        if embedding:
            redis_client.expire(embedding_cache_key, 600)
-            return list(np.frombuffer(base64.b64decode(embedding), dtype="float"))
+            decoded_embedding = np.frombuffer(base64.b64decode(embedding), dtype="float")
+            return [float(x) for x in decoded_embedding]
        try:
            embedding_result = self._model_instance.invoke_text_embedding(
                texts=[text], user=self._user, input_type=EmbeddingInputType.QUERY
--- a/api/core/rag/extractor/word_extractor.py
+++ b/api/core/rag/extractor/word_extractor.py
@ -86,7 +86,7 @@ class WordExtractor(BaseExtractor):
                image_count += 1
                if rel.is_external:
                    url = rel.reltype
-                    response = ssrf_proxy.get(url, stream=True)
+                    response = ssrf_proxy.get(url)
                    if response.status_code == 200:
                        image_ext = mimetypes.guess_extension(response.headers["Content-Type"])
                        file_uuid = str(uuid.uuid4())
--- a/api/core/tools/provider/builtin/aws/tools/lambda_translate_utils.py
+++ b/api/core/tools/provider/builtin/aws/tools/lambda_translate_utils.py
@ -12,7 +12,7 @@ class LambdaTranslateUtilsTool(BuiltinTool):

    def _invoke_lambda(self, text_content, src_lang, dest_lang, model_id, dictionary_name, request_type, lambda_name):
        msg = {
-            "src_content": text_content,
+            "src_contents": [text_content],
            "src_lang": src_lang,
            "dest_lang": dest_lang,
            "dictionary_id": dictionary_name,
--- a/api/core/tools/provider/builtin/aws/tools/lambda_translate_utils.yaml
+++ b/api/core/tools/provider/builtin/aws/tools/lambda_translate_utils.yaml
@ -8,9 +8,9 @@ identity:
  icon: icon.svg
 description:
  human:
-    en_US: A util tools for LLM translation, extra deployment is needed on AWS. Please refer Github Repo - https://github.com/ybalbert001/dynamodb-rag
-    zh_Hans: 大语言模型翻译工具(专词映射获取)，需要在AWS上进行额外部署，可参考Github Repo - https://github.com/ybalbert001/dynamodb-rag
-    pt_BR: A util tools for LLM translation, specific Lambda Function deployment is needed on AWS. Please refer Github Repo - https://github.com/ybalbert001/dynamodb-rag
+    en_US: A util tools for LLM translation, extra deployment is needed on AWS. Please refer Github Repo - https://github.com/aws-samples/rag-based-translation-with-dynamodb-and-bedrock
+    zh_Hans: 大语言模型翻译工具(专词映射获取)，需要在AWS上进行额外部署，可参考Github Repo - https://github.com/aws-samples/rag-based-translation-with-dynamodb-and-bedrock
+    pt_BR: A util tools for LLM translation, specific Lambda Function deployment is needed on AWS. Please refer Github Repo - https://github.com/aws-samples/rag-based-translation-with-dynamodb-and-bedrock
  llm: A util tools for translation.
 parameters:
  - name: text_content
--- a/api/core/tools/provider/builtin/aws/tools/sagemaker_chinese_toxicity_detector.py
+++ b/api/core/tools/provider/builtin/aws/tools/sagemaker_chinese_toxicity_detector.py
@ -0,0 +1,67 @@
+import json
+from typing import Any, Union
+
+import boto3
+
+from core.tools.entities.tool_entities import ToolInvokeMessage
+from core.tools.tool.builtin_tool import BuiltinTool
+
+# 定义标签映射
+LABEL_MAPPING = {"LABEL_0": "SAFE", "LABEL_1": "NO_SAFE"}
+
+
+class ContentModerationTool(BuiltinTool):
+    sagemaker_client: Any = None
+    sagemaker_endpoint: str = None
+
+    def _invoke_sagemaker(self, payload: dict, endpoint: str):
+        response = self.sagemaker_client.invoke_endpoint(
+            EndpointName=endpoint,
+            Body=json.dumps(payload),
+            ContentType="application/json",
+        )
+        # Parse response
+        response_body = response["Body"].read().decode("utf8")
+
+        json_obj = json.loads(response_body)
+
+        # Handle nested JSON if present
+        if isinstance(json_obj, dict) and "body" in json_obj:
+            body_content = json.loads(json_obj["body"])
+            raw_label = body_content.get("label")
+        else:
+            raw_label = json_obj.get("label")
+
+        # 映射标签并返回
+        result = LABEL_MAPPING.get(raw_label, "NO_SAFE")  # 如果映射中没有找到，默认返回NO_SAFE
+        return result
+
+    def _invoke(
+        self,
+        user_id: str,
+        tool_parameters: dict[str, Any],
+    ) -> Union[ToolInvokeMessage, list[ToolInvokeMessage]]:
+        """
+        invoke tools
+        """
+        try:
+            if not self.sagemaker_client:
+                aws_region = tool_parameters.get("aws_region")
+                if aws_region:
+                    self.sagemaker_client = boto3.client("sagemaker-runtime", region_name=aws_region)
+                else:
+                    self.sagemaker_client = boto3.client("sagemaker-runtime")
+
+            if not self.sagemaker_endpoint:
+                self.sagemaker_endpoint = tool_parameters.get("sagemaker_endpoint")
+
+            content_text = tool_parameters.get("content_text")
+
+            payload = {"text": content_text}
+
+            result = self._invoke_sagemaker(payload, self.sagemaker_endpoint)
+
+            return self.create_text_message(text=result)
+
+        except Exception as e:
+            return self.create_text_message(f"Exception {str(e)}")
--- a/api/core/tools/provider/builtin/aws/tools/sagemaker_chinese_toxicity_detector.yaml
+++ b/api/core/tools/provider/builtin/aws/tools/sagemaker_chinese_toxicity_detector.yaml
@ -0,0 +1,46 @@
+identity:
+  name: chinese_toxicity_detector
+  author: AWS
+  label:
+    en_US: Chinese Toxicity Detector
+    zh_Hans: 中文有害内容检测
+  icon: icon.svg
+description:
+  human:
+    en_US: A tool to detect Chinese toxicity
+    zh_Hans: 检测中文有害内容的工具
+  llm: A tool that checks if Chinese content is safe for work
+parameters:
+  - name: sagemaker_endpoint
+    type: string
+    required: true
+    label:
+      en_US: sagemaker endpoint for moderation
+      zh_Hans: 内容审核的SageMaker端点
+    human_description:
+      en_US: sagemaker endpoint for content moderation
+      zh_Hans: 内容审核的SageMaker端点
+    llm_description: sagemaker endpoint for content moderation
+    form: form
+  - name: content_text
+    type: string
+    required: true
+    label:
+      en_US: content text
+      zh_Hans: 待审核文本
+    human_description:
+      en_US: text content to be moderated
+      zh_Hans: 需要审核的文本内容
+    llm_description: text content to be moderated
+    form: llm
+  - name: aws_region
+    type: string
+    required: false
+    label:
+      en_US: region of sagemaker endpoint
+      zh_Hans: SageMaker 端点所在的region
+    human_description:
+      en_US: region of sagemaker endpoint
+      zh_Hans: SageMaker 端点所在的region
+    llm_description: region of sagemaker endpoint
+    form: form
--- a/api/core/tools/provider/builtin/aws/tools/transcribe_asr.py
+++ b/api/core/tools/provider/builtin/aws/tools/transcribe_asr.py
@ -0,0 +1,418 @@
+import json
+import logging
+import os
+import re
+import time
+import uuid
+from typing import Any, Union
+from urllib.parse import urlparse
+
+import boto3
+import requests
+from botocore.exceptions import ClientError
+from requests.exceptions import RequestException
+
+from core.tools.entities.tool_entities import ToolInvokeMessage
+from core.tools.tool.builtin_tool import BuiltinTool
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+LanguageCodeOptions = [
+    "af-ZA",
+    "ar-AE",
+    "ar-SA",
+    "da-DK",
+    "de-CH",
+    "de-DE",
+    "en-AB",
+    "en-AU",
+    "en-GB",
+    "en-IE",
+    "en-IN",
+    "en-US",
+    "en-WL",
+    "es-ES",
+    "es-US",
+    "fa-IR",
+    "fr-CA",
+    "fr-FR",
+    "he-IL",
+    "hi-IN",
+    "id-ID",
+    "it-IT",
+    "ja-JP",
+    "ko-KR",
+    "ms-MY",
+    "nl-NL",
+    "pt-BR",
+    "pt-PT",
+    "ru-RU",
+    "ta-IN",
+    "te-IN",
+    "tr-TR",
+    "zh-CN",
+    "zh-TW",
+    "th-TH",
+    "en-ZA",
+    "en-NZ",
+    "vi-VN",
+    "sv-SE",
+    "ab-GE",
+    "ast-ES",
+    "az-AZ",
+    "ba-RU",
+    "be-BY",
+    "bg-BG",
+    "bn-IN",
+    "bs-BA",
+    "ca-ES",
+    "ckb-IQ",
+    "ckb-IR",
+    "cs-CZ",
+    "cy-WL",
+    "el-GR",
+    "et-ET",
+    "eu-ES",
+    "fi-FI",
+    "gl-ES",
+    "gu-IN",
+    "ha-NG",
+    "hr-HR",
+    "hu-HU",
+    "hy-AM",
+    "is-IS",
+    "ka-GE",
+    "kab-DZ",
+    "kk-KZ",
+    "kn-IN",
+    "ky-KG",
+    "lg-IN",
+    "lt-LT",
+    "lv-LV",
+    "mhr-RU",
+    "mi-NZ",
+    "mk-MK",
+    "ml-IN",
+    "mn-MN",
+    "mr-IN",
+    "mt-MT",
+    "no-NO",
+    "or-IN",
+    "pa-IN",
+    "pl-PL",
+    "ps-AF",
+    "ro-RO",
+    "rw-RW",
+    "si-LK",
+    "sk-SK",
+    "sl-SI",
+    "so-SO",
+    "sr-RS",
+    "su-ID",
+    "sw-BI",
+    "sw-KE",
+    "sw-RW",
+    "sw-TZ",
+    "sw-UG",
+    "tl-PH",
+    "tt-RU",
+    "ug-CN",
+    "uk-UA",
+    "uz-UZ",
+    "wo-SN",
+    "zu-ZA",
+]
+
+MediaFormat = ["mp3", "mp4", "wav", "flac", "ogg", "amr", "webm", "m4a"]
+
+
+def is_url(text):
+    if not text:
+        return False
+    text = text.strip()
+    # Regular expression pattern for URL validation
+    pattern = re.compile(
+        r"^"  # Start of the string
+        r"(?:http|https)://"  # Protocol (http or https)
+        r"(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|"  # Domain
+        r"localhost|"  # localhost
+        r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"  # IP address
+        r"(?::\d+)?"  # Optional port
+        r"(?:/?|[/?]\S+)"  # Path
+        r"$",  # End of the string
+        re.IGNORECASE,
+    )
+    return bool(pattern.match(text))
+
+
+def upload_file_from_url_to_s3(s3_client, url, bucket_name, s3_key=None, max_retries=3):
+    """
+    Upload a file from a URL to an S3 bucket with retries and better error handling.
+
+    Parameters:
+    - s3_client
+    - url (str): The URL of the file to upload
+    - bucket_name (str): The name of the S3 bucket
+    - s3_key (str): The desired key (path) in S3. If None, will use the filename from URL
+    - max_retries (int): Maximum number of retry attempts
+
+    Returns:
+    - tuple: (bool, str) - (Success status, Message)
+    """
+
+    # Validate inputs
+    if not url or not bucket_name:
+        return False, "URL and bucket name are required"
+
+    retry_count = 0
+    while retry_count < max_retries:
+        try:
+            # Download the file from URL
+            response = requests.get(url, stream=True, timeout=30)
+            response.raise_for_status()
+
+            # If s3_key is not provided, try to get filename from URL
+            if not s3_key:
+                parsed_url = urlparse(url)
+                filename = os.path.basename(parsed_url.path.split("/file-preview")[0])
+                s3_key = "transcribe-files/" + filename
+
+            # Upload the file to S3
+            s3_client.upload_fileobj(
+                response.raw,
+                bucket_name,
+                s3_key,
+                ExtraArgs={
+                    "ContentType": response.headers.get("content-type"),
+                    "ACL": "private",  # Ensure the uploaded file is private
+                },
+            )
+
+            return f"s3://{bucket_name}/{s3_key}", f"Successfully uploaded file to s3://{bucket_name}/{s3_key}"
+
+        except RequestException as e:
+            retry_count += 1
+            if retry_count == max_retries:
+                return None, f"Failed to download file from URL after {max_retries} attempts: {str(e)}"
+            continue
+
+        except ClientError as e:
+            return None, f"AWS S3 error: {str(e)}"
+
+        except Exception as e:
+            return None, f"Unexpected error: {str(e)}"
+
+    return None, "Maximum retries exceeded"
+
+
+class TranscribeTool(BuiltinTool):
+    s3_client: Any = None
+    transcribe_client: Any = None
+
+    """
+    Note that you must include one of LanguageCode, IdentifyLanguage,
+    or IdentifyMultipleLanguages in your request. 
+    If you include more than one of these parameters, your transcription job fails.
+    """
+
+    def _transcribe_audio(self, audio_file_uri, file_type, **extra_args):
+        uuid_str = str(uuid.uuid4())
+        job_name = f"{int(time.time())}-{uuid_str}"
+        try:
+            # Start transcription job
+            response = self.transcribe_client.start_transcription_job(
+                TranscriptionJobName=job_name, Media={"MediaFileUri": audio_file_uri}, **extra_args
+            )
+
+            # Wait for the job to complete
+            while True:
+                status = self.transcribe_client.get_transcription_job(TranscriptionJobName=job_name)
+                if status["TranscriptionJob"]["TranscriptionJobStatus"] in ["COMPLETED", "FAILED"]:
+                    break
+                time.sleep(5)
+
+            if status["TranscriptionJob"]["TranscriptionJobStatus"] == "COMPLETED":
+                return status["TranscriptionJob"]["Transcript"]["TranscriptFileUri"], None
+            else:
+                return None, f"Error: TranscriptionJobStatus:{status['TranscriptionJob']['TranscriptionJobStatus']} "
+
+        except Exception as e:
+            return None, f"Error: {str(e)}"
+
+    def _download_and_read_transcript(self, transcript_file_uri: str, max_retries: int = 3) -> tuple[str, str]:
+        """
+        Download and read the transcript file from the given URI.
+
+        Parameters:
+        - transcript_file_uri (str): The URI of the transcript file
+        - max_retries (int): Maximum number of retry attempts
+
+        Returns:
+        - tuple: (text, error) - (Transcribed text if successful, error message if failed)
+        """
+        retry_count = 0
+        while retry_count < max_retries:
+            try:
+                # Download the transcript file
+                response = requests.get(transcript_file_uri, timeout=30)
+                response.raise_for_status()
+
+                # Parse the JSON content
+                transcript_data = response.json()
+
+                # Check if speaker labels are present and enabled
+                has_speaker_labels = (
+                    "results" in transcript_data
+                    and "speaker_labels" in transcript_data["results"]
+                    and "segments" in transcript_data["results"]["speaker_labels"]
+                )
+
+                if has_speaker_labels:
+                    # Get speaker segments
+                    segments = transcript_data["results"]["speaker_labels"]["segments"]
+                    items = transcript_data["results"]["items"]
+
+                    # Create a mapping of start_time -> speaker_label
+                    time_to_speaker = {}
+                    for segment in segments:
+                        speaker_label = segment["speaker_label"]
+                        for item in segment["items"]:
+                            time_to_speaker[item["start_time"]] = speaker_label
+
+                    # Build transcript with speaker labels
+                    current_speaker = None
+                    transcript_parts = []
+
+                    for item in items:
+                        # Skip non-pronunciation items (like punctuation)
+                        if item["type"] == "punctuation":
+                            transcript_parts.append(item["alternatives"][0]["content"])
+                            continue
+
+                        start_time = item["start_time"]
+                        speaker = time_to_speaker.get(start_time)
+
+                        if speaker != current_speaker:
+                            current_speaker = speaker
+                            transcript_parts.append(f"\n[{speaker}]: ")
+
+                        transcript_parts.append(item["alternatives"][0]["content"])
+
+                    return " ".join(transcript_parts).strip(), None
+                else:
+                    # Extract the transcription text
+                    # The transcript text is typically in the 'results' -> 'transcripts' array
+                    if "results" in transcript_data and "transcripts" in transcript_data["results"]:
+                        transcripts = transcript_data["results"]["transcripts"]
+                        if transcripts:
+                            # Combine all transcript segments
+                            full_text = " ".join(t.get("transcript", "") for t in transcripts)
+                            return full_text, None
+
+                return None, "No transcripts found in the response"
+
+            except requests.exceptions.RequestException as e:
+                retry_count += 1
+                if retry_count == max_retries:
+                    return None, f"Failed to download transcript file after {max_retries} attempts: {str(e)}"
+                continue
+
+            except json.JSONDecodeError as e:
+                return None, f"Failed to parse transcript JSON: {str(e)}"
+
+            except Exception as e:
+                return None, f"Unexpected error while processing transcript: {str(e)}"
+
+        return None, "Maximum retries exceeded"
+
+    def _invoke(
+        self,
+        user_id: str,
+        tool_parameters: dict[str, Any],
+    ) -> Union[ToolInvokeMessage, list[ToolInvokeMessage]]:
+        """
+        invoke tools
+        """
+        try:
+            if not self.transcribe_client:
+                aws_region = tool_parameters.get("aws_region")
+                if aws_region:
+                    self.transcribe_client = boto3.client("transcribe", region_name=aws_region)
+                    self.s3_client = boto3.client("s3", region_name=aws_region)
+                else:
+                    self.transcribe_client = boto3.client("transcribe")
+                    self.s3_client = boto3.client("s3")
+
+            file_url = tool_parameters.get("file_url")
+            file_type = tool_parameters.get("file_type")
+            language_code = tool_parameters.get("language_code")
+            identify_language = tool_parameters.get("identify_language", True)
+            identify_multiple_languages = tool_parameters.get("identify_multiple_languages", False)
+            language_options_str = tool_parameters.get("language_options")
+            s3_bucket_name = tool_parameters.get("s3_bucket_name")
+            ShowSpeakerLabels = tool_parameters.get("ShowSpeakerLabels", True)
+            MaxSpeakerLabels = tool_parameters.get("MaxSpeakerLabels", 2)
+
+            # Check the input params
+            if not s3_bucket_name:
+                return self.create_text_message(text="s3_bucket_name is required")
+            language_options = None
+            if language_options_str:
+                language_options = language_options_str.split("|")
+                for lang in language_options:
+                    if lang not in LanguageCodeOptions:
+                        return self.create_text_message(
+                            text=f"{lang} is not supported, should be one of {LanguageCodeOptions}"
+                        )
+            if language_code and language_code not in LanguageCodeOptions:
+                err_msg = f"language_code:{language_code} is not supported, should be one of {LanguageCodeOptions}"
+                return self.create_text_message(text=err_msg)
+
+            err_msg = f"identify_language:{identify_language}, \
+                identify_multiple_languages:{identify_multiple_languages}, \
+                Note that you must include one of LanguageCode, IdentifyLanguage, \
+                or IdentifyMultipleLanguages in your request. \
+                If you include more than one of these parameters, \
+                your transcription job fails."
+            if not language_code:
+                if identify_language and identify_multiple_languages:
+                    return self.create_text_message(text=err_msg)
+            else:
+                if identify_language or identify_multiple_languages:
+                    return self.create_text_message(text=err_msg)
+
+            extra_args = {
+                "IdentifyLanguage": identify_language,
+                "IdentifyMultipleLanguages": identify_multiple_languages,
+            }
+            if language_code:
+                extra_args["LanguageCode"] = language_code
+            if language_options:
+                extra_args["LanguageOptions"] = language_options
+            if ShowSpeakerLabels:
+                extra_args["Settings"] = {"ShowSpeakerLabels": ShowSpeakerLabels, "MaxSpeakerLabels": MaxSpeakerLabels}
+
+            # upload to s3 bucket
+            s3_path_result, error = upload_file_from_url_to_s3(self.s3_client, url=file_url, bucket_name=s3_bucket_name)
+            if not s3_path_result:
+                return self.create_text_message(text=error)
+
+            transcript_file_uri, error = self._transcribe_audio(
+                audio_file_uri=s3_path_result,
+                file_type=file_type,
+                **extra_args,
+            )
+            if not transcript_file_uri:
+                return self.create_text_message(text=error)
+
+            # Download and read the transcript
+            transcript_text, error = self._download_and_read_transcript(transcript_file_uri)
+            if not transcript_text:
+                return self.create_text_message(text=error)
+
+            return self.create_text_message(text=transcript_text)
+
+        except Exception as e:
+            return self.create_text_message(f"Exception {str(e)}")
--- a/api/core/tools/provider/builtin/aws/tools/transcribe_asr.yaml
+++ b/api/core/tools/provider/builtin/aws/tools/transcribe_asr.yaml
@ -0,0 +1,133 @@
+identity:
+  name: transcribe_asr
+  author: AWS
+  label:
+    en_US: TranscribeASR
+    zh_Hans: Transcribe语音识别转录
+    pt_BR: TranscribeASR
+  icon: icon.svg
+description:
+  human:
+    en_US: A tool for ASR (Automatic Speech Recognition) - https://github.com/aws-samples/dify-aws-tool
+    zh_Hans: AWS 语音识别转录服务, 请参考 https://aws.amazon.com/cn/pm/transcribe/#Learn_More_About_Amazon_Transcribe
+    pt_BR: A tool for ASR (Automatic Speech Recognition).
+  llm: A tool for ASR (Automatic Speech Recognition).
+parameters:
+  - name: file_url
+    type: string
+    required: true
+    label:
+      en_US: video or audio file url for transcribe
+      zh_Hans: 语音或者视频文件url
+      pt_BR: video or audio file url for transcribe
+    human_description:
+      en_US: video or audio file url for transcribe
+      zh_Hans: 语音或者视频文件url
+      pt_BR: video or audio file url for transcribe
+    llm_description: video or audio file url for transcribe
+    form: llm
+  - name: language_code
+    type: string
+    required: false
+    label:
+      en_US: Language Code
+      zh_Hans: 语言编码
+      pt_BR: Language Code
+    human_description:
+      en_US: The language code used to create your transcription job.  refer to :https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html
+      zh_Hans: 语言编码,例如zh-CN, en-US 可参考 https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html
+      pt_BR: The language code used to create your transcription job.  refer to :https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html
+    llm_description: The language code used to create your transcription job.
+    form: llm
+  - name: identify_language
+    type: boolean
+    default: true
+    required: false
+    label:
+      en_US: Automactically Identify Language
+      zh_Hans: 自动识别语言
+      pt_BR: Automactically Identify Language
+    human_description:
+      en_US: Automactically Identify Language
+      zh_Hans: 自动识别语言
+      pt_BR: Automactically Identify Language
+    llm_description: Enable Automactically Identify Language
+    form: form
+  - name: identify_multiple_languages
+    type: boolean
+    required: false
+    label:
+      en_US: Automactically Identify Multiple Languages
+      zh_Hans: 自动识别多种语言
+      pt_BR: Automactically Identify Multiple Languages
+    human_description:
+      en_US: Automactically Identify Multiple Languages
+      zh_Hans: 自动识别多种语言
+      pt_BR: Automactically Identify Multiple Languages
+    llm_description: Enable Automactically Identify Multiple Languages
+    form: form
+  - name: language_options
+    type: string
+    required: false
+    label:
+      en_US: Language Options
+      zh_Hans: 语言种类选项
+      pt_BR: Language Options
+    human_description:
+      en_US: Seperated by |, e.g:zh-CN|en-US, You can specify two or more language codes that represent the languages you think may be present in your media
+      zh_Hans: 您可以指定两个或更多的语言代码来表示您认为可能出现在媒体中的语言。用｜分隔,如 zh-CN|en-US
+      pt_BR: Seperated by |, e.g:zh-CN|en-US, You can specify two or more language codes that represent the languages you think may be present in your media
+    llm_description: Seperated by |, e.g:zh-CN|en-US, You can specify two or more language codes that represent the languages you think may be present in your media
+    form: llm
+  - name: s3_bucket_name
+    type: string
+    required: true
+    label:
+      en_US: s3 bucket name
+      zh_Hans: s3 存储桶名称
+      pt_BR: s3 bucket name
+    human_description:
+      en_US: s3 bucket name to store transcribe files  (don't add prefix s3://)
+      zh_Hans: s3 存储桶名称,用于存储转录文件  (不需要前缀 s3://)
+      pt_BR: s3 bucket name to store transcribe files  (don't add prefix s3://)
+    llm_description: s3 bucket name to store transcribe files
+    form: form
+  - name: ShowSpeakerLabels
+    type: boolean
+    required: true
+    default: true
+    label:
+      en_US: ShowSpeakerLabels
+      zh_Hans: 显示说话人标签
+      pt_BR: ShowSpeakerLabels
+    human_description:
+      en_US: Enables speaker partitioning (diarization) in your transcription output
+      zh_Hans: 在转录输出中启用说话人分区（说话人分离）
+      pt_BR: Enables speaker partitioning (diarization) in your transcription output
+    llm_description: Enables speaker partitioning (diarization) in your transcription output
+    form: form
+  - name: MaxSpeakerLabels
+    type: number
+    required: true
+    default: 2
+    label:
+      en_US: MaxSpeakerLabels
+      zh_Hans: 说话人标签数量
+      pt_BR: MaxSpeakerLabels
+    human_description:
+      en_US: Specify the maximum number of speakers you want to partition in your media
+      zh_Hans: 指定您希望在媒体中划分的最多演讲者数量。
+      pt_BR: Specify the maximum number of speakers you want to partition in your media
+    llm_description: Specify the maximum number of speakers you want to partition in your media
+    form: form
+  - name: aws_region
+    type: string
+    required: false
+    label:
+      en_US: AWS Region
+      zh_Hans: AWS 区域
+    human_description:
+      en_US: Please enter the AWS region for the transcribe service, for example 'us-east-1'.
+      zh_Hans: 请输入Transcribe的 AWS 区域，例如 'us-east-1'。
+    llm_description: Please enter the AWS region for the transcribe service, for example 'us-east-1'.
+    form: form
--- a/api/core/tools/provider/builtin/gitlab/tools/gitlab_mergerequests.yaml
+++ b/api/core/tools/provider/builtin/gitlab/tools/gitlab_mergerequests.yaml
@ -6,9 +6,9 @@ identity:
    zh_Hans: GitLab 合并请求查询
 description:
  human:
-    en_US: A tool for query GitLab merge requests, Input should be a exists reposity or branch.
+    en_US: A tool for query GitLab merge requests, Input should be a exists repository or branch.
    zh_Hans: 一个用于查询 GitLab 代码合并请求的工具，输入的内容应该是一个已存在的仓库名或者分支。
-  llm: A tool for query GitLab merge requests, Input should be a exists reposity or branch.
+  llm: A tool for query GitLab merge requests, Input should be a exists repository or branch.
 parameters:
  - name: repository
    type: string
--- a/api/core/tools/tool/tool.py
+++ b/api/core/tools/tool/tool.py
@ -324,7 +324,12 @@ class Tool(BaseModel, ABC):
        :param blob: the blob
        :return: the blob message
        """
-        return ToolInvokeMessage(type=ToolInvokeMessage.MessageType.BLOB, message=blob, meta=meta, save_as=save_as)
+        return ToolInvokeMessage(
+            type=ToolInvokeMessage.MessageType.BLOB,
+            message=blob,
+            meta=meta or {},
+            save_as=save_as,
+        )

    def create_json_message(self, object: dict) -> ToolInvokeMessage:
        """
--- a/api/core/tools/tool/workflow_tool.py
+++ b/api/core/tools/tool/workflow_tool.py
@ -58,11 +58,11 @@ class WorkflowTool(Tool):
            user=self._get_user(user_id),
            args={"inputs": tool_parameters, "files": files},
            invoke_from=self.runtime.invoke_from,
-            stream=False,
+            streaming=False,
            call_depth=self.workflow_call_depth + 1,
            workflow_thread_pool_id=self.thread_pool_id,
        )
-
+        assert isinstance(result, dict)
        data = result.get("data", {})

        if data.get("error"):
--- a/api/core/variables/init.py
+++ b/api/core/variables/init.py
@ -32,32 +32,32 @@ from .variables import (
 )

 __all__ = [
-    "IntegerVariable",
-    "FloatVariable",
-    "ObjectVariable",
-    "SecretVariable",
-    "StringVariable",
-    "ArrayAnyVariable",
-    "Variable",
-    "SegmentType",
-    "SegmentGroup",
-    "Segment",
-    "NoneSegment",
-    "NoneVariable",
-    "IntegerSegment",
-    "FloatSegment",
-    "ObjectSegment",
    "ArrayAnySegment",
-    "StringSegment",
-    "ArrayStringVariable",
-    "ArrayNumberVariable",
-    "ArrayObjectVariable",
-    "ArraySegment",
+    "ArrayAnyVariable",
    "ArrayFileSegment",
+    "ArrayFileVariable",
    "ArrayNumberSegment",
+    "ArrayNumberVariable",
    "ArrayObjectSegment",
+    "ArrayObjectVariable",
+    "ArraySegment",
    "ArrayStringSegment",
+    "ArrayStringVariable",
    "FileSegment",
    "FileVariable",
-    "ArrayFileVariable",
+    "FloatSegment",
+    "FloatVariable",
+    "IntegerSegment",
+    "IntegerVariable",
+    "NoneSegment",
+    "NoneVariable",
+    "ObjectSegment",
+    "ObjectVariable",
+    "SecretVariable",
+    "Segment",
+    "SegmentGroup",
+    "SegmentType",
+    "StringSegment",
+    "StringVariable",
+    "Variable",
 ]
--- a/Show More
+++ b/Show More