feat: Initial HA setup for Dify with Docker Compose

This commit introduces an initial configuration for deploying Dify in a more highly available (HA) manner using Docker Compose. Key changes and features implemented: 1. **`docker-compose.ha.yml` Created:** * A new Docker Compose file (`docker-compose.ha.yml`) is provided. * Stateless services (`api`, `web`) are configured with `replicas: 2`. I also scaled another service to two replicas. 2. **Nginx Load Balancing:** * Nginx configurations (`docker/nginx-ha/`) are updated to use `upstream` blocks, enabling load balancing for `api`, `web`, and `plugin_daemon` services. * `docker-compose.ha.yml` now uses these HA-specific Nginx configs. 3. **Database (PostgreSQL) HA Guidance:** * The `db` service in `docker-compose.ha.yml` includes comments strongly recommending the use of an external, managed PostgreSQL for production HA. * Notes for the README have been drafted to highlight this and the SPoF nature of the in-compose DB. 4. **Redis HA with Sentinel:** * The `redis` service is configured as `redis-master`. * Three `redis-sentinel` services are added to `docker-compose.ha.yml`. * Sentinel configuration files are provided in `docker/redis-ha/`. * Guidance for configuring Dify to use Redis Sentinel is drafted for the README and .env files. 5. **Vector Store (Weaviate) HA Guidance:** * Comments in `docker-compose.ha.yml` for the `weaviate` service recommend using an external/managed vector database or a dedicated Weaviate cluster for HA. * Drafted notes explain that the in-compose Weaviate is an SPoF. 6. **Other Services Scaled/Reviewed:** * `sandbox` and `plugin_daemon` services are configured with `replicas: 2`. * `ssrf_proxy` remains a single instance, with comments noting this. 7. **Persistent Storage Guidance (Drafted):** * Extensive notes have been prepared for the README on the critical requirement of using shared network storage (NFS, cloud solutions, etc.) for all stateful services in an HA deployment. 8. **Documentation Placeholders:** * Several `*_ha_notes.md` files (`postgresql_ha_notes.md`, `redis_ha_notes.md`, `vector_store_ha_notes.md`, `other_services_ha_notes.md`, `persistent_storage_ha_notes.md`) have been created in the root directory. These contain detailed information intended for a comprehensive `README.ha.md` and an `.env.ha.example` file. **Note:** This represents the work I've completed so far. The final `README.ha.md` and `.env.ha.example` files are yet to be assembled from the drafted notes. Further work would involve creating these consolidated documentation files.
12 months ago · dc60d19e18
parent 808aa4467c
commit dc60d19e18
14 changed files with 1716 additions and 0 deletions
--- a/docker-compose.ha.yml
+++ b/docker-compose.ha.yml
--- a/docker/nginx-ha/conf.d/default.conf.template
+++ b/docker/nginx-ha/conf.d/default.conf.template
@ -0,0 +1,60 @@
+# Please do not directly edit this file. Instead, modify the .env variables related to NGINX configuration.
+
+upstream dify_api_servers {
+    server api:5001;
+}
+
+upstream dify_web_servers {
+    server web:3000;
+}
+
+upstream dify_plugin_daemon_servers {
+    server plugin_daemon:5002;
+}
+
+server {
+    listen ${NGINX_PORT};
+    server_name ${NGINX_SERVER_NAME};
+
+    location /console/api {
+      proxy_pass http://dify_api_servers;
+      include proxy.conf;
+    }
+
+    location /api {
+      proxy_pass http://dify_api_servers;
+      include proxy.conf;
+    }
+
+    location /v1 {
+      proxy_pass http://dify_api_servers;
+      include proxy.conf;
+    }
+
+    location /files {
+      proxy_pass http://dify_api_servers;
+      include proxy.conf;
+    }
+
+    location /explore {
+      proxy_pass http://dify_web_servers;
+      include proxy.conf;
+    }
+
+    location /e/ {
+      proxy_pass http://dify_plugin_daemon_servers;
+      proxy_set_header Dify-Hook-Url $scheme://$host$request_uri;
+      include proxy.conf;
+    }
+
+    location / {
+      proxy_pass http://dify_web_servers;
+      include proxy.conf;
+    }
+
+    # placeholder for acme challenge location
+    ${ACME_CHALLENGE_LOCATION}
+
+    # placeholder for https config defined in https.conf.template
+    ${HTTPS_CONFIG}
+}
--- a/docker/nginx-ha/docker-entrypoint.sh
+++ b/docker/nginx-ha/docker-entrypoint.sh
@ -0,0 +1,42 @@
+#!/bin/bash
+
+HTTPS_CONFIG=''
+
+if [ "${NGINX_HTTPS_ENABLED}" = "true" ]; then
+    # Check if the certificate and key files for the specified domain exist
+    if [ -n "${CERTBOT_DOMAIN}" ] && \
+       [ -f "/etc/letsencrypt/live/${CERTBOT_DOMAIN}/${NGINX_SSL_CERT_FILENAME}" ] && \
+       [ -f "/etc/letsencrypt/live/${CERTBOT_DOMAIN}/${NGINX_SSL_CERT_KEY_FILENAME}" ]; then
+        SSL_CERTIFICATE_PATH="/etc/letsencrypt/live/${CERTBOT_DOMAIN}/${NGINX_SSL_CERT_FILENAME}"
+        SSL_CERTIFICATE_KEY_PATH="/etc/letsencrypt/live/${CERTBOT_DOMAIN}/${NGINX_SSL_CERT_KEY_FILENAME}"
+    else
+        SSL_CERTIFICATE_PATH="/etc/ssl/${NGINX_SSL_CERT_FILENAME}"
+        SSL_CERTIFICATE_KEY_PATH="/etc/ssl/${NGINX_SSL_CERT_KEY_FILENAME}"
+    fi
+    export SSL_CERTIFICATE_PATH
+    export SSL_CERTIFICATE_KEY_PATH
+
+    # set the HTTPS_CONFIG environment variable to the content of the https.conf.template
+    HTTPS_CONFIG=$(envsubst < /etc/nginx/https.conf.template)
+    export HTTPS_CONFIG
+    # Substitute the HTTPS_CONFIG in the default.conf.template with content from https.conf.template
+    envsubst '${HTTPS_CONFIG}' < /etc/nginx/conf.d/default.conf.template > /etc/nginx/conf.d/default.conf
+fi
+export HTTPS_CONFIG
+
+if [ "${NGINX_ENABLE_CERTBOT_CHALLENGE}" = "true" ]; then
+    ACME_CHALLENGE_LOCATION='location /.well-known/acme-challenge/ { root /var/www/html; }'
+else
+    ACME_CHALLENGE_LOCATION=''
+fi
+export ACME_CHALLENGE_LOCATION
+
+env_vars=$(printenv | cut -d= -f1 | sed 's/^/$/g' | paste -sd, -)
+
+envsubst "$env_vars" < /etc/nginx/nginx.conf.template > /etc/nginx/nginx.conf
+envsubst "$env_vars" < /etc/nginx/proxy.conf.template > /etc/nginx/proxy.conf
+
+envsubst "$env_vars" < /etc/nginx/conf.d/default.conf.template > /etc/nginx/conf.d/default.conf
+
+# Start Nginx using the default entrypoint
+exec nginx -g 'daemon off;'
--- a/docker/nginx-ha/https.conf.template
+++ b/docker/nginx-ha/https.conf.template
@ -0,0 +1,9 @@
+# Please do not directly edit this file. Instead, modify the .env variables related to NGINX configuration.
+
+listen ${NGINX_SSL_PORT} ssl;
+ssl_certificate ${SSL_CERTIFICATE_PATH};
+ssl_certificate_key ${SSL_CERTIFICATE_KEY_PATH};
+ssl_protocols ${NGINX_SSL_PROTOCOLS};
+ssl_prefer_server_ciphers on;
+ssl_session_cache shared:SSL:10m;
+ssl_session_timeout 10m;
--- a/docker/nginx-ha/nginx.conf.template
+++ b/docker/nginx-ha/nginx.conf.template
@ -0,0 +1,34 @@
+# Please do not directly edit this file. Instead, modify the .env variables related to NGINX configuration.
+
+user  nginx;
+worker_processes  ${NGINX_WORKER_PROCESSES};
+
+error_log  /var/log/nginx/error.log notice;
+pid        /var/run/nginx.pid;
+
+
+events {
+    worker_connections  1024;
+}
+
+
+http {
+    include       /etc/nginx/mime.types;
+    default_type  application/octet-stream;
+
+    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
+                      '$status $body_bytes_sent "$http_referer" '
+                      '"$http_user_agent" "$http_x_forwarded_for"';
+
+    access_log  /var/log/nginx/access.log  main;
+
+    sendfile        on;
+    #tcp_nopush     on;
+
+    keepalive_timeout  ${NGINX_KEEPALIVE_TIMEOUT};
+
+    #gzip  on;
+    client_max_body_size ${NGINX_CLIENT_MAX_BODY_SIZE};
+
+    include /etc/nginx/conf.d/*.conf;
+}
--- a/docker/nginx-ha/proxy.conf.template
+++ b/docker/nginx-ha/proxy.conf.template
@ -0,0 +1,11 @@
+# Please do not directly edit this file. Instead, modify the .env variables related to NGINX configuration.
+
+proxy_set_header Host $host;
+proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+proxy_set_header X-Forwarded-Proto $scheme;
+proxy_set_header X-Forwarded-Port $server_port;
+proxy_http_version 1.1;
+proxy_set_header Connection "";
+proxy_buffering off;
+proxy_read_timeout ${NGINX_PROXY_READ_TIMEOUT};
+proxy_send_timeout ${NGINX_PROXY_SEND_TIMEOUT};
--- a/docker/redis-ha/sentinel1.conf
+++ b/docker/redis-ha/sentinel1.conf
@ -0,0 +1,7 @@
+port 26379
+sentinel monitor dify-master-group redis-master 6379 2
+sentinel down-after-milliseconds dify-master-group 5000
+sentinel parallel-syncs dify-master-group 1
+sentinel failover-timeout dify-master-group 15000
+# If your Redis master has a password, uncomment and set the following:
+# sentinel auth-pass dify-master-group ${REDIS_PASSWORD:-difyai123456}
--- a/docker/redis-ha/sentinel2.conf
+++ b/docker/redis-ha/sentinel2.conf
@ -0,0 +1,7 @@
+port 26379
+sentinel monitor dify-master-group redis-master 6379 2
+sentinel down-after-milliseconds dify-master-group 5000
+sentinel parallel-syncs dify-master-group 1
+sentinel failover-timeout dify-master-group 15000
+# If your Redis master has a password, uncomment and set the following:
+# sentinel auth-pass dify-master-group ${REDIS_PASSWORD:-difyai123456}
--- a/docker/redis-ha/sentinel3.conf
+++ b/docker/redis-ha/sentinel3.conf
@ -0,0 +1,7 @@
+port 26379
+sentinel monitor dify-master-group redis-master 6379 2
+sentinel down-after-milliseconds dify-master-group 5000
+sentinel parallel-syncs dify-master-group 1
+sentinel failover-timeout dify-master-group 15000
+# If your Redis master has a password, uncomment and set the following:
+# sentinel auth-pass dify-master-group ${REDIS_PASSWORD:-difyai123456}
--- a/other_services_ha_notes.md
+++ b/other_services_ha_notes.md
@ -0,0 +1,30 @@
+## Other Services in High Availability Setup
+
+### Sandbox Service (`sandbox`)
+
+*   **Role:** The `sandbox` service is responsible for executing untrusted code, such as Python tools, in an isolated environment. This is crucial for security and stability when integrating with external code or APIs.
+*   **HA Configuration:** In `docker-compose.ha.yml`, the `sandbox` service is configured with `replicas: 2`. Docker Swarm's built-in load balancing will distribute requests between these replicas. Ensure your `CODE_EXECUTION_ENDPOINT` in the `.env` file points to the service name (`http://sandbox:8194`) so Docker can handle the routing.
+
+### Plugin Daemon Service (`plugin_daemon`)
+
+*   **Role:** The `plugin_daemon` manages the lifecycle and execution of Dify plugins. This includes handling plugin installations, updates, and runtime operations. It also exposes webhook endpoints for plugins that require them.
+*   **HA Configuration:**
+    *   The `plugin_daemon` service is configured with `replicas: 2` in `docker-compose.ha.yml`.
+    *   **External Webhooks:** Incoming webhook calls (e.g., `/e/{hook_id}`) are load-balanced by the Nginx service, which is configured with an `upstream` block for `dify_plugin_daemon_servers`.
+    *   **Internal Calls:** Direct calls from other Dify services (like `api` or `worker`) to the `plugin_daemon` (e.g., for plugin execution) are load-balanced by Docker Swarm's internal DNS and load balancing.
+    *   Ensure `PLUGIN_DAEMON_URL` is set to `http://plugin_daemon:5002` for internal communication.
+
+### SSRF Proxy Service (`ssrf_proxy`)
+
+*   **Role:** The `ssrf_proxy` service (Squid) acts as an outbound proxy for requests made by other services, particularly the `sandbox` and potentially some tools or plugins. Its primary purpose is to mitigate Server-Side Request Forgery (SSRF) vulnerabilities by controlling and filtering outbound HTTP/HTTPS requests.
+*   **HA Configuration:**
+    *   In the provided `docker-compose.ha.yml`, `ssrf_proxy` runs as a **single instance**.
+    *   For most Dify deployments, a single `ssrf_proxy` instance is sufficient as the volume of proxied outbound traffic is typically not a bottleneck.
+    *   However, if your deployment involves extremely high volumes of outbound requests that *must* go through this proxy, or if the proxy itself becomes a critical point of failure for essential features, you might need to investigate advanced HA configurations for Squid (e.g., using multiple Squid instances with a load balancer, or features like CARP/VRRP if network setup allows). Such advanced setups are complex and outside the scope of this HA template. Ensure your `SSRF_PROXY_HTTP_URL` and `SSRF_PROXY_HTTPS_URL` correctly point to this service (e.g., `http://ssrf_proxy:3128`).
+
+**Note on Persistent Storage for `sandbox` and `plugin_daemon`:**
+
+*   `sandbox`: The `sandbox` service in the default configuration uses a volume for `/dependencies`. If your custom tools require persistent state *within the sandbox itself* across restarts or between replicas (which is generally not recommended for stateless sandboxed execution), you would need to consider shared storage solutions. However, for its primary role of code execution, it's typically stateless.
+*   `plugin_daemon`: The `plugin_daemon` uses a volume for `/app/storage` (mapped from `./volumes/plugin_daemon`). This is used for storing plugin packages and potentially other plugin-related data. In an HA setup with multiple `plugin_daemon` replicas, this local volume means each replica has its own storage.
+    *   For plugin installation and management, this is generally acceptable as plugins are usually installed/updated via API calls that would be coordinated through the Dify API service.
+    *   If plugins themselves require shared persistent state *across replicas of the plugin_daemon*, you would need to configure `PLUGIN_STORAGE_TYPE` to use a shared object storage solution (like S3, Azure Blob, etc.) instead of the default `local` storage. This is detailed in Dify's plugin storage documentation. Using `local` storage with multiple `plugin_daemon` replicas means each replica might have a slightly different set of downloaded plugin assets if installations occurred at different times or were handled by different replicas, though the core Dify database would track installed plugins centrally. It's generally recommended to use shared storage for plugins in an HA environment.
--- a/persistent_storage_ha_notes.md
+++ b/persistent_storage_ha_notes.md
@ -0,0 +1,116 @@
+## Persistent Storage in a High Availability (HA) Dify Setup
+
+### 1. The Criticality of Shared Storage for HA
+
+In a High Availability (HA) environment, stateful services (those that need to read and write data to disk) must have their data stored on **shared, resilient network storage**. This storage must be accessible by all Docker Swarm nodes (or Kubernetes nodes, etc.) that could potentially run a replica of the service.
+
+When a container running a stateful service fails or is rescheduled to a different node, its replacement must be able to access the exact same data to ensure continuity and prevent data loss or inconsistency. Local host-path volumes are **not suitable** for HA because if the host goes down, the data on that host becomes unavailable.
+
+### 2. Services Requiring Shared Storage in `docker-compose.ha.yml`
+
+The following services in the provided `docker-compose.ha.yml` are stateful and their data volumes **must** be configured to use shared network storage for a true HA deployment:
+
+*   **`db` (PostgreSQL):**
+    *   **Volume:** `/var/lib/postgresql/data`
+    *   **Reason:** Contains all core Dify application data, user information, knowledge bases, chat histories, etc. This is the most critical data to protect.
+*   **`redis-master` (Redis):**
+    *   **Volume:** `/data`
+    *   **Reason:** While Redis is often used as a cache, it can also be used for persistent message queuing (Celery broker) and potentially other features. If Redis persistence is enabled (as it is by default for the master in the provided setup), this data should be on shared storage to allow a seamless recovery or failover if the Redis master container needs to be restarted on a different node. For a full HA Redis setup with Sentinel, the master's data persistence ensures that if a failover occurs and a slave is promoted (not applicable in the current single-master Sentinel setup but relevant for more advanced Redis HA), or if the master restarts, it can recover its state.
+*   **`weaviate` (or other vector stores):**
+    *   **Volume:** `/var/lib/weaviate` (for Weaviate)
+    *   **Reason:** Stores all vector embeddings for knowledge bases. Losing this data would require re-indexing all documents.
+*   **`api` and `worker` (Dify application file storage):**
+    *   **Volume:** `/app/api/storage`
+    *   **Reason:** This volume is used if you configure Dify with `STORAGE_TYPE=opendal` and `OPENDAL_SCHEME=fs`. It stores user-uploaded files (e.g., knowledge base documents, images in chat). For HA, all `api` and `worker` replicas must access the same file storage.
+    *   **Highly Recommended Alternative:** Use object storage like S3, Azure Blob, Google Cloud Storage, etc. See section 5.
+*   **`plugin_daemon`:**
+    *   **Volume:** `/app/storage` (mapped from `./volumes/plugin_daemon`)
+    *   **Reason:** Used for storing installed plugin packages and potentially other plugin-related data if `PLUGIN_STORAGE_TYPE=local`. If plugins are downloaded or managed by one replica, other replicas need access to the same plugin assets.
+    *   **Recommended Alternative for HA:** Configure `PLUGIN_STORAGE_TYPE` to use a shared object storage solution (S3, Azure Blob, etc.) as detailed in Dify's plugin documentation.
+*   **`sandbox`:**
+    *   **Volume 1:** `/dependencies`
+    *   **Volume 2:** `/conf`
+    *   **Reason:**
+        *   `/dependencies`: This volume is used to cache downloaded dependencies for sandboxed code execution. While it might be treated as a cache that can be rebuilt, sharing it could speed up cold starts for sandbox instances on new nodes. For true HA and consistent behavior, shared storage might be preferred if dependency resolution is time-consuming or complex.
+        *   `/conf`: Stores configuration for the sandbox environment. Changes here should be consistent across replicas.
+        *   In a strict HA setup, both might be better on shared storage, though `/dependencies` could be less critical if startup times are acceptable.
+
+### 3. How to Configure Shared Storage
+
+The default `docker-compose.ha.yml` uses host-relative paths (e.g., `./volumes/db/data:/var/lib/postgresql/data`) for volume mounts. This maps the container's data directory to a directory on the specific Docker host running the container. **This is NOT HA-compliant.**
+
+For a true HA deployment, these host paths **MUST be replaced** with one of the following shared storage strategies:
+
+*   **A. Named Volumes with External Storage Drivers:**
+    This is often the recommended approach with Docker Swarm. You define a named volume and configure a Docker storage driver that interfaces with your network storage solution (NFS, iSCSI, cloud provider block storage plugins like AWS EBS, Azure Disk, GCP Persistent Disk, etc.).
+
+    **Conceptual Example:**
+
+    ```yaml
+    # At the top-level of docker-compose.ha.yml
+    volumes:
+      postgres_data_ha:
+        driver: your-chosen-network-storage-driver # e.g., 'local' for NFS if pre-configured, or a cloud plugin
+        driver_opts:
+          # Options specific to your driver, e.g., for NFS:
+          # type: "nfs"
+          # o: "addr=nfs.example.com,rw,nfsvers=4,soft"
+          # device: ":/exports/dify_postgres_data"
+          # For cloud drivers, refer to their specific documentation.
+
+    services:
+      db:
+        # ... other db service config ...
+        volumes:
+          - postgres_data_ha:/var/lib/postgresql/data
+      
+      redis-master:
+        # ... other redis-master config ...
+        volumes:
+          - redis_data_ha:/data # Assuming 'redis_data_ha' is another named volume
+
+      # ... and similarly for weaviate, api/worker storage, plugin_daemon, sandbox ...
+    ```
+    You would need to create corresponding named volumes (e.g., `redis_data_ha`, `weaviate_data_ha`, etc.) for each stateful service. The specific `driver` and `driver_opts` will depend heavily on your chosen storage technology and Docker environment (Swarm, Kubernetes with a CSI driver, etc.).
+
+*   **B. Bind Mounting Pre-mounted Network Paths:**
+    Alternatively, you can pre-mount your network storage (e.g., an NFS share, GlusterFS mount, etc.) onto a consistent path on *all* Docker Swarm nodes that are part of the cluster. Then, you use this consistent host path as a bind mount in your `docker-compose.ha.yml`.
+
+    **Example:**
+    If you have an NFS share mounted at `/mnt/shared/dify/` on all your Docker hosts:
+
+    ```yaml
+    services:
+      db:
+        # ... other db service config ...
+        volumes:
+          - /mnt/shared/dify/postgres_data:/var/lib/postgresql/data
+      
+      redis-master:
+        # ... other redis-master config ...
+        volumes:
+          - /mnt/shared/dify/redis_data:/data
+
+      # ... and similarly for other stateful services ...
+    ```
+    This approach requires careful management of the underlying host mounts and ensuring they are always available before Docker services start.
+
+### 4. Consequences of Not Using Shared Storage
+
+If stateful services are run in an HA orchestrator (like Docker Swarm) without their data volumes on shared network storage:
+
+*   **Data Loss:** If a node running a service replica fails, and the orchestrator starts a new replica on a different node, the new replica will not have access to the data from the failed node. It will likely start with an empty or initialized data directory, leading to data loss.
+*   **Data Inconsistency:** Different replicas of a service might end up with different data sets if they are writing to local, non-shared volumes.
+*   **Stateful Failover Impossible:** True automatic failover of stateful services is not possible without shared storage.
+
+### 5. Alternative for Dify Application File Storage (`/app/api/storage`)
+
+For the Dify application's file storage (used by `api` and `worker` services when `OPENDAL_SCHEME=fs`), it is **highly recommended to use a dedicated object storage service** like AWS S3, Google Cloud Storage, Azure Blob Storage, MinIO, or other S3-compatible solutions. This is generally more scalable, resilient, and easier to manage for HA than filesystem-based shared storage for this specific purpose.
+
+Configure Dify to use object storage via the following environment variables:
+
+*   `STORAGE_TYPE=opendal` (or other types like `s3`, `azure-blob` depending on Dify version and specific adapter)
+*   `OPENDAL_SCHEME=<s3, azure, gcs, etc.>`
+*   And the relevant credentials and bucket information (e.g., `S3_ENDPOINT`, `S3_BUCKET_NAME`, `S3_ACCESS_KEY`, `S3_SECRET_KEY`, etc.).
+
+Using a managed object storage service offloads the complexity of storage HA to the cloud provider or your object storage solution. This is generally the preferred method for Dify's application file storage in an HA environment. The same applies to `PLUGIN_STORAGE_TYPE` for the `plugin_daemon` service.
--- a/postgresql_ha_notes.md
+++ b/postgresql_ha_notes.md
@ -0,0 +1,17 @@
+## PostgreSQL High Availability Notes
+
+For a production-ready high-availability (HA) Dify setup, it is **strongly recommended to use an external, managed PostgreSQL database service**. Examples include AWS RDS, Google Cloud SQL, or Azure Database for PostgreSQL. These services offer built-in HA, automated backups, and easier maintenance compared to a self-managed database.
+
+Dify uses the following standard environment variables to connect to your PostgreSQL database. You will need to configure these in your `.env` file or deployment environment:
+
+*   `DB_HOST`: The hostname or IP address of your PostgreSQL server.
+*   `DB_PORT`: The port number your PostgreSQL server is listening on (typically 5432).
+*   `DB_USERNAME`: The username for connecting to the database.
+*   `DB_PASSWORD`: The password for the specified username.
+*   `DB_DATABASE`: The name of the database Dify will use.
+
+**Important:** The `docker-compose.ha.yml` file provided in this repository includes a single `db` service running PostgreSQL. While this is convenient for development or testing, it represents a **single point of failure** in an HA context. If this containerized PostgreSQL instance fails, your Dify application will become unavailable.
+
+Setting up a truly HA PostgreSQL cluster (e.g., with replication and failover) within Docker Compose is complex and generally not recommended for production environments. Such setups often require specialized knowledge and tools (like Patroni, Stolon, or pg_auto_failover) and can be brittle if not managed carefully.
+
+If you choose to use the single PostgreSQL instance provided in `docker-compose.ha.yml` for any reason, **ensure you have a robust and regularly tested data backup and recovery strategy in place.** Data loss can occur if the container or its volume is corrupted or accidentally deleted.
--- a/redis_ha_notes.md
+++ b/redis_ha_notes.md
@ -0,0 +1,59 @@
+## Redis High Availability (HA) with Sentinel
+
+The `docker-compose.ha.yml` file includes a Redis setup configured for High Availability using Redis Sentinel. This setup consists of:
+
+*   One `redis-master` service: The primary Redis instance.
+*   Three `redis-sentinel-*` services: These sentinels monitor the master. If the master becomes unavailable, the sentinels will elect a new master (though in this Docker Compose setup, there's only one master candidate, so it's more about monitoring and providing a consistent connection endpoint for the application).
+
+### Alternative for Production: External Managed Redis
+
+For robust production HA, it is **strongly recommended to use an external, managed Redis service** (e.g., AWS ElastiCache, Google Cloud Memorystore, Azure Cache for Redis). These services typically offer better resilience, automated failover, and easier management than a self-managed Sentinel setup within Docker.
+
+If you use an external Redis service, you can comment out or remove the `redis-master` and `redis-sentinel-*` services from `docker-compose.ha.yml`. You will then configure Dify to connect directly to your managed Redis instance using its provided endpoint and credentials, ensuring `REDIS_USE_SENTINEL` is set to `false`.
+
+### Environment Variables for Sentinel Configuration
+
+To configure Dify to connect to Redis using the Sentinel setup provided in `docker-compose.ha.yml`, you need to set the following environment variables (e.g., in your `.env.ha.example` or deployment environment):
+
+*   `REDIS_HOST=redis-master`
+    *   Specifies the hostname for the Redis master. This should match the service name in `docker-compose.ha.yml`.
+*   `REDIS_PORT=6379`
+    *   The port Redis master is listening on.
+*   `REDIS_PASSWORD=${YOUR_REDIS_PASSWORD:-difyai123456}`
+    *   The password for your Redis master. This **must** be consistent across `redis-master` configuration, `sentinel auth-pass` in each sentinel's configuration file, and this environment variable.
+*   `REDIS_USE_SENTINEL=true`
+    *   Tells Dify to use the Sentinel protocol for connecting to Redis.
+*   `REDIS_SENTINELS=redis-sentinel-1:26379,redis-sentinel-2:26379,redis-sentinel-3:26379`
+    *   A comma-separated list of sentinel host:port pairs. These must match the service names and ports of the sentinel services in `docker-compose.ha.yml`.
+*   `REDIS_SENTINEL_SERVICE_NAME=dify-master-group`
+    *   The name of the master group defined in the sentinel configuration files (`sentinel monitor <name> ...`). This **must** match the name used in `docker/redis-ha/sentinel*.conf`.
+*   `REDIS_SENTINEL_PASSWORD=${YOUR_REDIS_PASSWORD:-difyai123456}`
+    *   The password used by Sentinels to authenticate with a password-protected Redis master. This should be the same as `REDIS_PASSWORD`. If the master does not have a password, this can be left blank, but `sentinel auth-pass` must also be commented out in sentinel configs.
+*   `CELERY_BROKER_URL=sentinel://redis-sentinel-1:26379,redis-sentinel-2:26379,redis-sentinel-3:26379/1`
+    *   The Celery broker URL configured for Sentinel. The `/1` at the end specifies Redis database number 1. Adjust if needed. You can also add `password=${YOUR_REDIS_PASSWORD}` within the sentinel part if needed, e.g., `sentinel://:${YOUR_REDIS_PASSWORD}@redis-sentinel-1:26379...` but it's often better to rely on `CELERY_SENTINEL_PASSWORD` if the library supports it.
+*   `CELERY_USE_SENTINEL=true`
+    *   Enables Sentinel mode for Celery's Redis backend.
+*   `CELERY_SENTINEL_MASTER_NAME=dify-master-group`
+    *   The name of the master group Celery should look for, matching `REDIS_SENTINEL_SERVICE_NAME`.
+*   `CELERY_SENTINEL_PASSWORD=${YOUR_REDIS_PASSWORD:-difyai123456}`
+    *   Password for Celery to authenticate with Redis master via Sentinel.
+
+### Sentinel Configuration for Password Authentication
+
+If your `redis-master` is configured with a password (which it is by default with `command: redis-server --requirepass ${REDIS_PASSWORD:-difyai123456}` in `docker-compose.ha.yml`), you **must** uncomment and configure the `sentinel auth-pass` directive in each of the sentinel configuration files:
+
+*   `docker/redis-ha/sentinel1.conf`
+*   `docker/redis-ha/sentinel2.conf`
+*   `docker/redis-ha/sentinel3.conf`
+
+Example line to uncomment and ensure the password matches `REDIS_PASSWORD`:
+`sentinel auth-pass dify-master-group ${REDIS_PASSWORD:-difyai123456}`
+
+**Consistency of `REDIS_PASSWORD` is crucial.** The same password should be used for:
+1.  The `redis-master` service's `command` in `docker-compose.ha.yml`.
+2.  The `sentinel auth-pass` directive in each `docker/redis-ha/sentinel*.conf` file.
+3.  The `REDIS_PASSWORD`, `REDIS_SENTINEL_PASSWORD`, and potentially `CELERY_SENTINEL_PASSWORD` environment variables used by the Dify application.
+
+This ensures that Sentinels can monitor the master and that the application can connect to Redis through Sentinel.Okay, I have successfully created `redis_ha_notes.md` with all the required information.
+
+This completes all the tasks for this sub-problem.
--- a/vector_store_ha_notes.md
+++ b/vector_store_ha_notes.md
@ -0,0 +1,28 @@
+## Vector Store High Availability (HA) Notes
+
+### Weaviate
+
+For a production-ready high-availability (HA) Dify setup that utilizes Weaviate as the vector store, it is **strongly recommended to use an external, managed vector database service or set up a dedicated Weaviate cluster**.
+
+*   **External Managed Service:** Consider cloud providers that offer managed Weaviate or other vector database services with built-in HA capabilities.
+*   **Dedicated Weaviate Cluster:** Refer to the [official Weaviate documentation](https://weaviate.io/developers/weaviate/concepts/cluster) for instructions on setting up a multi-node Weaviate cluster, typically on a platform like Kubernetes, for resilience and scalability.
+
+Dify uses the following environment variables to connect to your Weaviate instance. You will need to configure these in your `.env` file or deployment environment:
+
+*   `VECTOR_STORE=weaviate` (Ensure this is set to select Weaviate)
+*   `WEAVIATE_ENDPOINT=<your_external_weaviate_endpoint>`
+    *   Example: `http://your-weaviate-node1:8080` or the endpoint of your load balancer in front of the cluster.
+*   `WEAVIATE_API_KEY=<your_weaviate_api_key>`
+    *   Set this if your Weaviate instance or cluster requires API key authentication.
+
+**Important Note on the Provided `docker-compose.ha.yml`:**
+
+The `docker-compose.ha.yml` file includes a single `weaviate` service. While this is convenient for development or testing, it represents a **single point of failure** in an HA context. If this containerized Weaviate instance fails, operations relying on vector search (e.g., knowledge base retrieval) will be disrupted.
+
+Setting up a truly HA Weaviate cluster within Docker Compose is complex and generally not recommended for production due to the intricacies of networking, data replication, and sharding management in that environment.
+
+If you choose to use the single Weaviate instance provided in `docker-compose.ha.yml` for any reason, **ensure you have a robust and regularly tested data backup and recovery strategy for your vector embeddings.** Data loss can occur if the container or its volume is corrupted or accidentally deleted.
+
+### Other Vector Stores
+
+Dify supports various vector stores (e.g., Qdrant, Milvus, PGVector). If you choose a vector store other than Weaviate, you are responsible for investigating and implementing its specific high-availability mechanisms. Consult the official documentation for your chosen vector store for best practices on HA deployment. Ensure you update the relevant Dify environment variables (e.g., `QDRANT_URL`, `MILVUS_URI`, etc.) to point to your HA setup.