en_US:Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
en_US:It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
en_US:The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
en_US:The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
en_US:The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
en_US:Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
en_US:Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
en_US:It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
en_US:The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
en_US:The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
en_US:The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
en_US:Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
markers = {main = "(python_version == \"3.11\" or python_version >= \"3.12\") and (platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\" or platform_python_implementation == \"CPython\")", tools = "(python_version == \"3.11\" or python_version >= \"3.12\") and (platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\")", vdb = "(python_version == \"3.11\" or python_version >= \"3.12\") and (platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\")"}
markers = {main = "(python_version == \"3.11\" or python_version >= \"3.12\") and (platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\" or platform_python_implementation == \"CPython\")", dev = "(python_version == \"3.11\" or python_version >= \"3.12\") and (platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\")", tools = "(python_version == \"3.11\" or python_version >= \"3.12\") and (platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\")", vdb = "(python_version == \"3.11\" or python_version >= \"3.12\") and (platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\")"}
[package.extras]
docs = ["Sphinx", "furo"]
@ -4585,7 +4545,7 @@ version = "2.2.0"
description = "Safely pass data to untrusted environments and back."
optional = false
python-versions = ">=3.8"
groups = ["main", "tools"]
groups = ["main", "dev", "tools"]
markers = "python_version == \"3.11\" or python_version >= \"3.12\""
Document metadata (required if doc_type is provided). Fields vary by doc_type:
For <code>book</code>:
- <code>title</code> Book title
- <code>language</code> Book language
- <code>author</code> Book author
- <code>publisher</code> Publisher name
- <code>publication_date</code> Publication date
- <code>isbn</code> ISBN number
- <code>category</code> Book category
For <code>web_page</code>:
- <code>title</code> Page title
- <code>url</code> Page URL
- <code>language</code> Page language
- <code>publish_date</code> Publish date
- <code>author/publisher</code> Author or publisher
- <code>topic/keywords</code> Topic or keywords
- <code>description</code> Page description
Please check [api/services/dataset_service.py](https://github.com/langgenius/dify/blob/main/api/services/dataset_service.py#L475) for more details on the fields required for each doc_type.
For doc_type "others", any valid JSON object is accepted
- <code>qa_model</code> Q&A Mode: Generates Q&A pairs for segmented documents and then embeds the questions
- <code>doc_type</code> Type of document (optional)
- <code>book</code> Book
Document records a book or publication
- <code>web_page</code> Web page
Document records web page content
- <code>paper</code> Academic paper/article
Document records academic paper or research article
- <code>social_media_post</code> Social media post
Content from social media posts
- <code>wikipedia_entry</code> Wikipedia entry
Content from Wikipedia entries
- <code>personal_document</code> Personal document
Documents related to personal content
- <code>business_document</code> Business document
Documents related to business content
- <code>im_chat_log</code> Chat log
Records of instant messaging chats
- <code>synced_from_notion</code> Notion document
Documents synchronized from Notion
- <code>synced_from_github</code> GitHub document
Documents synchronized from GitHub
- <code>others</code> Other document types
Other document types not listed above
- <code>doc_metadata</code> Document metadata (required if doc_type is provided)
Fields vary by doc_type:
For <code>book</code>:
- <code>title</code> Book title
Title of the book
- <code>language</code> Book language
Language of the book
- <code>author</code> Book author
Author of the book
- <code>publisher</code> Publisher name
Name of the publishing house
- <code>publication_date</code> Publication date
Date when the book was published
- <code>isbn</code> ISBN number
International Standard Book Number
- <code>category</code> Book category
Category or genre of the book
For <code>web_page</code>:
- <code>title</code> Page title
Title of the web page
- <code>url</code> Page URL
URL address of the web page
- <code>language</code> Page language
Language of the web page
- <code>publish_date</code> Publish date
Date when the web page was published
- <code>author/publisher</code> Author or publisher
Author or publisher of the web page
- <code>topic/keywords</code> Topic or keywords
Topics or keywords of the web page
- <code>description</code> Page description
Description of the web page content
Please check [api/services/dataset_service.py](https://github.com/langgenius/dify/blob/main/api/services/dataset_service.py#L475) for more details on the fields required for each doc_type.
For doc_type "others", any valid JSON object is accepted
- <code>doc_language</code> In Q&A mode, specify the language of the document, for example: <code>English</code>, <code>Chinese</code>
Document metadata (required if doc_type is provided). Fields vary by doc_type:
For <code>book</code>:
- <code>title</code> Book title
- <code>language</code> Book language
- <code>author</code> Book author
- <code>publisher</code> Publisher name
- <code>publication_date</code> Publication date
- <code>isbn</code> ISBN number
- <code>category</code> Book category
For <code>web_page</code>:
- <code>title</code> Page title
- <code>url</code> Page URL
- <code>language</code> Page language
- <code>publish_date</code> Publish date
- <code>author/publisher</code> Author or publisher
- <code>topic/keywords</code> Topic or keywords
- <code>description</code> Page description
Please check [api/services/dataset_service.py](https://github.com/langgenius/dify/blob/main/api/services/dataset_service.py#L475) for more details on the fields required for each doc_type.
For doc_type "others", any valid JSON object is accepted
- <code>separator</code> Segmentation identifier. Currently, only one delimiter is allowed. The default is <code>***</code>
- <code>max_tokens</code> The maximum length (tokens) must be validated to be shorter than the length of the parent chunk
- <code>chunk_overlap</code> Define the overlap between adjacent chunks (optional)
- <code>doc_type</code> Type of document (optional)
- <code>book</code> Book
Document records a book or publication
- <code>web_page</code> Web page
Document records web page content
- <code>paper</code> Academic paper/article
Document records academic paper or research article
- <code>social_media_post</code> Social media post
Content from social media posts
- <code>wikipedia_entry</code> Wikipedia entry
Content from Wikipedia entries
- <code>personal_document</code> Personal document
Documents related to personal content
- <code>business_document</code> Business document
Documents related to business content
- <code>im_chat_log</code> Chat log
Records of instant messaging chats
- <code>synced_from_notion</code> Notion document
Documents synchronized from Notion
- <code>synced_from_github</code> GitHub document
Documents synchronized from GitHub
- <code>others</code> Other document types
Other document types not listed above
- <code>doc_metadata</code> Document metadata (required if doc_type is provided)
Fields vary by doc_type:
For <code>book</code>:
- <code>title</code> Book title
Title of the book
- <code>language</code> Book language
Language of the book
- <code>author</code> Book author
Author of the book
- <code>publisher</code> Publisher name
Name of the publishing house
- <code>publication_date</code> Publication date
Date when the book was published
- <code>isbn</code> ISBN number
International Standard Book Number
- <code>category</code> Book category
Category or genre of the book
For <code>web_page</code>:
- <code>title</code> Page title
Title of the web page
- <code>url</code> Page URL
URL address of the web page
- <code>language</code> Page language
Language of the web page
- <code>publish_date</code> Publish date
Date when the web page was published
- <code>author/publisher</code> Author or publisher
Author or publisher of the web page
- <code>topic/keywords</code> Topic or keywords
Topics or keywords of the web page
- <code>description</code> Page description
Description of the web page content
Please check [api/services/dataset_service.py](https://github.com/langgenius/dify/blob/main/api/services/dataset_service.py#L475) for more details on the fields required for each doc_type.
For doc_type "others", any valid JSON object is accepted
@ -400,7 +400,7 @@ The text generation application offers non-session support and is ideal for tran
For text messages generated by Dify, simply pass the generated message-id directly. The backend will use the message-id to look up the corresponding content and synthesize the voice information directly. If both message_id and text are provided simultaneously, the message_id is given priority.
</Property>
<Property name='text' type='str' key='text'>
Speech generated content。
Speech generated content.
</Property>
<Property name='user' type='string' key='user'>
The user identifier, defined by the developer, must ensure uniqueness within the app.