docs: add english versions for the files customizable_model_scale_out and predefined_model_scale_out (#8871)

2 years ago · 42dfde6546
parent c531b4a911
commit 42dfde6546
8 changed files with 485 additions and 2 deletions
--- a/api/core/model_runtime/docs/en_US/customizable_model_scale_out.md
+++ b/api/core/model_runtime/docs/en_US/customizable_model_scale_out.md
@ -0,0 +1,310 @@
 ## Custom Integration of Pre-defined Models
 ### Introduction
 After completing the vendors integration, the next step is to connect the vendor's models. To illustrate the entire connection process, we will use Xinference as an example to demonstrate a complete vendor integration.
 It is important to note that for custom models, each model connection requires a complete vendor credential.
 Unlike pre-defined models, a custom vendor integration always includes the following two parameters, which do not need to be defined in the vendor YAML file.
 ![](images/index/image-3.png)
 As mentioned earlier, vendors do not need to implement validate_provider_credential. The runtime will automatically call the corresponding model layer's validate_credentials to validate the credentials based on the model type and name selected by the user.
 ### Writing the Vendor YAML
 First, we need to identify the types of models supported by the vendor we are integrating.
 Currently supported model types are as follows:
 - `llm` Text Generation Models
 - `text_embedding` Text Embedding Models
 - `rerank` Rerank Models
 - `speech2text` Speech-to-Text
 - `tts` Text-to-Speech
 - `moderation` Moderation
 Xinference supports LLM, Text Embedding, and Rerank. So we will start by writing xinference.yaml.
 ```yaml
 provider: xinference #Define the vendor identifier
 label: # Vendor display name, supports both en_US (English) and zh_Hans (Simplified Chinese). If zh_Hans is not set, it will use en_US by default.
  en_US: Xorbits Inference
 icon_small: # Small icon, refer to other vendors' icons stored in the _assets directory within the vendor implementation directory; follows the same language policy as the label
  en_US: icon_s_en.svg
 icon_large: # Large icon
  en_US: icon_l_en.svg
 help: # Help information
  title:
    en_US: How to deploy Xinference
    zh_Hans: 如何部署 Xinference
  url:
    en_US: https://github.com/xorbitsai/inference
 supported_model_types: # Supported model types. Xinference supports LLM, Text Embedding, and Rerank
 - llm
 - text-embedding
 - rerank
 configurate_methods: # Since Xinference is a locally deployed vendor with no predefined models, users need to deploy whatever models they need according to Xinference documentation. Thus, it only supports custom models.
 - customizable-model
 provider_credential_schema:
  credential_form_schemas:
 ```
 Then, we need to determine what credentials are required to define a model in Xinference.
 - Since it supports three different types of models, we need to specify the model_type to denote the model type. Here is how we can define it:
 ```yaml
 provider_credential_schema:
  credential_form_schemas:
  - variable: model_type
    type: select
    label:
      en_US: Model type
      zh_Hans: 模型类型
    required: true
    options:
    - value: text-generation
      label:
        en_US: Language Model
        zh_Hans: 语言模型
    - value: embeddings
      label:
        en_US: Text Embedding
    - value: reranking
      label:
        en_US: Rerank
 ```
 - Next, each model has its own model_name, so we need to define that here:
 ```yaml
  - variable: model_name
    type: text-input
    label:
      en_US: Model name
      zh_Hans: 模型名称
    required: true
    placeholder:
      zh_Hans: 填写模型名称
      en_US: Input model name
 ```
 - Specify the Xinference local deployment address:
 ```yaml
  - variable: server_url
    label:
      zh_Hans: 服务器URL
      en_US: Server url
    type: text-input
    required: true
    placeholder:
      zh_Hans: 在此输入Xinference的服务器地址，如 https://example.com/xxx
      en_US: Enter the url of your Xinference, for example https://example.com/xxx
 ```
 - Each model has a unique model_uid, so we also need to define that here:
 ```yaml
  - variable: model_uid
    label:
      zh_Hans: 模型UID
      en_US: Model uid
    type: text-input
    required: true
    placeholder:
      zh_Hans: 在此输入您的Model UID
      en_US: Enter the model uid
 ```
 Now, we have completed the basic definition of the vendor.
 ### Writing the Model Code
 Next, let's take the `llm` type as an example and write `xinference.llm.llm.py`.
 In `llm.py`, create a Xinference LLM class, we name it `XinferenceAILargeLanguageModel` (this can be arbitrary), inheriting from the `__base.large_language_model.LargeLanguageModel` base class, and implement the following methods:
 - LLM Invocation
 Implement the core method for LLM invocation, supporting both stream and synchronous responses.
 ```python
 def _invoke(self, model: str, credentials: dict,
            prompt_messages: list[PromptMessage], model_parameters: dict,
            tools: Optional[list[PromptMessageTool]] = None, stop: Optional[list[str]] = None,
            stream: bool = True, user: Optional[str] = None) \
        -> Union[LLMResult, Generator]:
    """
    Invoke large language model
    :param model: model name
 	:param credentials: model credentials
 	:param prompt_messages: prompt messages
 	:param model_parameters: model parameters
 	:param tools: tools for tool usage
 	:param stop: stop words
 	:param stream: is the response a stream
 	:param user: unique user id
 	:return: full response or stream response chunk generator result
 	"""
 ```
 When implementing, ensure to use two functions to return data separately for synchronous and stream responses. This is important because Python treats functions containing the `yield` keyword as generator functions, mandating them to return `Generator` types. Here’s an example (note that the example uses simplified parameters; in real implementation, use the parameter list as defined above):
 ```python
 def _invoke(self, stream: bool, **kwargs) \
        -> Union[LLMResult, Generator]:
    if stream:
          return self._handle_stream_response(**kwargs)
    return self._handle_sync_response(**kwargs)
 def _handle_stream_response(self, **kwargs) -> Generator:
    for chunk in response:
          yield chunk
 def _handle_sync_response(self, **kwargs) -> LLMResult:
    return LLMResult(**response)
 ```
 - Pre-compute Input Tokens
 If the model does not provide an interface for pre-computing tokens, you can return 0 directly.
 ```python
 def get_num_tokens(self, model: str, credentials: dict, prompt_messages: list[PromptMessage],tools: Optional[list[PromptMessageTool]] = None) -> int:
  """
  Get number of tokens for given prompt messages
  :param model: model name
  :param credentials: model credentials
  :param prompt_messages: prompt messages
  :param tools: tools for tool usage
  :return: token count
  """
 ```
 Sometimes, you might not want to return 0 directly. In such cases, you can use `self._get_num_tokens_by_gpt2(text: str)` to get pre-computed tokens. This method is provided by the `AIModel` base class, and it uses GPT2's Tokenizer for calculation. However, it should be noted that this is only a substitute and may not be fully accurate.
 - Model Credentials Validation
 Similar to vendor credentials validation, this method validates individual model credentials.
 ```python
 def validate_credentials(self, model: str, credentials: dict) -> None:
    """
    Validate model credentials
    :param model: model name
 	:param credentials: model credentials
 	:return: None
 	"""
 ```
 - Model Parameter Schema
 Unlike custom types, since the YAML file does not define which parameters a model supports, we need to dynamically generate the model parameter schema.
 For instance, Xinference supports `max_tokens`, `temperature`, and `top_p` parameters.
 However, some vendors may support different parameters for different models. For example, the `OpenLLM` vendor supports `top_k`, but not all models provided by this vendor support `top_k`. Let's say model A supports `top_k` but model B does not. In such cases, we need to dynamically generate the model parameter schema, as illustrated below:
 ```python
    def get_customizable_model_schema(self, model: str, credentials: dict) -> AIModelEntity | None:
        """
            used to define customizable model schema
        """
        rules = [
            ParameterRule(
                name='temperature', type=ParameterType.FLOAT,
                use_template='temperature',
                label=I18nObject(
                    zh_Hans='温度', en_US='Temperature'
                )
            ),
            ParameterRule(
                name='top_p', type=ParameterType.FLOAT,
                use_template='top_p',
                label=I18nObject(
                    zh_Hans='Top P', en_US='Top P'
                )
            ),
            ParameterRule(
                name='max_tokens', type=ParameterType.INT,
                use_template='max_tokens',
                min=1,
                default=512,
                label=I18nObject(
                    zh_Hans='最大生成长度', en_US='Max Tokens'
                )
            )
        ]
        # if model is A, add top_k to rules
        if model == 'A':
            rules.append(
                ParameterRule(
                    name='top_k', type=ParameterType.INT,
                    use_template='top_k',
                    min=1,
                    default=50,
                    label=I18nObject(
                        zh_Hans='Top K', en_US='Top K'
                    )
                )
            )
        """
            some NOT IMPORTANT code here
        """
        entity = AIModelEntity(
            model=model,
            label=I18nObject(
                en_US=model
            ),
            fetch_from=FetchFrom.CUSTOMIZABLE_MODEL,
            model_type=model_type,
            model_properties={ 
                ModelPropertyKey.MODE:  ModelType.LLM,
            },
            parameter_rules=rules
        )
        return entity
 ```
 - Exception Error Mapping
 When a model invocation error occurs, it should be mapped to the runtime's specified `InvokeError` type, enabling Dify to handle different errors appropriately.
 Runtime Errors:
 - `InvokeConnectionError` Connection error during invocation
 - `InvokeServerUnavailableError` Service provider unavailable
 - `InvokeRateLimitError` Rate limit reached
 - `InvokeAuthorizationError` Authorization failure
 - `InvokeBadRequestError` Invalid request parameters
 ```python
  @property
  def _invoke_error_mapping(self) -> dict[type[InvokeError], list[type[Exception]]]:
      """
      Map model invoke error to unified error
      The key is the error type thrown to the caller
      The value is the error type thrown by the model,
      which needs to be converted into a unified error type for the caller.
      :return: Invoke error mapping
      """
 ```
 For interface method details, see: [Interfaces](./interfaces.md). For specific implementations, refer to: [llm.py](https://github.com/langgenius/dify-runtime/blob/main/lib/model_providers/anthropic/llm/llm.py).
--- a/api/core/model_runtime/docs/en_US/images/index/image-1.png
+++ b/api/core/model_runtime/docs/en_US/images/index/image-1.png
--- a/api/core/model_runtime/docs/en_US/images/index/image-2.png
+++ b/api/core/model_runtime/docs/en_US/images/index/image-2.png
--- a/api/core/model_runtime/docs/en_US/images/index/image-3.png
+++ b/api/core/model_runtime/docs/en_US/images/index/image-3.png
--- a/api/core/model_runtime/docs/en_US/images/index/image.png
+++ b/api/core/model_runtime/docs/en_US/images/index/image.png
--- a/api/core/model_runtime/docs/en_US/predefined_model_scale_out.md
+++ b/api/core/model_runtime/docs/en_US/predefined_model_scale_out.md
@ -0,0 +1,173 @@
 ## Predefined Model Integration
 After completing the vendor integration, the next step is to integrate the models from the vendor.
 First, we need to determine the type of model to be integrated and create the corresponding model type `module` under the respective vendor's directory.
 Currently supported model types are:
 - `llm` Text Generation Model
 - `text_embedding` Text Embedding Model
 - `rerank` Rerank Model
 - `speech2text` Speech-to-Text
 - `tts` Text-to-Speech
 - `moderation` Moderation
 Continuing with `Anthropic` as an example, `Anthropic` only supports LLM, so create a `module` named `llm` under `model_providers.anthropic`.
 For predefined models, we first need to create a YAML file named after the model under the `llm` `module`, such as `claude-2.1.yaml`.
 ### Prepare Model YAML
 ```yaml
 model: claude-2.1  # Model identifier
 # Display name of the model, which can be set to en_US English or zh_Hans Chinese. If zh_Hans is not set, it will default to en_US.
 # This can also be omitted, in which case the model identifier will be used as the label
 label:
  en_US: claude-2.1
 model_type: llm  # Model type, claude-2.1 is an LLM
 features:  # Supported features, agent-thought supports Agent reasoning, vision supports image understanding
 - agent-thought
 model_properties:  # Model properties
  mode: chat  # LLM mode, complete for text completion models, chat for conversation models
  context_size: 200000  # Maximum context size
 parameter_rules:  # Parameter rules for the model call; only LLM requires this
 - name: temperature  # Parameter variable name
  # Five default configuration templates are provided: temperature/top_p/max_tokens/presence_penalty/frequency_penalty
  # The template variable name can be set directly in use_template, which will use the default configuration in entities.defaults.PARAMETER_RULE_TEMPLATE
  # Additional configuration parameters will override the default configuration if set
  use_template: temperature
 - name: top_p
  use_template: top_p
 - name: top_k
  label:  # Display name of the parameter
    zh_Hans: 取样数量
    en_US: Top k
  type: int  # Parameter type, supports float/int/string/boolean
  help:  # Help information, describing the parameter's function
    zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
    en_US: Only sample from the top K options for each subsequent token.
  required: false  # Whether the parameter is mandatory; can be omitted
 - name: max_tokens_to_sample
  use_template: max_tokens
  default: 4096  # Default value of the parameter
  min: 1  # Minimum value of the parameter, applicable to float/int only
  max: 4096  # Maximum value of the parameter, applicable to float/int only
 pricing:  # Pricing information
  input: '8.00'  # Input unit price, i.e., prompt price
  output: '24.00'  # Output unit price, i.e., response content price
  unit: '0.000001'  # Price unit, meaning the above prices are per 100K
  currency: USD  # Price currency
 ```
 It is recommended to prepare all model configurations before starting the implementation of the model code.
 You can also refer to the YAML configuration information under the corresponding model type directories of other vendors in the `model_providers` directory. For the complete YAML rules, refer to: [Schema](schema.md#aimodelentity).
 ### Implement the Model Call Code
 Next, create a Python file named `llm.py` under the `llm` `module` to write the implementation code.
 Create an Anthropic LLM class named `AnthropicLargeLanguageModel` (or any other name), inheriting from the `__base.large_language_model.LargeLanguageModel` base class, and implement the following methods:
 - LLM Call
 Implement the core method for calling the LLM, supporting both streaming and synchronous responses.
 ```python
  def _invoke(self, model: str, credentials: dict,
              prompt_messages: list[PromptMessage], model_parameters: dict,
              tools: Optional[list[PromptMessageTool]] = None, stop: Optional[list[str]] = None,
              stream: bool = True, user: Optional[str] = None) \
          -> Union[LLMResult, Generator]:
      """
      Invoke large language model
      :param model: model name
      :param credentials: model credentials
      :param prompt_messages: prompt messages
      :param model_parameters: model parameters
      :param tools: tools for tool calling
      :param stop: stop words
      :param stream: is stream response
      :param user: unique user id
      :return: full response or stream response chunk generator result
      """
 ```
 Ensure to use two functions for returning data, one for synchronous returns and the other for streaming returns, because Python identifies functions containing the `yield` keyword as generator functions, fixing the return type to `Generator`. Thus, synchronous and streaming returns need to be implemented separately, as shown below (note that the example uses simplified parameters, for actual implementation follow the above parameter list):
 ```python
  def _invoke(self, stream: bool, **kwargs) \
          -> Union[LLMResult, Generator]:
      if stream:
            return self._handle_stream_response(**kwargs)
      return self._handle_sync_response(**kwargs)
  def _handle_stream_response(self, **kwargs) -> Generator:
      for chunk in response:
            yield chunk
  def _handle_sync_response(self, **kwargs) -> LLMResult:
      return LLMResult(**response)
 ```
 - Pre-compute Input Tokens
 If the model does not provide an interface to precompute tokens, return 0 directly.
 ```python
  def get_num_tokens(self, model: str, credentials: dict, prompt_messages: list[PromptMessage],
                     tools: Optional[list[PromptMessageTool]] = None) -> int:
      """
      Get number of tokens for given prompt messages
      :param model: model name
      :param credentials: model credentials
      :param prompt_messages: prompt messages
      :param tools: tools for tool calling
      :return:
      """
 ```
 - Validate Model Credentials
 Similar to vendor credential validation, but specific to a single model.
 ```python
  def validate_credentials(self, model: str, credentials: dict) -> None:
      """
      Validate model credentials
      :param model: model name
      :param credentials: model credentials
      :return:
      """
 ```
 - Map Invoke Errors
 When a model call fails, map it to a specific `InvokeError` type as required by Runtime, allowing Dify to handle different errors accordingly.
 Runtime Errors:
 - `InvokeConnectionError` Connection error
 - `InvokeServerUnavailableError` Service provider unavailable
 - `InvokeRateLimitError` Rate limit reached
 - `InvokeAuthorizationError` Authorization failed
 - `InvokeBadRequestError` Parameter error
 ```python
  @property
  def _invoke_error_mapping(self) -> dict[type[InvokeError], list[type[Exception]]]:
      """
      Map model invoke error to unified error
      The key is the error type thrown to the caller
      The value is the error type thrown by the model,
      which needs to be converted into a unified error type for the caller.
      :return: Invoke error mapping
      """
 ```
 For interface method explanations, see: [Interfaces](./interfaces.md). For detailed implementation, refer to: [llm.py](https://github.com/langgenius/dify-runtime/blob/main/lib/model_providers/anthropic/llm/llm.py).
--- a/api/core/model_runtime/docs/en_US/provider_scale_out.md
+++ b/api/core/model_runtime/docs/en_US/provider_scale_out.md
@ -58,7 +58,7 @@ provider_credential_schema:  # Provider credential rules, as Anthropic only supp
      en_US: Enter your API URL
 ```
-You can also refer to the YAML configuration information under other provider directories in `model_providers`. The complete YAML rules are available at: [Schema](schema.md#Provider).
+You can also refer to the YAML configuration information under other provider directories in `model_providers`. The complete YAML rules are available at: [Schema](schema.md#provider).
 ### Implementing Provider Code
--- a/api/core/model_runtime/docs/zh_Hans/provider_scale_out.md
+++ b/api/core/model_runtime/docs/zh_Hans/provider_scale_out.md
@ -117,7 +117,7 @@ model_credential_schema:
      en_US: Enter your API Base
 ```
-也可以参考  `model_providers` 目录下其他供应商目录下的 YAML 配置信息，完整的 YAML 规则见：[Schema](schema.md#Provider)。
+也可以参考  `model_providers` 目录下其他供应商目录下的 YAML 配置信息，完整的 YAML 规则见：[Schema](schema.md#provider)。
 #### 实现供应商代码