New Feature: Built-in Chain-of-Thought Support in Empower Functions Models Family

Introducing built-in Chain-of-Thought (CoT) support in empower functions models, the empower functions models could output their thought process along with the response.

June 3, 2024

•

4 minutes

We’re excited to share that we have launched an update to the family of Empower functions models, now featuring built-in Chain-of-Thought (CoT) support, the empower functions models could output their thought process along with the response. This can be easily toggled with a parameter in the request. This feature is available on both the Empower platform endpoints (doc) and our open-source model family (doc). Also we have the CoT enabled in our live demo so the assistant will display its thought process when the “thinking mode” is enabled.

What is Chain-of-Thought?

Chain-of-Thought (CoT) is a prompting technique that enhances complex reasoning in AI models by breaking down the reasoning process into intermediate steps. This method allows models to handle tasks that require multi-step thinking by explicitly generating and following a thought process before arriving at a final response. By doing so, CoT improves the accuracy and transparency of the model's outputs, because of the nature of causal inference of LLMs.

In function-calling use cases, CoT is typically utilized to analyze the intent of the user input to determine whether it’s appropriate to trigger functions or continue the conversation as usual. If it’s suitable to trigger functions, the model identifies the most appropriate function(s) to invoke. It checks if any required parameters are missing and cannot be inferred from the conversation context. Based on this analysis, the model triggers the functions or asks the user for follow-up information.

Below is a quick example of prompt used for the model to do CoT for function calling and a sample model response on the thought process:

Prompt:

To respond to the user's request, use relevant tools if available. Follow these steps:

Analyze the request to identify the appropriate tool to use.
Review the required parameters for the selected tool.
Determine if the user has provided all necessary parameters or if they can be inferred from the context. Carefully consider all provided information to support any inferred values.
If all required parameters are present or can be reasonably inferred, proceed to call the tool.
If any required parameter is missing, do not call the tool. Instead, ask the user for the missing information.
Do not request additional details for optional parameters if they are not provided.

Thinking response:

The user asked for the weather in San Francisco. The relevant tool to use is "get_current_weather," which requires the "location" parameter. The user provided the location directly as "San Francisco," so all required parameters are present, leading to the tool call with the argument "location" set to "San Francisco."

Model Level Chain-of-Thought Support

While it’s typical to implement CoT at the prompt level, this approach has two main drawbacks:

Performance: Additional instructions and tokens are needed to guide the CoT process, introducing overhead in terms of both cost and latency.
Reliability: Ensuring the model follows the correct format is challenging, especially for function calling, which involves a mix of JSON (function calls) and free text (thinking). This complexity makes streaming extremely difficult. There are tricks to mitigate this, such as adding an additional "explanation" parameter to the function definition, but this has limitations. When the explanation is generated, the model has already decided to trigger functions and which exact function(s) to trigger, so the improvement in accuracy is limited.

To address these drawbacks, we decided to enable CoT at the model level. Empower functions models have been trained with built-in CoT capability that can be enabled with a special prompt (less than 10 tokens in the internal system prompt). When CoT is enabled, Empower functions models will respond with their thought process within tags before the actual response (which will be a set of function calls or regular conversations). This approach provides the model with a full “thought process” before deciding whether to trigger any functions and which function(s) to trigger. We have fully supported streaming with CoT. Additionally, the model can function without CoT if the special prompt is not added.

How to Use?

Because this is model level feature, all you need to do is just add an “include_thinking” parameter when hitting the chat-completion API as the curl example below:

curl -XPOST 'https://app.empower.dev/api/v1/chat/completions' \-H 'Authorization: Bearer API_KEY' \-H 'Content-Type: application/json' \-d '
{
  "model": "empower-functions-medium",
  "include_thinking": true,
  "messages": [
    {
      "role": "user",
      "content": "How'\''s the weather in Paris and Tokyo?"
    }
  ],
  "temperature": 0.0,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g., San Francisco, CA"
            }
          },
          "required": [
            "location"
          ]
        }
      }
    }
  ]
}'

Output:

{
  "id": "",
  "created": 1717306979,
  "model": "empower-functions-medium",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "<thinking>The user asked for the current weather in Paris and Tokyo. The relevant tool to use is \"get_current_weather,\" which requires the \"location\" parameter. The user provided both locations directly, so the tool calls were made with \"Paris\" and \"Tokyo\" as the location values.</thinking>",
        "tool_calls": [
          {
            "id": "",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"location\":\"Paris\"}"
            }
          },
          {
            "id": "",
            "index": 1,
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"location\":\"Tokyo\"}"
            }
          }
        ]
      },
      "index": 0,
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 189,
    "completion_tokens": 149,
    "total_tokens": 338
  },
  "object": "chat.completion"
}

Full code examples can be found in this doc. And please refer to our github document for using this feature in our open-source model family.

Ready to start?

Deploy and serve your first fine-tuned LLM in 1 minute for free!

a black and white image of a black and white background