Fixing GLM4.5/Qwen3-Coder Tool Call Render Issue

by SLV Team 49 views

Introduction

Hey guys! Today, let's dive deep into a specific bug that affects the GLM4.5 and Qwen3-Coder models when dealing with tool calls. This issue causes a render failure because these models expect tool call arguments in a specific format (Mappings), but they're receiving them as strings. Don't worry if this sounds too technical – we'll break it down in a way that's super easy to understand. We will explore the root cause of this problem, provide a step-by-step guide on how to reproduce it, discuss the expected behavior, and propose a solution to fix it. Let's get started and make sure your models are running smoothly!

Understanding the Bug

So, what's the big deal with this bug? Well, when you're working with models like GLM4.5 and Qwen3-Coder, you might use something called "tool calls." Think of these as special instructions or functions that the model can use to get more information or perform specific tasks. Now, these tools need arguments – the data they need to work with. The problem is that these models expect those arguments to be in a specific format, kind of like how you need to write your address in a certain way for the mail to arrive. Specifically, they expect what's called a "Mapping," which is like a dictionary in programming terms.

However, the way these arguments are currently being sent is as simple strings – just plain text. It's like trying to send a letter without putting it in an envelope! This mismatch causes the models to throw an error and fail to render the tool call properly. The OAI standard specifies that arguments are currently sent and represented as a string.

This issue arises because the default chat templates for GLM4.5 and Qwen3-Coder models are designed to handle tool call arguments as Mappings. When the arguments are passed as strings, the rendering process fails, leading to an interruption in the model's completion. This is a crucial issue to address, as it directly impacts the functionality and reliability of these models in real-world applications.

Root Cause Analysis

To really understand this bug, we need to dig into why this format mismatch is happening. The issue stems from how the TabbyAPI handles tool call arguments. According to the OAI standard, the arguments are sent as strings. However, the GLM4.5 and Qwen3-Coder models, which are built on top of Transformers and SGLang, expect these arguments to be in a Mapping format before rendering.

Essentially, the arguments are getting lost in translation. It's like ordering food in a different language – the message isn't getting through clearly. This is further emphasized by the recommendations from Transformers and SGLang, which suggest converting the arguments field to a Mapping before initiating the render. This conversion is crucial for ensuring compatibility between the format in which the arguments are sent and the format in which the models expect to receive them.

The discrepancy highlights a need for a conversion step within the TabbyAPI to ensure that the arguments are correctly formatted before being passed to the models. Identifying the root cause is the first step towards implementing a robust solution that aligns with both the OAI standard and the specific requirements of the GLM4.5 and Qwen3-Coder models.

Reproducing the Bug

Okay, so how can you see this bug in action? It's actually pretty straightforward. The easiest way to reproduce this issue is by using a curl command. Think of curl as a tool that lets you send messages to a server – in this case, your model.

Here’s the command you can use:

#!/bin/bash

curl http://localhost:19999/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $YOUR_API_KEY" \
  -d '{
  "model": "Kwaipilot-KAT-Dev",
  "tools": [
    {
      "type": "function", 
      "function": {
        "name": "tool_calculate_post", 
        "description": "Calculates/evaluates the given expression.", 
        "parameters": {
          "type": "object", 
          "properties": {
            "expression": {
              "type": "string", 
              "title": "Expression", 
              "description": ""
            }
          },
          "required": ["expression"]
        }
      }
    }
  ], 
  "messages": [
    {
      "role": "user", 
      "content": "Use the calculator tool to compute `4 x 90` and provide a nice little answer block with the result."
    },
    {
      "role": "assistant",
      "content": "", 
      "tool_calls": [
        {
          "index": 0, 
          "id": "632619312", 
          "type": "function", 
          "function": {
            "name": "tool_calculate_post", 
            "arguments": "{\"expression\": \"4 x 90\"}"
          }
        }
      ]
    },
    {
      "role": "tool", 
      "tool_call_id": "632619312", 
      "content": 360
    }
  ]
}'

Make sure to replace http://localhost:19999 with your actual host and port, and $YOUR_API_KEY with your API key. This command sends a request to your model asking it to use a calculator tool to compute 4 x 90. When you run this, you should see the render failure in your logs.

This curl command simulates a chat completion request with a tool call. The model is instructed to use the tool_calculate_post function to compute the result of 4 x 90. The arguments for this tool are passed as a string, which triggers the bug in GLM4.5 and Qwen3-Coder models. By running this command, you can consistently reproduce the issue and verify the effectiveness of any proposed solutions.

Expected Behavior

So, what should happen when everything is working correctly? Ideally, the model should use the tool call result properly and continue the completion without any hiccups. In our example, the model should recognize the tool call, use the calculator tool to compute 4 x 90, and then provide a nice answer block with the result.

Instead of crashing, the model should seamlessly integrate the tool's response into its output. This means that the model should be able to parse the arguments, execute the tool, receive the result, and incorporate it into the ongoing conversation or task. The expected behavior is a smooth, uninterrupted completion that leverages the tool call to enhance the model's capabilities.

When the bug is present, the model throws an error and stops the completion process, which is far from ideal. The goal is to ensure that the model not only avoids the error but also effectively utilizes the tool call to provide accurate and relevant responses.

Analyzing the Logs

To really get a handle on what's going wrong, let's take a look at the logs. Logs are like a behind-the-scenes record of what's happening in your application. They can give you clues about errors and help you pinpoint the exact location of the problem.

Here’s a snippet of the logs you might see when this bug occurs:

2025-10-24 12:03:53.991 INFO:     Body: {'model': 'Kwaipilot-KAT-Dev', 'tools': [{'type': 'function', 'function': {'name': 'tool_calculate_post', 'description': 
'Calculates/evaluates the given expression.', 'parameters': {'type': 'object', 'properties': {'expression': {'type': 'string', 'title': 'Expression', 'description': 
''}}, 'required': ['expression']}}}], 'messages': [{'role': 'user', 'content': 'Use the calculator tool to compute `4 x 90` and provide a nice little answer block 
with the result.'}, {'role': 'assistant', 'content': '', 'tool_calls': [{'index': 0, 'id': '632619312', 'type': 'function', 'function': {'name': 
'tool_calculate_post', 'arguments': '{"expression": "4 x 90"}'}}]}, {'role': 'tool', 'tool_call_id': '632619312', 'content': 360}]}
2025-10-24 12:03:53.995 WARNING:  Unable to switch model to Kwaipilot-KAT-Dev because "inline_model_loading" is not True in config.yml.
2025-10-24 12:03:53.998 INFO:     127.0.0.1:45888 - "POST /v1/chat/completions HTTP/1.1" 500
2025-10-24 12:03:54.003 ERROR:    Exception in ASGI application
...
2025-10-24 12:03:54.003 ERROR:    TypeError: Can only get item pairs from a mapping.

Look for lines that start with ERROR or WARNING. In this case, the key line is TypeError: Can only get item pairs from a mapping. This tells you exactly what's going wrong: the model is expecting a Mapping (like a dictionary), but it's getting a string.

By examining the logs, you can trace the error back to the point where the model tries to render the tool call arguments. This detailed traceback is invaluable for identifying the specific code section that needs modification. The logs not only confirm the presence of the bug but also provide the necessary information to develop a targeted solution.

Proposed Solution

Alright, let's talk solutions! The main idea here is to convert the arguments field from a string to a Mapping before it gets to the model. This way, we're giving the model what it expects, and everyone's happy.

Transformers, and by extension SGLang, recommend converting this arguments field to a Mapping before kicking off the render. I expect the correct place to put this would be in endpoints/OAI/utils/chat_completion.py:243 or so.

Specifically, you could add a step in the format_messages_with_template function to parse the arguments string into a dictionary. This can be done using Python's built-in json.loads() function. Here’s a rough idea of what the code might look like:

import json

def format_messages_with_template(messages, template_vars):
    for message in messages:
        if message.get("tool_calls"):
            for tool_call in message["tool_calls"]:
                arguments_str = tool_call["function"]["arguments"]
                try:
                    tool_call["function"]["arguments"] = json.loads(arguments_str)
                except json.JSONDecodeError:
                    # Handle the case where arguments is not a valid JSON string
                    pass
    # rest of the function

This code snippet iterates through the messages, checks for tool calls, and attempts to parse the arguments string into a dictionary. If the string is valid JSON, it updates the arguments field with the parsed dictionary. If not, it handles the exception, ensuring that the process doesn't break.

By implementing this conversion, we ensure that the GLM4.5 and Qwen3-Coder models receive the arguments in the format they expect, resolving the render failure. This solution aligns with the recommendations from Transformers and SGLang and addresses the root cause of the bug.

Alternative Solutions

While converting the arguments to a Mapping in chat_completion.py is a solid approach, there are a couple of other ways we could tackle this issue. One alternative is to use a fromjson Jinja filter.

Jinja filters are like little functions that you can use in your templates to modify data. In this case, a fromjson filter would take a JSON string and convert it into a Python dictionary. This approach is similar to the one proposed in a rejected Transformers PR. Although the PR was rejected, the idea itself is sound.

An easy end-run on this would be to add a fromjson Jinja filter, Ala this rejected Transformers PR. TabbyAPI already maintains a collection of its own Jinja prompts, so having custom glm4.5 and qwen3-coder that make use of fromjson isn't impossible, but solving it at endpoints/OAI/utils/chat_completion.py:243 or so seems preferable.

TabbyAPI already maintains a collection of its own Jinja prompts, so having custom glm4.5 and qwen3-coder that make use of fromjson isn't impossible, but solving it at endpoints/OAI/utils/chat_completion.py:243 or so seems preferable. This approach would involve modifying the Jinja templates used by GLM4.5 and Qwen3-Coder to include the fromjson filter when rendering tool call arguments. While this method works, it might be a bit more complex to maintain, as it requires changes in the templates themselves.

Another potential solution could involve creating custom chat templates for GLM4.5 and Qwen3-Coder that are specifically designed to handle string arguments. This would entail adjusting the templates to parse the string arguments directly, rather than expecting a Mapping. However, this approach might introduce inconsistencies and make the system harder to maintain in the long run.

Conclusion

So, there you have it! We've taken a detailed look at the GLM4.5/Qwen3-Coder tool call rendering failure, explored the root cause, showed how to reproduce it, and discussed a solution. By converting the tool call arguments to a Mapping before rendering, we can fix this bug and ensure that these models work smoothly with tool calls.

This fix not only resolves the immediate issue but also enhances the overall robustness and reliability of the system. By ensuring that the model receives the arguments in the expected format, we prevent unexpected errors and improve the model's ability to leverage tool calls effectively.

Remember, understanding and addressing bugs like these is crucial for building robust and reliable AI systems. By diving deep into the problem and exploring potential solutions, we can make sure our models are working their best. If you're facing this issue, give the proposed solution a try, and let's keep pushing the boundaries of what these models can do! Keep an eye on updates and best practices from the community to stay ahead of the game. Happy coding, and let's keep those models running smoothly!