No description
Find a file
2026-03-16 11:02:25 +00:00
.gitattributes initial commit 2026-03-16 10:49:06 +00:00
chat_template.jinja Initial release: Qwen 3.5 chat template with 21 fixes 2026-03-16 11:02:25 +00:00
README.md Update README.md 2026-03-16 10:57:08 +00:00

license language base_model pipeline_tag tags
apache-2.0
en
Qwen/Qwen3.5-35B-A3B
text-generation
qwen
chat-template
jinja
qwen3.5
llama-cpp
open-webui
vllm
tool-calling
streaming

Qwen 3.5 Jinja Chat Template v1 attuned by Barubary

A Jinja chat template for all Qwen 3.5 models on llama.cpp, Open WebUI, vLLM, Ollama, LM Studio, and any OAI-compatible endpoint.

Created this because of some current projects related to this model.

21 fixes over the official Qwen 3.5 chat template — addressing bugs that are still open upstream as of March 2026.

Active Bug Reports Fixed

This template directly addresses the following community-reported bugs:

Bug Report Platform Fix
Tool calling chat template is broken HuggingFace Fix 6
Parallel tool calls interleaving GitHub Fix 15
KV-cache reuse breaks with enable_thinking=false GitHub Fix 12
Cannot close thinking via enable_thinking: false GitHub Fix 1, 19
Missing reasoning_content in Tool Calling GitHub Fix 13
LM Studio parser breaks Qwen3.5 tool calling Reddit Fix 18, 19
Qwen3.5 27B getting stuck in loops Reddit Fix 17
Template problem HuggingFace Fix 6, 7

All 21 Fixes

Each fix is labeled inline in the template source (e.g., {#- FIX6 #}).

# Fix What It Solves
1 add_vision_id / enable_thinking safe defaults Crashes when config vars not passed
2 Precomputed _last_idx for namespace() constructor llama.cpp minja parser compatibility
3 Developer role handled Claude Code / Codex / OpenCode support
4 System/developer split before main loop Duplicate system messages
5 item.type checked before 'in item' key test Type-check ordering bug
6 arguments.items() replaces bare key iteration Tool calling crash (HF discussion #4)
7 | safe filter removed llama.cpp compatibility
8 tojson/string explicit if/else No chained filters, prevents double-escaping
9 String arguments pass-through OAI-compatible proxy support
10 tc alias avoids shadowing tool_call loop var Variable scoping bug
11 ns2 namespace replaces loop.previtem / loop.nextitem llama.cpp minja doesn't support loop helpers
12 enable_thinking applied to in-context assistant turns KV-cache reuse bug (GitHub #1826)
13 reasoning_content is defined + not none guard Missing reasoning_content (GitHub #26)
14 loop.index0 > (not >=) for assistant thinking scope Off-by-one in thinking block placement
15 Parallel tool calls: \n\n delimiter between blocks Parallel tool call interleaving (GitHub #7117)
16 Long tool args/responses: configurable truncation guard Context overflow from massive tool outputs
17 Deep agent loops: graceful fallback to index 0 Agent loops crashing after 5+ hops
18 Streaming compat: clean newline boundaries on all XML tags LM Studio parser breaks (Reddit)
19 Auto-disable thinking when tools active <tool_call> leaks into <think> blocks
20 Unknown roles: graceful fallback mapped to user role Planner/critic/custom roles crash
21 Flattened nesting depth; _has_tools precomputed llama.cpp minja stability

Feature Comparison

Feature This Template Official Qwen Unsloth bartowski
Parallel tool call separation
Auto-disable thinking with tools
Deep agent loop fallback
Unknown role graceful fallback
Configurable truncation guards
Streaming-safe XML boundaries partial
Developer role support
arguments.items() fix
reasoning_content guard partial

Usage

llama-server (llama.cpp)

llama-server \
  -m Qwen3.5-35B-A3B-*.gguf \
  --jinja -fa \
  --chat-template-file chat_template.jinja \
  -c 32768 -ngl 99 \
  --temp 0.6 --top-k 20 --top-p 0.8 \
  --cache-type-k q8_0 --cache-type-v q8_0 \
  --host 0.0.0.0 --port 8080

Open WebUI

Mount the template via Docker:

volumes:
  - ./chat_template.jinja:/templates/chat_template.jinja:ro
command: >
  --chat-template-file /templates/chat_template.jinja

vLLM

vllm serve Qwen/Qwen3.5-35B-A3B \
  --chat-template ./chat_template.jinja

Ollama

Copy chat_template.jinja into your Modelfile or use with a compatible frontend.

Configuration

Pass via --chat-template-kwargs:

{
  "enable_thinking": true,
  "auto_disable_thinking_with_tools": true,
  "add_vision_id": false,
  "max_tool_arg_chars": 0,
  "max_tool_response_chars": 8192
}
Variable Default Description
enable_thinking true Controls <think> mode
auto_disable_thinking_with_tools true Auto-disables thinking when tools are provided to prevent <tool_call> bleed into <think> blocks
add_vision_id false Prefix images/videos with "Picture N:" / "Video N:"
max_tool_arg_chars 0 (unlimited) Truncate tool arguments beyond this length
max_tool_response_chars 0 (unlimited) Truncate tool responses beyond this length

Before / After

Tool Call Bleed Bug (Fix 19)

Before (official template):

<think>
The user wants me to search for...
<tool_call>          ← WRONG: tool call inside think block
<function=search>

After (this template):

<think>

</think>

<tool_call>          ← Correct: thinking auto-disabled when tools present
<function=search>

Parallel Tool Calls (Fix 15)

Before (official template):

<tool_call><function=multiply>...</function></tool_call><tool_call><function=add>...</function></tool_call>

After (this template):

<tool_call>
<function=multiply>
...
</function>
</tool_call>

<tool_call>
<function=add>
...
</function>
</tool_call>

Compatible Models

Tested and compatible with all Qwen 3.5 models:

  • Qwen3.5-35B-A3B (all quants)
  • Qwen3.5-27B-A3B
  • Qwen3.5-14B-A3B
  • Qwen3.5-9B
  • Qwen3.5-4B
  • Qwen3.5-Coder series

Also backward-compatible with Qwen3 32B.

Tested Platforms

  • llama.cpp (b4242+)
  • Open WebUI (v0.4.8+)
  • vLLM (v0.6.4+)
  • Ollama (v0.5.0+)
  • LM Studio (v0.3.5+)
  • Text Generation WebUI (oobabooga)

Credits

Base template architecture: Qwen team (official Qwen3.5 chat template)

All 21 fixes: Barubary (original implementations)

License

Apache 2.0 (same as the official Qwen3.5 template)

Contributing

Found a bug? Open an issue with:

  • Minimal reproduction case
  • Error logs
  • Model and runtime versions

Pull requests welcome.