at a glance

GLM-4.7 FlashQwen3.5-35B-A3B
providerZhipu AIAlibaba
parameters730B total / 3B active (MoE)35B total / 3B active (MoE)
context window128k tokens256k tokens

benchmarks

Pricing (per 1M tokens) ?
GLM-4.7 Flash
$0.07 in / $0.40 out
Qwen3.5-35B-A3B
$0.11 in / $0.85 out
GPQA Diamond (graduate science) ?
GLM-4.7 Flash
75.2%
Qwen3.5-35B-A3B
84.2%
SWE-bench Verified (software engineering) ?
GLM-4.7 Flash
59.2%
Qwen3.5-35B-A3B
69.2%
TAU2-Bench (agentic tool use) ?
GLM-4.7 Flash
79.5%
Qwen3.5-35B-A3B
81.2%
Terminal Bench 2 (shell tasks) ?
GLM-4.7 Flash
64.0%*
Qwen3.5-35B-A3B
40.5%
GLM-4.7 Flash Qwen3.5-35B-A3B bold score = winner

what are these models?

GLM-4.7 Flash is Zhipu AI’s fast inference model from the GLM-4.7 family. Zhipu AI is a Chinese AI research lab; the “Flash” variant is optimized for speed and cost-efficiency. It shows particularly strong performance on terminal and CLI benchmarks.

Qwen3.5-35B-A3B is a Mixture-of-Experts model from Alibaba’s Qwen3.5 series — 35B total parameters, 3B active per forward pass. It is open-weight under Apache 2.0, giving it strong coverage across knowledge, coding, and reasoning tasks at low inference cost.

benchmark breakdown

Qwen3.5-35B-A3B leads on most tasks. GPQA Diamond (84.2% vs 75.2%), SWE-bench Verified (69.2% vs 59.2%), and TAU2-Bench (81.2% vs 79.5%) all favor the Qwen MoE model. The science and coding gaps are significant (9 points each).

GLM-4.7 Flash wins on terminal tasks. The Terminal Bench 2 gap is striking: 64.0%* vs 40.5%. For CLI and shell automation, GLM-4.7 Flash is the stronger model — by nearly 24 points.

Note: The GLM-4.7 Flash Terminal Bench 2 score (64.0%*) may reflect a specific evaluation setup or self-reported conditions — factor this into your assessment if terminal task performance is critical to your use case.

what people are saying

when to use GLM-4.7 Flash

  • terminal and CLI task automation is your primary use case
  • you need a fast, efficient inference model for cost-sensitive workloads
  • you want a model optimized for Chinese-language tasks (Zhipu’s specialty)

when to use Qwen3.5-35B-A3B

  • scientific reasoning, software engineering, or agentic tool use are priorities
  • you want open weights for self-hosting or fine-tuning
  • you need Apache 2.0 licensing for commercial use
  • cost efficiency is important — 3B active parameters makes it cheap to serve

scaling performance efficiently with fine-tuning

Both models are strong candidates for domain-specific fine-tuning, but Qwen3.5-35B-A3B stands out as a particularly powerful foundation. Its open weights (Apache 2.0) and MoE architecture enable a compelling dynamic: ~35B-level knowledge capacity at roughly 3B inference cost when deployed as a fine-tuned specialist.

In areas where GLM-4.7 Flash currently leads — such as terminal and tool-driven workflows — fine-tuning Qwen3.5-35B-A3B on your own tool-use trajectories can rapidly close the gap and often deliver comparable performance with greater flexibility.

frequently asked questions

which model should i use for coding?

qwen3.5-35b-a3b — it leads by 10 points on swe-bench verified (69.2% vs 59.2%).

which is better for shell and terminal automation?

glm-4.7 flash — its terminal bench 2 score (64.0%*) is nearly 24 points higher than qwen3.5-35b-a3b.

what does the asterisk mean on glm’s terminal bench score?

the * indicates the score may have been measured under specific evaluation conditions or is self-reported. verify independently if terminal task performance is critical for your deployment.

can i self-host qwen3.5-35b-a3b?

yes — it’s open-weight under apache 2.0, running at ~3b active parameters per token.