← Index
2026-04-25 · Investigation report

Toolcaller Pricing Investigation

Five candidate models tested on identical sales conversations. GLM-5.1 picked for production at ~$59 per 1,000 conversations.

Result: openrouter/z-ai/glm-5.1 wins on price/reliability/quality balance. Production deployed at $0.059 per conversation. Caching pass-through via OpenRouter works for GLM and DeepSeek-direct routes (~66% hit rate). DeepSeek-via-:nitro failed: empty-choices errors mid-loop and no caching. GLM-4.6 occasionally (~1 in 5 runs) acknowledges contact but doesn't dispatch the lead in-conversation — the follow-up timer recovers it, so this is observed behavior rather than blocking.

Methodology

Results

Model $/conv Per 1k Lead reliability Reasoning Notes
openrouter/z-ai/glm-5.1 (production) $0.059 $59 clean (1/1) very good (KB-grounded) Caching passes through OpenRouter, ~66% hit rate
openrouter/z-ai/glm-4.6 $0.012–0.013 $13 observed: ~1 in 5 runs misses send_lead in-conversation; timer recovery covers it very good (slightly behind 5.1 / GPT-5 in our flow) Cheaper than 5.1; lead-loss is recoverable, not blocking
openai/gpt-5.4 $0.063 $63 clean (1/1) very good OpenAI auto-caching; cache reads at 10% of input rate
anthropic/claude-sonnet-4.6 (cached) $0.118 $118 clean (1/1) very good Explicit cache_control markers; 90% off cached reads
openrouter/deepseek/deepseek-v4-pro:nitro flaky n/a testing failed n/a Empty-choices errors mid-loop; no caching pass-through

Caching findings

ProviderMechanismDiscountOpenRouter pass-through
OpenAI (GPT-5 family)automatic90%yes
OpenAI (GPT-4.1 / 4o)automatic75% / 50%yes
Anthropicexplicit cache_control90%yes (markers passed through)
DeepSeekautomatic~90%only via DeepSeek's own API; not via Together/Fireworks
Z.AI GLMautomatic~80%yes (~66% effective rate observed for GLM-5.1)
Moonshot Kimi / Qwenvariesunknownundocumented

For our agent, the stable cacheable system prefix is ~3,500–4,300 tokens (engine preamble + project Prompt1 + tools schema). The volatile suffix (per-turn state snapshot) doesn't cache, by design — it's split into a separate system block so its variation doesn't bust the prefix.

OpenRouter cache-miss pattern

Caching pass-through via OpenRouter is non-deterministic because OpenRouter load-balances across multiple upstreams (Together, Fireworks, DeepInfra, the model author's own API). Different upstreams have different caches; if a call lands on a different upstream than the previous one, the prefix is cold there and pays full rate.

Observed in production GLM-5.1 traces: typically one cache miss per session (cost variance ~17%). Net effect: paying $59/1k vs the cache-perfect floor of ~$50/1k. Acceptable for now; addressable later via OpenRouter's provider.order request field if it becomes a pattern.

Per-turn cost characteristics (production GLM-5.1, 6-turn session)

turn  user msg                                      $/turn
─────────────────────────────────────────────────────────────
  1   "нужна оценка квартиры"                       $0.0108
  2   "барнаул, оформить наследство"                 $0.0102
  3   "а где взять ергн?"                            $0.0105
  4   <file: passport.pdf>                           $0.0000  (file_ack)
  5   "паспорт"                                      $0.0120
  6   "понял. да, номер актуальный" → send_lead      $0.0157
─────────────────────────────────────────────────────────────
TOTAL                                                $0.0591

Reliability observations

Recommendations

Use caseModel
Production defaultopenrouter/z-ai/glm-5.1
Premium tier (paying ~7×)anthropic/claude-sonnet-4.6 with caching
Mid-tier alternativeopenai/gpt-5.4
Cheaper alternativeopenrouter/z-ai/glm-4.6 — about 5× cheaper than 5.1, with the in-conversation lead-loss caveat above (timer recovery handles it)
Avoidany :nitro variant (no caching pass-through)

Reproducibility

To re-run this investigation:

# Edit config.dev.json's models.toolcaller, walk a 5-turn session via CLI:
bin/aichat-cli --config config.dev.json --project abo22

# Sum cost from the trace:
grep '^usage:' traces/1/<latest>.log | sed 's/.*cost=\$\([0-9.]*\).*/\1/' | paste -sd+ | bc

The usage: and cost= lines are written by internal/agent/toolagent.go (debugDumpBlocks + the per-call usage line). Token counts are exact; OpenRouter cost= is exact for routes that return it.