← Index
2026-04-25 · Architecture changelog

Toolcaller Refactor

3-stage state machine → single tool-calling loop. Production now runs on GLM-5.1 at ~$59 per 1,000 conversations — about half of what cached Claude Sonnet would cost on the same architecture.

TL;DR for the business side: The agent's internal architecture changed today. Customer experience is at least as good as before (replies are slightly more natural, dispatch logic is unchanged). Cost per conversation on the new architecture is roughly half of what cached Claude Sonnet would cost — this is a model-choice comparison, not a before/after of the old 3-stage bot. From an operations standpoint: there's now one prompt to write per project instead of three, and the model decides when to act rather than a hand-tuned heuristic.

Before vs. now

Before — 3 stages

User message arrives
   ↓
Stage 1
  rewrite query (LLM)
  3× knowledge-base search
  URL-score counter
  URL-determination LLM call
  reply (Prompt1)
   ↓
Stage 2
  fetch service docs
  extract price (LLM)
  3× parallel extractions
  reply (Prompt2)
   ↓
Stage 3
  generate summary (LLM)
  closing reply (Prompt3)
  send to CRM

4 prompts per project, hand-coded stage transitions, up to 7 LLM calls per turn.

Now — one tool-using loop

User message arrives
   ↓
Single LLM loop (max 8 iterations)
  → may search the KB (hybrid_search)
  → may save what it learned (set_state)
  → may dispatch the lead (send_lead)
  → ends with a reply

Tools:
  hybrid_search, set_state,
  get_state, send_lead

The model itself decides
when to call which tool.

1 prompt per project, model owns every decision, 2–4 LLM calls per turn typical.

Customer-facing impact

Customers should not notice a regression. Subtle improvements:

One known edge case: on rare turns a smaller open-source model can acknowledge the customer's contact but skip actually dispatching the lead. The follow-up timer catches these silently and dispatches the lead later. Not observed in production with GLM-5.1, the current production model.

Operations: how to write Prompt1

The single Prompt1 now drives the agent's behavior. The engine has built-in knowledge of the four tools (search, save, get-state, dispatch); the project prompt's job is to tell the model what the business does, what to gather from the customer, and how to behave.

Recommended structure (~3,500–4,500 chars)

Identity & tone (1–2 paragraphs)

WHAT TO GATHER (numbered list of fields)
  1) service type
  2) object
  3) city
  4) contact (phone or email) — required before dispatch
  5) documents — required / nice-to-have
  6) any other business-specific fields

DIALOG LOGIC (numbered steps)
  1. Identify the service type first.
  2. Once known, briefly explain value, give a price ballpark from KB.
  3. Ask for documents.
  4. Ask for contact.
  5. Dispatch when [criteria].
  6. After dispatch — short closing message.

WHEN TO DISPATCH
  - Minimum: contact + 3 key params.
  - Or: customer explicitly asks for a human.

DON'T-DOS
  - Don't quote prices not in the knowledge base.
  - Don't promise specific dates or handle payment.
  - Don't invent manager names.

How the engine maps natural language to tool calls

You write business-level instructions in plain language. The engine translates phrases into the right tool calls automatically:

Natural-language phrase in Prompt1Engine maps to
"hand off to a specialist", "transfer the lead", "dispatch"send_lead
"consult the knowledge base", "rely on facts", "check the site"hybrid_search
"remember", "save", "note", "capture"set_state
"city", "budget", "deadline", "service type" (anything in WHAT TO GATHER)set_state(extras=…) with stable English keys

You do not need to write technical instructions like "call set_state with {contact: {method: 'phone', value: ...}}" in your prompt. The engine handles all of that. Mentioning tool names or argument schemas in Prompt1 actually hurts — it confuses the model when your wording competes with the engine's.

Common Prompt1 mistakes

Mentioning tool names or argument schemas Engine handles these. Use plain business language: "transfer the lead", not "call send_lead(summary)".
Vague dispatch criteria Be specific: "Minimum: contact + service type + object + city". Don't say "when you have enough info".
Putting price tables inline Prices live in the knowledge base. Use price_url to point at the price page; the agent will look up specific prices via hybrid_search when asked.
Length over signal Prompts > 5,000 chars start hurting attention. Aim for 3,500–4,500.
Forgetting the start-reply for new clients The /start command's first message comes from the project's start-reply field, not Prompt1.

Testing changes

Update Prompt1 directly from the project's Telegram lead group with /prompt1 <text>. No bot restart needed. The full command reference is in the operations guide. After updating, just walk a representative scenario with the bot in chat and check whether it gathers the right fields and dispatches the lead at the right moment.

What changed in the codebase

Added

Removed

Refactored

Pricing impact

On the new tool-calling architecture, GLM-5.1 (production, ~$0.059/conv) costs about half of cached Claude Sonnet (~$0.118/conv) at the same or better quality. To be clear: this is a model comparison on the new architecture, not a measurement of "new vs. old 3-stage bot" — the old system used a different model and different prompt layout, so a direct $/conv comparison wouldn't be apples-to-apples. See the pricing investigation for the full comparison.