Before vs. now
Before — 3 stages
User message arrives
↓
Stage 1
rewrite query (LLM)
3× knowledge-base search
URL-score counter
URL-determination LLM call
reply (Prompt1)
↓
Stage 2
fetch service docs
extract price (LLM)
3× parallel extractions
reply (Prompt2)
↓
Stage 3
generate summary (LLM)
closing reply (Prompt3)
send to CRM
4 prompts per project, hand-coded stage transitions, up to 7 LLM calls per turn.
Now — one tool-using loop
User message arrives
↓
Single LLM loop (max 8 iterations)
→ may search the KB (hybrid_search)
→ may save what it learned (set_state)
→ may dispatch the lead (send_lead)
→ ends with a reply
Tools:
hybrid_search, set_state,
get_state, send_lead
The model itself decides
when to call which tool.
1 prompt per project, model owns every decision, 2–4 LLM calls per turn typical.
Customer-facing impact
Customers should not notice a regression. Subtle improvements:
- Conversations feel more natural. No rigid "Stage 1 only asks clarifying questions" wall — the bot can answer a price question right away if the question is clear.
- Faster on simple turns. File uploads, "thanks", etc. now skip the full search-and-respond pipeline.
- More specific replies. The model can search the knowledge base an extra time if the question warrants it, instead of always doing exactly three searches.
- Lead format unchanged. The lead group still receives the same structured payload (contact, city, summary, files, history).
Operations: how to write Prompt1
Recommended structure (~3,500–4,500 chars)
Identity & tone (1–2 paragraphs)
WHAT TO GATHER (numbered list of fields)
1) service type
2) object
3) city
4) contact (phone or email) — required before dispatch
5) documents — required / nice-to-have
6) any other business-specific fields
DIALOG LOGIC (numbered steps)
1. Identify the service type first.
2. Once known, briefly explain value, give a price ballpark from KB.
3. Ask for documents.
4. Ask for contact.
5. Dispatch when [criteria].
6. After dispatch — short closing message.
WHEN TO DISPATCH
- Minimum: contact + 3 key params.
- Or: customer explicitly asks for a human.
DON'T-DOS
- Don't quote prices not in the knowledge base.
- Don't promise specific dates or handle payment.
- Don't invent manager names.
How the engine maps natural language to tool calls
You write business-level instructions in plain language. The engine translates phrases into the right tool calls automatically:
| Natural-language phrase in Prompt1 | Engine maps to |
|---|---|
| "hand off to a specialist", "transfer the lead", "dispatch" | send_lead |
| "consult the knowledge base", "rely on facts", "check the site" | hybrid_search |
| "remember", "save", "note", "capture" | set_state |
| "city", "budget", "deadline", "service type" (anything in WHAT TO GATHER) | set_state(extras=…) with stable English keys |
You do not need to write technical instructions like "call set_state with {contact: {method: 'phone', value: ...}}" in your prompt. The engine handles all of that. Mentioning tool names or argument schemas in Prompt1 actually hurts — it confuses the model when your wording competes with the engine's.
Common Prompt1 mistakes
| Mentioning tool names or argument schemas | Engine handles these. Use plain business language: "transfer the lead", not "call send_lead(summary)". |
| Vague dispatch criteria | Be specific: "Minimum: contact + service type + object + city". Don't say "when you have enough info". |
| Putting price tables inline | Prices live in the knowledge base. Use price_url to point at the price page; the agent will look up specific prices via hybrid_search when asked. |
| Length over signal | Prompts > 5,000 chars start hurting attention. Aim for 3,500–4,500. |
| Forgetting the start-reply for new clients | The /start command's first message comes from the project's start-reply field, not Prompt1. |
Testing changes
Update Prompt1 directly from the project's Telegram lead group with /prompt1 <text>. No bot restart needed. The full command reference is in the operations guide. After updating, just walk a representative scenario with the bot in chat and check whether it gathers the right fields and dispatches the lead at the right moment.
What changed in the codebase
Added
internal/llm/toolcaller.go— the tool-calling abstraction with two backends (Anthropic native, OpenAI-tools-spec).internal/agent/toolagent.go— the runtime loop: build system prompt, dispatch tools, persist blocks.migrations/012_chat_history_blocks.sql— adds JSONB column for full assistant + tool-call sequences per turn.migrations/013_drop_3stage_remnants.sql— drops dead columns + JSONB keys.
Removed
internal/agent/stages.go(~970 lines of 3-stage flow).- Timer's separate
classifyStatusandgenerateFollowUpLLM calls — timer pings now go through the same agent loop with a synthetic[TIMER PING]turn. Prompt2,Prompt3,PricePromptfields and the corresponding Telegram commands.- Per-project URL-determination thresholds.
- ~46 dead i18n keys.
Refactored
- System prompt is now split into a stable cacheable prefix (engine preamble + project Prompt1 + tools schema) and a volatile suffix (per-turn state snapshot). Anthropic prompt caching is wired automatically; OpenAI / DeepSeek / GLM caching also benefits.
ChatStatelostURLsCounter,PriceSummary,ContactRequested. GainedExtras,ClientStatus(hot/cold),LeadSummary.- Cold + last-interval timer behavior: now dispatches the lead instead of dropping the sequence (configurable later).
Pricing impact
On the new tool-calling architecture, GLM-5.1 (production, ~$0.059/conv) costs about half of cached Claude Sonnet (~$0.118/conv) at the same or better quality. To be clear: this is a model comparison on the new architecture, not a measurement of "new vs. old 3-stage bot" — the old system used a different model and different prompt layout, so a direct $/conv comparison wouldn't be apples-to-apples. See the pricing investigation for the full comparison.