aichat-go

⚙

Tool-calling agent

The model owns the conversation flow via four tools: hybrid_search, set_state, get_state, send_lead. No hand-tuned thresholds, no stage transitions.

🔍

Hybrid Search

Dense embeddings + BM25 sparse vectors with Qdrant DBSF fusion. The agent model is the reranker — raw fused results go straight into the tool-call response.

⚡

Multi-backend ToolCaller

Anthropic Messages API or OpenAI tools spec (covers OpenRouter routing to GLM, DeepSeek, Kimi, etc.). Multi-key pool with round-robin rotation and prompt caching where supported.

🕑

Smart Follow-Ups

4-step timer sequence (5m / 15m / 40m / 24h) with LLM-driven client classification. Automatically detects cold, hot, and finished conversations.

🏠

Multi-Tenant

Full project isolation with composite keys. Each tenant gets its own bot, collections, prompts, and lead group while sharing infrastructure.

🚀

Incremental Crawler

4-phase pipeline (fetch / chunk / embed / upload) with per-URL resume. Supports HTML, PDF, DOCX, XLSX, TXT. Semantic chunking with LLM topic labels.

Core Concepts

Project: A tenant — one business with its own bot, knowledge base, prompts, and CRM target. All data paths use (chat_id, project_id) composite keys.

Stage: Either StageActive (in-progress conversation) or StageFinal (lead dispatched, conversation locked). Derived from the Finished field — no separate stage field to get out of sync.

Knowledge Base: Per-project Qdrant collections containing chunked website content with both dense (embedding) and sparse (BM25) vectors.

Determined URL: A service page URL the agent commits to via set_state(determined_url=...). Used as a filter for follow-up hybrid_search calls.

Lead: The output: contact info + conversation summary + chat history + files, sent to the CRM lead group.

Timer: Automated follow-up scheduler. When a user goes silent the timer fires the agent's tool-calling loop with a synthetic [TIMER PING] turn; the agent decides whether to send a nudge or dispatch the lead.

ToolCaller: The LLM client used by the agent. Multi-turn tool-using conversations. Two backends: Anthropic Messages API native, and OpenAI tools spec (covers OpenRouter-routed GLM, DeepSeek, Kimi, etc.).

Architecture Overview

End-to-end data flow from Telegram webhook to CRM lead dispatch. The agent runs a tool-calling loop — the model itself decides when to search, what to commit to, and when to dispatch the lead.

TelegramWebhook POST

normalize

GatewayTelegramUpdate → Message

AgentTool-calling loop

tool: hybrid_search

Hybrid SearchQdrant + BM25

QdrantDense + Sparse

PostgreSQLBM25 Index

reason / reply

ToolCallerAnthropic / OpenAI tools

OpenRouterGLM / DeepSeek / Kimi / ...

tool: send_lead

CRM OutputLead Group

TelegramGroup Messages

Key design decisions: State is derived (not stored) from the Finished field. The agent owns flow via four tools (hybrid_search, set_state, get_state, send_lead) — no hand-tuned thresholds. Metrics are applied via decoration (zero Prometheus imports in core packages). All data paths use composite keys (chat_id, project_id) for tenant isolation.

Component details

Component	Location	Purpose
Gateway	`internal/gateway/`	Normalizes TelegramUpdate into `domain.Message` (text, caption, files, sender)
Agent	`internal/agent/`	Tool-calling loop: rebuilds the message thread from `chat_history.blocks`, dispatches tool calls, returns the final text reply
ToolCaller	`internal/llm/toolcaller*.go`	Multi-backend tool-calling client: Anthropic Messages API + OpenAI tools spec (covers OpenRouter routing). Multi-key pool with rate-limit fallback.
Provider (extract / embedding)	`internal/llm/openai.go`, `claude.go`	Single-shot completions for crawler chunk topics, timer status classification, and embeddings. Routed via the `extract` / `embedding` tiers.
Hybrid Search	`internal/search/`	Dense embeddings + BM25 sparse vectors with Qdrant DBSF fusion. The agent model is the reranker — raw fused results go straight back as the `hybrid_search` tool response.
CRM	`internal/crm/`	Dispatches leads to Telegram group (summary + history + media). Two-phase persist guarantees idempotency.
Timer	`internal/timer/`	Automated follow-ups. Fires the agent's tool-calling loop with a synthetic `[TIMER PING]` turn; the agent decides what to do.
Crawler	`internal/crawler/`	4-phase pipeline: fetch, chunk, embed, upload to Qdrant
Metrics	`internal/metrics/`	Prometheus wrappers (zero-coupled with core packages)
Admin	`internal/admin/`	Admin group command handler for cross-project management
i18n	`internal/i18n/`	Localization (ru/en) with compile-time embedded locale files

Request lifecycle (webhook to response)

Every incoming Telegram message follows this path:

Webhook reception — POST /webhook/{tg_api_key} received on the public HTTP server (:8080). The API key in the URL routes to the correct project bundle.
Gateway normalization — TelegramUpdate is converted to domain.Message with text, caption, files, and sender info extracted.
State loading — Chat state loaded from PostgreSQL using composite key (chat_id, project_id).
Command check — /start resets state and returns greeting. /debug toggles debug mode.
File collection — Any file attachments are accumulated into state. Files without text get an immediate acknowledgement without LLM calls.
Stage dispatch — Based on derived state (CurrentStage()), the message is routed to the appropriate stage handler.
State persist — Updated state written back to PostgreSQL.
Timer reset — Follow-up timer reset to step 0 for this chat.

Project structure tree

cmd/
  bot/         # Main bot binary
    main.go     # Wiring: DB, tiered LLM, per-project bundles, HTTP server
    config.go   # JSON config loading with env var fallback
  crawler/     # Knowledge base indexer CLI
    main.go     # Subcommands: init-project, seed, add-urls, register
  cli/         # Local readline chat interface (testing)
    main.go     # Same agent pipeline, file-based CRM output

internal/
  domain/      types.go        # Stage, Message, ChatState, Lead, Block
  agent/       agent.go        # HandleMessage entrypoint
                toolagent.go    # Tool-calling loop, tool dispatch, system prompt
                commands.go     # /start, /debug
                lead_commands.go # Lead group: /prompt, /stats
                trace.go        # Per-turn trace records for debugging
  llm/         toolcaller.go               # ToolCaller interface, Block/Message types
                toolcaller_anthropic.go     # Anthropic Messages API backend
                toolcaller_openai.go        # OpenAI tools spec backend (covers OpenRouter)
                toolcaller_chain.go         # Multi-backend fallback chain
                tiered.go       # TieredClient (extract / embedding tiers)
                openai.go       # OpenAI-compatible provider (extract, embeddings)
                claude.go       # Claude provider (extract)
                embedding_adapter.go # Model-specific text formatting
  search/      hybrid.go       # Qdrant DBSF fusion (dense + BM25)
                bm25.go         # BM25Encoder, Tokenize
  db/          postgres.go     # PostgreSQL implementation
  gateway/     telegram.go     # TelegramUpdate normalization
  telegram/    client.go       # HTTP client, retry on 429
  crm/         telegram.go     # TelegramCRM: summary + history + media
  timer/       timer.go        # Scheduler, sequences, LLM classification
  metrics/     metrics.go      # Metric definitions
                llm.go          # InstrumentedLLMClient wrapper
  crawler/     pipeline.go     # Resumable 4-phase pipeline
                chunker.go      # Semantic + simple chunking
                indexer.go      # Qdrant upsert
  i18n/        i18n.go         # T(), Tf(), embedded locales
                locales/    ru.json, en.json

migrations/  001-013 (latest: 012_chat_history_blocks.sql, 013_drop_3stage_remnants.sql)
nix/         module.nix, example-configuration.nix
flake.nix    Makefile

Extension points

New LLM Provider

Implement the Provider interface

internal/llm/

New CRM Backend

Implement CRM interface (multi-CRM dispatcher)

internal/crm/

New Gateway

Normalize to domain.Message + webhook route

internal/gateway/

Custom Search

Implement KnowledgeBase interface

internal/search/

Conversation Flow

The agent runs a single tool-calling loop per user turn. The model decides when to search, what to commit to via set_state, and when to dispatch the lead via send_lead.

HandleMessage
  ├─ if state.Finished → canned acknowledgement (post-finish lock)
  └─ else runToolLoop(state, history, userMsg)
        msgs = rebuildMessages(history)         // replays prior tool_use / tool_result blocks
        msgs += user(userMsg)
        loop (cap 8 iterations):
          response = toolcaller.Call(systemPrefix, systemSuffix, msgs, tools)
          msgs += assistant(response.blocks)
          if no tool_use blocks:
              return last text block as the user-facing reply
          for each tool_use:
              result = dispatch(tool_use)        // hybrid_search / set_state / get_state / send_lead
              msgs += user(tool_result(id, result))
        // overflow → canned safety reply, log loudly

Tools

Tool	Purpose	Args
`hybrid_search`	Search the project KB. Returns top-k DBSF-fused (dense + BM25) chunks. The model is the reranker.	`query`, `top_k` (capped at 3 server-side), optional `url_filter`
`set_state`	Persist anything the model wants to remember across turns. Three flat string args (no nested objects, so weak toolcaller models can’t malform the JSON).	`notes` (free-form scratchpad — full rewrite each call), `determined_url`, `client_status` (hot/cold)
`get_state`	Re-read current `ChatState` mid-turn. Belt-and-braces — the same data is in the system suffix.	none
`send_lead`	Dispatch the conversation as a lead to CRM. Idempotent — if `LeadSent=true` already, returns ok without re-sending.	`summary`

System prompt structure

The system prompt is split into two cacheable parts:

Prefix (stable, cached): the engine preamble (agent.toolcaller_preamble — explains the tools and how to map natural-language project instructions to tool calls), then the project-editable prompt1, then the tools schema.
Suffix (volatile, breaks cache): the per-turn state snapshot — DeterminedURL, contact, extras, files, optional PriceURL hint, ClientStatus.

Anthropic backends mark the prefix with cache_control: ephemeral (90% input discount on cache hits). OpenAI / DeepSeek / GLM (via OpenRouter) auto-cache long stable prefixes.

Two-phase lead persist details

Finished=true is written to the database before the CRM send, and LeadSent=true is written after. This prevents duplicate leads from concurrent webhook requests or timer fires reading stale state. If the bot crashes between the two persists, the timer system retries the CRM send on next fire or on restart (via chatStore.ListFinishedUnsent).

Finished=truePersist to DB

CRM send

LeadSent=truePersist to DB

Crash between these two states → timer retries the CRM send on next fire or restart

Tool-turn persistence (chat_history.blocks)

The full assistant / tool-result block sequence for each turn is stored in chat_history.blocks (JSONB). On the next turn, rebuildMessages replays these blocks back to the model so it sees its own prior tool_use and tool_result exchanges. Falls back to plain question / reply for legacy rows where blocks is empty.

File upload handling

When a user sends a file without text, the bot immediately acknowledges it ("Accepted! Let me know when you're done.") without calling the LLM. Files are accumulated on ChatState.Files and attached to the lead when send_lead fires.

Media files in the lead are grouped by type for clean presentation:

Photos + Videos — sent as a media album (Telegram groups them visually)
Documents — sent as a document album
Voice / Video notes — sent individually (Telegram does not support albums for these)

Contact sharing button

On the first turn that mentions contact collection, the bot sends a Telegram request_contact keyboard button alongside its reply. This lets users share their phone number with one tap.

Sent only once per conversation
Shared contact lands in ChatState.SharedContact (not in history) for privacy
Included in the dispatched lead

Timer-driven dispatch

The timer scheduler runs the same tool-calling loop with a synthetic [TIMER PING] user turn. The agent decides whether to send a follow-up message or call send_lead itself. At the last interval (24h) the timer dispatches the lead unconditionally for any unresolved chat — regardless of client_status — so no lead is lost. Hot clients with contact in hand should be dispatched earlier by the agent itself (the prompt encourages send_lead after 1–2 pings when there's enough info).

When the safety-net dispatch fires and the agent never called send_lead itself, state.LeadSummary is empty. The timer then asks the agent to summarize the dialog via Agent.SummarizeForLead — a one-shot Complete call (no tools) on the summary tier (cheap mid-tier, e.g. gemma-4-26b-a4b-it). It renders the dialog as a Client/Bot transcript plus the agent’s Notes scratchpad and produces a 1–2 paragraph summary for the human reviewer, so the lead post never arrives blank.

Notes scratchpad (set_state)

set_state(notes="...") is the agent’s free-form text scratchpad. It replaced the earlier structured contact{method,value} + extras{...} schema. The model writes plain text, full rewrite each call — whatever it omits is lost. Conventions live in the tool description, not the schema:

name: Виктор
contact: phone +79130001234
service: оценка квартиры
city: Барнаул
date: на следующей неделе

The scratchpad is rendered into the system suffix on every turn so the model sees its own state, and it’s the input to Agent.SummarizeForLead at 24h dispatch. The flat-string shape is deliberate: weak toolcaller models (e.g. glm-5.1) drop closing braces on nested-object set_state args, which used to silently lose phone numbers and trip the tool-loop overflow guard.

Lead-as-living-document (post-send refresh)

After send_lead fires, the chat doesn’t go silent. The customer can keep typing for up to five more messages; each one is appended to chat_history and the existing CRM lead artifact is refreshed in place — for Telegram, editMessageText on the summary post + editMessageMedia on the chat-history .txt attachment. The summary text stays frozen, but a history updated: <ts> footer appears so the operator notices.

CRM artifact references live in a dedicated lead_dispatches table (chat_id, project_id, provider) — one row per CRM. Multi-CRM by construction: a chat dispatched to both Telegram and Bitrix gets two rows, and the post-send refresh fans UpdateLead across all of them.

Per-chat batching queue

Inbound messages don’t hit HandleMessage directly. Each gateway (Telegram webhook, CLI loop) calls agent.Enqueue, which persists the user message as an orphan row in chat_history (Question filled, Reply empty) and pushes onto a per-chat cheggaaa/mb queue. One worker goroutine per (chat_id, project_id) drains the queue serially.

What this gives:

No double-dispatch race: single worker per chat — the structural fix for the goroutine race that previously stomped state and produced two CRM posts for one customer.
Burst coalescing: messages that arrive while the worker is mid-LLM accumulate in the queue and get processed as the next batch. The model sees one combined user turn, not three sequential ones.
Crash recovery: orphan rows that survive a process kill get picked up by the next batch — no message lost.

Replies flow back through a per-project agent.Replier callback (cmd/bot wires it to tgClient.SendMessage; cmd/cli prints to stdout). Workers exit after five minutes idle and respawn on the next message.

Search Pipeline

The agent calls hybrid_search as a tool. The tool returns Qdrant DBSF-fused dense + BM25 results raw — the agent model is the reranker.

1

Tool call: `hybrid_search`

The agent picks the query string itself (no separate query-rewrite LLM call). Optionally restricts results to a single URL via url_filter — typically passed once determined_url is set, or with the price_url when the customer asks about pricing.

2

Hybrid Search

Qdrant prefetch with both dense embeddings (cosine similarity, 1024 dims via Matryoshka truncation) and BM25 sparse vectors. Qdrant's built-in DBSF (Distribution-Based Score Fusion) merges results, preserving absolute relevance signal unlike rank-based RRF. Each prefetch retrieves limit * 2 candidates for the fusion algorithm.

3

Server-side cap

Top 3 results are returned to the model regardless of the requested top_k. The model judges which (if any) of those three are relevant for the current turn. No reranker call, no neighbor expansion — the loop is small and fast.

Visual Search Flow

Agent calls hybrid_search(query, url_filter?)

Dense Embeddings
cosine, 1024 dims

+

BM25 Sparse
keyword matching

DBSF Fusion
Distribution-Based Score Fusion (preserves absolute relevance)

Top 3 results returned to the agent

Agent reasons over results, decides what to do next

DBSF vs RRF: DBSF normalizes each scorer's output based on its actual distribution, so a high cosine-similarity result retains its absolute relevance signal. RRF (Reciprocal Rank Fusion) is purely rank-based and loses this information.

3

Top-k cap

1024

Embedding dims

DBSF

Fusion

0.5

Min score

Qdrant collections per project

Collection	Vectors	Purpose
`{project}_content_hybrid`	Dense + Sparse	Chunked content with topic-prepended embeddings. The agent's `hybrid_search` tool reads this.
`{project}_source`	Dense only	Full page content. Retained for crawler bookkeeping; not consulted by the runtime agent path.

BM25 implementation details

Standard BM25 formula: IDF * (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * dl/avgDL))
Parameters: k1=1.5, b=0.75
Tokenizer: Unicode-aware (Cyrillic + Latin), lowercased, split on non-letter/digit
Persistence: vocabulary, IDF, avgDL, numDocs stored as JSONB in PostgreSQL (bm25_indexes table)
In-memory cache: loaded from DB on first use per project, invalidated on crawler updates

LLM Calls Map

The system has two LLM clients: the agent's ToolCaller (multi-turn tool-calling for the customer-facing loop) and the legacy Provider (single-shot completions for crawler topics, timer classification, embeddings).

Clients

Client	Config	Purpose	Backends
ToolCaller	`models.toolcaller`	The agent's tool-calling loop. Multi-turn conversation with tool calls + tool results. The model owns conversation flow.	Anthropic Messages API native; OpenAI tools spec (covers OpenRouter routing to GLM, DeepSeek, Kimi, etc.)
Provider (extract)	`models.extract`	Single-shot completion for crawler chunk topics, timer status classification, follow-up text.	OpenAI-compatible; Claude
Provider (embedding)	`models.embedding`	Vector embeddings for hybrid search.	OpenAI-compatible (OpenRouter, native)

Where each client fires

Where	Client	Purpose
`internal/agent/toolagent.go` · `runToolLoop`	ToolCaller	Main agent loop — one call per loop iteration; the model returns either a final reply or one or more tool_use blocks.
`internal/agent/toolagent.go` · `GeneratePing`	ToolCaller	Same loop, fired by the timer with a synthetic `[TIMER PING]` turn.
`internal/timer/timer.go` · `classifyStatus`	Provider (extract)	Classify silent client as `COLD` / `HOT`.
`internal/timer/timer.go` · `generateFollowUp`	Provider (extract)	Compose follow-up message text.
`internal/crawler/chunker.go` · `chunkSemantic`	Provider (extract)	Per-page boundary detection during indexing.
`internal/llm/embedding_adapter.go`	Provider (embedding)	Vector embeddings for chunks (indexing) and queries (search).

System prompt structure (cacheable)

Prefix (stable, cached): engine preamble (agent.toolcaller_preamble) + project prompt1 + tools schema. Anthropic backends mark this with cache_control: ephemeral for a 90% input discount on cache hits. OpenAI / DeepSeek / GLM auto-cache long stable prefixes (~50% discount when cached_tokens > 0).
Suffix (volatile, breaks cache): per-turn state snapshot — DeterminedURL, contact, extras, files, optional PriceURL hint, ClientStatus.

Provider routing & retry

Each ToolCaller / Provider chain has independent retry and fallback:

Attempt 1 → rate-limit/transient? → backoff (1s + jitter) → retry
Attempt 2 → rate-limit/transient? → backoff (2s + jitter) → retry
Attempt 3 → rate-limit/transient? → fall through
Non-transient error? → fall through immediately
All exhausted → AllProvidersExhaustedError

Backoff formula: 2^attempt + 10% jitter, capped at 60s (or whatever the provider's Retry-After header says, if shorter). Key pool rotates API keys round-robin.

Transient classification: rate limits (429) are transient. For OpenAI-tools backends we also treat embedded errors and empty choices arrays as transient (return RateLimitError) — observed when DeepSeek/GLM upstream returns a malformed success.

Embedding Adapters

AdaptedProvider wraps the embedding provider and formats text based on mode (document vs query). Adapter selected automatically by model name. E5-instruct / Qwen3-Embedding models get Instruct: ...\nQuery: ... prefix for queries. Token limit auto-split: when a chunk exceeds the model's token limit, text is split by sentence, each half embedded recursively, and vectors averaged + L2-normalized.

Crawler Pipeline

4-phase incremental pipeline that fetches, chunks, embeds, and uploads content to Qdrant. Intermediate results are saved per-URL for resume.

1

Fetch Pages

HTTP GET with parallel workers (default: 3). Supports HTML, PDF, DOCX, XLSX, TXT via auto-detection. Each page saved to 1_pages/{urlhash}.json. Skip logic: pages with existing files are skipped on re-run.

2

Semantic Chunk

LLM boundary detection: outputs TOPIC: X | STARTS: Y anchor phrases per page (extract tier, temp 0.1). Original text is sliced at detected boundaries. Each chunk gets a topic label. Saved to 2_chunks/{urlhash}.json. Falls back to paragraph-based splitting with --no-llm-chunk.

3

Batch Embed

32 chunks per API call. Topic prepended before embedding: "Topic: X\n\n{text}". BM25 sparse vectors rebuilt from all chunks (IDF requires full corpus). Saved to 3_embeds/{urlhash}.json + bm25.json. Token limit auto-split handles oversized chunks.

4

Upload to Qdrant

Qdrant upsert in batches of 50 points. Source pages uploaded with real embeddings to {name}_source. Chunked content to {name}_content_hybrid. BM25 index persisted to PostgreSQL. Pipeline data archived to project-archives/{name}.tar.gz.

Incremental behavior: Each step checks for existing per-URL files before processing. Re-running the pipeline only processes missing content. Delete a step's folder to force re-processing from that point. File key: sha256(url)[:16].

Storage layout

new-projects/{project-name}/
  1_pages/{urlhash}.json    # single Page per URL
  2_chunks/{urlhash}.json   # []Chunk per URL
  3_embeds/{urlhash}.json   # []EmbeddedChunk per URL
  3_embeds/bm25.json        # BM25 encoder snapshot

Document type support

Format	Library	Notes
HTML	go-readability v2	Firefox Reader View algorithm. Fallback: DOM walk (main → article → body, stripping nav/header/footer/script)
PDF	ledongthuc/pdf	Text extraction. Scanned PDFs with no text layer are skipped.
DOCX	fumiama/go-docx	Paragraph and table text extraction
XLSX	xuri/excelize	All sheets as tab-separated text
TXT	—	Plain-text body served verbatim (no HTML parsing)

Semantic chunking vs simple chunking

Semantic Chunking (default)

One LLM call per page (extract tier, temp 0.1) detects natural topic boundaries. The LLM outputs anchor phrases in the format TOPIC: X | STARTS: Y. The original text is sliced at the detected boundaries, preserving the exact source text without LLM paraphrasing. Each chunk receives a topic label for embedding.

Simple Chunking (`--no-llm-chunk`)

Paragraph and heading-aware splitting. Text is split on double newlines and markdown headings. Short paragraphs are merged up to maxChars (default 1500). Chunks get numbered topic labels ("Part 1", "Part 2", etc.).

Topic prepending strategy

At index time, each chunk's text is prepended with its topic label before embedding:

Before: "Our basic plan starts at $99/month with unlimited support."
After:  "Topic: Pricing plans\n\nOur basic plan starts at $99/month with unlimited support."

The topic acts as a semantic anchor — the embedding now captures the chunk's theme, not just its surface content. A query about "costs" will match closer to a chunk anchored with "Pricing plans" even if the chunk text never mentions "costs".

Search queries do not need the topic prefix — the embedding space naturally aligns.

CLI usage examples

# Create a new project and seed immediately
bin/aichat-crawler init-project \
  --name myproject \
  --tg-api-key "123456:ABC-DEF" \
  --tg-lead-group -1001234567890 \
  --language en \
  --prompt1 "You are a sales consultant..." \
  --start-reply "Welcome! How can I help?" \
  --sitemap-url "https://example.com/sitemap.xml"

# Seed a project later (separate from creation)
bin/aichat-crawler seed \
  --project-id "uuid-from-init-output" \
  --project-name myproject \
  --sitemap-url "https://example.com/sitemap.xml"

# Add URLs to an existing project
bin/aichat-crawler add-urls \
  --project-name myproject \
  --urls "https://example.com/new-page1,https://example.com/new-page2"

# Set prompts from files
bin/aichat-crawler set \
  --project-name myproject prompt1 @prompts/prompt1.txt

Timer System

Automated follow-up sequences that fire when a user goes silent. LLM-driven classification decides the action at each step.

Follow-Up Timeline

1

5 min

Light ping

2

15 min

Second check-in

3

40 min

Third check-in

4

24 hours

Last interval — auto-dispatches lead for any unresolved chat

Per-Fire Logic

Timer Fires

Run agent's tool-calling loopwith synthetic [TIMER PING] turn

Agent calls send_lead

Lead dispatched, conversation locked. No closing text sent — conversation is over.

Agent replies with text

Follow-up message sent to user, timer advances to next interval.

Agent stays silent

No message sent. Timer advances. At the last interval (24h) the lead is dispatched anyway, regardless of client_status.

Lifecycle details

Reset — Every incoming message resets the timer for that chat, starting the sequence from step 0.
Fire — Fetches fresh chat state and history. Skips if already Finished + LeadSent. Otherwise invokes the agent's tool-calling loop with a synthetic [TIMER PING] turn (Agent.GeneratePing). The agent decides what to do.
Last-interval safety net — At the last interval (24h) the scheduler dispatches the lead unconditionally for every unresolved chat, regardless of client_status. This is the catch-all so no lead is lost — hot clients should usually be dispatched earlier by the agent itself (the prompt nudges send_lead after 1–2 pings when contact is in hand), but if the agent never makes that call we still ship the lead at 24h with whatever was gathered.
Persist — Timer state is written to PostgreSQL with the next trigger time. On restart, Reload() recovers all active timers, calculating remaining delay and firing overdue ones immediately.
Lead retry — If a timer finds Finished=true, LeadSent=false, it retries the CRM send instead of running the agent.
Cancel — When a user sends a new message, the old timer is cancelled (goroutine killed + DB record updated).

Goroutine safety: Each chat/project pair gets one goroutine. A current == self identity check prevents a superseded goroutine from cleaning up a newer timer's state.

Multi-Tenancy

Every data path uses composite keys (chat_id, project_id) for tenant isolation. Each project gets its own resources while sharing infrastructure.

Per Project (Isolated)

Telegram bot token

Qdrant collections

System prompts (JSONB)

Lead group (CRM target)

BM25 sparse encoder

Agent instance

Language locale

Shared (Infrastructure)

PostgreSQL instance

Qdrant instance

LLM provider pool

Timer scheduler

HTTP servers

Metrics endpoint

Data isolation details

Data	Table/Collection	Key
Chat state	`chat_states`	`PRIMARY KEY (chat_id, project_id)`
Chat history	`chat_history`	`WHERE chat_id AND project_id`
Timers	`timers`	`PRIMARY KEY (chat_id, project_id)`
BM25 index	`bm25_indexes`	`PRIMARY KEY (project_id)`
Search vectors	Qdrant	`{project_name}_content_hybrid`, `{project_name}_source`
Webhook routing	URL path	`POST /webhook/{tg_api_key}`

Runtime registration

Project bundles (Telegram client, agent, CRM, KB) live in an in-memory map. New projects can be registered at runtime via two paths:

Admin group — /project_init command creates the project, seeds KB, and registers the bundle instantly. No restart needed.
Internal API — POST /api/register-project/{id} for CLI or external tools that seed the DB independently.

Project onboarding flow (Telegram)

Step 1: Create a bot in BotFather, copy the token.

Step 2: Send one command in the admin project's lead group:

/project_init myproject sitemap https://example.com/sitemap.xml 123456:AAH...token

The system verifies the token, creates the project, seeds the KB, and replies with progress:

> Project "myproject" created (bot: @myproject_bot). Seeding started...
> Scraping completed: 42 pages
> Chunking completed: 156 chunks
> Embeddings completed: 156 vectors
> Upload completed: 42 pages, 156 chunks indexed
> Project "myproject" init finished!
> Test the bot: https://t.me/myproject_bot
> To create the CRM group, click:
> https://t.me/myproject_bot?startgroup=connect_myproject

Step 3: Click the deep link. Telegram opens the "create group" UI with the bot pre-added. The bot auto-detects the group and connects it as the lead group. No group ID needed — discovered automatically via the /start connect_{name} command.

Admin commands reference

Command	Description
`/project_init <name> sitemap <url> <token>`	Create project, crawl sitemap, seed KB
`/project_init <name> urls <url1,url2> <token>`	Create project from individual URLs
`/project_prompt1 <name> <text>`	Update the project's system prompt
`/project_price_url <name> <url>`	Set price URL hint (used by the agent's `hybrid_search`)
`/project_stats <name> [period]`	Show chat statistics
`/project_info <name>`	Show project details
`/project_add_urls <name> <urls>`	Add URLs to KB
`/project_delete_urls <name> <urls>`	Delete URL chunks from KB
`/project_delete <name>`	Soft-delete project
`/project_restore <name>`	Restore soft-deleted project

Lead group commands (per project)

Command	Description
`/help`	Show all available commands
`/prompt`	Show current system prompt
`/prompt1 [text]`	Show or update the project's system prompt
`/price_url [url]`	Show or set the price URL hint
`/stats [period]`	Show chat statistics (e.g. `1 week`, `3 days`)
`/info`	Show project details

User chat commands

Command	Description
`/start`	Reset conversation state and return the project's greeting
`/debug`	Toggle debug mode (appends trace info to bot replies)

Database schema

Three migrations in migrations/:

001_initial.sql

CREATE TABLE projects (
  id          UUID PRIMARY KEY,
  name        TEXT UNIQUE,
  tg_lead_group BIGINT,
  tg_api_key  TEXT UNIQUE,
  prompts     JSONB,
  language    TEXT DEFAULT 'ru'
);

CREATE TABLE chat_history (
  id          UUID PRIMARY KEY,
  chat_id     BIGINT,
  project_id  UUID REFERENCES projects(id),
  question    TEXT,
  reply       TEXT,
  dbinfo      JSONB,
  created_at  TIMESTAMPTZ
);

CREATE TABLE chat_states (
  chat_id     BIGINT,
  project_id  UUID REFERENCES projects(id),
  state       JSONB,
  PRIMARY KEY (chat_id, project_id)
);

002_bm25_index.sql

CREATE TABLE bm25_indexes (
  project_id  UUID PRIMARY KEY REFERENCES projects(id),
  vocab       JSONB,
  idf         JSONB,
  avg_dl      FLOAT,
  num_docs    INT,
  updated_at  TIMESTAMPTZ
);

003_timers.sql

CREATE TABLE timers (
  chat_id     BIGINT,
  project_id  UUID REFERENCES projects(id),
  step        INT,
  trigger_at  TIMESTAMPTZ,
  PRIMARY KEY (chat_id, project_id)
);

Key indexes

idx_chat_history_chat_id on (chat_id, created_at) — history lookup
idx_chat_states_project on (project_id) — project-level queries
idx_chat_states_state_gin GIN on (state) — JSONB queries on state

CRM Lead Output

When a lead is dispatched, the following is sent to the project's Telegram lead group.

1

Forward last message — the user's last message is forwarded so managers can click on the profile to start a direct conversation.

2

Summary message — LLM-generated summary containing: lead ID, name, city, service, contact method, shared contact (if available), client status (finished/hot/cold), file count.

3

Chat history document — full timestamped Q/A transcript uploaded as lead{chatID}.txt, replied to the summary message.

4

Media files — grouped by type: photos+videos as album, documents as album, voice/video notes individually. All replied to the summary.

Contact persistence: Contacts are extracted once and reused across turns. Shared contacts from the Telegram button are stored in chat state (not in history) and included in the lead summary when dispatched.

CRM interface design

type CRM interface {
    SendLead(ctx context.Context, lead domain.Lead) (string, error)
    Name() string
}

Current implementation: TelegramCRM. The interface is designed for future CRM backends (Bitrix, AmoCRM) via the MultiCRM dispatcher pattern.

Deployment

NixOS-native deployment with systemd service hardening, sops-nix secrets, and Cloudflare Tunnel.

Service Dependency Chain

postgresql

aichat-migrate

aichat-bot

qdrant

(also depends on)

aichat-bot

Security Hardening

Setting	Value	Purpose
`NoNewPrivileges`	`true`	Prevent privilege escalation
`ProtectSystem`	`strict`	Read-only filesystem except allowed paths
`ProtectHome`	`true`	No access to /home
`PrivateTmp`	`true`	Isolated /tmp
`MemoryMax`	`512M`	Memory limit (OOM protection)

NixOS module configuration

services.aichat = {
  enable = true;
  configFile = "/run/secrets/aichat-config.json";
  listenAddr = ":8080";

  # Database
  enablePostgres = true;
  dbName = "aichat";
  postgresPort = 5432;

  # Qdrant
  enableQdrant = true;
  qdrantPort = 6333;
  qdrantDataDir = "/var/lib/qdrant";

  # Backup
  enableBackup = true;
  backupDir = "/var/backup/aichat";
  backupRetentionDays = 14;
};

Backup & secrets

Backup

Daily systemd timer with RandomizedDelaySec=1h
pg_dump + Qdrant snapshots
14-day retention with automatic cleanup

Secrets (sops-nix)

Encrypted in secrets/production.yaml using age keys
Decrypted at boot on the target machine using its SSH host key
Keys: aichat_config (full JSON config), cloudflared_creds (tunnel credentials)

Quick deploy commands

# Deploy code changes to production
make prod-deploy

# SSH tunnels for crawler access to prod DB + Qdrant
make prod-tunnel
# PostgreSQL on localhost:15432, Qdrant on localhost:16333

# Seed a project on prod (while tunnel is open)
./bin/aichat-crawler --config /tmp/config.prod-tunnel.json init-project \
  --name myproject --tg-api-key "TOKEN" --language ru \
  --no-llm-chunk --workers 1 --urls "URL1,URL2,..."

# Fetch trace files from prod
ssh root@server "cat /var/lib/aichat/traces/{chatID}/{dialogID}.log"

Webhook setup

After the bot is running and accessible via HTTPS, register the webhook with Telegram:

curl -X POST "https://api.telegram.org/bot{TG_API_KEY}/setWebhook" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-domain.com/webhook/{TG_API_KEY}"}'

The webhook URL must exactly match https://{host}/webhook/{tg_api_key} where tg_api_key is the bot token from the projects table. The API key in the URL routes incoming updates to the correct project bundle.

Manual / Docker deployment

Prerequisites: Go 1.22+, PostgreSQL 16, Qdrant

# Build all binaries
make build
# produces bin/aichat-bot, bin/aichat-crawler, bin/aichat-cli

# Run migrations
make migrate DATABASE_URL="postgres://user:pass@localhost/aichat"

# Run the bot with JSON config
bin/aichat-bot --config config.prod.json

# Or with environment variables
DATABASE_URL="postgres://user:pass@localhost/aichat" \
OPENAI_API_KEYS="sk-xxx" \
bin/aichat-bot

Cloudflare Tunnel setup

Production uses Cloudflare Tunnel to route HTTPS traffic to the bot without exposing ports. The tunnel routes aichat.example.com to localhost:8080 on the server.

Tunnel credentials managed via sops-nix (encrypted at rest)
Configured in nix/production.nix
No need for TLS certificates or reverse proxy configuration
Telegram webhooks point to the tunnel URL

Localization

User-facing strings and LLM prompt templates are localized via JSON locale files embedded at compile time.

Aspect	Details
Supported languages	`ru` (default), `en`
Locale files	`internal/i18n/locales/ru.json`, `internal/i18n/locales/en.json`
API	`i18n.T(lang, key)` for strings, `i18n.Tf(lang, key, args...)` for formatted strings
Fallback chain	Requested lang → `"ru"` → key itself
Coverage	36+ keys: agent replies, LLM prompts, history formatting, timer messages, CRM lead formatting, chunker prompts
Per-project	Set in `projects.language` column, propagates to agent, CRM, and timer
Adding a locale	Create `internal/i18n/locales/{code}.json` with the same keys — auto-loaded via `//go:embed`

Monitoring

Prometheus metrics exposed at GET /metrics on the internal server (localhost:9090 by default). Metrics are applied via decoration — core packages have zero Prometheus imports.

LLM Metrics

Metric	Type	Labels	Description
`aichat_llm_requests_total`	counter	provider, method, status	Total LLM API calls
`aichat_llm_request_duration_seconds`	histogram	provider, method	LLM call latency (0.1s-51s buckets)

Telegram Metrics

Metric	Type	Labels	Description
`aichat_telegram_requests_total`	counter	method, status	Telegram API calls
`aichat_telegram_request_duration_seconds`	histogram	method	Telegram API latency

Webhook & Agent Metrics

Metric	Type	Labels	Description
`aichat_webhook_requests_total`	counter	project, status	Inbound webhook requests
`aichat_webhook_request_duration_seconds`	histogram	project	Webhook processing time
`aichat_agent_messages_total`	counter	project, stage	Messages processed by stage

Timer & CRM Metrics

Metric	Type	Labels	Description
`aichat_timer_active_sequences`	gauge	—	Currently active timer sequences
`aichat_timer_fires_total`	counter	action	Timer fire outcomes (follow_up, lead, skip)
`aichat_crm_leads_total`	counter	project, status	Leads sent to CRM

Alert recommendations

Condition	PromQL Query
LLM errors spiking	`rate(aichat_llm_requests_total{status="error"}[5m]) > 0.1`
LLM latency high	`histogram_quantile(0.95, rate(aichat_llm_request_duration_seconds_bucket[5m])) > 30`
Webhook errors	`rate(aichat_webhook_requests_total{status="error"}[5m]) > 0.05`
Timers stuck	`rate(aichat_timer_fires_total[1h]) == 0` when `aichat_timer_active_sequences > 0`
CRM failures	`rate(aichat_crm_leads_total{status="error"}[5m]) > 0`

Grafana dashboard tips

Group LLM metrics by provider to compare provider performance and error rates
Track aichat_agent_messages_total by stage to see funnel conversion rates (general → service_specific → final)
Watch aichat_timer_active_sequences as a proxy for total active conversations
Use aichat_webhook_request_duration_seconds p95 to spot slow responses (typical: 2-5s including LLM + search)
Monitor aichat_crm_leads_total by project to track lead generation per tenant

Label reference

Label	Values	Used In
`provider`	openai, claude, together, openrouter, tiered	LLM metrics
`method`	complete, embed (LLM); sendMessage, sendMediaGroup, etc. (Telegram)	LLM, Telegram metrics
`status`	ok, error	All counter metrics
`project`	Project name string	Webhook, Agent, CRM metrics
`stage`	`active` (in-progress conversation) or `final` (lead dispatched)	Agent metrics
`action`	follow_up, lead, skip	Timer metrics

Configuration Reference

Configuration via JSON file (--config config.json) or environment variables (fallback). JSON config is recommended for production.

Annotated config.example.json

{
  // PostgreSQL connection string
  "database_url": "postgresql://user:pass@localhost/aichat",

  // Qdrant vector database
  "qdrant": {
    "url": "http://localhost:6333",
    "api_key": ""
  },

  // Public HTTP server (webhooks)
  "listen_addr": ":8080",
  // Internal HTTP server (health, metrics, register-project)
  "internal_addr": "localhost:9090",
  // Base URL for webhook registration (must be HTTPS in production)
  "webhook_url": "https://bot.example.com",

  // LLM providers: any OpenAI-compatible API
  "providers": {
    "openrouter": {
      "keys": ["sk-or-v1-your-key"],
      "base_url": "https://openrouter.ai/api/v1"
    },
    "together": {
      "keys": ["your-together-key"],
      "base_url": "https://api.together.xyz/v1"
    }
  },

  // Model slots: "provider/model" format (split on first /)
  "models": {
    "toolcaller": "openrouter/z-ai/glm-5.1",                                // agent's tool-calling loop
    "extract":    "openrouter/mistralai/mistral-small-3.1-24b-instruct",    // crawler topics, timer classification
    "embedding":  "openrouter/qwen/qwen3-embedding-8b"                       // vectors
  },

  // Crawler-specific settings
  "crawler": {
    "chunk_size": 1500,       // chars per chunk (simple mode)
    "workers": 4,             // parallel crawl workers
    "no_llm_chunk": false,     // true = paragraph-based, false = LLM semantic
    "embedding_dim": 1024     // Matryoshka truncation dimension
  },

  "debug": false
}

Environment variable fallback

Variable	Required	Default	Description
`DATABASE_URL`	Yes	—	PostgreSQL connection string
`OPENAI_API_KEYS`	At least one	—	Comma-separated OpenAI API keys
`CLAUDE_API_KEYS`	At least one	—	Comma-separated Claude API keys
`QDRANT_URL`	No	`http://localhost:6333`	Qdrant REST API URL
`QDRANT_API_KEY`	No	—	Qdrant authentication key
`LISTEN_ADDR`	No	`:8080`	Public HTTP server address
`INTERNAL_ADDR`	No	`localhost:9090`	Internal HTTP server address
`DEBUG`	No	—	Enable debug logging (JSON)
`AGENT_TRACE`	No	—	Set to `1` for per-dialog trace files

Model slots explained

Model identifiers use "provider/model" format. The provider name is split on the first / character and matched against the providers map.

Slot	Config Key	Purpose	Notes
toolcaller	`models.toolcaller`	The agent's tool-calling loop	MUST be a tool-capable model. Tested in production: `anthropic/claude-sonnet-4-6`, `openai/gpt-5.4`, `openrouter/z-ai/glm-5.1` (current default), `openrouter/z-ai/glm-4.6`.
extract	`models.extract`	Crawler chunk topics, timer status classification, follow-up text	Smaller, faster model. Single-shot completions.
embedding	`models.embedding`	Vector embeddings for search	Batch API, Matryoshka truncation to configured dim

Extract model constraint: Must return content in choices[].message.content field. Thinking/reasoning models that use a reasoning field will not work for the extract slot.

Key Thresholds Reference

All configurable and hardcoded constants that control system behavior.

Agent Thresholds

Constant	Value	Purpose	Defined In
Tool-loop iteration cap	8	Max ToolCaller round-trips per user turn before falling back to a canned safety reply	`internal/agent/toolagent.go`
`hybrid_search` top-k	3	Hard server-side cap on results returned to the model regardless of requested top_k	`internal/agent/toolagent.go`
`minSearchScore`	0.5	Minimum DBSF fusion score to include a result	`internal/search/hybrid.go`

Timer Intervals

Step	Delay	Behavior
1	5 minutes	First check-in — agent decides whether to ping or stay silent
2	15 minutes	Second check-in
3	40 minutes	Third check-in
4	24 hours	Last interval — safety net. The lead is dispatched unconditionally for any unresolved chat (regardless of `client_status`) so no lead is lost.

Crawler Constants

Constant	Value	Purpose	Defined In
`defaultEmbeddingDim`	1536	Default embedding dimension (text-embedding-3-small)	`internal/crawler/indexer.go`
`upsertBatch`	50	Points per Qdrant upsert batch	`internal/crawler/indexer.go`
Default chunk size	1500 chars	Simple chunking mode character limit	`internal/crawler/chunker.go`
Embed batch size	32 chunks	Chunks per embedding API call	`internal/crawler/pipeline.go`
Default workers	3	Parallel crawl workers	`cmd/crawler/main.go`

LLM Temperatures

Temperature	Used For
0.1	Semantic chunking (crawler)
0.3	ToolCaller default; timer status classification; follow-up generation

Some tool-capable models (GPT-5, o-series) ignore explicit temperature and use a fixed value — the OpenAI tools backend handles this automatically.

LLM Provider

Constant	Value	Purpose
Max backoff	60 seconds	Cap on exponential backoff for 429 retries
Backoff formula	`2^attempt + 10% jitter`	Exponential with jitter, respects Retry-After header
Retries per chain	3	Max 429 retries before falling through to next provider

Telegram

Constant	Value	Purpose
Max message length	4096 chars	Telegram API limit; messages auto-chunked at this boundary
Retry on 429	Yes	Respects Retry-After header from Telegram API

Performance Reference

2-5s

Webhook Latency

~500ms

Search (Qdrant)

~1KB

Per Turn

~8KB

Per Vector

Webhook latency includes one or more ToolCaller round-trips plus search. Search latency covers Qdrant DBSF fusion. Storage estimates: ~1KB per conversation turn, ~100KB per 10K chunks, ~8KB per dense vector (1024 dims).

Build and test commands

# Build all three binaries
make build        # produces bin/aichat-bot, bin/aichat-crawler, bin/aichat-cli

# Run tests
make test         # go test ./...

# Lint
make lint         # go vet ./...

# Local development services (PostgreSQL + Qdrant via Docker)
make dev-services

# Run the CLI interface for testing (no Telegram needed)
bin/aichat-cli --config config.dev.json --project myproject

aichat-go

Multi-tenant AI Sales Bot Platform
Built with Go, Qdrant, PostgreSQL, and OpenRouter

3 Entry Points: bot, crawler, cli 4 Tools: hybrid_search, set_state, get_state, send_lead 2 Locales (ru/en)

This documentation is generated from source code and docs/*.md files.

Tool-calling agent

Hybrid Search

Multi-backend ToolCaller

Smart Follow-Ups

Multi-Tenant

Incremental Crawler

Core Concepts

Architecture Overview

New LLM Provider

New CRM Backend

New Gateway

Custom Search

Conversation Flow

Tools

System prompt structure

Search Pipeline

Tool call: hybrid_search

Hybrid Search

Server-side cap

Visual Search Flow

LLM Calls Map

Clients

Where each client fires

Embedding Adapters

Crawler Pipeline

Fetch Pages

Semantic Chunk

Batch Embed

Upload to Qdrant

Semantic Chunking (default)

Simple Chunking (--no-llm-chunk)

Timer System

Follow-Up Timeline

Per-Fire Logic

Multi-Tenancy

Per Project (Isolated)

Shared (Infrastructure)

Lead group commands (per project)

User chat commands

001_initial.sql

002_bm25_index.sql

003_timers.sql

Key indexes

CRM Lead Output

Deployment

Service Dependency Chain

Security Hardening

Backup

Secrets (sops-nix)

Localization

Monitoring

LLM Metrics

Telegram Metrics

Webhook & Agent Metrics

Timer & CRM Metrics

Configuration Reference

Annotated config.example.json

Key Thresholds Reference

Agent Thresholds

Timer Intervals

Crawler Constants

LLM Temperatures

LLM Provider

Telegram

Performance Reference

Tool call: `hybrid_search`

Simple Chunking (`--no-llm-chunk`)