aichat-go

Multi-tenant AI Sales Bot Platform

Tool-Calling Agent
Hybrid Vector + BM25 Search
Multi-Tenant Isolation
Automated Follow-Ups
Go Qdrant PostgreSQL OpenRouter NixOS

Tool-calling agent

The model owns the conversation flow via four tools: hybrid_search, set_state, get_state, send_lead. No hand-tuned thresholds, no stage transitions.

🔍

Hybrid Search

Dense embeddings + BM25 sparse vectors with Qdrant DBSF fusion. The agent model is the reranker — raw fused results go straight into the tool-call response.

Multi-backend ToolCaller

Anthropic Messages API or OpenAI tools spec (covers OpenRouter routing to GLM, DeepSeek, Kimi, etc.). Multi-key pool with round-robin rotation and prompt caching where supported.

🕑

Smart Follow-Ups

4-step timer sequence (5m / 15m / 40m / 24h) with LLM-driven client classification. Automatically detects cold, hot, and finished conversations.

🏠

Multi-Tenant

Full project isolation with composite keys. Each tenant gets its own bot, collections, prompts, and lead group while sharing infrastructure.

🚀

Incremental Crawler

4-phase pipeline (fetch / chunk / embed / upload) with per-URL resume. Supports HTML, PDF, DOCX, XLSX, TXT. Semantic chunking with LLM topic labels.

Core Concepts

Project
A tenant — one business with its own bot, knowledge base, prompts, and CRM target. All data paths use (chat_id, project_id) composite keys.
Stage
Either StageActive (in-progress conversation) or StageFinal (lead dispatched, conversation locked). Derived from the Finished field — no separate stage field to get out of sync.
Knowledge Base
Per-project Qdrant collections containing chunked website content with both dense (embedding) and sparse (BM25) vectors.
Determined URL
A service page URL the agent commits to via set_state(determined_url=...). Used as a filter for follow-up hybrid_search calls.
Lead
The output: contact info + conversation summary + chat history + files, sent to the CRM lead group.
Timer
Automated follow-up scheduler. When a user goes silent the timer fires the agent's tool-calling loop with a synthetic [TIMER PING] turn; the agent decides whether to send a nudge or dispatch the lead.
ToolCaller
The LLM client used by the agent. Multi-turn tool-using conversations. Two backends: Anthropic Messages API native, and OpenAI tools spec (covers OpenRouter-routed GLM, DeepSeek, Kimi, etc.).

Architecture Overview

End-to-end data flow from Telegram webhook to CRM lead dispatch. The agent runs a tool-calling loop — the model itself decides when to search, what to commit to, and when to dispatch the lead.

TelegramWebhook POST
normalize
GatewayTelegramUpdate → Message
AgentTool-calling loop
tool: hybrid_search
Hybrid SearchQdrant + BM25
QdrantDense + Sparse
PostgreSQLBM25 Index
reason / reply
ToolCallerAnthropic / OpenAI tools
OpenRouterGLM / DeepSeek / Kimi / ...
tool: send_lead
CRM OutputLead Group
TelegramGroup Messages
Key design decisions: State is derived (not stored) from the Finished field. The agent owns flow via four tools (hybrid_search, set_state, get_state, send_lead) — no hand-tuned thresholds. Metrics are applied via decoration (zero Prometheus imports in core packages). All data paths use composite keys (chat_id, project_id) for tenant isolation.
Component details
ComponentLocationPurpose
Gatewayinternal/gateway/Normalizes TelegramUpdate into domain.Message (text, caption, files, sender)
Agentinternal/agent/Tool-calling loop: rebuilds the message thread from chat_history.blocks, dispatches tool calls, returns the final text reply
ToolCallerinternal/llm/toolcaller*.goMulti-backend tool-calling client: Anthropic Messages API + OpenAI tools spec (covers OpenRouter routing). Multi-key pool with rate-limit fallback.
Provider (extract / embedding)internal/llm/openai.go, claude.goSingle-shot completions for crawler chunk topics, timer status classification, and embeddings. Routed via the extract / embedding tiers.
Hybrid Searchinternal/search/Dense embeddings + BM25 sparse vectors with Qdrant DBSF fusion. The agent model is the reranker — raw fused results go straight back as the hybrid_search tool response.
CRMinternal/crm/Dispatches leads to Telegram group (summary + history + media). Two-phase persist guarantees idempotency.
Timerinternal/timer/Automated follow-ups. Fires the agent's tool-calling loop with a synthetic [TIMER PING] turn; the agent decides what to do.
Crawlerinternal/crawler/4-phase pipeline: fetch, chunk, embed, upload to Qdrant
Metricsinternal/metrics/Prometheus wrappers (zero-coupled with core packages)
Admininternal/admin/Admin group command handler for cross-project management
i18ninternal/i18n/Localization (ru/en) with compile-time embedded locale files
Request lifecycle (webhook to response)

Every incoming Telegram message follows this path:

  1. Webhook receptionPOST /webhook/{tg_api_key} received on the public HTTP server (:8080). The API key in the URL routes to the correct project bundle.
  2. Gateway normalizationTelegramUpdate is converted to domain.Message with text, caption, files, and sender info extracted.
  3. State loading — Chat state loaded from PostgreSQL using composite key (chat_id, project_id).
  4. Command check/start resets state and returns greeting. /debug toggles debug mode.
  5. File collection — Any file attachments are accumulated into state. Files without text get an immediate acknowledgement without LLM calls.
  6. Stage dispatch — Based on derived state (CurrentStage()), the message is routed to the appropriate stage handler.
  7. State persist — Updated state written back to PostgreSQL.
  8. Timer reset — Follow-up timer reset to step 0 for this chat.
Project structure tree
cmd/
  bot/         # Main bot binary
    main.go     # Wiring: DB, tiered LLM, per-project bundles, HTTP server
    config.go   # JSON config loading with env var fallback
  crawler/     # Knowledge base indexer CLI
    main.go     # Subcommands: init-project, seed, add-urls, register
  cli/         # Local readline chat interface (testing)
    main.go     # Same agent pipeline, file-based CRM output

internal/
  domain/      types.go        # Stage, Message, ChatState, Lead, Block
  agent/       agent.go        # HandleMessage entrypoint
                toolagent.go    # Tool-calling loop, tool dispatch, system prompt
                commands.go     # /start, /debug
                lead_commands.go # Lead group: /prompt, /stats
                trace.go        # Per-turn trace records for debugging
  llm/         toolcaller.go               # ToolCaller interface, Block/Message types
                toolcaller_anthropic.go     # Anthropic Messages API backend
                toolcaller_openai.go        # OpenAI tools spec backend (covers OpenRouter)
                toolcaller_chain.go         # Multi-backend fallback chain
                tiered.go       # TieredClient (extract / embedding tiers)
                openai.go       # OpenAI-compatible provider (extract, embeddings)
                claude.go       # Claude provider (extract)
                embedding_adapter.go # Model-specific text formatting
  search/      hybrid.go       # Qdrant DBSF fusion (dense + BM25)
                bm25.go         # BM25Encoder, Tokenize
  db/          postgres.go     # PostgreSQL implementation
  gateway/     telegram.go     # TelegramUpdate normalization
  telegram/    client.go       # HTTP client, retry on 429
  crm/         telegram.go     # TelegramCRM: summary + history + media
  timer/       timer.go        # Scheduler, sequences, LLM classification
  metrics/     metrics.go      # Metric definitions
                llm.go          # InstrumentedLLMClient wrapper
  crawler/     pipeline.go     # Resumable 4-phase pipeline
                chunker.go      # Semantic + simple chunking
                indexer.go      # Qdrant upsert
  i18n/        i18n.go         # T(), Tf(), embedded locales
                locales/    ru.json, en.json

migrations/  001-013 (latest: 012_chat_history_blocks.sql, 013_drop_3stage_remnants.sql)
nix/         module.nix, example-configuration.nix
flake.nix    Makefile
Extension points
New LLM Provider

Implement the Provider interface

internal/llm/
New CRM Backend

Implement CRM interface (multi-CRM dispatcher)

internal/crm/
New Gateway

Normalize to domain.Message + webhook route

internal/gateway/
Custom Search

Implement KnowledgeBase interface

internal/search/

Conversation Flow

The agent runs a single tool-calling loop per user turn. The model decides when to search, what to commit to via set_state, and when to dispatch the lead via send_lead.

HandleMessage
  ├─ if state.Finished → canned acknowledgement (post-finish lock)
  └─ else runToolLoop(state, history, userMsg)
        msgs = rebuildMessages(history)         // replays prior tool_use / tool_result blocks
        msgs += user(userMsg)
        loop (cap 8 iterations):
          response = toolcaller.Call(systemPrefix, systemSuffix, msgs, tools)
          msgs += assistant(response.blocks)
          if no tool_use blocks:
              return last text block as the user-facing reply
          for each tool_use:
              result = dispatch(tool_use)        // hybrid_search / set_state / get_state / send_lead
              msgs += user(tool_result(id, result))
        // overflow → canned safety reply, log loudly

Tools

ToolPurposeArgs
hybrid_search Search the project KB. Returns top-k DBSF-fused (dense + BM25) chunks. The model is the reranker. query, top_k (capped at 3 server-side), optional url_filter
set_state Persist anything the model wants to remember across turns. Three flat string args (no nested objects, so weak toolcaller models can’t malform the JSON). notes (free-form scratchpad — full rewrite each call), determined_url, client_status (hot/cold)
get_state Re-read current ChatState mid-turn. Belt-and-braces — the same data is in the system suffix. none
send_lead Dispatch the conversation as a lead to CRM. Idempotent — if LeadSent=true already, returns ok without re-sending. summary

System prompt structure

The system prompt is split into two cacheable parts:

  • Prefix (stable, cached): the engine preamble (agent.toolcaller_preamble — explains the tools and how to map natural-language project instructions to tool calls), then the project-editable prompt1, then the tools schema.
  • Suffix (volatile, breaks cache): the per-turn state snapshot — DeterminedURL, contact, extras, files, optional PriceURL hint, ClientStatus.

Anthropic backends mark the prefix with cache_control: ephemeral (90% input discount on cache hits). OpenAI / DeepSeek / GLM (via OpenRouter) auto-cache long stable prefixes.

Two-phase lead persist details

Finished=true is written to the database before the CRM send, and LeadSent=true is written after. This prevents duplicate leads from concurrent webhook requests or timer fires reading stale state. If the bot crashes between the two persists, the timer system retries the CRM send on next fire or on restart (via chatStore.ListFinishedUnsent).

Finished=truePersist to DB
CRM send
LeadSent=truePersist to DB

Crash between these two states → timer retries the CRM send on next fire or restart

Tool-turn persistence (chat_history.blocks)

The full assistant / tool-result block sequence for each turn is stored in chat_history.blocks (JSONB). On the next turn, rebuildMessages replays these blocks back to the model so it sees its own prior tool_use and tool_result exchanges. Falls back to plain question / reply for legacy rows where blocks is empty.

File upload handling

When a user sends a file without text, the bot immediately acknowledges it ("Accepted! Let me know when you're done.") without calling the LLM. Files are accumulated on ChatState.Files and attached to the lead when send_lead fires.

Media files in the lead are grouped by type for clean presentation:

  • Photos + Videos — sent as a media album (Telegram groups them visually)
  • Documents — sent as a document album
  • Voice / Video notes — sent individually (Telegram does not support albums for these)
Contact sharing button

On the first turn that mentions contact collection, the bot sends a Telegram request_contact keyboard button alongside its reply. This lets users share their phone number with one tap.

  • Sent only once per conversation
  • Shared contact lands in ChatState.SharedContact (not in history) for privacy
  • Included in the dispatched lead
Timer-driven dispatch

The timer scheduler runs the same tool-calling loop with a synthetic [TIMER PING] user turn. The agent decides whether to send a follow-up message or call send_lead itself. At the last interval (24h) the timer dispatches the lead unconditionally for any unresolved chat — regardless of client_status — so no lead is lost. Hot clients with contact in hand should be dispatched earlier by the agent itself (the prompt encourages send_lead after 1–2 pings when there's enough info).

When the safety-net dispatch fires and the agent never called send_lead itself, state.LeadSummary is empty. The timer then asks the agent to summarize the dialog via Agent.SummarizeForLead — a one-shot Complete call (no tools) on the summary tier (cheap mid-tier, e.g. gemma-4-26b-a4b-it). It renders the dialog as a Client/Bot transcript plus the agent’s Notes scratchpad and produces a 1–2 paragraph summary for the human reviewer, so the lead post never arrives blank.

Notes scratchpad (set_state)

set_state(notes="...") is the agent’s free-form text scratchpad. It replaced the earlier structured contact{method,value} + extras{...} schema. The model writes plain text, full rewrite each call — whatever it omits is lost. Conventions live in the tool description, not the schema:

name: Виктор
contact: phone +79130001234
service: оценка квартиры
city: Барнаул
date: на следующей неделе

The scratchpad is rendered into the system suffix on every turn so the model sees its own state, and it’s the input to Agent.SummarizeForLead at 24h dispatch. The flat-string shape is deliberate: weak toolcaller models (e.g. glm-5.1) drop closing braces on nested-object set_state args, which used to silently lose phone numbers and trip the tool-loop overflow guard.

Lead-as-living-document (post-send refresh)

After send_lead fires, the chat doesn’t go silent. The customer can keep typing for up to five more messages; each one is appended to chat_history and the existing CRM lead artifact is refreshed in place — for Telegram, editMessageText on the summary post + editMessageMedia on the chat-history .txt attachment. The summary text stays frozen, but a history updated: <ts> footer appears so the operator notices.

CRM artifact references live in a dedicated lead_dispatches table (chat_id, project_id, provider) — one row per CRM. Multi-CRM by construction: a chat dispatched to both Telegram and Bitrix gets two rows, and the post-send refresh fans UpdateLead across all of them.

Per-chat batching queue

Inbound messages don’t hit HandleMessage directly. Each gateway (Telegram webhook, CLI loop) calls agent.Enqueue, which persists the user message as an orphan row in chat_history (Question filled, Reply empty) and pushes onto a per-chat cheggaaa/mb queue. One worker goroutine per (chat_id, project_id) drains the queue serially.

What this gives:

  • No double-dispatch race: single worker per chat — the structural fix for the goroutine race that previously stomped state and produced two CRM posts for one customer.
  • Burst coalescing: messages that arrive while the worker is mid-LLM accumulate in the queue and get processed as the next batch. The model sees one combined user turn, not three sequential ones.
  • Crash recovery: orphan rows that survive a process kill get picked up by the next batch — no message lost.

Replies flow back through a per-project agent.Replier callback (cmd/bot wires it to tgClient.SendMessage; cmd/cli prints to stdout). Workers exit after five minutes idle and respawn on the next message.

Search Pipeline

The agent calls hybrid_search as a tool. The tool returns Qdrant DBSF-fused dense + BM25 results raw — the agent model is the reranker.

1

Tool call: hybrid_search

The agent picks the query string itself (no separate query-rewrite LLM call). Optionally restricts results to a single URL via url_filter — typically passed once determined_url is set, or with the price_url when the customer asks about pricing.

2

Hybrid Search

Qdrant prefetch with both dense embeddings (cosine similarity, 1024 dims via Matryoshka truncation) and BM25 sparse vectors. Qdrant's built-in DBSF (Distribution-Based Score Fusion) merges results, preserving absolute relevance signal unlike rank-based RRF. Each prefetch retrieves limit * 2 candidates for the fusion algorithm.

3

Server-side cap

Top 3 results are returned to the model regardless of the requested top_k. The model judges which (if any) of those three are relevant for the current turn. No reranker call, no neighbor expansion — the loop is small and fast.

Visual Search Flow

Agent calls hybrid_search(query, url_filter?)
Dense Embeddings
cosine, 1024 dims
+
BM25 Sparse
keyword matching
DBSF Fusion
Distribution-Based Score Fusion (preserves absolute relevance)
Top 3 results returned to the agent
Agent reasons over results, decides what to do next
DBSF vs RRF: DBSF normalizes each scorer's output based on its actual distribution, so a high cosine-similarity result retains its absolute relevance signal. RRF (Reciprocal Rank Fusion) is purely rank-based and loses this information.
3
Top-k cap
1024
Embedding dims
DBSF
Fusion
0.5
Min score
Qdrant collections per project
CollectionVectorsPurpose
{project}_content_hybridDense + SparseChunked content with topic-prepended embeddings. The agent's hybrid_search tool reads this.
{project}_sourceDense onlyFull page content. Retained for crawler bookkeeping; not consulted by the runtime agent path.
BM25 implementation details
  • Standard BM25 formula: IDF * (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * dl/avgDL))
  • Parameters: k1=1.5, b=0.75
  • Tokenizer: Unicode-aware (Cyrillic + Latin), lowercased, split on non-letter/digit
  • Persistence: vocabulary, IDF, avgDL, numDocs stored as JSONB in PostgreSQL (bm25_indexes table)
  • In-memory cache: loaded from DB on first use per project, invalidated on crawler updates

LLM Calls Map

The system has two LLM clients: the agent's ToolCaller (multi-turn tool-calling for the customer-facing loop) and the legacy Provider (single-shot completions for crawler topics, timer classification, embeddings).

Clients

ClientConfigPurposeBackends
ToolCaller models.toolcaller The agent's tool-calling loop. Multi-turn conversation with tool calls + tool results. The model owns conversation flow. Anthropic Messages API native; OpenAI tools spec (covers OpenRouter routing to GLM, DeepSeek, Kimi, etc.)
Provider (extract) models.extract Single-shot completion for crawler chunk topics, timer status classification, follow-up text. OpenAI-compatible; Claude
Provider (embedding) models.embedding Vector embeddings for hybrid search. OpenAI-compatible (OpenRouter, native)

Where each client fires

WhereClientPurpose
internal/agent/toolagent.go · runToolLoopToolCallerMain agent loop — one call per loop iteration; the model returns either a final reply or one or more tool_use blocks.
internal/agent/toolagent.go · GeneratePingToolCallerSame loop, fired by the timer with a synthetic [TIMER PING] turn.
internal/timer/timer.go · classifyStatusProvider (extract)Classify silent client as COLD / HOT.
internal/timer/timer.go · generateFollowUpProvider (extract)Compose follow-up message text.
internal/crawler/chunker.go · chunkSemanticProvider (extract)Per-page boundary detection during indexing.
internal/llm/embedding_adapter.goProvider (embedding)Vector embeddings for chunks (indexing) and queries (search).
System prompt structure (cacheable)
  • Prefix (stable, cached): engine preamble (agent.toolcaller_preamble) + project prompt1 + tools schema. Anthropic backends mark this with cache_control: ephemeral for a 90% input discount on cache hits. OpenAI / DeepSeek / GLM auto-cache long stable prefixes (~50% discount when cached_tokens > 0).
  • Suffix (volatile, breaks cache): per-turn state snapshot — DeterminedURL, contact, extras, files, optional PriceURL hint, ClientStatus.
Provider routing & retry

Each ToolCaller / Provider chain has independent retry and fallback:

Attempt 1 → rate-limit/transient? → backoff (1s + jitter) → retry
Attempt 2 → rate-limit/transient? → backoff (2s + jitter) → retry
Attempt 3 → rate-limit/transient? → fall through
Non-transient error? → fall through immediately
All exhausted → AllProvidersExhaustedError

Backoff formula: 2^attempt + 10% jitter, capped at 60s (or whatever the provider's Retry-After header says, if shorter). Key pool rotates API keys round-robin.

Transient classification: rate limits (429) are transient. For OpenAI-tools backends we also treat embedded errors and empty choices arrays as transient (return RateLimitError) — observed when DeepSeek/GLM upstream returns a malformed success.

Embedding Adapters

AdaptedProvider wraps the embedding provider and formats text based on mode (document vs query). Adapter selected automatically by model name. E5-instruct / Qwen3-Embedding models get Instruct: ...\nQuery: ... prefix for queries. Token limit auto-split: when a chunk exceeds the model's token limit, text is split by sentence, each half embedded recursively, and vectors averaged + L2-normalized.

Crawler Pipeline

4-phase incremental pipeline that fetches, chunks, embeds, and uploads content to Qdrant. Intermediate results are saved per-URL for resume.

1

Fetch Pages

HTTP GET with parallel workers (default: 3). Supports HTML, PDF, DOCX, XLSX, TXT via auto-detection. Each page saved to 1_pages/{urlhash}.json. Skip logic: pages with existing files are skipped on re-run.

2

Semantic Chunk

LLM boundary detection: outputs TOPIC: X | STARTS: Y anchor phrases per page (extract tier, temp 0.1). Original text is sliced at detected boundaries. Each chunk gets a topic label. Saved to 2_chunks/{urlhash}.json. Falls back to paragraph-based splitting with --no-llm-chunk.

3

Batch Embed

32 chunks per API call. Topic prepended before embedding: "Topic: X\n\n{text}". BM25 sparse vectors rebuilt from all chunks (IDF requires full corpus). Saved to 3_embeds/{urlhash}.json + bm25.json. Token limit auto-split handles oversized chunks.

4

Upload to Qdrant

Qdrant upsert in batches of 50 points. Source pages uploaded with real embeddings to {name}_source. Chunked content to {name}_content_hybrid. BM25 index persisted to PostgreSQL. Pipeline data archived to project-archives/{name}.tar.gz.

Incremental behavior: Each step checks for existing per-URL files before processing. Re-running the pipeline only processes missing content. Delete a step's folder to force re-processing from that point. File key: sha256(url)[:16].
Storage layout
new-projects/{project-name}/
  1_pages/{urlhash}.json    # single Page per URL
  2_chunks/{urlhash}.json   # []Chunk per URL
  3_embeds/{urlhash}.json   # []EmbeddedChunk per URL
  3_embeds/bm25.json        # BM25 encoder snapshot
Document type support
FormatLibraryNotes
HTMLgo-readability v2Firefox Reader View algorithm. Fallback: DOM walk (main → article → body, stripping nav/header/footer/script)
PDFledongthuc/pdfText extraction. Scanned PDFs with no text layer are skipped.
DOCXfumiama/go-docxParagraph and table text extraction
XLSXxuri/excelizeAll sheets as tab-separated text
TXTPlain-text body served verbatim (no HTML parsing)
Semantic chunking vs simple chunking

Semantic Chunking (default)

One LLM call per page (extract tier, temp 0.1) detects natural topic boundaries. The LLM outputs anchor phrases in the format TOPIC: X | STARTS: Y. The original text is sliced at the detected boundaries, preserving the exact source text without LLM paraphrasing. Each chunk receives a topic label for embedding.

Simple Chunking (--no-llm-chunk)

Paragraph and heading-aware splitting. Text is split on double newlines and markdown headings. Short paragraphs are merged up to maxChars (default 1500). Chunks get numbered topic labels ("Part 1", "Part 2", etc.).

Topic prepending strategy

At index time, each chunk's text is prepended with its topic label before embedding:

Before: "Our basic plan starts at $99/month with unlimited support."
After:  "Topic: Pricing plans\n\nOur basic plan starts at $99/month with unlimited support."

The topic acts as a semantic anchor — the embedding now captures the chunk's theme, not just its surface content. A query about "costs" will match closer to a chunk anchored with "Pricing plans" even if the chunk text never mentions "costs".

Search queries do not need the topic prefix — the embedding space naturally aligns.

CLI usage examples
# Create a new project and seed immediately
bin/aichat-crawler init-project \
  --name myproject \
  --tg-api-key "123456:ABC-DEF" \
  --tg-lead-group -1001234567890 \
  --language en \
  --prompt1 "You are a sales consultant..." \
  --start-reply "Welcome! How can I help?" \
  --sitemap-url "https://example.com/sitemap.xml"

# Seed a project later (separate from creation)
bin/aichat-crawler seed \
  --project-id "uuid-from-init-output" \
  --project-name myproject \
  --sitemap-url "https://example.com/sitemap.xml"

# Add URLs to an existing project
bin/aichat-crawler add-urls \
  --project-name myproject \
  --urls "https://example.com/new-page1,https://example.com/new-page2"

# Set prompts from files
bin/aichat-crawler set \
  --project-name myproject prompt1 @prompts/prompt1.txt

Timer System

Automated follow-up sequences that fire when a user goes silent. LLM-driven classification decides the action at each step.

Follow-Up Timeline

1
5 min
Light ping
2
15 min
Second check-in
3
40 min
Third check-in
4
24 hours
Last interval — auto-dispatches lead for any unresolved chat

Per-Fire Logic

Timer Fires
Run agent's tool-calling loopwith synthetic [TIMER PING] turn
Agent calls send_lead
Lead dispatched, conversation locked. No closing text sent — conversation is over.
Agent replies with text
Follow-up message sent to user, timer advances to next interval.
Agent stays silent
No message sent. Timer advances. At the last interval (24h) the lead is dispatched anyway, regardless of client_status.
Lifecycle details
  1. Reset — Every incoming message resets the timer for that chat, starting the sequence from step 0.
  2. Fire — Fetches fresh chat state and history. Skips if already Finished + LeadSent. Otherwise invokes the agent's tool-calling loop with a synthetic [TIMER PING] turn (Agent.GeneratePing). The agent decides what to do.
  3. Last-interval safety net — At the last interval (24h) the scheduler dispatches the lead unconditionally for every unresolved chat, regardless of client_status. This is the catch-all so no lead is lost — hot clients should usually be dispatched earlier by the agent itself (the prompt nudges send_lead after 1–2 pings when contact is in hand), but if the agent never makes that call we still ship the lead at 24h with whatever was gathered.
  4. Persist — Timer state is written to PostgreSQL with the next trigger time. On restart, Reload() recovers all active timers, calculating remaining delay and firing overdue ones immediately.
  5. Lead retry — If a timer finds Finished=true, LeadSent=false, it retries the CRM send instead of running the agent.
  6. Cancel — When a user sends a new message, the old timer is cancelled (goroutine killed + DB record updated).

Goroutine safety: Each chat/project pair gets one goroutine. A current == self identity check prevents a superseded goroutine from cleaning up a newer timer's state.

Multi-Tenancy

Every data path uses composite keys (chat_id, project_id) for tenant isolation. Each project gets its own resources while sharing infrastructure.

Per Project (Isolated)

Telegram bot token
Qdrant collections
System prompts (JSONB)
Lead group (CRM target)
BM25 sparse encoder
Agent instance
Language locale

Shared (Infrastructure)

PostgreSQL instance
Qdrant instance
LLM provider pool
Timer scheduler
HTTP servers
Metrics endpoint
Data isolation details
DataTable/CollectionKey
Chat statechat_statesPRIMARY KEY (chat_id, project_id)
Chat historychat_historyWHERE chat_id AND project_id
TimerstimersPRIMARY KEY (chat_id, project_id)
BM25 indexbm25_indexesPRIMARY KEY (project_id)
Search vectorsQdrant{project_name}_content_hybrid, {project_name}_source
Webhook routingURL pathPOST /webhook/{tg_api_key}
Runtime registration

Project bundles (Telegram client, agent, CRM, KB) live in an in-memory map. New projects can be registered at runtime via two paths:

  • Admin group/project_init command creates the project, seeds KB, and registers the bundle instantly. No restart needed.
  • Internal APIPOST /api/register-project/{id} for CLI or external tools that seed the DB independently.
Project onboarding flow (Telegram)

Step 1: Create a bot in BotFather, copy the token.

Step 2: Send one command in the admin project's lead group:

/project_init myproject sitemap https://example.com/sitemap.xml 123456:AAH...token

The system verifies the token, creates the project, seeds the KB, and replies with progress:

> Project "myproject" created (bot: @myproject_bot). Seeding started...
> Scraping completed: 42 pages
> Chunking completed: 156 chunks
> Embeddings completed: 156 vectors
> Upload completed: 42 pages, 156 chunks indexed
> Project "myproject" init finished!
> Test the bot: https://t.me/myproject_bot
> To create the CRM group, click:
> https://t.me/myproject_bot?startgroup=connect_myproject

Step 3: Click the deep link. Telegram opens the "create group" UI with the bot pre-added. The bot auto-detects the group and connects it as the lead group. No group ID needed — discovered automatically via the /start connect_{name} command.

Admin commands reference
CommandDescription
/project_init <name> sitemap <url> <token>Create project, crawl sitemap, seed KB
/project_init <name> urls <url1,url2> <token>Create project from individual URLs
/project_prompt1 <name> <text>Update the project's system prompt
/project_price_url <name> <url>Set price URL hint (used by the agent's hybrid_search)
/project_stats <name> [period]Show chat statistics
/project_info <name>Show project details
/project_add_urls <name> <urls>Add URLs to KB
/project_delete_urls <name> <urls>Delete URL chunks from KB
/project_delete <name>Soft-delete project
/project_restore <name>Restore soft-deleted project

Lead group commands (per project)

CommandDescription
/helpShow all available commands
/promptShow current system prompt
/prompt1 [text]Show or update the project's system prompt
/price_url [url]Show or set the price URL hint
/stats [period]Show chat statistics (e.g. 1 week, 3 days)
/infoShow project details

User chat commands

CommandDescription
/startReset conversation state and return the project's greeting
/debugToggle debug mode (appends trace info to bot replies)
Database schema

Three migrations in migrations/:

001_initial.sql

CREATE TABLE projects (
  id          UUID PRIMARY KEY,
  name        TEXT UNIQUE,
  tg_lead_group BIGINT,
  tg_api_key  TEXT UNIQUE,
  prompts     JSONB,
  language    TEXT DEFAULT 'ru'
);

CREATE TABLE chat_history (
  id          UUID PRIMARY KEY,
  chat_id     BIGINT,
  project_id  UUID REFERENCES projects(id),
  question    TEXT,
  reply       TEXT,
  dbinfo      JSONB,
  created_at  TIMESTAMPTZ
);

CREATE TABLE chat_states (
  chat_id     BIGINT,
  project_id  UUID REFERENCES projects(id),
  state       JSONB,
  PRIMARY KEY (chat_id, project_id)
);

002_bm25_index.sql

CREATE TABLE bm25_indexes (
  project_id  UUID PRIMARY KEY REFERENCES projects(id),
  vocab       JSONB,
  idf         JSONB,
  avg_dl      FLOAT,
  num_docs    INT,
  updated_at  TIMESTAMPTZ
);

003_timers.sql

CREATE TABLE timers (
  chat_id     BIGINT,
  project_id  UUID REFERENCES projects(id),
  step        INT,
  trigger_at  TIMESTAMPTZ,
  PRIMARY KEY (chat_id, project_id)
);

Key indexes

  • idx_chat_history_chat_id on (chat_id, created_at) — history lookup
  • idx_chat_states_project on (project_id) — project-level queries
  • idx_chat_states_state_gin GIN on (state) — JSONB queries on state

CRM Lead Output

When a lead is dispatched, the following is sent to the project's Telegram lead group.

1
Forward last message — the user's last message is forwarded so managers can click on the profile to start a direct conversation.
2
Summary message — LLM-generated summary containing: lead ID, name, city, service, contact method, shared contact (if available), client status (finished/hot/cold), file count.
3
Chat history document — full timestamped Q/A transcript uploaded as lead{chatID}.txt, replied to the summary message.
4
Media files — grouped by type: photos+videos as album, documents as album, voice/video notes individually. All replied to the summary.
Contact persistence: Contacts are extracted once and reused across turns. Shared contacts from the Telegram button are stored in chat state (not in history) and included in the lead summary when dispatched.
CRM interface design
type CRM interface {
    SendLead(ctx context.Context, lead domain.Lead) (string, error)
    Name() string
}

Current implementation: TelegramCRM. The interface is designed for future CRM backends (Bitrix, AmoCRM) via the MultiCRM dispatcher pattern.

Deployment

NixOS-native deployment with systemd service hardening, sops-nix secrets, and Cloudflare Tunnel.

Service Dependency Chain

postgresql
aichat-migrate
aichat-bot
qdrant
(also depends on)
aichat-bot

Security Hardening

SettingValuePurpose
NoNewPrivilegestruePrevent privilege escalation
ProtectSystemstrictRead-only filesystem except allowed paths
ProtectHometrueNo access to /home
PrivateTmptrueIsolated /tmp
MemoryMax512MMemory limit (OOM protection)
NixOS module configuration
services.aichat = {
  enable = true;
  configFile = "/run/secrets/aichat-config.json";
  listenAddr = ":8080";

  # Database
  enablePostgres = true;
  dbName = "aichat";
  postgresPort = 5432;

  # Qdrant
  enableQdrant = true;
  qdrantPort = 6333;
  qdrantDataDir = "/var/lib/qdrant";

  # Backup
  enableBackup = true;
  backupDir = "/var/backup/aichat";
  backupRetentionDays = 14;
};
Backup & secrets

Backup

  • Daily systemd timer with RandomizedDelaySec=1h
  • pg_dump + Qdrant snapshots
  • 14-day retention with automatic cleanup

Secrets (sops-nix)

  • Encrypted in secrets/production.yaml using age keys
  • Decrypted at boot on the target machine using its SSH host key
  • Keys: aichat_config (full JSON config), cloudflared_creds (tunnel credentials)
Quick deploy commands
# Deploy code changes to production
make prod-deploy

# SSH tunnels for crawler access to prod DB + Qdrant
make prod-tunnel
# PostgreSQL on localhost:15432, Qdrant on localhost:16333

# Seed a project on prod (while tunnel is open)
./bin/aichat-crawler --config /tmp/config.prod-tunnel.json init-project \
  --name myproject --tg-api-key "TOKEN" --language ru \
  --no-llm-chunk --workers 1 --urls "URL1,URL2,..."

# Fetch trace files from prod
ssh root@server "cat /var/lib/aichat/traces/{chatID}/{dialogID}.log"
Webhook setup

After the bot is running and accessible via HTTPS, register the webhook with Telegram:

curl -X POST "https://api.telegram.org/bot{TG_API_KEY}/setWebhook" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-domain.com/webhook/{TG_API_KEY}"}'

The webhook URL must exactly match https://{host}/webhook/{tg_api_key} where tg_api_key is the bot token from the projects table. The API key in the URL routes incoming updates to the correct project bundle.

Manual / Docker deployment

Prerequisites: Go 1.22+, PostgreSQL 16, Qdrant

# Build all binaries
make build
# produces bin/aichat-bot, bin/aichat-crawler, bin/aichat-cli

# Run migrations
make migrate DATABASE_URL="postgres://user:pass@localhost/aichat"

# Run the bot with JSON config
bin/aichat-bot --config config.prod.json

# Or with environment variables
DATABASE_URL="postgres://user:pass@localhost/aichat" \
OPENAI_API_KEYS="sk-xxx" \
bin/aichat-bot
Cloudflare Tunnel setup

Production uses Cloudflare Tunnel to route HTTPS traffic to the bot without exposing ports. The tunnel routes aichat.example.com to localhost:8080 on the server.

  • Tunnel credentials managed via sops-nix (encrypted at rest)
  • Configured in nix/production.nix
  • No need for TLS certificates or reverse proxy configuration
  • Telegram webhooks point to the tunnel URL

Localization

User-facing strings and LLM prompt templates are localized via JSON locale files embedded at compile time.

AspectDetails
Supported languagesru (default), en
Locale filesinternal/i18n/locales/ru.json, internal/i18n/locales/en.json
APIi18n.T(lang, key) for strings, i18n.Tf(lang, key, args...) for formatted strings
Fallback chainRequested lang → "ru" → key itself
Coverage36+ keys: agent replies, LLM prompts, history formatting, timer messages, CRM lead formatting, chunker prompts
Per-projectSet in projects.language column, propagates to agent, CRM, and timer
Adding a localeCreate internal/i18n/locales/{code}.json with the same keys — auto-loaded via //go:embed

Monitoring

Prometheus metrics exposed at GET /metrics on the internal server (localhost:9090 by default). Metrics are applied via decoration — core packages have zero Prometheus imports.

LLM Metrics

MetricTypeLabelsDescription
aichat_llm_requests_totalcounterprovider, method, statusTotal LLM API calls
aichat_llm_request_duration_secondshistogramprovider, methodLLM call latency (0.1s-51s buckets)

Telegram Metrics

MetricTypeLabelsDescription
aichat_telegram_requests_totalcountermethod, statusTelegram API calls
aichat_telegram_request_duration_secondshistogrammethodTelegram API latency

Webhook & Agent Metrics

MetricTypeLabelsDescription
aichat_webhook_requests_totalcounterproject, statusInbound webhook requests
aichat_webhook_request_duration_secondshistogramprojectWebhook processing time
aichat_agent_messages_totalcounterproject, stageMessages processed by stage

Timer & CRM Metrics

MetricTypeLabelsDescription
aichat_timer_active_sequencesgaugeCurrently active timer sequences
aichat_timer_fires_totalcounteractionTimer fire outcomes (follow_up, lead, skip)
aichat_crm_leads_totalcounterproject, statusLeads sent to CRM
Alert recommendations
ConditionPromQL Query
LLM errors spikingrate(aichat_llm_requests_total{status="error"}[5m]) > 0.1
LLM latency highhistogram_quantile(0.95, rate(aichat_llm_request_duration_seconds_bucket[5m])) > 30
Webhook errorsrate(aichat_webhook_requests_total{status="error"}[5m]) > 0.05
Timers stuckrate(aichat_timer_fires_total[1h]) == 0 when aichat_timer_active_sequences > 0
CRM failuresrate(aichat_crm_leads_total{status="error"}[5m]) > 0
Grafana dashboard tips
  • Group LLM metrics by provider to compare provider performance and error rates
  • Track aichat_agent_messages_total by stage to see funnel conversion rates (general → service_specific → final)
  • Watch aichat_timer_active_sequences as a proxy for total active conversations
  • Use aichat_webhook_request_duration_seconds p95 to spot slow responses (typical: 2-5s including LLM + search)
  • Monitor aichat_crm_leads_total by project to track lead generation per tenant
Label reference
LabelValuesUsed In
provideropenai, claude, together, openrouter, tieredLLM metrics
methodcomplete, embed (LLM); sendMessage, sendMediaGroup, etc. (Telegram)LLM, Telegram metrics
statusok, errorAll counter metrics
projectProject name stringWebhook, Agent, CRM metrics
stageactive (in-progress conversation) or final (lead dispatched)Agent metrics
actionfollow_up, lead, skipTimer metrics

Configuration Reference

Configuration via JSON file (--config config.json) or environment variables (fallback). JSON config is recommended for production.

Annotated config.example.json

{
  // PostgreSQL connection string
  "database_url": "postgresql://user:pass@localhost/aichat",

  // Qdrant vector database
  "qdrant": {
    "url": "http://localhost:6333",
    "api_key": ""
  },

  // Public HTTP server (webhooks)
  "listen_addr": ":8080",
  // Internal HTTP server (health, metrics, register-project)
  "internal_addr": "localhost:9090",
  // Base URL for webhook registration (must be HTTPS in production)
  "webhook_url": "https://bot.example.com",

  // LLM providers: any OpenAI-compatible API
  "providers": {
    "openrouter": {
      "keys": ["sk-or-v1-your-key"],
      "base_url": "https://openrouter.ai/api/v1"
    },
    "together": {
      "keys": ["your-together-key"],
      "base_url": "https://api.together.xyz/v1"
    }
  },

  // Model slots: "provider/model" format (split on first /)
  "models": {
    "toolcaller": "openrouter/z-ai/glm-5.1",                                // agent's tool-calling loop
    "extract":    "openrouter/mistralai/mistral-small-3.1-24b-instruct",    // crawler topics, timer classification
    "embedding":  "openrouter/qwen/qwen3-embedding-8b"                       // vectors
  },

  // Crawler-specific settings
  "crawler": {
    "chunk_size": 1500,       // chars per chunk (simple mode)
    "workers": 4,             // parallel crawl workers
    "no_llm_chunk": false,     // true = paragraph-based, false = LLM semantic
    "embedding_dim": 1024     // Matryoshka truncation dimension
  },

  "debug": false
}
Environment variable fallback
VariableRequiredDefaultDescription
DATABASE_URLYesPostgreSQL connection string
OPENAI_API_KEYSAt least oneComma-separated OpenAI API keys
CLAUDE_API_KEYSComma-separated Claude API keys
QDRANT_URLNohttp://localhost:6333Qdrant REST API URL
QDRANT_API_KEYNoQdrant authentication key
LISTEN_ADDRNo:8080Public HTTP server address
INTERNAL_ADDRNolocalhost:9090Internal HTTP server address
DEBUGNoEnable debug logging (JSON)
AGENT_TRACENoSet to 1 for per-dialog trace files
Model slots explained

Model identifiers use "provider/model" format. The provider name is split on the first / character and matched against the providers map.

SlotConfig KeyPurposeNotes
toolcallermodels.toolcallerThe agent's tool-calling loopMUST be a tool-capable model. Tested in production: anthropic/claude-sonnet-4-6, openai/gpt-5.4, openrouter/z-ai/glm-5.1 (current default), openrouter/z-ai/glm-4.6.
extractmodels.extractCrawler chunk topics, timer status classification, follow-up textSmaller, faster model. Single-shot completions.
embeddingmodels.embeddingVector embeddings for searchBatch API, Matryoshka truncation to configured dim

Extract model constraint: Must return content in choices[].message.content field. Thinking/reasoning models that use a reasoning field will not work for the extract slot.

Key Thresholds Reference

All configurable and hardcoded constants that control system behavior.

Agent Thresholds

ConstantValuePurposeDefined In
Tool-loop iteration cap8Max ToolCaller round-trips per user turn before falling back to a canned safety replyinternal/agent/toolagent.go
hybrid_search top-k3Hard server-side cap on results returned to the model regardless of requested top_kinternal/agent/toolagent.go
minSearchScore0.5Minimum DBSF fusion score to include a resultinternal/search/hybrid.go

Timer Intervals

StepDelayBehavior
15 minutesFirst check-in — agent decides whether to ping or stay silent
215 minutesSecond check-in
340 minutesThird check-in
424 hoursLast interval — safety net. The lead is dispatched unconditionally for any unresolved chat (regardless of client_status) so no lead is lost.

Crawler Constants

ConstantValuePurposeDefined In
defaultEmbeddingDim1536Default embedding dimension (text-embedding-3-small)internal/crawler/indexer.go
upsertBatch50Points per Qdrant upsert batchinternal/crawler/indexer.go
Default chunk size1500 charsSimple chunking mode character limitinternal/crawler/chunker.go
Embed batch size32 chunksChunks per embedding API callinternal/crawler/pipeline.go
Default workers3Parallel crawl workerscmd/crawler/main.go

LLM Temperatures

TemperatureUsed For
0.1Semantic chunking (crawler)
0.3ToolCaller default; timer status classification; follow-up generation

Some tool-capable models (GPT-5, o-series) ignore explicit temperature and use a fixed value — the OpenAI tools backend handles this automatically.

LLM Provider

ConstantValuePurpose
Max backoff60 secondsCap on exponential backoff for 429 retries
Backoff formula2^attempt + 10% jitterExponential with jitter, respects Retry-After header
Retries per chain3Max 429 retries before falling through to next provider

Telegram

ConstantValuePurpose
Max message length4096 charsTelegram API limit; messages auto-chunked at this boundary
Retry on 429YesRespects Retry-After header from Telegram API

Performance Reference

2-5s
Webhook Latency
~500ms
Search (Qdrant)
~1KB
Per Turn
~8KB
Per Vector

Webhook latency includes one or more ToolCaller round-trips plus search. Search latency covers Qdrant DBSF fusion. Storage estimates: ~1KB per conversation turn, ~100KB per 10K chunks, ~8KB per dense vector (1024 dims).

Build and test commands
# Build all three binaries
make build        # produces bin/aichat-bot, bin/aichat-crawler, bin/aichat-cli

# Run tests
make test         # go test ./...

# Lint
make lint         # go vet ./...

# Local development services (PostgreSQL + Qdrant via Docker)
make dev-services

# Run the CLI interface for testing (no Telegram needed)
bin/aichat-cli --config config.dev.json --project myproject

aichat-go

Multi-tenant AI Sales Bot Platform
Built with Go, Qdrant, PostgreSQL, and OpenRouter

3 Entry Points: bot, crawler, cli 4 Tools: hybrid_search, set_state, get_state, send_lead 2 Locales (ru/en)

This documentation is generated from source code and docs/*.md files.