aichat-go
Multi-tenant AI Sales Bot Platform
Tool-calling agent
The model owns the conversation flow via four tools: hybrid_search, set_state, get_state, send_lead. No hand-tuned thresholds, no stage transitions.
Hybrid Search
Dense embeddings + BM25 sparse vectors with Qdrant DBSF fusion. The agent model is the reranker — raw fused results go straight into the tool-call response.
Multi-backend ToolCaller
Anthropic Messages API or OpenAI tools spec (covers OpenRouter routing to GLM, DeepSeek, Kimi, etc.). Multi-key pool with round-robin rotation and prompt caching where supported.
Smart Follow-Ups
4-step timer sequence (5m / 15m / 40m / 24h) with LLM-driven client classification. Automatically detects cold, hot, and finished conversations.
Multi-Tenant
Full project isolation with composite keys. Each tenant gets its own bot, collections, prompts, and lead group while sharing infrastructure.
Incremental Crawler
4-phase pipeline (fetch / chunk / embed / upload) with per-URL resume. Supports HTML, PDF, DOCX, XLSX, TXT. Semantic chunking with LLM topic labels.
Core Concepts
- Project
- A tenant — one business with its own bot, knowledge base, prompts, and CRM target. All data paths use
(chat_id, project_id)composite keys.
- Stage
- Either
StageActive(in-progress conversation) orStageFinal(lead dispatched, conversation locked). Derived from theFinishedfield — no separate stage field to get out of sync.
- Knowledge Base
- Per-project Qdrant collections containing chunked website content with both dense (embedding) and sparse (BM25) vectors.
- Determined URL
- A service page URL the agent commits to via
set_state(determined_url=...). Used as a filter for follow-uphybrid_searchcalls.
- Lead
- The output: contact info + conversation summary + chat history + files, sent to the CRM lead group.
- Timer
- Automated follow-up scheduler. When a user goes silent the timer fires the agent's tool-calling loop with a synthetic
[TIMER PING]turn; the agent decides whether to send a nudge or dispatch the lead.
- ToolCaller
- The LLM client used by the agent. Multi-turn tool-using conversations. Two backends: Anthropic Messages API native, and OpenAI tools spec (covers OpenRouter-routed GLM, DeepSeek, Kimi, etc.).
Architecture Overview
End-to-end data flow from Telegram webhook to CRM lead dispatch. The agent runs a tool-calling loop — the model itself decides when to search, what to commit to, and when to dispatch the lead.
Finished field. The agent owns flow via four tools (hybrid_search, set_state, get_state, send_lead) — no hand-tuned thresholds. Metrics are applied via decoration (zero Prometheus imports in core packages). All data paths use composite keys (chat_id, project_id) for tenant isolation.
Component details
| Component | Location | Purpose |
|---|---|---|
| Gateway | internal/gateway/ | Normalizes TelegramUpdate into domain.Message (text, caption, files, sender) |
| Agent | internal/agent/ | Tool-calling loop: rebuilds the message thread from chat_history.blocks, dispatches tool calls, returns the final text reply |
| ToolCaller | internal/llm/toolcaller*.go | Multi-backend tool-calling client: Anthropic Messages API + OpenAI tools spec (covers OpenRouter routing). Multi-key pool with rate-limit fallback. |
| Provider (extract / embedding) | internal/llm/openai.go, claude.go | Single-shot completions for crawler chunk topics, timer status classification, and embeddings. Routed via the extract / embedding tiers. |
| Hybrid Search | internal/search/ | Dense embeddings + BM25 sparse vectors with Qdrant DBSF fusion. The agent model is the reranker — raw fused results go straight back as the hybrid_search tool response. |
| CRM | internal/crm/ | Dispatches leads to Telegram group (summary + history + media). Two-phase persist guarantees idempotency. |
| Timer | internal/timer/ | Automated follow-ups. Fires the agent's tool-calling loop with a synthetic [TIMER PING] turn; the agent decides what to do. |
| Crawler | internal/crawler/ | 4-phase pipeline: fetch, chunk, embed, upload to Qdrant |
| Metrics | internal/metrics/ | Prometheus wrappers (zero-coupled with core packages) |
| Admin | internal/admin/ | Admin group command handler for cross-project management |
| i18n | internal/i18n/ | Localization (ru/en) with compile-time embedded locale files |
Request lifecycle (webhook to response)
Every incoming Telegram message follows this path:
- Webhook reception —
POST /webhook/{tg_api_key}received on the public HTTP server (:8080). The API key in the URL routes to the correct project bundle. - Gateway normalization —
TelegramUpdateis converted todomain.Messagewith text, caption, files, and sender info extracted. - State loading — Chat state loaded from PostgreSQL using composite key
(chat_id, project_id). - Command check —
/startresets state and returns greeting./debugtoggles debug mode. - File collection — Any file attachments are accumulated into state. Files without text get an immediate acknowledgement without LLM calls.
- Stage dispatch — Based on derived state (
CurrentStage()), the message is routed to the appropriate stage handler. - State persist — Updated state written back to PostgreSQL.
- Timer reset — Follow-up timer reset to step 0 for this chat.
Project structure tree
cmd/
bot/ # Main bot binary
main.go # Wiring: DB, tiered LLM, per-project bundles, HTTP server
config.go # JSON config loading with env var fallback
crawler/ # Knowledge base indexer CLI
main.go # Subcommands: init-project, seed, add-urls, register
cli/ # Local readline chat interface (testing)
main.go # Same agent pipeline, file-based CRM output
internal/
domain/ types.go # Stage, Message, ChatState, Lead, Block
agent/ agent.go # HandleMessage entrypoint
toolagent.go # Tool-calling loop, tool dispatch, system prompt
commands.go # /start, /debug
lead_commands.go # Lead group: /prompt, /stats
trace.go # Per-turn trace records for debugging
llm/ toolcaller.go # ToolCaller interface, Block/Message types
toolcaller_anthropic.go # Anthropic Messages API backend
toolcaller_openai.go # OpenAI tools spec backend (covers OpenRouter)
toolcaller_chain.go # Multi-backend fallback chain
tiered.go # TieredClient (extract / embedding tiers)
openai.go # OpenAI-compatible provider (extract, embeddings)
claude.go # Claude provider (extract)
embedding_adapter.go # Model-specific text formatting
search/ hybrid.go # Qdrant DBSF fusion (dense + BM25)
bm25.go # BM25Encoder, Tokenize
db/ postgres.go # PostgreSQL implementation
gateway/ telegram.go # TelegramUpdate normalization
telegram/ client.go # HTTP client, retry on 429
crm/ telegram.go # TelegramCRM: summary + history + media
timer/ timer.go # Scheduler, sequences, LLM classification
metrics/ metrics.go # Metric definitions
llm.go # InstrumentedLLMClient wrapper
crawler/ pipeline.go # Resumable 4-phase pipeline
chunker.go # Semantic + simple chunking
indexer.go # Qdrant upsert
i18n/ i18n.go # T(), Tf(), embedded locales
locales/ ru.json, en.json
migrations/ 001-013 (latest: 012_chat_history_blocks.sql, 013_drop_3stage_remnants.sql)
nix/ module.nix, example-configuration.nix
flake.nix Makefile
Extension points
New LLM Provider
Implement the Provider interface
internal/llm/
New CRM Backend
Implement CRM interface (multi-CRM dispatcher)
internal/crm/
New Gateway
Normalize to domain.Message + webhook route
internal/gateway/
Custom Search
Implement KnowledgeBase interface
internal/search/
Conversation Flow
The agent runs a single tool-calling loop per user turn. The model decides when to search, what to commit to via set_state, and when to dispatch the lead via send_lead.
HandleMessage
├─ if state.Finished → canned acknowledgement (post-finish lock)
└─ else runToolLoop(state, history, userMsg)
msgs = rebuildMessages(history) // replays prior tool_use / tool_result blocks
msgs += user(userMsg)
loop (cap 8 iterations):
response = toolcaller.Call(systemPrefix, systemSuffix, msgs, tools)
msgs += assistant(response.blocks)
if no tool_use blocks:
return last text block as the user-facing reply
for each tool_use:
result = dispatch(tool_use) // hybrid_search / set_state / get_state / send_lead
msgs += user(tool_result(id, result))
// overflow → canned safety reply, log loudly
Tools
| Tool | Purpose | Args |
|---|---|---|
hybrid_search |
Search the project KB. Returns top-k DBSF-fused (dense + BM25) chunks. The model is the reranker. | query, top_k (capped at 3 server-side), optional url_filter |
set_state |
Persist anything the model wants to remember across turns. Three flat string args (no nested objects, so weak toolcaller models can’t malform the JSON). | notes (free-form scratchpad — full rewrite each call), determined_url, client_status (hot/cold) |
get_state |
Re-read current ChatState mid-turn. Belt-and-braces — the same data is in the system suffix. |
none |
send_lead |
Dispatch the conversation as a lead to CRM. Idempotent — if LeadSent=true already, returns ok without re-sending. |
summary |
System prompt structure
The system prompt is split into two cacheable parts:
- Prefix (stable, cached): the engine preamble (
agent.toolcaller_preamble— explains the tools and how to map natural-language project instructions to tool calls), then the project-editableprompt1, then the tools schema. - Suffix (volatile, breaks cache): the per-turn state snapshot —
DeterminedURL, contact, extras, files, optionalPriceURLhint,ClientStatus.
Anthropic backends mark the prefix with cache_control: ephemeral (90% input discount on cache hits). OpenAI / DeepSeek / GLM (via OpenRouter) auto-cache long stable prefixes.
Two-phase lead persist details
Finished=true is written to the database before the CRM send, and LeadSent=true is written after. This prevents duplicate leads from concurrent webhook requests or timer fires reading stale state. If the bot crashes between the two persists, the timer system retries the CRM send on next fire or on restart (via chatStore.ListFinishedUnsent).
Crash between these two states → timer retries the CRM send on next fire or restart
Tool-turn persistence (chat_history.blocks)
The full assistant / tool-result block sequence for each turn is stored in chat_history.blocks (JSONB). On the next turn, rebuildMessages replays these blocks back to the model so it sees its own prior tool_use and tool_result exchanges. Falls back to plain question / reply for legacy rows where blocks is empty.
File upload handling
When a user sends a file without text, the bot immediately acknowledges it ("Accepted! Let me know when you're done.") without calling the LLM. Files are accumulated on ChatState.Files and attached to the lead when send_lead fires.
Media files in the lead are grouped by type for clean presentation:
- Photos + Videos — sent as a media album (Telegram groups them visually)
- Documents — sent as a document album
- Voice / Video notes — sent individually (Telegram does not support albums for these)
Contact sharing button
On the first turn that mentions contact collection, the bot sends a Telegram request_contact keyboard button alongside its reply. This lets users share their phone number with one tap.
- Sent only once per conversation
- Shared contact lands in
ChatState.SharedContact(not in history) for privacy - Included in the dispatched lead
Timer-driven dispatch
The timer scheduler runs the same tool-calling loop with a synthetic [TIMER PING] user turn. The agent decides whether to send a follow-up message or call send_lead itself. At the last interval (24h) the timer dispatches the lead unconditionally for any unresolved chat — regardless of client_status — so no lead is lost. Hot clients with contact in hand should be dispatched earlier by the agent itself (the prompt encourages send_lead after 1–2 pings when there's enough info).
When the safety-net dispatch fires and the agent never called send_lead itself, state.LeadSummary is empty. The timer then asks the agent to summarize the dialog via Agent.SummarizeForLead — a one-shot Complete call (no tools) on the summary tier (cheap mid-tier, e.g. gemma-4-26b-a4b-it). It renders the dialog as a Client/Bot transcript plus the agent’s Notes scratchpad and produces a 1–2 paragraph summary for the human reviewer, so the lead post never arrives blank.
Notes scratchpad (set_state)
set_state(notes="...") is the agent’s free-form text scratchpad. It replaced the earlier structured contact{method,value} + extras{...} schema. The model writes plain text, full rewrite each call — whatever it omits is lost. Conventions live in the tool description, not the schema:
name: Виктор
contact: phone +79130001234
service: оценка квартиры
city: Барнаул
date: на следующей неделе
The scratchpad is rendered into the system suffix on every turn so the model sees its own state, and it’s the input to Agent.SummarizeForLead at 24h dispatch. The flat-string shape is deliberate: weak toolcaller models (e.g. glm-5.1) drop closing braces on nested-object set_state args, which used to silently lose phone numbers and trip the tool-loop overflow guard.
Lead-as-living-document (post-send refresh)
After send_lead fires, the chat doesn’t go silent. The customer can keep typing for up to five more messages; each one is appended to chat_history and the existing CRM lead artifact is refreshed in place — for Telegram, editMessageText on the summary post + editMessageMedia on the chat-history .txt attachment. The summary text stays frozen, but a history updated: <ts> footer appears so the operator notices.
CRM artifact references live in a dedicated lead_dispatches table (chat_id, project_id, provider) — one row per CRM. Multi-CRM by construction: a chat dispatched to both Telegram and Bitrix gets two rows, and the post-send refresh fans UpdateLead across all of them.
Per-chat batching queue
Inbound messages don’t hit HandleMessage directly. Each gateway (Telegram webhook, CLI loop) calls agent.Enqueue, which persists the user message as an orphan row in chat_history (Question filled, Reply empty) and pushes onto a per-chat cheggaaa/mb queue. One worker goroutine per (chat_id, project_id) drains the queue serially.
What this gives:
- No double-dispatch race: single worker per chat — the structural fix for the goroutine race that previously stomped state and produced two CRM posts for one customer.
- Burst coalescing: messages that arrive while the worker is mid-LLM accumulate in the queue and get processed as the next batch. The model sees one combined user turn, not three sequential ones.
- Crash recovery: orphan rows that survive a process kill get picked up by the next batch — no message lost.
Replies flow back through a per-project agent.Replier callback (cmd/bot wires it to tgClient.SendMessage; cmd/cli prints to stdout). Workers exit after five minutes idle and respawn on the next message.
Search Pipeline
The agent calls hybrid_search as a tool. The tool returns Qdrant DBSF-fused dense + BM25 results raw — the agent model is the reranker.
Tool call: hybrid_search
The agent picks the query string itself (no separate query-rewrite LLM call). Optionally restricts results to a single URL via url_filter — typically passed once determined_url is set, or with the price_url when the customer asks about pricing.
Hybrid Search
Qdrant prefetch with both dense embeddings (cosine similarity, 1024 dims via Matryoshka truncation) and BM25 sparse vectors. Qdrant's built-in DBSF (Distribution-Based Score Fusion) merges results, preserving absolute relevance signal unlike rank-based RRF. Each prefetch retrieves limit * 2 candidates for the fusion algorithm.
Server-side cap
Top 3 results are returned to the model regardless of the requested top_k. The model judges which (if any) of those three are relevant for the current turn. No reranker call, no neighbor expansion — the loop is small and fast.
Visual Search Flow
hybrid_search(query, url_filter?)cosine, 1024 dims
keyword matching
Distribution-Based Score Fusion (preserves absolute relevance)
Qdrant collections per project
| Collection | Vectors | Purpose |
|---|---|---|
{project}_content_hybrid | Dense + Sparse | Chunked content with topic-prepended embeddings. The agent's hybrid_search tool reads this. |
{project}_source | Dense only | Full page content. Retained for crawler bookkeeping; not consulted by the runtime agent path. |
BM25 implementation details
- Standard BM25 formula:
IDF * (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * dl/avgDL)) - Parameters: k1=1.5, b=0.75
- Tokenizer: Unicode-aware (Cyrillic + Latin), lowercased, split on non-letter/digit
- Persistence: vocabulary, IDF, avgDL, numDocs stored as JSONB in PostgreSQL (
bm25_indexestable) - In-memory cache: loaded from DB on first use per project, invalidated on crawler updates
LLM Calls Map
The system has two LLM clients: the agent's ToolCaller (multi-turn tool-calling for the customer-facing loop) and the legacy Provider (single-shot completions for crawler topics, timer classification, embeddings).
Clients
| Client | Config | Purpose | Backends |
|---|---|---|---|
| ToolCaller | models.toolcaller |
The agent's tool-calling loop. Multi-turn conversation with tool calls + tool results. The model owns conversation flow. | Anthropic Messages API native; OpenAI tools spec (covers OpenRouter routing to GLM, DeepSeek, Kimi, etc.) |
| Provider (extract) | models.extract |
Single-shot completion for crawler chunk topics, timer status classification, follow-up text. | OpenAI-compatible; Claude |
| Provider (embedding) | models.embedding |
Vector embeddings for hybrid search. | OpenAI-compatible (OpenRouter, native) |
Where each client fires
| Where | Client | Purpose |
|---|---|---|
internal/agent/toolagent.go · runToolLoop | ToolCaller | Main agent loop — one call per loop iteration; the model returns either a final reply or one or more tool_use blocks. |
internal/agent/toolagent.go · GeneratePing | ToolCaller | Same loop, fired by the timer with a synthetic [TIMER PING] turn. |
internal/timer/timer.go · classifyStatus | Provider (extract) | Classify silent client as COLD / HOT. |
internal/timer/timer.go · generateFollowUp | Provider (extract) | Compose follow-up message text. |
internal/crawler/chunker.go · chunkSemantic | Provider (extract) | Per-page boundary detection during indexing. |
internal/llm/embedding_adapter.go | Provider (embedding) | Vector embeddings for chunks (indexing) and queries (search). |
System prompt structure (cacheable)
- Prefix (stable, cached): engine preamble (
agent.toolcaller_preamble) + projectprompt1+ tools schema. Anthropic backends mark this withcache_control: ephemeralfor a 90% input discount on cache hits. OpenAI / DeepSeek / GLM auto-cache long stable prefixes (~50% discount whencached_tokens > 0). - Suffix (volatile, breaks cache): per-turn state snapshot —
DeterminedURL, contact, extras, files, optionalPriceURLhint,ClientStatus.
Provider routing & retry
Each ToolCaller / Provider chain has independent retry and fallback:
Attempt 1 → rate-limit/transient? → backoff (1s + jitter) → retry
Attempt 2 → rate-limit/transient? → backoff (2s + jitter) → retry
Attempt 3 → rate-limit/transient? → fall through
Non-transient error? → fall through immediately
All exhausted → AllProvidersExhaustedError
Backoff formula: 2^attempt + 10% jitter, capped at 60s (or whatever the provider's Retry-After header says, if shorter). Key pool rotates API keys round-robin.
Transient classification: rate limits (429) are transient. For OpenAI-tools backends we also treat embedded errors and empty choices arrays as transient (return RateLimitError) — observed when DeepSeek/GLM upstream returns a malformed success.
Embedding Adapters
AdaptedProvider wraps the embedding provider and formats text based on mode (document vs query). Adapter selected automatically by model name. E5-instruct / Qwen3-Embedding models get Instruct: ...\nQuery: ... prefix for queries. Token limit auto-split: when a chunk exceeds the model's token limit, text is split by sentence, each half embedded recursively, and vectors averaged + L2-normalized.
Crawler Pipeline
4-phase incremental pipeline that fetches, chunks, embeds, and uploads content to Qdrant. Intermediate results are saved per-URL for resume.
Fetch Pages
HTTP GET with parallel workers (default: 3). Supports HTML, PDF, DOCX, XLSX, TXT via auto-detection. Each page saved to 1_pages/{urlhash}.json. Skip logic: pages with existing files are skipped on re-run.
Semantic Chunk
LLM boundary detection: outputs TOPIC: X | STARTS: Y anchor phrases per page (extract tier, temp 0.1). Original text is sliced at detected boundaries. Each chunk gets a topic label. Saved to 2_chunks/{urlhash}.json. Falls back to paragraph-based splitting with --no-llm-chunk.
Batch Embed
32 chunks per API call. Topic prepended before embedding: "Topic: X\n\n{text}". BM25 sparse vectors rebuilt from all chunks (IDF requires full corpus). Saved to 3_embeds/{urlhash}.json + bm25.json. Token limit auto-split handles oversized chunks.
Upload to Qdrant
Qdrant upsert in batches of 50 points. Source pages uploaded with real embeddings to {name}_source. Chunked content to {name}_content_hybrid. BM25 index persisted to PostgreSQL. Pipeline data archived to project-archives/{name}.tar.gz.
sha256(url)[:16].
Storage layout
new-projects/{project-name}/
1_pages/{urlhash}.json # single Page per URL
2_chunks/{urlhash}.json # []Chunk per URL
3_embeds/{urlhash}.json # []EmbeddedChunk per URL
3_embeds/bm25.json # BM25 encoder snapshot
Document type support
| Format | Library | Notes |
|---|---|---|
| HTML | go-readability v2 | Firefox Reader View algorithm. Fallback: DOM walk (main → article → body, stripping nav/header/footer/script) |
| ledongthuc/pdf | Text extraction. Scanned PDFs with no text layer are skipped. | |
| DOCX | fumiama/go-docx | Paragraph and table text extraction |
| XLSX | xuri/excelize | All sheets as tab-separated text |
| TXT | — | Plain-text body served verbatim (no HTML parsing) |
Semantic chunking vs simple chunking
Semantic Chunking (default)
One LLM call per page (extract tier, temp 0.1) detects natural topic boundaries. The LLM outputs anchor phrases in the format TOPIC: X | STARTS: Y. The original text is sliced at the detected boundaries, preserving the exact source text without LLM paraphrasing. Each chunk receives a topic label for embedding.
Simple Chunking (--no-llm-chunk)
Paragraph and heading-aware splitting. Text is split on double newlines and markdown headings. Short paragraphs are merged up to maxChars (default 1500). Chunks get numbered topic labels ("Part 1", "Part 2", etc.).
Topic prepending strategy
At index time, each chunk's text is prepended with its topic label before embedding:
Before: "Our basic plan starts at $99/month with unlimited support."
After: "Topic: Pricing plans\n\nOur basic plan starts at $99/month with unlimited support."
The topic acts as a semantic anchor — the embedding now captures the chunk's theme, not just its surface content. A query about "costs" will match closer to a chunk anchored with "Pricing plans" even if the chunk text never mentions "costs".
Search queries do not need the topic prefix — the embedding space naturally aligns.
CLI usage examples
# Create a new project and seed immediately
bin/aichat-crawler init-project \
--name myproject \
--tg-api-key "123456:ABC-DEF" \
--tg-lead-group -1001234567890 \
--language en \
--prompt1 "You are a sales consultant..." \
--start-reply "Welcome! How can I help?" \
--sitemap-url "https://example.com/sitemap.xml"
# Seed a project later (separate from creation)
bin/aichat-crawler seed \
--project-id "uuid-from-init-output" \
--project-name myproject \
--sitemap-url "https://example.com/sitemap.xml"
# Add URLs to an existing project
bin/aichat-crawler add-urls \
--project-name myproject \
--urls "https://example.com/new-page1,https://example.com/new-page2"
# Set prompts from files
bin/aichat-crawler set \
--project-name myproject prompt1 @prompts/prompt1.txt
Timer System
Automated follow-up sequences that fire when a user goes silent. LLM-driven classification decides the action at each step.
Follow-Up Timeline
Per-Fire Logic
client_status.Lifecycle details
- Reset — Every incoming message resets the timer for that chat, starting the sequence from step 0.
- Fire — Fetches fresh chat state and history. Skips if already
Finished + LeadSent. Otherwise invokes the agent's tool-calling loop with a synthetic[TIMER PING]turn (Agent.GeneratePing). The agent decides what to do. - Last-interval safety net — At the last interval (24h) the scheduler dispatches the lead unconditionally for every unresolved chat, regardless of
client_status. This is the catch-all so no lead is lost — hot clients should usually be dispatched earlier by the agent itself (the prompt nudgessend_leadafter 1–2 pings when contact is in hand), but if the agent never makes that call we still ship the lead at 24h with whatever was gathered. - Persist — Timer state is written to PostgreSQL with the next trigger time. On restart,
Reload()recovers all active timers, calculating remaining delay and firing overdue ones immediately. - Lead retry — If a timer finds
Finished=true, LeadSent=false, it retries the CRM send instead of running the agent. - Cancel — When a user sends a new message, the old timer is cancelled (goroutine killed + DB record updated).
Goroutine safety: Each chat/project pair gets one goroutine. A current == self identity check prevents a superseded goroutine from cleaning up a newer timer's state.
Multi-Tenancy
Every data path uses composite keys (chat_id, project_id) for tenant isolation. Each project gets its own resources while sharing infrastructure.
Per Project (Isolated)
Shared (Infrastructure)
Data isolation details
| Data | Table/Collection | Key |
|---|---|---|
| Chat state | chat_states | PRIMARY KEY (chat_id, project_id) |
| Chat history | chat_history | WHERE chat_id AND project_id |
| Timers | timers | PRIMARY KEY (chat_id, project_id) |
| BM25 index | bm25_indexes | PRIMARY KEY (project_id) |
| Search vectors | Qdrant | {project_name}_content_hybrid, {project_name}_source |
| Webhook routing | URL path | POST /webhook/{tg_api_key} |
Runtime registration
Project bundles (Telegram client, agent, CRM, KB) live in an in-memory map. New projects can be registered at runtime via two paths:
- Admin group —
/project_initcommand creates the project, seeds KB, and registers the bundle instantly. No restart needed. - Internal API —
POST /api/register-project/{id}for CLI or external tools that seed the DB independently.
Project onboarding flow (Telegram)
Step 1: Create a bot in BotFather, copy the token.
Step 2: Send one command in the admin project's lead group:
/project_init myproject sitemap https://example.com/sitemap.xml 123456:AAH...token
The system verifies the token, creates the project, seeds the KB, and replies with progress:
> Project "myproject" created (bot: @myproject_bot). Seeding started...
> Scraping completed: 42 pages
> Chunking completed: 156 chunks
> Embeddings completed: 156 vectors
> Upload completed: 42 pages, 156 chunks indexed
> Project "myproject" init finished!
> Test the bot: https://t.me/myproject_bot
> To create the CRM group, click:
> https://t.me/myproject_bot?startgroup=connect_myproject
Step 3: Click the deep link. Telegram opens the "create group" UI with the bot pre-added. The bot auto-detects the group and connects it as the lead group. No group ID needed — discovered automatically via the /start connect_{name} command.
Admin commands reference
| Command | Description |
|---|---|
/project_init <name> sitemap <url> <token> | Create project, crawl sitemap, seed KB |
/project_init <name> urls <url1,url2> <token> | Create project from individual URLs |
/project_prompt1 <name> <text> | Update the project's system prompt |
/project_price_url <name> <url> | Set price URL hint (used by the agent's hybrid_search) |
/project_stats <name> [period] | Show chat statistics |
/project_info <name> | Show project details |
/project_add_urls <name> <urls> | Add URLs to KB |
/project_delete_urls <name> <urls> | Delete URL chunks from KB |
/project_delete <name> | Soft-delete project |
/project_restore <name> | Restore soft-deleted project |
Lead group commands (per project)
| Command | Description |
|---|---|
/help | Show all available commands |
/prompt | Show current system prompt |
/prompt1 [text] | Show or update the project's system prompt |
/price_url [url] | Show or set the price URL hint |
/stats [period] | Show chat statistics (e.g. 1 week, 3 days) |
/info | Show project details |
User chat commands
| Command | Description |
|---|---|
/start | Reset conversation state and return the project's greeting |
/debug | Toggle debug mode (appends trace info to bot replies) |
Database schema
Three migrations in migrations/:
001_initial.sql
CREATE TABLE projects (
id UUID PRIMARY KEY,
name TEXT UNIQUE,
tg_lead_group BIGINT,
tg_api_key TEXT UNIQUE,
prompts JSONB,
language TEXT DEFAULT 'ru'
);
CREATE TABLE chat_history (
id UUID PRIMARY KEY,
chat_id BIGINT,
project_id UUID REFERENCES projects(id),
question TEXT,
reply TEXT,
dbinfo JSONB,
created_at TIMESTAMPTZ
);
CREATE TABLE chat_states (
chat_id BIGINT,
project_id UUID REFERENCES projects(id),
state JSONB,
PRIMARY KEY (chat_id, project_id)
);
002_bm25_index.sql
CREATE TABLE bm25_indexes (
project_id UUID PRIMARY KEY REFERENCES projects(id),
vocab JSONB,
idf JSONB,
avg_dl FLOAT,
num_docs INT,
updated_at TIMESTAMPTZ
);
003_timers.sql
CREATE TABLE timers (
chat_id BIGINT,
project_id UUID REFERENCES projects(id),
step INT,
trigger_at TIMESTAMPTZ,
PRIMARY KEY (chat_id, project_id)
);
Key indexes
idx_chat_history_chat_idon(chat_id, created_at)— history lookupidx_chat_states_projecton(project_id)— project-level queriesidx_chat_states_state_ginGIN on(state)— JSONB queries on state
CRM Lead Output
When a lead is dispatched, the following is sent to the project's Telegram lead group.
lead{chatID}.txt, replied to the summary message.CRM interface design
type CRM interface {
SendLead(ctx context.Context, lead domain.Lead) (string, error)
Name() string
}
Current implementation: TelegramCRM. The interface is designed for future CRM backends (Bitrix, AmoCRM) via the MultiCRM dispatcher pattern.
Deployment
NixOS-native deployment with systemd service hardening, sops-nix secrets, and Cloudflare Tunnel.
Service Dependency Chain
Security Hardening
| Setting | Value | Purpose |
|---|---|---|
NoNewPrivileges | true | Prevent privilege escalation |
ProtectSystem | strict | Read-only filesystem except allowed paths |
ProtectHome | true | No access to /home |
PrivateTmp | true | Isolated /tmp |
MemoryMax | 512M | Memory limit (OOM protection) |
NixOS module configuration
services.aichat = {
enable = true;
configFile = "/run/secrets/aichat-config.json";
listenAddr = ":8080";
# Database
enablePostgres = true;
dbName = "aichat";
postgresPort = 5432;
# Qdrant
enableQdrant = true;
qdrantPort = 6333;
qdrantDataDir = "/var/lib/qdrant";
# Backup
enableBackup = true;
backupDir = "/var/backup/aichat";
backupRetentionDays = 14;
};
Backup & secrets
Backup
- Daily systemd timer with
RandomizedDelaySec=1h pg_dump+ Qdrant snapshots- 14-day retention with automatic cleanup
Secrets (sops-nix)
- Encrypted in
secrets/production.yamlusing age keys - Decrypted at boot on the target machine using its SSH host key
- Keys:
aichat_config(full JSON config),cloudflared_creds(tunnel credentials)
Quick deploy commands
# Deploy code changes to production
make prod-deploy
# SSH tunnels for crawler access to prod DB + Qdrant
make prod-tunnel
# PostgreSQL on localhost:15432, Qdrant on localhost:16333
# Seed a project on prod (while tunnel is open)
./bin/aichat-crawler --config /tmp/config.prod-tunnel.json init-project \
--name myproject --tg-api-key "TOKEN" --language ru \
--no-llm-chunk --workers 1 --urls "URL1,URL2,..."
# Fetch trace files from prod
ssh root@server "cat /var/lib/aichat/traces/{chatID}/{dialogID}.log"
Webhook setup
After the bot is running and accessible via HTTPS, register the webhook with Telegram:
curl -X POST "https://api.telegram.org/bot{TG_API_KEY}/setWebhook" \
-H "Content-Type: application/json" \
-d '{"url": "https://your-domain.com/webhook/{TG_API_KEY}"}'
The webhook URL must exactly match https://{host}/webhook/{tg_api_key} where tg_api_key is the bot token from the projects table. The API key in the URL routes incoming updates to the correct project bundle.
Manual / Docker deployment
Prerequisites: Go 1.22+, PostgreSQL 16, Qdrant
# Build all binaries
make build
# produces bin/aichat-bot, bin/aichat-crawler, bin/aichat-cli
# Run migrations
make migrate DATABASE_URL="postgres://user:pass@localhost/aichat"
# Run the bot with JSON config
bin/aichat-bot --config config.prod.json
# Or with environment variables
DATABASE_URL="postgres://user:pass@localhost/aichat" \
OPENAI_API_KEYS="sk-xxx" \
bin/aichat-bot
Cloudflare Tunnel setup
Production uses Cloudflare Tunnel to route HTTPS traffic to the bot without exposing ports. The tunnel routes aichat.example.com to localhost:8080 on the server.
- Tunnel credentials managed via sops-nix (encrypted at rest)
- Configured in
nix/production.nix - No need for TLS certificates or reverse proxy configuration
- Telegram webhooks point to the tunnel URL
Localization
User-facing strings and LLM prompt templates are localized via JSON locale files embedded at compile time.
| Aspect | Details |
|---|---|
| Supported languages | ru (default), en |
| Locale files | internal/i18n/locales/ru.json, internal/i18n/locales/en.json |
| API | i18n.T(lang, key) for strings, i18n.Tf(lang, key, args...) for formatted strings |
| Fallback chain | Requested lang → "ru" → key itself |
| Coverage | 36+ keys: agent replies, LLM prompts, history formatting, timer messages, CRM lead formatting, chunker prompts |
| Per-project | Set in projects.language column, propagates to agent, CRM, and timer |
| Adding a locale | Create internal/i18n/locales/{code}.json with the same keys — auto-loaded via //go:embed |
Monitoring
Prometheus metrics exposed at GET /metrics on the internal server (localhost:9090 by default). Metrics are applied via decoration — core packages have zero Prometheus imports.
LLM Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
aichat_llm_requests_total | counter | provider, method, status | Total LLM API calls |
aichat_llm_request_duration_seconds | histogram | provider, method | LLM call latency (0.1s-51s buckets) |
Telegram Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
aichat_telegram_requests_total | counter | method, status | Telegram API calls |
aichat_telegram_request_duration_seconds | histogram | method | Telegram API latency |
Webhook & Agent Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
aichat_webhook_requests_total | counter | project, status | Inbound webhook requests |
aichat_webhook_request_duration_seconds | histogram | project | Webhook processing time |
aichat_agent_messages_total | counter | project, stage | Messages processed by stage |
Timer & CRM Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
aichat_timer_active_sequences | gauge | — | Currently active timer sequences |
aichat_timer_fires_total | counter | action | Timer fire outcomes (follow_up, lead, skip) |
aichat_crm_leads_total | counter | project, status | Leads sent to CRM |
Alert recommendations
| Condition | PromQL Query |
|---|---|
| LLM errors spiking | rate(aichat_llm_requests_total{status="error"}[5m]) > 0.1 |
| LLM latency high | histogram_quantile(0.95, rate(aichat_llm_request_duration_seconds_bucket[5m])) > 30 |
| Webhook errors | rate(aichat_webhook_requests_total{status="error"}[5m]) > 0.05 |
| Timers stuck | rate(aichat_timer_fires_total[1h]) == 0 when aichat_timer_active_sequences > 0 |
| CRM failures | rate(aichat_crm_leads_total{status="error"}[5m]) > 0 |
Grafana dashboard tips
- Group LLM metrics by
providerto compare provider performance and error rates - Track
aichat_agent_messages_totalbystageto see funnel conversion rates (general → service_specific → final) - Watch
aichat_timer_active_sequencesas a proxy for total active conversations - Use
aichat_webhook_request_duration_secondsp95 to spot slow responses (typical: 2-5s including LLM + search) - Monitor
aichat_crm_leads_totalbyprojectto track lead generation per tenant
Label reference
| Label | Values | Used In |
|---|---|---|
provider | openai, claude, together, openrouter, tiered | LLM metrics |
method | complete, embed (LLM); sendMessage, sendMediaGroup, etc. (Telegram) | LLM, Telegram metrics |
status | ok, error | All counter metrics |
project | Project name string | Webhook, Agent, CRM metrics |
stage | active (in-progress conversation) or final (lead dispatched) | Agent metrics |
action | follow_up, lead, skip | Timer metrics |
Configuration Reference
Configuration via JSON file (--config config.json) or environment variables (fallback). JSON config is recommended for production.
Annotated config.example.json
{
// PostgreSQL connection string
"database_url": "postgresql://user:pass@localhost/aichat",
// Qdrant vector database
"qdrant": {
"url": "http://localhost:6333",
"api_key": ""
},
// Public HTTP server (webhooks)
"listen_addr": ":8080",
// Internal HTTP server (health, metrics, register-project)
"internal_addr": "localhost:9090",
// Base URL for webhook registration (must be HTTPS in production)
"webhook_url": "https://bot.example.com",
// LLM providers: any OpenAI-compatible API
"providers": {
"openrouter": {
"keys": ["sk-or-v1-your-key"],
"base_url": "https://openrouter.ai/api/v1"
},
"together": {
"keys": ["your-together-key"],
"base_url": "https://api.together.xyz/v1"
}
},
// Model slots: "provider/model" format (split on first /)
"models": {
"toolcaller": "openrouter/z-ai/glm-5.1", // agent's tool-calling loop
"extract": "openrouter/mistralai/mistral-small-3.1-24b-instruct", // crawler topics, timer classification
"embedding": "openrouter/qwen/qwen3-embedding-8b" // vectors
},
// Crawler-specific settings
"crawler": {
"chunk_size": 1500, // chars per chunk (simple mode)
"workers": 4, // parallel crawl workers
"no_llm_chunk": false, // true = paragraph-based, false = LLM semantic
"embedding_dim": 1024 // Matryoshka truncation dimension
},
"debug": false
}
Environment variable fallback
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL | Yes | — | PostgreSQL connection string |
OPENAI_API_KEYS | At least one | — | Comma-separated OpenAI API keys |
CLAUDE_API_KEYS | — | Comma-separated Claude API keys | |
QDRANT_URL | No | http://localhost:6333 | Qdrant REST API URL |
QDRANT_API_KEY | No | — | Qdrant authentication key |
LISTEN_ADDR | No | :8080 | Public HTTP server address |
INTERNAL_ADDR | No | localhost:9090 | Internal HTTP server address |
DEBUG | No | — | Enable debug logging (JSON) |
AGENT_TRACE | No | — | Set to 1 for per-dialog trace files |
Model slots explained
Model identifiers use "provider/model" format. The provider name is split on the first / character and matched against the providers map.
| Slot | Config Key | Purpose | Notes |
|---|---|---|---|
| toolcaller | models.toolcaller | The agent's tool-calling loop | MUST be a tool-capable model. Tested in production: anthropic/claude-sonnet-4-6, openai/gpt-5.4, openrouter/z-ai/glm-5.1 (current default), openrouter/z-ai/glm-4.6. |
| extract | models.extract | Crawler chunk topics, timer status classification, follow-up text | Smaller, faster model. Single-shot completions. |
models.embedding | Vector embeddings for search | Batch API, Matryoshka truncation to configured dim |
Extract model constraint: Must return content in choices[].message.content field. Thinking/reasoning models that use a reasoning field will not work for the extract slot.
Key Thresholds Reference
All configurable and hardcoded constants that control system behavior.
Agent Thresholds
| Constant | Value | Purpose | Defined In |
|---|---|---|---|
| Tool-loop iteration cap | 8 | Max ToolCaller round-trips per user turn before falling back to a canned safety reply | internal/agent/toolagent.go |
hybrid_search top-k | 3 | Hard server-side cap on results returned to the model regardless of requested top_k | internal/agent/toolagent.go |
minSearchScore | 0.5 | Minimum DBSF fusion score to include a result | internal/search/hybrid.go |
Timer Intervals
| Step | Delay | Behavior |
|---|---|---|
| 1 | 5 minutes | First check-in — agent decides whether to ping or stay silent |
| 2 | 15 minutes | Second check-in |
| 3 | 40 minutes | Third check-in |
| 4 | 24 hours | Last interval — safety net. The lead is dispatched unconditionally for any unresolved chat (regardless of client_status) so no lead is lost. |
Crawler Constants
| Constant | Value | Purpose | Defined In |
|---|---|---|---|
defaultEmbeddingDim | 1536 | Default embedding dimension (text-embedding-3-small) | internal/crawler/indexer.go |
upsertBatch | 50 | Points per Qdrant upsert batch | internal/crawler/indexer.go |
| Default chunk size | 1500 chars | Simple chunking mode character limit | internal/crawler/chunker.go |
| Embed batch size | 32 chunks | Chunks per embedding API call | internal/crawler/pipeline.go |
| Default workers | 3 | Parallel crawl workers | cmd/crawler/main.go |
LLM Temperatures
| Temperature | Used For |
|---|---|
| 0.1 | Semantic chunking (crawler) |
| 0.3 | ToolCaller default; timer status classification; follow-up generation |
Some tool-capable models (GPT-5, o-series) ignore explicit temperature and use a fixed value — the OpenAI tools backend handles this automatically.
LLM Provider
| Constant | Value | Purpose |
|---|---|---|
| Max backoff | 60 seconds | Cap on exponential backoff for 429 retries |
| Backoff formula | 2^attempt + 10% jitter | Exponential with jitter, respects Retry-After header |
| Retries per chain | 3 | Max 429 retries before falling through to next provider |
Telegram
| Constant | Value | Purpose |
|---|---|---|
| Max message length | 4096 chars | Telegram API limit; messages auto-chunked at this boundary |
| Retry on 429 | Yes | Respects Retry-After header from Telegram API |
Performance Reference
Webhook latency includes one or more ToolCaller round-trips plus search. Search latency covers Qdrant DBSF fusion. Storage estimates: ~1KB per conversation turn, ~100KB per 10K chunks, ~8KB per dense vector (1024 dims).
Build and test commands
# Build all three binaries
make build # produces bin/aichat-bot, bin/aichat-crawler, bin/aichat-cli
# Run tests
make test # go test ./...
# Lint
make lint # go vet ./...
# Local development services (PostgreSQL + Qdrant via Docker)
make dev-services
# Run the CLI interface for testing (no Telegram needed)
bin/aichat-cli --config config.dev.json --project myproject
aichat-go
Multi-tenant AI Sales Bot Platform
Built with Go, Qdrant, PostgreSQL, and OpenRouter
This documentation is generated from source code and docs/*.md files.