aichat-go
Multi-tenant AI Sales Bot Platform
3-Stage Sales Funnel
General consultation → service-specific Q&A → lead capture. Automatic stage transitions driven by URL score accumulation and LLM classification.
Hybrid Search
Dense embeddings + BM25 sparse vectors with Qdrant DBSF fusion. Optional Cohere cross-encoder reranking and neighbor expansion for richer context.
Tiered LLM Routing
Chat tier for customer-facing replies, extract tier for lightweight classification. Any OpenAI-compatible API supported. Multi-key pool with round-robin rotation.
Smart Follow-Ups
4-step timer sequence (5m / 15m / 40m / 24h) with LLM-driven client classification. Automatically detects cold, hot, and finished conversations.
Multi-Tenant
Full project isolation with composite keys. Each tenant gets its own bot, collections, prompts, and lead group while sharing infrastructure.
Incremental Crawler
4-phase pipeline (fetch / chunk / embed / upload) with per-URL resume. Supports HTML, PDF, DOCX, XLSX. Semantic chunking with LLM topic labels.
Core Concepts
- Project
- A tenant — one business with its own bot, knowledge base, prompts, and CRM target. All data paths use
(chat_id, project_id)composite keys.
- Stage
- Position in the 3-stage sales funnel. Derived from
FinishedandDeterminedURLfields — no separate stage field to get out of sync.
- Knowledge Base
- Per-project Qdrant collections containing chunked website content with both dense (embedding) and sparse (BM25) vectors.
- URL
- A service page URL that identifies which specific service the user is asking about. The agent "locks on" to a URL to transition from general to service-specific consultation.
- Lead
- The output: contact info + conversation summary + chat history + files, sent to the CRM lead group.
- Timer
- Automated follow-up sequence that fires when a user goes silent. Classifies engagement via LLM and either sends a nudge or dispatches the lead.
- Tier
- LLM routing category: chat (customer-facing), extract (lightweight), embedding (vectors), reranker (cross-encoder).
Architecture Overview
End-to-end data flow from Telegram webhook to CRM lead dispatch, with hybrid search and tiered LLM routing.
Finished and DeterminedURL fields. Metrics are applied via decoration (zero Prometheus imports in core packages). All data paths use composite keys (chat_id, project_id) for tenant isolation.
Component details
| Component | Location | Purpose |
|---|---|---|
| Gateway | internal/gateway/ | Normalizes TelegramUpdate into domain.Message (text, caption, files, sender) |
| Agent | internal/agent/ | 3-stage conversation state machine with command handling |
| Tiered LLM | internal/llm/ | Routes chat/extract/embedding calls to different provider+model combos |
| Hybrid Search | internal/search/ | Dense embeddings + BM25 sparse vectors with DBSF fusion and cross-encoder reranking |
| CRM | internal/crm/ | Dispatches leads to Telegram group (summary + history + media) |
| Timer | internal/timer/ | Automated follow-up sequences with LLM-driven classification |
| Crawler | internal/crawler/ | 4-phase pipeline: fetch, chunk, embed, upload to Qdrant |
| Metrics | internal/metrics/ | Prometheus wrappers (zero-coupled with core packages) |
| Admin | internal/admin/ | Admin group command handler for cross-project management |
| i18n | internal/i18n/ | Localization (ru/en) with compile-time embedded locale files |
Request lifecycle (webhook to response)
Every incoming Telegram message follows this path:
- Webhook reception —
POST /webhook/{tg_api_key}received on the public HTTP server (:8080). The API key in the URL routes to the correct project bundle. - Gateway normalization —
TelegramUpdateis converted todomain.Messagewith text, caption, files, and sender info extracted. - State loading — Chat state loaded from PostgreSQL using composite key
(chat_id, project_id). - Command check —
/startresets state and returns greeting./debugtoggles debug mode. - File collection — Any file attachments are accumulated into state. Files without text get an immediate acknowledgement without LLM calls.
- Stage dispatch — Based on derived state (
CurrentStage()), the message is routed to the appropriate stage handler. - State persist — Updated state written back to PostgreSQL.
- Timer reset — Follow-up timer reset to step 0 for this chat.
Project structure tree
cmd/
bot/ # Main bot binary
main.go # Wiring: DB, tiered LLM, per-project bundles, HTTP server
config.go # JSON config loading with env var fallback
crawler/ # Knowledge base indexer CLI
main.go # Subcommands: init-project, seed, add-urls, register
cli/ # Local readline chat interface (testing)
main.go # Same agent pipeline, file-based CRM output
internal/
domain/ types.go # Stage, Message, ChatState, Lead, SearchResult
agent/ agent.go # HandleMessage dispatch
stages.go # Stage 1/1.5/2/3 handlers
commands.go # /start, /debug
lead_commands.go # Lead group: /prompt, /stats
llm/ tiered.go # TieredClient: routes by tier
chain.go # Provider chain with retry + fallback
openai.go # OpenAI-compatible provider
claude.go # Claude/Anthropic provider
embedding_adapter.go # Model-specific text formatting
search/ hybrid.go # Qdrant DBSF + reranking + neighbors
reranker.go # Cohere cross-encoder client
bm25.go # BM25Encoder, Tokenize
db/ postgres.go # PostgreSQL implementation
gateway/ telegram.go # TelegramUpdate normalization
telegram/ client.go # HTTP client, retry on 429
crm/ telegram.go # TelegramCRM: summary + history + media
timer/ timer.go # Scheduler, sequences, LLM classification
metrics/ metrics.go # Metric definitions
llm.go # InstrumentedLLMClient wrapper
crawler/ pipeline.go # Resumable 4-phase pipeline
chunker.go # Semantic + simple chunking
indexer.go # Qdrant upsert
i18n/ i18n.go # T(), Tf(), embedded locales
locales/ ru.json, en.json
migrations/ 001_initial.sql, 002_bm25_index.sql, 003_timers.sql
nix/ module.nix, example-configuration.nix
flake.nix Makefile
Extension points
New LLM Provider
Implement the Provider interface
internal/llm/
New CRM Backend
Implement CRM interface (multi-CRM dispatcher)
internal/crm/
New Gateway
Normalize to domain.Message + webhook route
internal/gateway/
Custom Search
Implement KnowledgeBase interface
internal/search/
Conversation Flow
The agent implements a 3-stage sales funnel state machine. Every incoming message is dispatched to a stage handler based on derived state.
General Consultation
Identify which service the user is asking about through multi-turn conversation.
- Query rewrite — LLM generates 3 search variations (extract tier, temp 0.3)
- Hybrid search — per variation, limit=5 (with reranker) or 2 (without), minScore=0.5
- DBSF fusion — Qdrant merges dense + sparse results preserving absolute relevance
- Reranking — Cohere cross-encoder re-scores candidates (if configured)
- Neighbor expansion — adjacent chunks N-1/N+1 fetched for context
- LLM reply — prompt1 + history + top 5 search results (chat tier, temp 0.3)
- URL accumulation — each result's URL gets
+= scoreacross turns
URL Determination
When no URL has crossed the auto-select threshold, the LLM decides.
- Auto-select — if any URL score ≥ 3.0, lock immediately
- LLM determination — extract tier (temp 0.7), examines history + search results + URL scores
- Output:
URL: <url>→ lock and transition to Stage 2 - Output:
QUESTION: <text>→ send clarifying question, stay in Stage 1
Service-Specific Consultation
Deep Q&A about the locked service with parallel contact extraction.
- Full source page — fetched via
GetByURLfrom source collection - Price extraction — one-time LLM call on Stage 2 entry, cached in
PriceSummary(extract, temp 0.1) - LLM reply — prompt2 + source page + cached prices + search results + history (chat tier, temp 0.3)
- Contact extraction — 3 parallel LLM calls (extract tier, temp 0.1):
- Contact method + value (
METHOD: x / VALUE: yorNONE) - City name (or
NONE) - Finish decision (
YESorNO)
- Contact method + value (
- Contact sharing — Telegram
request_contactbutton sent on first Stage 2 interaction
Lead Generation & Dispatch
Package conversation into a lead and dispatch to CRM.
- Set Finished=true — persisted to DB immediately (two-phase persist)
- Lead summary — LLM generates summary with name, city, service, contact (extract, temp 0.2)
- Closing reply — farewell message to user (chat tier, temp 0.3)
- CRM dispatch — forward message + summary + history document + media files
- Set LeadSent=true — persisted to DB (prevents duplicate sends)
Two-phase lead persist details
Finished=true is written to the database before the CRM send, and LeadSent=true is written after. This prevents duplicate leads from concurrent webhook requests or timer fires reading stale state. If the bot crashes between the two persists, the timer system retries the CRM send on next fire or on restart.
Crash between these two states → timer retries the CRM send on next fire or restart
File upload handling
When a user sends a file without text, the bot immediately acknowledges it ("Accepted! Let me know when you're done.") without calling the LLM. Files are accumulated in the chat state and attached to the lead when it is generated.
Media files in the lead are grouped by type for clean presentation:
- Photos + Videos — sent as a media album (Telegram groups them visually)
- Documents — sent as a document album
- Voice / Video notes — sent individually (Telegram does not support albums for these)
Contact sharing button
On the first Stage 2 interaction, the bot sends a Telegram request_contact keyboard button alongside its reply. This lets users share their phone number with one tap.
- Sent only once per conversation (tracked via
state.UserSettings.ContactRequested) - Shared contact is stored in chat state (not in history) for privacy
- Contact data is included in the lead summary and extraction prompts
- When a user says "take my phone from Telegram", the LLM can use the actual shared contact data
Soft checklist approach
The bot gently reminds the user to share their city and phone number, but does not block lead generation. If the user provides enough information for a meaningful lead, finish=true is allowed even without all fields. This avoids frustrating users who prefer not to share certain information while still maximizing lead quality.
Search Pipeline
Hybrid search combining dense embeddings and BM25 sparse vectors with DBSF fusion, cross-encoder reranking, and neighbor expansion.
Query Rewrite
LLM generates 3 search variations from the user's question (extract tier, temp 0.3). Improves recall by capturing different phrasings that may match different KB chunks.
Hybrid Search
Each variation is searched using Qdrant's prefetch with both dense embeddings (cosine similarity, 1024 dims) and BM25 sparse vectors. Qdrant's built-in DBSF (Distribution-Based Score Fusion) merges results, preserving absolute relevance signal unlike rank-based RRF. Each prefetch retrieves limit * 2 candidates for the fusion algorithm.
Deduplication
Results from all 3 query variations are merged. Duplicates (by text content) are removed, keeping the highest-scoring instance.
Reranking
Cohere cross-encoder (configured via models.reranker) re-scores all candidates. Over-retrieves 5 results per query variation (3x the final limit) to give the reranker more material. All results contribute to URL scoring before the context cap.
Context Cap
Top 5 results are kept for the LLM context window. URL scores are accumulated from all results (before and after cap) into state.URLsCounter.
Neighbor Expansion
For each result, adjacent chunks (position N-1 and N+1 from the same URL) are fetched via Qdrant scroll and concatenated in position order. Cheap metadata-only query, no vector computation needed.
Visual Search Flow
cosine, 1024 dims
keyword matching
Distribution-Based Score Fusion (preserves absolute relevance)
Cohere rerank (optional)
pos N-1 / N+1
Qdrant collections per project
| Collection | Vectors | Purpose |
|---|---|---|
{project}_content_hybrid | Dense + Sparse | Chunked content with topic-prepended embeddings. Searchable target for hybrid queries. |
{project}_source | Dense only | Full page content with real embeddings. Used in Stage 2 for determined URL context and page-level matching. |
BM25 implementation details
- Standard BM25 formula:
IDF * (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * dl/avgDL)) - Parameters: k1=1.5, b=0.75
- Tokenizer: Unicode-aware (Cyrillic + Latin), lowercased, split on non-letter/digit
- Persistence: vocabulary, IDF, avgDL, numDocs stored as JSONB in PostgreSQL (
bm25_indexestable) - In-memory cache: loaded from DB on first use per project, invalidated on crawler updates
LLM Calls Map
All 14 LLM call sites in the system, organized by tier and purpose.
Tiers
| # | Purpose | Tier | Temp | Location |
|---|---|---|---|---|
| 1 | Stage 1 reply — general consultation with KB search context | chat | 0.3 | stages.go |
| 2 | Stage 2 reply — service-specific with full source page + prices | chat | 0.3 | stages.go |
| 3 | Stage 3 closing — farewell after lead capture | chat | 0.3 | stages.go |
| 4 | Query rewrite — 3 search variations | extract | 0.3 | stages.go |
| 5 | URL determination — pick service or ask clarification | extract | 0.7 | stages.go |
| 6 | Contact extraction — method + value | extract | 0.1 | stages.go |
| 7 | City extraction | extract | 0.1 | stages.go |
| 8 | Finish detection — YES/NO | extract | 0.1 | stages.go |
| 9 | Lead summary — for CRM dispatch | extract | 0.2 | stages.go |
| 10 | Timer classification — COLD/HOT/FINISHED | extract | 0.3 | timer.go |
| 11 | Timer follow-up — contextual reminder (conditional) | extract | 0.5 | timer.go |
| 12 | Timer lead summary — for timer-initiated CRM dispatch | extract | 0.2 | timer.go |
| 13 | Semantic chunking — boundary detection per page | extract | 0.1 | chunker.go |
| 14 | Price extraction — one-time on Stage 2 entry, cached | extract | 0.1 | stages.go |
Call flow diagram
// User message flow
User message
→ #4 rewriteQuery (extract) → search variations
→ hybrid search (DBSF fusion, dense + BM25)
→ reranker (Cohere cross-encoder, if configured)
→ top 5 results → neighbor expansion (position N-1/N+1)
→ #5 determineURL (extract) → pick service or clarify
→ if entering Stage 2: #14 price extraction (extract, one-time, cached)
→ #1 or #2 Stage reply (chat) → answer to user
→ #6 + #7 + #8 extractContact (3x extract, parallel)
→ if finished:
→ #9 generateSummary (extract) → CRM summary
→ #3 Stage 3 closing (chat) → goodbye
// Timer flow
Timer fires
→ #10 classifyStatus (extract) → cold/hot/finished
→ #11 generateFollowUp (extract, conditional)
→ if finished: #12 summary (extract) → CRM lead
// Crawler flow
Crawler
→ #13 semantic boundary detection per page (extract)
→ slice original text at detected boundaries
→ topic prepend at embed time ("Topic: X\n\n...")
→ batch embedding (32 chunks per API call)
Provider routing details
The TieredClient routes Complete calls based on req.Tier:
TierChat→ chat chain (customer-facing models)TierExtract→ extract chain (lightweight models). Falls back to chat chain if not configured.Embed→ embedding chain (wrapped inAdaptedProviderfor model-specific text formatting)
Each chain has independent retry/fallback. Backoff formula: 2^attempt + 10% jitter, capped at 60s. Key pool rotates API keys round-robin.
Embedding Adapters
AdaptedProvider wraps the embedding provider and formats text based on mode (document vs query). Adapter selected automatically by model name. E5-instruct models get Instruct: ...\nQuery: ... prefix for queries. Token limit auto-split: when a chunk exceeds the model's token limit, text is split by sentence, each half embedded recursively, and vectors averaged + L2-normalized.
Crawler Pipeline
4-phase incremental pipeline that fetches, chunks, embeds, and uploads content to Qdrant. Intermediate results are saved per-URL for resume.
Fetch Pages
HTTP GET with parallel workers (default: 3). Supports HTML, PDF, DOCX, XLSX via auto-detection. Each page saved to 1_pages/{urlhash}.json. Skip logic: pages with existing files are skipped on re-run.
Semantic Chunk
LLM boundary detection: outputs TOPIC: X | STARTS: Y anchor phrases per page (extract tier, temp 0.1). Original text is sliced at detected boundaries. Each chunk gets a topic label. Saved to 2_chunks/{urlhash}.json. Falls back to paragraph-based splitting with --no-llm-chunk.
Batch Embed
32 chunks per API call. Topic prepended before embedding: "Topic: X\n\n{text}". BM25 sparse vectors rebuilt from all chunks (IDF requires full corpus). Saved to 3_embeds/{urlhash}.json + bm25.json. Token limit auto-split handles oversized chunks.
Upload to Qdrant
Qdrant upsert in batches of 50 points. Source pages uploaded with real embeddings to {name}_source. Chunked content to {name}_content_hybrid. BM25 index persisted to PostgreSQL. Pipeline data archived to project-archives/{name}.tar.gz.
sha256(url)[:16].
Storage layout
new-projects/{project-name}/
1_pages/{urlhash}.json # single Page per URL
2_chunks/{urlhash}.json # []Chunk per URL
3_embeds/{urlhash}.json # []EmbeddedChunk per URL
3_embeds/bm25.json # BM25 encoder snapshot
Document type support
| Format | Library | Notes |
|---|---|---|
| HTML | go-readability v2 | Firefox Reader View algorithm. Fallback: DOM walk (main → article → body, stripping nav/header/footer/script) |
| ledongthuc/pdf | Text extraction. Scanned PDFs with no text layer are skipped. | |
| DOCX | fumiama/go-docx | Paragraph and table text extraction |
| XLSX | xuri/excelize | All sheets as tab-separated text |
Semantic chunking vs simple chunking
Semantic Chunking (default)
One LLM call per page (extract tier, temp 0.1) detects natural topic boundaries. The LLM outputs anchor phrases in the format TOPIC: X | STARTS: Y. The original text is sliced at the detected boundaries, preserving the exact source text without LLM paraphrasing. Each chunk receives a topic label for embedding.
Simple Chunking (--no-llm-chunk)
Paragraph and heading-aware splitting. Text is split on double newlines and markdown headings. Short paragraphs are merged up to maxChars (default 1500). Chunks get numbered topic labels ("Part 1", "Part 2", etc.).
Topic prepending strategy
At index time, each chunk's text is prepended with its topic label before embedding:
Before: "Our basic plan starts at $99/month with unlimited support."
After: "Topic: Pricing plans\n\nOur basic plan starts at $99/month with unlimited support."
The topic acts as a semantic anchor — the embedding now captures the chunk's theme, not just its surface content. A query about "costs" will match closer to a chunk anchored with "Pricing plans" even if the chunk text never mentions "costs".
Search queries do not need the topic prefix — the embedding space naturally aligns.
CLI usage examples
# Create a new project and seed immediately
bin/aichat-crawler init-project \
--name myproject \
--tg-api-key "123456:ABC-DEF" \
--tg-lead-group -1001234567890 \
--language en \
--prompt1 "You are a sales consultant..." \
--prompt2 "You are helping a customer..." \
--start-reply "Welcome! How can I help?" \
--sitemap-url "https://example.com/sitemap.xml"
# Seed a project later (separate from creation)
bin/aichat-crawler seed \
--project-id "uuid-from-init-output" \
--project-name myproject \
--sitemap-url "https://example.com/sitemap.xml"
# Add URLs to an existing project
bin/aichat-crawler add-urls \
--project-name myproject \
--urls "https://example.com/new-page1,https://example.com/new-page2"
# Set prompts from files
bin/aichat-crawler set \
--project-name myproject prompt1 @prompts/prompt1.txt
Timer System
Automated follow-up sequences that fire when a user goes silent. LLM-driven classification decides the action at each step.
Follow-Up Timeline
Per-Fire Logic
Lifecycle details
- Reset — Every incoming message resets the timer for that chat, starting the sequence from step 0.
- Fire — Fetches fresh chat state and history. Skips if already finished. Classifies the client via LLM, then acts based on classification.
- Persist — Timer state is written to PostgreSQL with the next trigger time. On restart,
Reload()recovers all active timers, calculating remaining delay and firing overdue ones immediately. - Lead retry — If a timer finds
Finished=true, LeadSent=false, it retries the CRM send instead of classifying. - Cancel — When a user sends a new message, the old timer is cancelled (goroutine killed + DB record updated).
Goroutine safety: Each chat/project pair gets one goroutine. A current == self identity check prevents a superseded goroutine from cleaning up a newer timer's state.
Token savings: Two-step approach (classify first, generate conditionally) skips follow-up generation for finished clients and hot clients who don't need a message yet.
Multi-Tenancy
Every data path uses composite keys (chat_id, project_id) for tenant isolation. Each project gets its own resources while sharing infrastructure.
Per Project (Isolated)
Shared (Infrastructure)
Data isolation details
| Data | Table/Collection | Key |
|---|---|---|
| Chat state | chat_states | PRIMARY KEY (chat_id, project_id) |
| Chat history | chat_history | WHERE chat_id AND project_id |
| Timers | timers | PRIMARY KEY (chat_id, project_id) |
| BM25 index | bm25_indexes | PRIMARY KEY (project_id) |
| Search vectors | Qdrant | {project_name}_content_hybrid, {project_name}_source |
| Webhook routing | URL path | POST /webhook/{tg_api_key} |
Runtime registration
Project bundles (Telegram client, agent, CRM, KB) live in an in-memory map. New projects can be registered at runtime via two paths:
- Admin group —
/project_initcommand creates the project, seeds KB, and registers the bundle instantly. No restart needed. - Internal API —
POST /api/register-project/{id}for CLI or external tools that seed the DB independently.
Project onboarding flow (Telegram)
Step 1: Create a bot in BotFather, copy the token.
Step 2: Send one command in the admin project's lead group:
/project_init myproject sitemap https://example.com/sitemap.xml 123456:AAH...token
The system verifies the token, creates the project, seeds the KB, and replies with progress:
> Project "myproject" created (bot: @myproject_bot). Seeding started...
> Scraping completed: 42 pages
> Chunking completed: 156 chunks
> Embeddings completed: 156 vectors
> Upload completed: 42 pages, 156 chunks indexed
> Project "myproject" init finished!
> Test the bot: https://t.me/myproject_bot
> To create the CRM group, click:
> https://t.me/myproject_bot?startgroup=connect_myproject
Step 3: Click the deep link. Telegram opens the "create group" UI with the bot pre-added. The bot auto-detects the group and connects it as the lead group. No group ID needed — discovered automatically via the /start connect_{name} command.
Admin commands reference
| Command | Description |
|---|---|
/project_init <name> sitemap <url> <token> | Create project, crawl sitemap, seed KB |
/project_init <name> urls <url1,url2> <token> | Create project from individual URLs |
/project_prompt1 <name> <text> | Update Stage 1 system prompt |
/project_prompt2 <name> <text> | Update Stage 2 system prompt |
/project_prompt3 <name> <text> | Update Stage 3 (finish) prompt |
/project_price_url <name> <url> | Set price URL (KB chunks injected into Stage 1) |
/project_stats <name> [period] | Show chat statistics |
/project_info <name> | Show project details |
/project_add_urls <name> <urls> | Add URLs to KB |
/project_delete_urls <name> <urls> | Delete URL chunks from KB |
/project_delete <name> | Soft-delete project |
/project_restore <name> | Restore soft-deleted project |
Lead group commands (per project)
| Command | Description |
|---|---|
/help | Show all available commands |
/prompt | Show current prompt1, prompt2, and prompt3 |
/prompt1 [text] | Show or update Stage 1 system prompt |
/prompt2 [text] | Show or update Stage 2 system prompt |
/stats [period] | Show chat statistics (e.g. 1 week, 3 days) |
/info | Show project details |
User chat commands
| Command | Description |
|---|---|
/start | Reset conversation state and return the project's greeting |
/debug | Toggle debug mode (appends trace info to bot replies) |
Database schema
Three migrations in migrations/:
001_initial.sql
CREATE TABLE projects (
id UUID PRIMARY KEY,
name TEXT UNIQUE,
tg_lead_group BIGINT,
tg_api_key TEXT UNIQUE,
prompts JSONB,
language TEXT DEFAULT 'ru'
);
CREATE TABLE chat_history (
id UUID PRIMARY KEY,
chat_id BIGINT,
project_id UUID REFERENCES projects(id),
question TEXT,
reply TEXT,
dbinfo JSONB,
created_at TIMESTAMPTZ
);
CREATE TABLE chat_states (
chat_id BIGINT,
project_id UUID REFERENCES projects(id),
state JSONB,
PRIMARY KEY (chat_id, project_id)
);
002_bm25_index.sql
CREATE TABLE bm25_indexes (
project_id UUID PRIMARY KEY REFERENCES projects(id),
vocab JSONB,
idf JSONB,
avg_dl FLOAT,
num_docs INT,
updated_at TIMESTAMPTZ
);
003_timers.sql
CREATE TABLE timers (
chat_id BIGINT,
project_id UUID REFERENCES projects(id),
step INT,
trigger_at TIMESTAMPTZ,
PRIMARY KEY (chat_id, project_id)
);
Key indexes
idx_chat_history_chat_idon(chat_id, created_at)— history lookupidx_chat_states_projecton(project_id)— project-level queriesidx_chat_states_state_ginGIN on(state)— JSONB queries on state
CRM Lead Output
When a lead is dispatched, the following is sent to the project's Telegram lead group.
lead{chatID}.txt, replied to the summary message.CRM interface design
type CRM interface {
SendLead(ctx context.Context, lead domain.Lead) (string, error)
Name() string
}
Current implementation: TelegramCRM. The interface is designed for future CRM backends (Bitrix, AmoCRM) via the MultiCRM dispatcher pattern.
Deployment
NixOS-native deployment with systemd service hardening, sops-nix secrets, and Cloudflare Tunnel.
Service Dependency Chain
Security Hardening
| Setting | Value | Purpose |
|---|---|---|
NoNewPrivileges | true | Prevent privilege escalation |
ProtectSystem | strict | Read-only filesystem except allowed paths |
ProtectHome | true | No access to /home |
PrivateTmp | true | Isolated /tmp |
MemoryMax | 512M | Memory limit (OOM protection) |
NixOS module configuration
services.aichat = {
enable = true;
configFile = "/run/secrets/aichat-config.json";
listenAddr = ":8080";
# Database
enablePostgres = true;
dbName = "aichat";
postgresPort = 5432;
# Qdrant
enableQdrant = true;
qdrantPort = 6333;
qdrantDataDir = "/var/lib/qdrant";
# Backup
enableBackup = true;
backupDir = "/var/backup/aichat";
backupRetentionDays = 14;
};
Backup & secrets
Backup
- Daily systemd timer with
RandomizedDelaySec=1h pg_dump+ Qdrant snapshots- 14-day retention with automatic cleanup
Secrets (sops-nix)
- Encrypted in
secrets/production.yamlusing age keys - Decrypted at boot on the target machine using its SSH host key
- Keys:
aichat_config(full JSON config),cloudflared_creds(tunnel credentials)
Quick deploy commands
# Deploy code changes to production
make prod-deploy
# SSH tunnels for crawler access to prod DB + Qdrant
make prod-tunnel
# PostgreSQL on localhost:15432, Qdrant on localhost:16333
# Seed a project on prod (while tunnel is open)
./bin/aichat-crawler --config /tmp/config.prod-tunnel.json init-project \
--name myproject --tg-api-key "TOKEN" --language ru \
--no-llm-chunk --workers 1 --urls "URL1,URL2,..."
# Fetch trace files from prod
ssh root@server "cat /var/lib/aichat/traces/{chatID}/{dialogID}.log"
Webhook setup
After the bot is running and accessible via HTTPS, register the webhook with Telegram:
curl -X POST "https://api.telegram.org/bot{TG_API_KEY}/setWebhook" \
-H "Content-Type: application/json" \
-d '{"url": "https://your-domain.com/webhook/{TG_API_KEY}"}'
The webhook URL must exactly match https://{host}/webhook/{tg_api_key} where tg_api_key is the bot token from the projects table. The API key in the URL routes incoming updates to the correct project bundle.
Manual / Docker deployment
Prerequisites: Go 1.22+, PostgreSQL 16, Qdrant
# Build all binaries
make build
# produces bin/aichat-bot, bin/aichat-crawler, bin/aichat-cli
# Run migrations
make migrate DATABASE_URL="postgres://user:pass@localhost/aichat"
# Run the bot with JSON config
bin/aichat-bot --config config.prod.json
# Or with environment variables
DATABASE_URL="postgres://user:pass@localhost/aichat" \
OPENAI_API_KEYS="sk-xxx" \
bin/aichat-bot
Cloudflare Tunnel setup
Production uses Cloudflare Tunnel to route HTTPS traffic to the bot without exposing ports. The tunnel routes aichat.example.com to localhost:8080 on the server.
- Tunnel credentials managed via sops-nix (encrypted at rest)
- Configured in
nix/production.nix - No need for TLS certificates or reverse proxy configuration
- Telegram webhooks point to the tunnel URL
Localization
User-facing strings and LLM prompt templates are localized via JSON locale files embedded at compile time.
| Aspect | Details |
|---|---|
| Supported languages | ru (default), en |
| Locale files | internal/i18n/locales/ru.json, internal/i18n/locales/en.json |
| API | i18n.T(lang, key) for strings, i18n.Tf(lang, key, args...) for formatted strings |
| Fallback chain | Requested lang → "ru" → key itself |
| Coverage | 36+ keys: agent replies, LLM prompts, history formatting, timer messages, CRM lead formatting, chunker prompts |
| Per-project | Set in projects.language column, propagates to agent, CRM, and timer |
| Adding a locale | Create internal/i18n/locales/{code}.json with the same keys — auto-loaded via //go:embed |
Monitoring
Prometheus metrics exposed at GET /metrics on the internal server (localhost:9090 by default). Metrics are applied via decoration — core packages have zero Prometheus imports.
LLM Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
aichat_llm_requests_total | counter | provider, method, status | Total LLM API calls |
aichat_llm_request_duration_seconds | histogram | provider, method | LLM call latency (0.1s-51s buckets) |
Telegram Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
aichat_telegram_requests_total | counter | method, status | Telegram API calls |
aichat_telegram_request_duration_seconds | histogram | method | Telegram API latency |
Webhook & Agent Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
aichat_webhook_requests_total | counter | project, status | Inbound webhook requests |
aichat_webhook_request_duration_seconds | histogram | project | Webhook processing time |
aichat_agent_messages_total | counter | project, stage | Messages processed by stage |
Timer & CRM Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
aichat_timer_active_sequences | gauge | — | Currently active timer sequences |
aichat_timer_fires_total | counter | action | Timer fire outcomes (follow_up, lead, skip) |
aichat_crm_leads_total | counter | project, status | Leads sent to CRM |
Alert recommendations
| Condition | PromQL Query |
|---|---|
| LLM errors spiking | rate(aichat_llm_requests_total{status="error"}[5m]) > 0.1 |
| LLM latency high | histogram_quantile(0.95, rate(aichat_llm_request_duration_seconds_bucket[5m])) > 30 |
| Webhook errors | rate(aichat_webhook_requests_total{status="error"}[5m]) > 0.05 |
| Timers stuck | rate(aichat_timer_fires_total[1h]) == 0 when aichat_timer_active_sequences > 0 |
| CRM failures | rate(aichat_crm_leads_total{status="error"}[5m]) > 0 |
Grafana dashboard tips
- Group LLM metrics by
providerto compare provider performance and error rates - Track
aichat_agent_messages_totalbystageto see funnel conversion rates (general → service_specific → final) - Watch
aichat_timer_active_sequencesas a proxy for total active conversations - Use
aichat_webhook_request_duration_secondsp95 to spot slow responses (typical: 2-5s including LLM + search) - Monitor
aichat_crm_leads_totalbyprojectto track lead generation per tenant
Label reference
| Label | Values | Used In |
|---|---|---|
provider | openai, claude, together, openrouter, tiered | LLM metrics |
method | complete, embed (LLM); sendMessage, sendMediaGroup, etc. (Telegram) | LLM, Telegram metrics |
status | ok, error | All counter metrics |
project | Project name string | Webhook, Agent, CRM metrics |
stage | general (Stage 1), service_specific (Stage 2), final (Stage 3) | Agent metrics |
action | follow_up, lead, skip | Timer metrics |
Configuration Reference
Configuration via JSON file (--config config.json) or environment variables (fallback). JSON config is recommended for production.
Annotated config.example.json
{
// PostgreSQL connection string
"database_url": "postgresql://user:pass@localhost/aichat",
// Qdrant vector database
"qdrant": {
"url": "http://localhost:6333",
"api_key": ""
},
// Public HTTP server (webhooks)
"listen_addr": ":8080",
// Internal HTTP server (health, metrics, register-project)
"internal_addr": "localhost:9090",
// Base URL for webhook registration (must be HTTPS in production)
"webhook_url": "https://bot.example.com",
// LLM providers: any OpenAI-compatible API
"providers": {
"openrouter": {
"keys": ["sk-or-v1-your-key"],
"base_url": "https://openrouter.ai/api/v1"
},
"together": {
"keys": ["your-together-key"],
"base_url": "https://api.together.xyz/v1"
}
},
// Model tiers: "provider/model" format (split on first /)
"models": {
"chat": "openrouter/qwen/qwen3-235b-a22b-2507", // customer-facing
"extract": "openrouter/mistralai/mistral-small-3.1-24b-instruct", // lightweight
"embedding": "openrouter/qwen/qwen3-embedding-8b", // vectors
"reranker": "openrouter/cohere/rerank-4-fast" // cross-encoder
},
// Crawler-specific settings
"crawler": {
"chunk_size": 1500, // chars per chunk (simple mode)
"workers": 4, // parallel crawl workers
"no_llm_chunk": false, // true = paragraph-based, false = LLM semantic
"embedding_dim": 1024 // Matryoshka truncation dimension
},
"debug": false
}
Environment variable fallback
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL | Yes | — | PostgreSQL connection string |
OPENAI_API_KEYS | At least one | — | Comma-separated OpenAI API keys |
CLAUDE_API_KEYS | — | Comma-separated Claude API keys | |
QDRANT_URL | No | http://localhost:6333 | Qdrant REST API URL |
QDRANT_API_KEY | No | — | Qdrant authentication key |
LISTEN_ADDR | No | :8080 | Public HTTP server address |
INTERNAL_ADDR | No | localhost:9090 | Internal HTTP server address |
DEBUG | No | — | Enable debug logging (JSON) |
AGENT_TRACE | No | — | Set to 1 for per-dialog trace files |
Tier routing explained
Model identifiers use "provider/model" format. The provider name is split on the first / character and matched against the providers map.
| Tier | Config Key | Purpose | Notes |
|---|---|---|---|
| chat | models.chat | Customer-facing stage replies | Larger, higher-quality model |
| extract | models.extract | Extraction, classification, summaries | Smaller, faster model. Falls back to chat if not configured. |
models.embedding | Vector embeddings for search | Batch API, Matryoshka truncation to configured dim | |
| reranker | models.reranker | Cross-encoder result re-scoring | Cohere API, optional |
Extract model constraint: Must return content in choices[].message.content field. Thinking/reasoning models that use a reasoning field will not work for the extract tier.
Key Thresholds Reference
All configurable and hardcoded constants that control system behavior.
Agent Thresholds
| Constant | Value | Purpose | Defined In |
|---|---|---|---|
urlScoreThreshold | 3.0 | Minimum accumulated URL score to auto-select service | internal/agent/stages.go |
minSearchScore | 0.5 | Minimum DBSF fusion score to include a result | internal/agent/stages.go |
maxContextResults | 5 | Max search results sent to LLM context | internal/agent/stages.go |
| Search limit (with reranker) | 5 | Results per query variation when reranker is available | internal/agent/stages.go |
| Search limit (without reranker) | 2 | Results per query variation without reranker | internal/agent/stages.go |
| First-turn guard | 2 exchanges | URL determination skipped until at least 2 exchanges | internal/agent/stages.go |
Timer Intervals
| Step | Delay | Behavior |
|---|---|---|
| 1 | 5 minutes | Light ping, no pressure |
| 2 | 15 minutes | Suggest specific value, nudge toward next step |
| 3 | 40 minutes | Offer a live specialist |
| 4 | 24 hours | Gentle reminder that help is available |
Crawler Constants
| Constant | Value | Purpose | Defined In |
|---|---|---|---|
defaultEmbeddingDim | 1536 | Default embedding dimension (text-embedding-3-small) | internal/crawler/indexer.go |
upsertBatch | 50 | Points per Qdrant upsert batch | internal/crawler/indexer.go |
| Default chunk size | 1500 chars | Simple chunking mode character limit | internal/crawler/chunker.go |
| Embed batch size | 32 chunks | Chunks per embedding API call | internal/crawler/pipeline.go |
| Default workers | 3 | Parallel crawl workers | cmd/crawler/main.go |
LLM Temperatures
| Temperature | Used For |
|---|---|
| 0.1 | Contact extraction, city extraction, finish detection, semantic chunking, price extraction |
| 0.2 | Lead summary (agent + timer) |
| 0.3 | Stage 1/2/3 replies, query rewrite, timer classification |
| 0.5 | Timer follow-up generation |
| 0.7 | URL determination (higher creativity for ambiguous cases) |
LLM Provider
| Constant | Value | Purpose |
|---|---|---|
| Max backoff | 60 seconds | Cap on exponential backoff for 429 retries |
| Backoff formula | 2^attempt + 10% jitter | Exponential with jitter, respects Retry-After header |
| Retries per chain | 3 | Max 429 retries before falling through to next provider |
Telegram
| Constant | Value | Purpose |
|---|---|---|
| Max message length | 4096 chars | Telegram API limit; messages auto-chunked at this boundary |
| Retry on 429 | Yes | Respects Retry-After header from Telegram API |
Performance Reference
Webhook latency includes LLM generation + search. Search latency covers Qdrant RRF fusion. Storage estimates: ~1KB per conversation turn, ~100KB per 10K chunks, ~8KB per dense vector (1024 dims).
Build and test commands
# Build all three binaries
make build # produces bin/aichat-bot, bin/aichat-crawler, bin/aichat-cli
# Run tests
make test # go test ./...
# Lint
make lint # go vet ./...
# Local development services (PostgreSQL + Qdrant via Docker)
make dev-services
# Run the CLI interface for testing (no Telegram needed)
bin/aichat-cli --config config.dev.json --project myproject
aichat-go
Multi-tenant AI Sales Bot Platform
Built with Go, Qdrant, PostgreSQL, and OpenRouter
This documentation is generated from source code and docs/*.md files.