aichat-go

Multi-tenant AI Sales Bot Platform

3-Stage Conversation Engine
Hybrid Vector + BM25 Search
Cross-Encoder Reranking
Multi-Tenant Isolation
Automated Follow-Ups
Go Qdrant PostgreSQL OpenRouter NixOS

3-Stage Sales Funnel

General consultation → service-specific Q&A → lead capture. Automatic stage transitions driven by URL score accumulation and LLM classification.

🔍

Hybrid Search

Dense embeddings + BM25 sparse vectors with Qdrant DBSF fusion. Optional Cohere cross-encoder reranking and neighbor expansion for richer context.

Tiered LLM Routing

Chat tier for customer-facing replies, extract tier for lightweight classification. Any OpenAI-compatible API supported. Multi-key pool with round-robin rotation.

🕑

Smart Follow-Ups

4-step timer sequence (5m / 15m / 40m / 24h) with LLM-driven client classification. Automatically detects cold, hot, and finished conversations.

🏠

Multi-Tenant

Full project isolation with composite keys. Each tenant gets its own bot, collections, prompts, and lead group while sharing infrastructure.

🚀

Incremental Crawler

4-phase pipeline (fetch / chunk / embed / upload) with per-URL resume. Supports HTML, PDF, DOCX, XLSX. Semantic chunking with LLM topic labels.

Core Concepts

Project
A tenant — one business with its own bot, knowledge base, prompts, and CRM target. All data paths use (chat_id, project_id) composite keys.
Stage
Position in the 3-stage sales funnel. Derived from Finished and DeterminedURL fields — no separate stage field to get out of sync.
Knowledge Base
Per-project Qdrant collections containing chunked website content with both dense (embedding) and sparse (BM25) vectors.
URL
A service page URL that identifies which specific service the user is asking about. The agent "locks on" to a URL to transition from general to service-specific consultation.
Lead
The output: contact info + conversation summary + chat history + files, sent to the CRM lead group.
Timer
Automated follow-up sequence that fires when a user goes silent. Classifies engagement via LLM and either sends a nudge or dispatches the lead.
Tier
LLM routing category: chat (customer-facing), extract (lightweight), embedding (vectors), reranker (cross-encoder).

Architecture Overview

End-to-end data flow from Telegram webhook to CRM lead dispatch, with hybrid search and tiered LLM routing.

TelegramWebhook POST
normalize
GatewayTelegramUpdate → Message
Agent3-Stage State Machine
search
Hybrid SearchQdrant + BM25
QdrantDense + Sparse
PostgreSQLBM25 Index
generate
Tiered LLMChat / Extract
OpenRouterAny Provider
dispatch
CRM OutputLead Group
TelegramGroup Messages
Key design decisions: State is derived (not stored) from Finished and DeterminedURL fields. Metrics are applied via decoration (zero Prometheus imports in core packages). All data paths use composite keys (chat_id, project_id) for tenant isolation.
Component details
ComponentLocationPurpose
Gatewayinternal/gateway/Normalizes TelegramUpdate into domain.Message (text, caption, files, sender)
Agentinternal/agent/3-stage conversation state machine with command handling
Tiered LLMinternal/llm/Routes chat/extract/embedding calls to different provider+model combos
Hybrid Searchinternal/search/Dense embeddings + BM25 sparse vectors with DBSF fusion and cross-encoder reranking
CRMinternal/crm/Dispatches leads to Telegram group (summary + history + media)
Timerinternal/timer/Automated follow-up sequences with LLM-driven classification
Crawlerinternal/crawler/4-phase pipeline: fetch, chunk, embed, upload to Qdrant
Metricsinternal/metrics/Prometheus wrappers (zero-coupled with core packages)
Admininternal/admin/Admin group command handler for cross-project management
i18ninternal/i18n/Localization (ru/en) with compile-time embedded locale files
Request lifecycle (webhook to response)

Every incoming Telegram message follows this path:

  1. Webhook receptionPOST /webhook/{tg_api_key} received on the public HTTP server (:8080). The API key in the URL routes to the correct project bundle.
  2. Gateway normalizationTelegramUpdate is converted to domain.Message with text, caption, files, and sender info extracted.
  3. State loading — Chat state loaded from PostgreSQL using composite key (chat_id, project_id).
  4. Command check/start resets state and returns greeting. /debug toggles debug mode.
  5. File collection — Any file attachments are accumulated into state. Files without text get an immediate acknowledgement without LLM calls.
  6. Stage dispatch — Based on derived state (CurrentStage()), the message is routed to the appropriate stage handler.
  7. State persist — Updated state written back to PostgreSQL.
  8. Timer reset — Follow-up timer reset to step 0 for this chat.
Project structure tree
cmd/
  bot/         # Main bot binary
    main.go     # Wiring: DB, tiered LLM, per-project bundles, HTTP server
    config.go   # JSON config loading with env var fallback
  crawler/     # Knowledge base indexer CLI
    main.go     # Subcommands: init-project, seed, add-urls, register
  cli/         # Local readline chat interface (testing)
    main.go     # Same agent pipeline, file-based CRM output

internal/
  domain/      types.go        # Stage, Message, ChatState, Lead, SearchResult
  agent/       agent.go        # HandleMessage dispatch
                stages.go       # Stage 1/1.5/2/3 handlers
                commands.go     # /start, /debug
                lead_commands.go # Lead group: /prompt, /stats
  llm/         tiered.go       # TieredClient: routes by tier
                chain.go        # Provider chain with retry + fallback
                openai.go       # OpenAI-compatible provider
                claude.go       # Claude/Anthropic provider
                embedding_adapter.go # Model-specific text formatting
  search/      hybrid.go       # Qdrant DBSF + reranking + neighbors
                reranker.go     # Cohere cross-encoder client
                bm25.go         # BM25Encoder, Tokenize
  db/          postgres.go     # PostgreSQL implementation
  gateway/     telegram.go     # TelegramUpdate normalization
  telegram/    client.go       # HTTP client, retry on 429
  crm/         telegram.go     # TelegramCRM: summary + history + media
  timer/       timer.go        # Scheduler, sequences, LLM classification
  metrics/     metrics.go      # Metric definitions
                llm.go          # InstrumentedLLMClient wrapper
  crawler/     pipeline.go     # Resumable 4-phase pipeline
                chunker.go      # Semantic + simple chunking
                indexer.go      # Qdrant upsert
  i18n/        i18n.go         # T(), Tf(), embedded locales
                locales/    ru.json, en.json

migrations/  001_initial.sql, 002_bm25_index.sql, 003_timers.sql
nix/         module.nix, example-configuration.nix
flake.nix    Makefile
Extension points
New LLM Provider

Implement the Provider interface

internal/llm/
New CRM Backend

Implement CRM interface (multi-CRM dispatcher)

internal/crm/
New Gateway

Normalize to domain.Message + webhook route

internal/gateway/
Custom Search

Implement KnowledgeBase interface

internal/search/

Conversation Flow

The agent implements a 3-stage sales funnel state machine. Every incoming message is dispatched to a stage handler based on derived state.

STAGE 1

General Consultation

Identify which service the user is asking about through multi-turn conversation.

  • Query rewrite — LLM generates 3 search variations (extract tier, temp 0.3)
  • Hybrid search — per variation, limit=5 (with reranker) or 2 (without), minScore=0.5
  • DBSF fusion — Qdrant merges dense + sparse results preserving absolute relevance
  • Reranking — Cohere cross-encoder re-scores candidates (if configured)
  • Neighbor expansion — adjacent chunks N-1/N+1 fetched for context
  • LLM reply — prompt1 + history + top 5 search results (chat tier, temp 0.3)
  • URL accumulation — each result's URL gets += score across turns
Transition: URL auto-selected when any URL score ≥ 3.0 (skipped on first turn)
score ≥ 3.0 or LLM determination
STAGE 1.5

URL Determination

When no URL has crossed the auto-select threshold, the LLM decides.

  • Auto-select — if any URL score ≥ 3.0, lock immediately
  • LLM determination — extract tier (temp 0.7), examines history + search results + URL scores
  • Output: URL: <url> → lock and transition to Stage 2
  • Output: QUESTION: <text> → send clarifying question, stay in Stage 1
Guard: Determination is skipped until at least 2 exchanges have occurred
URL locked
STAGE 2

Service-Specific Consultation

Deep Q&A about the locked service with parallel contact extraction.

  • Full source page — fetched via GetByURL from source collection
  • Price extraction — one-time LLM call on Stage 2 entry, cached in PriceSummary (extract, temp 0.1)
  • LLM reply — prompt2 + source page + cached prices + search results + history (chat tier, temp 0.3)
  • Contact extraction — 3 parallel LLM calls (extract tier, temp 0.1):
    • Contact method + value (METHOD: x / VALUE: y or NONE)
    • City name (or NONE)
    • Finish decision (YES or NO)
  • Contact sharing — Telegram request_contact button sent on first Stage 2 interaction
Transition: finish=YES from extraction LLM → Stage 3
finish = YES
STAGE 3

Lead Generation & Dispatch

Package conversation into a lead and dispatch to CRM.

  • Set Finished=true — persisted to DB immediately (two-phase persist)
  • Lead summary — LLM generates summary with name, city, service, contact (extract, temp 0.2)
  • Closing reply — farewell message to user (chat tier, temp 0.3)
  • CRM dispatch — forward message + summary + history document + media files
  • Set LeadSent=true — persisted to DB (prevents duplicate sends)
After: Subsequent messages get a "your request has been forwarded" acknowledgement
Two-phase lead persist details

Finished=true is written to the database before the CRM send, and LeadSent=true is written after. This prevents duplicate leads from concurrent webhook requests or timer fires reading stale state. If the bot crashes between the two persists, the timer system retries the CRM send on next fire or on restart.

Finished=truePersist to DB
CRM send
LeadSent=truePersist to DB

Crash between these two states → timer retries the CRM send on next fire or restart

File upload handling

When a user sends a file without text, the bot immediately acknowledges it ("Accepted! Let me know when you're done.") without calling the LLM. Files are accumulated in the chat state and attached to the lead when it is generated.

Media files in the lead are grouped by type for clean presentation:

  • Photos + Videos — sent as a media album (Telegram groups them visually)
  • Documents — sent as a document album
  • Voice / Video notes — sent individually (Telegram does not support albums for these)
Contact sharing button

On the first Stage 2 interaction, the bot sends a Telegram request_contact keyboard button alongside its reply. This lets users share their phone number with one tap.

  • Sent only once per conversation (tracked via state.UserSettings.ContactRequested)
  • Shared contact is stored in chat state (not in history) for privacy
  • Contact data is included in the lead summary and extraction prompts
  • When a user says "take my phone from Telegram", the LLM can use the actual shared contact data
Soft checklist approach

The bot gently reminds the user to share their city and phone number, but does not block lead generation. If the user provides enough information for a meaningful lead, finish=true is allowed even without all fields. This avoids frustrating users who prefer not to share certain information while still maximizing lead quality.

Search Pipeline

Hybrid search combining dense embeddings and BM25 sparse vectors with DBSF fusion, cross-encoder reranking, and neighbor expansion.

1

Query Rewrite

LLM generates 3 search variations from the user's question (extract tier, temp 0.3). Improves recall by capturing different phrasings that may match different KB chunks.

2

Hybrid Search

Each variation is searched using Qdrant's prefetch with both dense embeddings (cosine similarity, 1024 dims) and BM25 sparse vectors. Qdrant's built-in DBSF (Distribution-Based Score Fusion) merges results, preserving absolute relevance signal unlike rank-based RRF. Each prefetch retrieves limit * 2 candidates for the fusion algorithm.

3

Deduplication

Results from all 3 query variations are merged. Duplicates (by text content) are removed, keeping the highest-scoring instance.

4

Reranking

Cohere cross-encoder (configured via models.reranker) re-scores all candidates. Over-retrieves 5 results per query variation (3x the final limit) to give the reranker more material. All results contribute to URL scoring before the context cap.

5

Context Cap

Top 5 results are kept for the LLM context window. URL scores are accumulated from all results (before and after cap) into state.URLsCounter.

6

Neighbor Expansion

For each result, adjacent chunks (position N-1 and N+1 from the same URL) are fetched via Qdrant scroll and concatenated in position order. Cheap metadata-only query, no vector computation needed.

Visual Search Flow

User Question
Query Rewrite (3 variations)
Dense Embeddings
cosine, 1024 dims
+
BM25 Sparse
keyword matching
DBSF Fusion
Distribution-Based Score Fusion (preserves absolute relevance)
Deduplication (by text content)
Cross-Encoder Reranking
Cohere rerank (optional)
Top 5 Results
+
Neighbor Expansion
pos N-1 / N+1
LLM Context Window
DBSF vs RRF: DBSF normalizes each scorer's output based on its actual distribution, so a high cosine-similarity result retains its absolute relevance signal. RRF (Reciprocal Rank Fusion) is purely rank-based and loses this information.
3
Query Variations
1024
Embedding Dims
5
Max Context Results
0.5
Min Score
Qdrant collections per project
CollectionVectorsPurpose
{project}_content_hybridDense + SparseChunked content with topic-prepended embeddings. Searchable target for hybrid queries.
{project}_sourceDense onlyFull page content with real embeddings. Used in Stage 2 for determined URL context and page-level matching.
BM25 implementation details
  • Standard BM25 formula: IDF * (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * dl/avgDL))
  • Parameters: k1=1.5, b=0.75
  • Tokenizer: Unicode-aware (Cyrillic + Latin), lowercased, split on non-letter/digit
  • Persistence: vocabulary, IDF, avgDL, numDocs stored as JSONB in PostgreSQL (bm25_indexes table)
  • In-memory cache: loaded from DB on first use per project, invalidated on crawler updates

LLM Calls Map

All 14 LLM call sites in the system, organized by tier and purpose.

Tiers

Chat — customer-facing replies Extract — lightweight extraction Reranker — Cohere cross-encoder Embedding — vector generation
#PurposeTierTempLocation
1Stage 1 reply — general consultation with KB search contextchat0.3stages.go
2Stage 2 reply — service-specific with full source page + priceschat0.3stages.go
3Stage 3 closing — farewell after lead capturechat0.3stages.go
4Query rewrite — 3 search variationsextract0.3stages.go
5URL determination — pick service or ask clarificationextract0.7stages.go
6Contact extraction — method + valueextract0.1stages.go
7City extractionextract0.1stages.go
8Finish detection — YES/NOextract0.1stages.go
9Lead summary — for CRM dispatchextract0.2stages.go
10Timer classification — COLD/HOT/FINISHEDextract0.3timer.go
11Timer follow-up — contextual reminder (conditional)extract0.5timer.go
12Timer lead summary — for timer-initiated CRM dispatchextract0.2timer.go
13Semantic chunking — boundary detection per pageextract0.1chunker.go
14Price extraction — one-time on Stage 2 entry, cachedextract0.1stages.go
Call flow diagram
// User message flow
User message
   #4 rewriteQuery (extract)  search variations
   hybrid search (DBSF fusion, dense + BM25)
   reranker (Cohere cross-encoder, if configured)
   top 5 results  neighbor expansion (position N-1/N+1)
   #5 determineURL (extract)  pick service or clarify
   if entering Stage 2: #14 price extraction (extract, one-time, cached)
   #1 or #2 Stage reply (chat)  answer to user
   #6 + #7 + #8 extractContact (3x extract, parallel)
   if finished:
       #9 generateSummary (extract)  CRM summary
       #3 Stage 3 closing (chat)  goodbye

// Timer flow
Timer fires
   #10 classifyStatus (extract)  cold/hot/finished
   #11 generateFollowUp (extract, conditional)
   if finished: #12 summary (extract)  CRM lead

// Crawler flow
Crawler
   #13 semantic boundary detection per page (extract)
   slice original text at detected boundaries
   topic prepend at embed time ("Topic: X\n\n...")
   batch embedding (32 chunks per API call)
Provider routing details

The TieredClient routes Complete calls based on req.Tier:

  • TierChat → chat chain (customer-facing models)
  • TierExtract → extract chain (lightweight models). Falls back to chat chain if not configured.
  • Embed → embedding chain (wrapped in AdaptedProvider for model-specific text formatting)

Each chain has independent retry/fallback. Backoff formula: 2^attempt + 10% jitter, capped at 60s. Key pool rotates API keys round-robin.

Embedding Adapters

AdaptedProvider wraps the embedding provider and formats text based on mode (document vs query). Adapter selected automatically by model name. E5-instruct models get Instruct: ...\nQuery: ... prefix for queries. Token limit auto-split: when a chunk exceeds the model's token limit, text is split by sentence, each half embedded recursively, and vectors averaged + L2-normalized.

Crawler Pipeline

4-phase incremental pipeline that fetches, chunks, embeds, and uploads content to Qdrant. Intermediate results are saved per-URL for resume.

1

Fetch Pages

HTTP GET with parallel workers (default: 3). Supports HTML, PDF, DOCX, XLSX via auto-detection. Each page saved to 1_pages/{urlhash}.json. Skip logic: pages with existing files are skipped on re-run.

2

Semantic Chunk

LLM boundary detection: outputs TOPIC: X | STARTS: Y anchor phrases per page (extract tier, temp 0.1). Original text is sliced at detected boundaries. Each chunk gets a topic label. Saved to 2_chunks/{urlhash}.json. Falls back to paragraph-based splitting with --no-llm-chunk.

3

Batch Embed

32 chunks per API call. Topic prepended before embedding: "Topic: X\n\n{text}". BM25 sparse vectors rebuilt from all chunks (IDF requires full corpus). Saved to 3_embeds/{urlhash}.json + bm25.json. Token limit auto-split handles oversized chunks.

4

Upload to Qdrant

Qdrant upsert in batches of 50 points. Source pages uploaded with real embeddings to {name}_source. Chunked content to {name}_content_hybrid. BM25 index persisted to PostgreSQL. Pipeline data archived to project-archives/{name}.tar.gz.

Incremental behavior: Each step checks for existing per-URL files before processing. Re-running the pipeline only processes missing content. Delete a step's folder to force re-processing from that point. File key: sha256(url)[:16].
Storage layout
new-projects/{project-name}/
  1_pages/{urlhash}.json    # single Page per URL
  2_chunks/{urlhash}.json   # []Chunk per URL
  3_embeds/{urlhash}.json   # []EmbeddedChunk per URL
  3_embeds/bm25.json        # BM25 encoder snapshot
Document type support
FormatLibraryNotes
HTMLgo-readability v2Firefox Reader View algorithm. Fallback: DOM walk (main → article → body, stripping nav/header/footer/script)
PDFledongthuc/pdfText extraction. Scanned PDFs with no text layer are skipped.
DOCXfumiama/go-docxParagraph and table text extraction
XLSXxuri/excelizeAll sheets as tab-separated text
Semantic chunking vs simple chunking

Semantic Chunking (default)

One LLM call per page (extract tier, temp 0.1) detects natural topic boundaries. The LLM outputs anchor phrases in the format TOPIC: X | STARTS: Y. The original text is sliced at the detected boundaries, preserving the exact source text without LLM paraphrasing. Each chunk receives a topic label for embedding.

Simple Chunking (--no-llm-chunk)

Paragraph and heading-aware splitting. Text is split on double newlines and markdown headings. Short paragraphs are merged up to maxChars (default 1500). Chunks get numbered topic labels ("Part 1", "Part 2", etc.).

Topic prepending strategy

At index time, each chunk's text is prepended with its topic label before embedding:

Before: "Our basic plan starts at $99/month with unlimited support."
After:  "Topic: Pricing plans\n\nOur basic plan starts at $99/month with unlimited support."

The topic acts as a semantic anchor — the embedding now captures the chunk's theme, not just its surface content. A query about "costs" will match closer to a chunk anchored with "Pricing plans" even if the chunk text never mentions "costs".

Search queries do not need the topic prefix — the embedding space naturally aligns.

CLI usage examples
# Create a new project and seed immediately
bin/aichat-crawler init-project \
  --name myproject \
  --tg-api-key "123456:ABC-DEF" \
  --tg-lead-group -1001234567890 \
  --language en \
  --prompt1 "You are a sales consultant..." \
  --prompt2 "You are helping a customer..." \
  --start-reply "Welcome! How can I help?" \
  --sitemap-url "https://example.com/sitemap.xml"

# Seed a project later (separate from creation)
bin/aichat-crawler seed \
  --project-id "uuid-from-init-output" \
  --project-name myproject \
  --sitemap-url "https://example.com/sitemap.xml"

# Add URLs to an existing project
bin/aichat-crawler add-urls \
  --project-name myproject \
  --urls "https://example.com/new-page1,https://example.com/new-page2"

# Set prompts from files
bin/aichat-crawler set \
  --project-name myproject prompt1 @prompts/prompt1.txt

Timer System

Automated follow-up sequences that fire when a user goes silent. LLM-driven classification decides the action at each step.

Follow-Up Timeline

1
5 min
Light ping
2
15 min
Suggest value
3
40 min
Live specialist
4
24 hours
Final reminder

Per-Fire Logic

Timer Fires
Classify ClientLLM: COLD / HOT / FINISHED
FINISHED
Send lead to CRM
COLD
Generate follow-up
HOT
Skip (or follow-up at penultimate step)
Lifecycle details
  1. Reset — Every incoming message resets the timer for that chat, starting the sequence from step 0.
  2. Fire — Fetches fresh chat state and history. Skips if already finished. Classifies the client via LLM, then acts based on classification.
  3. Persist — Timer state is written to PostgreSQL with the next trigger time. On restart, Reload() recovers all active timers, calculating remaining delay and firing overdue ones immediately.
  4. Lead retry — If a timer finds Finished=true, LeadSent=false, it retries the CRM send instead of classifying.
  5. Cancel — When a user sends a new message, the old timer is cancelled (goroutine killed + DB record updated).

Goroutine safety: Each chat/project pair gets one goroutine. A current == self identity check prevents a superseded goroutine from cleaning up a newer timer's state.

Token savings: Two-step approach (classify first, generate conditionally) skips follow-up generation for finished clients and hot clients who don't need a message yet.

Multi-Tenancy

Every data path uses composite keys (chat_id, project_id) for tenant isolation. Each project gets its own resources while sharing infrastructure.

Per Project (Isolated)

Telegram bot token
Qdrant collections
System prompts (JSONB)
Lead group (CRM target)
BM25 sparse encoder
Agent instance
Language locale

Shared (Infrastructure)

PostgreSQL instance
Qdrant instance
LLM provider pool
Timer scheduler
HTTP servers
Metrics endpoint
Data isolation details
DataTable/CollectionKey
Chat statechat_statesPRIMARY KEY (chat_id, project_id)
Chat historychat_historyWHERE chat_id AND project_id
TimerstimersPRIMARY KEY (chat_id, project_id)
BM25 indexbm25_indexesPRIMARY KEY (project_id)
Search vectorsQdrant{project_name}_content_hybrid, {project_name}_source
Webhook routingURL pathPOST /webhook/{tg_api_key}
Runtime registration

Project bundles (Telegram client, agent, CRM, KB) live in an in-memory map. New projects can be registered at runtime via two paths:

  • Admin group/project_init command creates the project, seeds KB, and registers the bundle instantly. No restart needed.
  • Internal APIPOST /api/register-project/{id} for CLI or external tools that seed the DB independently.
Project onboarding flow (Telegram)

Step 1: Create a bot in BotFather, copy the token.

Step 2: Send one command in the admin project's lead group:

/project_init myproject sitemap https://example.com/sitemap.xml 123456:AAH...token

The system verifies the token, creates the project, seeds the KB, and replies with progress:

> Project "myproject" created (bot: @myproject_bot). Seeding started...
> Scraping completed: 42 pages
> Chunking completed: 156 chunks
> Embeddings completed: 156 vectors
> Upload completed: 42 pages, 156 chunks indexed
> Project "myproject" init finished!
> Test the bot: https://t.me/myproject_bot
> To create the CRM group, click:
> https://t.me/myproject_bot?startgroup=connect_myproject

Step 3: Click the deep link. Telegram opens the "create group" UI with the bot pre-added. The bot auto-detects the group and connects it as the lead group. No group ID needed — discovered automatically via the /start connect_{name} command.

Admin commands reference
CommandDescription
/project_init <name> sitemap <url> <token>Create project, crawl sitemap, seed KB
/project_init <name> urls <url1,url2> <token>Create project from individual URLs
/project_prompt1 <name> <text>Update Stage 1 system prompt
/project_prompt2 <name> <text>Update Stage 2 system prompt
/project_prompt3 <name> <text>Update Stage 3 (finish) prompt
/project_price_url <name> <url>Set price URL (KB chunks injected into Stage 1)
/project_stats <name> [period]Show chat statistics
/project_info <name>Show project details
/project_add_urls <name> <urls>Add URLs to KB
/project_delete_urls <name> <urls>Delete URL chunks from KB
/project_delete <name>Soft-delete project
/project_restore <name>Restore soft-deleted project

Lead group commands (per project)

CommandDescription
/helpShow all available commands
/promptShow current prompt1, prompt2, and prompt3
/prompt1 [text]Show or update Stage 1 system prompt
/prompt2 [text]Show or update Stage 2 system prompt
/stats [period]Show chat statistics (e.g. 1 week, 3 days)
/infoShow project details

User chat commands

CommandDescription
/startReset conversation state and return the project's greeting
/debugToggle debug mode (appends trace info to bot replies)
Database schema

Three migrations in migrations/:

001_initial.sql

CREATE TABLE projects (
  id          UUID PRIMARY KEY,
  name        TEXT UNIQUE,
  tg_lead_group BIGINT,
  tg_api_key  TEXT UNIQUE,
  prompts     JSONB,
  language    TEXT DEFAULT 'ru'
);

CREATE TABLE chat_history (
  id          UUID PRIMARY KEY,
  chat_id     BIGINT,
  project_id  UUID REFERENCES projects(id),
  question    TEXT,
  reply       TEXT,
  dbinfo      JSONB,
  created_at  TIMESTAMPTZ
);

CREATE TABLE chat_states (
  chat_id     BIGINT,
  project_id  UUID REFERENCES projects(id),
  state       JSONB,
  PRIMARY KEY (chat_id, project_id)
);

002_bm25_index.sql

CREATE TABLE bm25_indexes (
  project_id  UUID PRIMARY KEY REFERENCES projects(id),
  vocab       JSONB,
  idf         JSONB,
  avg_dl      FLOAT,
  num_docs    INT,
  updated_at  TIMESTAMPTZ
);

003_timers.sql

CREATE TABLE timers (
  chat_id     BIGINT,
  project_id  UUID REFERENCES projects(id),
  step        INT,
  trigger_at  TIMESTAMPTZ,
  PRIMARY KEY (chat_id, project_id)
);

Key indexes

  • idx_chat_history_chat_id on (chat_id, created_at) — history lookup
  • idx_chat_states_project on (project_id) — project-level queries
  • idx_chat_states_state_gin GIN on (state) — JSONB queries on state

CRM Lead Output

When a lead is dispatched, the following is sent to the project's Telegram lead group.

1
Forward last message — the user's last message is forwarded so managers can click on the profile to start a direct conversation.
2
Summary message — LLM-generated summary containing: lead ID, name, city, service, contact method, shared contact (if available), client status (finished/hot/cold), file count.
3
Chat history document — full timestamped Q/A transcript uploaded as lead{chatID}.txt, replied to the summary message.
4
Media files — grouped by type: photos+videos as album, documents as album, voice/video notes individually. All replied to the summary.
Contact persistence: Contacts are extracted once and reused across turns. Shared contacts from the Telegram button are stored in chat state (not in history) and included in the lead summary when dispatched.
CRM interface design
type CRM interface {
    SendLead(ctx context.Context, lead domain.Lead) (string, error)
    Name() string
}

Current implementation: TelegramCRM. The interface is designed for future CRM backends (Bitrix, AmoCRM) via the MultiCRM dispatcher pattern.

Deployment

NixOS-native deployment with systemd service hardening, sops-nix secrets, and Cloudflare Tunnel.

Service Dependency Chain

postgresql
aichat-migrate
aichat-bot
qdrant
(also depends on)
aichat-bot

Security Hardening

SettingValuePurpose
NoNewPrivilegestruePrevent privilege escalation
ProtectSystemstrictRead-only filesystem except allowed paths
ProtectHometrueNo access to /home
PrivateTmptrueIsolated /tmp
MemoryMax512MMemory limit (OOM protection)
NixOS module configuration
services.aichat = {
  enable = true;
  configFile = "/run/secrets/aichat-config.json";
  listenAddr = ":8080";

  # Database
  enablePostgres = true;
  dbName = "aichat";
  postgresPort = 5432;

  # Qdrant
  enableQdrant = true;
  qdrantPort = 6333;
  qdrantDataDir = "/var/lib/qdrant";

  # Backup
  enableBackup = true;
  backupDir = "/var/backup/aichat";
  backupRetentionDays = 14;
};
Backup & secrets

Backup

  • Daily systemd timer with RandomizedDelaySec=1h
  • pg_dump + Qdrant snapshots
  • 14-day retention with automatic cleanup

Secrets (sops-nix)

  • Encrypted in secrets/production.yaml using age keys
  • Decrypted at boot on the target machine using its SSH host key
  • Keys: aichat_config (full JSON config), cloudflared_creds (tunnel credentials)
Quick deploy commands
# Deploy code changes to production
make prod-deploy

# SSH tunnels for crawler access to prod DB + Qdrant
make prod-tunnel
# PostgreSQL on localhost:15432, Qdrant on localhost:16333

# Seed a project on prod (while tunnel is open)
./bin/aichat-crawler --config /tmp/config.prod-tunnel.json init-project \
  --name myproject --tg-api-key "TOKEN" --language ru \
  --no-llm-chunk --workers 1 --urls "URL1,URL2,..."

# Fetch trace files from prod
ssh root@server "cat /var/lib/aichat/traces/{chatID}/{dialogID}.log"
Webhook setup

After the bot is running and accessible via HTTPS, register the webhook with Telegram:

curl -X POST "https://api.telegram.org/bot{TG_API_KEY}/setWebhook" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-domain.com/webhook/{TG_API_KEY}"}'

The webhook URL must exactly match https://{host}/webhook/{tg_api_key} where tg_api_key is the bot token from the projects table. The API key in the URL routes incoming updates to the correct project bundle.

Manual / Docker deployment

Prerequisites: Go 1.22+, PostgreSQL 16, Qdrant

# Build all binaries
make build
# produces bin/aichat-bot, bin/aichat-crawler, bin/aichat-cli

# Run migrations
make migrate DATABASE_URL="postgres://user:pass@localhost/aichat"

# Run the bot with JSON config
bin/aichat-bot --config config.prod.json

# Or with environment variables
DATABASE_URL="postgres://user:pass@localhost/aichat" \
OPENAI_API_KEYS="sk-xxx" \
bin/aichat-bot
Cloudflare Tunnel setup

Production uses Cloudflare Tunnel to route HTTPS traffic to the bot without exposing ports. The tunnel routes aichat.example.com to localhost:8080 on the server.

  • Tunnel credentials managed via sops-nix (encrypted at rest)
  • Configured in nix/production.nix
  • No need for TLS certificates or reverse proxy configuration
  • Telegram webhooks point to the tunnel URL

Localization

User-facing strings and LLM prompt templates are localized via JSON locale files embedded at compile time.

AspectDetails
Supported languagesru (default), en
Locale filesinternal/i18n/locales/ru.json, internal/i18n/locales/en.json
APIi18n.T(lang, key) for strings, i18n.Tf(lang, key, args...) for formatted strings
Fallback chainRequested lang → "ru" → key itself
Coverage36+ keys: agent replies, LLM prompts, history formatting, timer messages, CRM lead formatting, chunker prompts
Per-projectSet in projects.language column, propagates to agent, CRM, and timer
Adding a localeCreate internal/i18n/locales/{code}.json with the same keys — auto-loaded via //go:embed

Monitoring

Prometheus metrics exposed at GET /metrics on the internal server (localhost:9090 by default). Metrics are applied via decoration — core packages have zero Prometheus imports.

LLM Metrics

MetricTypeLabelsDescription
aichat_llm_requests_totalcounterprovider, method, statusTotal LLM API calls
aichat_llm_request_duration_secondshistogramprovider, methodLLM call latency (0.1s-51s buckets)

Telegram Metrics

MetricTypeLabelsDescription
aichat_telegram_requests_totalcountermethod, statusTelegram API calls
aichat_telegram_request_duration_secondshistogrammethodTelegram API latency

Webhook & Agent Metrics

MetricTypeLabelsDescription
aichat_webhook_requests_totalcounterproject, statusInbound webhook requests
aichat_webhook_request_duration_secondshistogramprojectWebhook processing time
aichat_agent_messages_totalcounterproject, stageMessages processed by stage

Timer & CRM Metrics

MetricTypeLabelsDescription
aichat_timer_active_sequencesgaugeCurrently active timer sequences
aichat_timer_fires_totalcounteractionTimer fire outcomes (follow_up, lead, skip)
aichat_crm_leads_totalcounterproject, statusLeads sent to CRM
Alert recommendations
ConditionPromQL Query
LLM errors spikingrate(aichat_llm_requests_total{status="error"}[5m]) > 0.1
LLM latency highhistogram_quantile(0.95, rate(aichat_llm_request_duration_seconds_bucket[5m])) > 30
Webhook errorsrate(aichat_webhook_requests_total{status="error"}[5m]) > 0.05
Timers stuckrate(aichat_timer_fires_total[1h]) == 0 when aichat_timer_active_sequences > 0
CRM failuresrate(aichat_crm_leads_total{status="error"}[5m]) > 0
Grafana dashboard tips
  • Group LLM metrics by provider to compare provider performance and error rates
  • Track aichat_agent_messages_total by stage to see funnel conversion rates (general → service_specific → final)
  • Watch aichat_timer_active_sequences as a proxy for total active conversations
  • Use aichat_webhook_request_duration_seconds p95 to spot slow responses (typical: 2-5s including LLM + search)
  • Monitor aichat_crm_leads_total by project to track lead generation per tenant
Label reference
LabelValuesUsed In
provideropenai, claude, together, openrouter, tieredLLM metrics
methodcomplete, embed (LLM); sendMessage, sendMediaGroup, etc. (Telegram)LLM, Telegram metrics
statusok, errorAll counter metrics
projectProject name stringWebhook, Agent, CRM metrics
stagegeneral (Stage 1), service_specific (Stage 2), final (Stage 3)Agent metrics
actionfollow_up, lead, skipTimer metrics

Configuration Reference

Configuration via JSON file (--config config.json) or environment variables (fallback). JSON config is recommended for production.

Annotated config.example.json

{
  // PostgreSQL connection string
  "database_url": "postgresql://user:pass@localhost/aichat",

  // Qdrant vector database
  "qdrant": {
    "url": "http://localhost:6333",
    "api_key": ""
  },

  // Public HTTP server (webhooks)
  "listen_addr": ":8080",
  // Internal HTTP server (health, metrics, register-project)
  "internal_addr": "localhost:9090",
  // Base URL for webhook registration (must be HTTPS in production)
  "webhook_url": "https://bot.example.com",

  // LLM providers: any OpenAI-compatible API
  "providers": {
    "openrouter": {
      "keys": ["sk-or-v1-your-key"],
      "base_url": "https://openrouter.ai/api/v1"
    },
    "together": {
      "keys": ["your-together-key"],
      "base_url": "https://api.together.xyz/v1"
    }
  },

  // Model tiers: "provider/model" format (split on first /)
  "models": {
    "chat": "openrouter/qwen/qwen3-235b-a22b-2507",    // customer-facing
    "extract": "openrouter/mistralai/mistral-small-3.1-24b-instruct",  // lightweight
    "embedding": "openrouter/qwen/qwen3-embedding-8b",   // vectors
    "reranker": "openrouter/cohere/rerank-4-fast"        // cross-encoder
  },

  // Crawler-specific settings
  "crawler": {
    "chunk_size": 1500,       // chars per chunk (simple mode)
    "workers": 4,             // parallel crawl workers
    "no_llm_chunk": false,     // true = paragraph-based, false = LLM semantic
    "embedding_dim": 1024     // Matryoshka truncation dimension
  },

  "debug": false
}
Environment variable fallback
VariableRequiredDefaultDescription
DATABASE_URLYesPostgreSQL connection string
OPENAI_API_KEYSAt least oneComma-separated OpenAI API keys
CLAUDE_API_KEYSComma-separated Claude API keys
QDRANT_URLNohttp://localhost:6333Qdrant REST API URL
QDRANT_API_KEYNoQdrant authentication key
LISTEN_ADDRNo:8080Public HTTP server address
INTERNAL_ADDRNolocalhost:9090Internal HTTP server address
DEBUGNoEnable debug logging (JSON)
AGENT_TRACENoSet to 1 for per-dialog trace files
Tier routing explained

Model identifiers use "provider/model" format. The provider name is split on the first / character and matched against the providers map.

TierConfig KeyPurposeNotes
chatmodels.chatCustomer-facing stage repliesLarger, higher-quality model
extractmodels.extractExtraction, classification, summariesSmaller, faster model. Falls back to chat if not configured.
embeddingmodels.embeddingVector embeddings for searchBatch API, Matryoshka truncation to configured dim
rerankermodels.rerankerCross-encoder result re-scoringCohere API, optional

Extract model constraint: Must return content in choices[].message.content field. Thinking/reasoning models that use a reasoning field will not work for the extract tier.

Key Thresholds Reference

All configurable and hardcoded constants that control system behavior.

Agent Thresholds

ConstantValuePurposeDefined In
urlScoreThreshold3.0Minimum accumulated URL score to auto-select serviceinternal/agent/stages.go
minSearchScore0.5Minimum DBSF fusion score to include a resultinternal/agent/stages.go
maxContextResults5Max search results sent to LLM contextinternal/agent/stages.go
Search limit (with reranker)5Results per query variation when reranker is availableinternal/agent/stages.go
Search limit (without reranker)2Results per query variation without rerankerinternal/agent/stages.go
First-turn guard2 exchangesURL determination skipped until at least 2 exchangesinternal/agent/stages.go

Timer Intervals

StepDelayBehavior
15 minutesLight ping, no pressure
215 minutesSuggest specific value, nudge toward next step
340 minutesOffer a live specialist
424 hoursGentle reminder that help is available

Crawler Constants

ConstantValuePurposeDefined In
defaultEmbeddingDim1536Default embedding dimension (text-embedding-3-small)internal/crawler/indexer.go
upsertBatch50Points per Qdrant upsert batchinternal/crawler/indexer.go
Default chunk size1500 charsSimple chunking mode character limitinternal/crawler/chunker.go
Embed batch size32 chunksChunks per embedding API callinternal/crawler/pipeline.go
Default workers3Parallel crawl workerscmd/crawler/main.go

LLM Temperatures

TemperatureUsed For
0.1Contact extraction, city extraction, finish detection, semantic chunking, price extraction
0.2Lead summary (agent + timer)
0.3Stage 1/2/3 replies, query rewrite, timer classification
0.5Timer follow-up generation
0.7URL determination (higher creativity for ambiguous cases)

LLM Provider

ConstantValuePurpose
Max backoff60 secondsCap on exponential backoff for 429 retries
Backoff formula2^attempt + 10% jitterExponential with jitter, respects Retry-After header
Retries per chain3Max 429 retries before falling through to next provider

Telegram

ConstantValuePurpose
Max message length4096 charsTelegram API limit; messages auto-chunked at this boundary
Retry on 429YesRespects Retry-After header from Telegram API

Performance Reference

2-5s
Webhook Latency
~500ms
Search (Qdrant)
~1KB
Per Turn
~8KB
Per Vector

Webhook latency includes LLM generation + search. Search latency covers Qdrant RRF fusion. Storage estimates: ~1KB per conversation turn, ~100KB per 10K chunks, ~8KB per dense vector (1024 dims).

Build and test commands
# Build all three binaries
make build        # produces bin/aichat-bot, bin/aichat-crawler, bin/aichat-cli

# Run tests
make test         # go test ./...

# Lint
make lint         # go vet ./...

# Local development services (PostgreSQL + Qdrant via Docker)
make dev-services

# Run the CLI interface for testing (no Telegram needed)
bin/aichat-cli --config config.dev.json --project myproject

aichat-go

Multi-tenant AI Sales Bot Platform
Built with Go, Qdrant, PostgreSQL, and OpenRouter

3 Entry Points: bot, crawler, cli 14 LLM Call Sites 3 SQL Migrations 2 Locales (ru/en)

This documentation is generated from source code and docs/*.md files.