Building an n8n + LLM Automation Pipeline: Architecture, Prompt Engineering, and Error Handling
Deep technical breakdown of the automation stack powering Maalig AI’s lead workflows. n8n webhook triggers, LLM prompt templates, Redis deduplication, and how failures are absorbed without alerting the user.
Building an n8n + LLM Automation Pipeline
Every lead Maalig AI captures goes through the same path: webhook → dedup → LLM → router → side-effects. n8n owns the routing. The API owns the LLM. Redis owns the truth about what has already been seen.
This is how that pipeline is wired.
The Trigger
WhatsApp messages and form submissions hit a single endpoint on the NestJS API: POST /api/v1/webhooks/whatsapp (or the channel equivalent). The handler does three things in this order:
- Verify the HMAC signature on the raw body (Meta-style
X-Hub-Signature-256). - Idempotency check against Redis.
- Persist the lead and emit a
lead.createdevent.
The signature verification has to run on req.rawBody — not the JSON-parsed body. NestJS gets this with { rawBody: true } passed to NestFactory.create. Forget that, and signatures silently fail-open.
n8n as the Router, Not the Brain
The mistake most teams make: putting the LLM call inside n8n. Don't. n8n is a workflow router — it should make zero LLM decisions on its own.
The flow my workflow runs:
Webhook → Route by event type
├─ lead.created → Sheets + Slack + Welcome Email
├─ message.received → AI reply (calls back to /api/v1/ai/reply)
├─ subscription.paid → Upgrade Slack + receipt email
└─ default → Drop with 200
All n8n knows is event type. All the AI logic lives in the API where it can be unit-tested, version-controlled, and observed.
Prompt Template Layout
The AI service builds prompts from three layers, in order:
System prompt ← stable, in code, never user-editable
+ Plan context ← e.g. "User is on Growth plan, 4,200/5,000 AI replies used"
+ Recent thread ← last 20 messages from Redis (`chat:<userId>:<channel>:<sessionId>`)
+ User message ← the actual incoming text
The thread cache is the part that makes responses feel like a real conversation. It has a 24-hour TTL — long enough for a customer to come back the next day, short enough that PII doesn't accumulate forever.
Token Cache: Stateless Requests Only
For stateless prompts (no thread context — direct lookups, FAQ-style), the result is cached in Redis under ai:cache:<sha256(message::context)> with a 10-minute TTL. Hit rate sits around 18% in production, which is meaningful when each call costs both money and 800–1200 ms of latency.
The key has to include the context, not just the message. Two users asking the same question on different plans should not share an answer.
Failures Are Absorbed, Not Alerted
A lead workflow that fails silently is worse than one that doesn't run — the user thinks they have a system. So failures hit a Dead Letter Queue (BullMQ DLQ) and an admin route under /api/v1/admin/dlq lets me inspect, retry, or drop them.
n8n itself retries 3 times with exponential backoff (5s → 25s → 125s). If a webhook fails 3× to a single endpoint, n8n logs it and the workflow continues — the customer-facing message is already in MongoDB; only the side-effects (Sheets, Slack) are delayed.
The customer never sees a failure. I see the DLQ count.