Visitors often have questions: What services do we offer? How much does it cost? Can we help with their specific challenge? Rather than making them dig through pages or wait for a response, we built zxtra. An AI assistant that answers questions instantly, right on the page.
This is the story of how we built zxtra using Cloudflare Workers AI, the technical decisions we made, and why AI assistants are becoming essential for modern businesses.
TL;DR
- Problem: Visitors had questions scattered across multiple pages. FAQ pages help, but people don't always check them
- Solution: zxtra, a context-aware AI assistant that lives on every page
- Stack: Cloudflare Workers AI, Cloudflare Turnstile, React floating chat widget
- Result: Instant answers 24/7, zero infrastructure overhead, enterprise-grade security
The Problem: Information Overload
Our website has grown significantly. Service pages, case studies, blog posts, pricing information, team bios. While this depth establishes expertise, it creates a challenge: visitors don't always know where to find what they need.
Common questions we heard:
- "What services do you offer?"
- "How much does a DevOps engagement cost?"
- "Can you help with AWS cost optimization?"
- "Do you have experience with Kubernetes?"
These questions all have answers on our website. Finding them requires navigation, reading, and sometimes connecting dots across multiple pages.
Traditional Solutions Fall Short
| Solution | Problem |
|---|---|
| FAQ Page | People don't always check it. Can't anticipate everything |
| Live Chat | Requires staff availability. Not scalable |
| Contact Form | Creates friction for simple questions |
The AI Solution
What if visitors could ask questions naturally and get instant, accurate answers? That's what zxtra provides. A conversational interface to our entire website's knowledge base.
Building zxtra: Technical Decisions
Why Cloudflare Workers AI
We evaluated several options:
| Platform | Pros | Cons |
|---|---|---|
| OpenAI API | Best models, extensive docs | Higher cost, latency varies |
| AWS Bedrock | AWS integration, multiple models | Complex setup, region-locked |
| Cloudflare Workers AI | Edge deployment, simple pricing | Newer, fewer model options |
We chose Cloudflare Workers AI for four reasons:
- Edge deployment: Our website already runs on Cloudflare Workers. Adding AI inference meant zero additional infrastructure
- Simple pricing: Generous free tier, predictable costs. No surprise bills from token overages
- Low latency: Inference runs at the edge, close to users. Response times are consistently fast globally
- Llama 3.1: Fast, reliable responses with the standard chat completion format
The Knowledge Architecture
zxtra doesn't browse the internet or hallucinate information. It only knows what we explicitly tell it.
Loading diagram...
The system prompt is built dynamically from our centralized data modules. Same data sources as the website. Single source of truth.
Benefits:
- zxtra's knowledge updates automatically when we update website content
- No hallucinations. The model can only reference information we've provided
- Context awareness. We include the current page so zxtra can summarize what the visitor is viewing
The Chat Widget
The chat interface appears on every page as a floating button. Non-intrusive. Expands when clicked. Minimizable while keeping the conversation.
Key UX decisions:
- Markdown support for AI responses
- Copy button for easy response sharing
- Suggested questions for first-time users
- Rate limit indicators to set expectations
Security: Multiple Layers of Protection
An AI endpoint is an attractive target for abuse. Without protection, bots could drain API quotas, use our AI for unrelated queries, or attempt prompt injection attacks.
We implemented multiple layers:
Layer 1: Cloudflare Turnstile
Turnstile is Cloudflare's invisible CAPTCHA replacement. Unlike traditional CAPTCHAs, it doesn't require user interaction. It runs in the background and verifies that the visitor is human.
Key learning: Turnstile tokens are single-use. After verifying a token on the server, you must reset the widget to get a new token for the next request. We spent time debugging "verification failed" errors before realizing this.
Layer 2: Rate Limiting
Even with bot protection, we limit request frequency:
- 20 requests per minute per user
- 100 requests per hour per user
- 2-second minimum interval between requests
Rate limits are tracked using signed cookies, making them resistant to manipulation.
Layer 3: Input Validation
Basic but essential:
- Maximum 2000 characters per message
- Required fields must be present
- JSON body must parse correctly
Layer 4: Strict System Prompt
The system prompt explicitly constrains the AI's behavior:
- Only answer questions using information provided
- If a question is not about ZSoftly, decline politely
- If information is missing, direct to contact us
- Never answer general knowledge questions unrelated to ZSoftly
This prevents prompt injection attacks where users try to make the AI behave outside its intended purpose.
Lessons Learned
What Worked Well
Edge deployment was the right choice. Response times are consistently under 2 seconds globally. No cold starts, no regional latency issues.
Dynamic system prompts pay off. By pulling from our centralized data modules, zxtra's knowledge stays current automatically.
Turnstile provides invisible protection. Users don't notice it, but it effectively blocks automated abuse.
Challenges We Faced
Single-use Turnstile tokens. This wasn't obvious from documentation. After a successful verification, the token is consumed. You need a new one for each request.
Balancing strictness with helpfulness. Too strict, and the AI refuses legitimate questions. Too loose, and it goes off-topic. Finding the right system prompt took iteration.
What We'd Do Differently
Add conversation history. Currently, each message is independent. Adding context from previous messages would make multi-turn conversations more natural.
Implement streaming responses. The full response is generated before being sent. Streaming would improve perceived latency for longer responses.
Build analytics. We don't yet track what questions visitors ask. This data would help us improve both the AI and our content.
The Business Case for AI Assistants
Building zxtra took about a week of focused development time. Ongoing cost is minimal. Cloudflare Workers AI has a generous free tier, and our usage is well within it.
What we get in return:
| Benefit | Impact |
|---|---|
| 24/7 availability | Visitors get answers any time |
| Instant responses | No waiting for a human to be available |
| Consistent quality | Every response uses the same knowledge base |
| Reduced friction | Simple questions don't require forms |
| Scalability | 10 or 10,000 visitors, the AI handles it |
Should You Build an AI Assistant?
Consider it if:
- Visitors frequently ask similar questions
- Your content spans multiple pages or topics
- You want to provide support outside business hours
- You're already on Cloudflare (makes deployment trivial)
Hold off if:
- Your queries require real-time data (inventory, frequently changing pricing)
- You need transactional capabilities (placing orders, making changes)
- Compliance requirements restrict AI usage
Try zxtra Yourself
We built zxtra not just for our visitors, but as a demonstration of what we can build for your business. Whether you need a customer service chatbot, an internal knowledge assistant, or a specialized AI tool, the architecture patterns are similar.
Try the full zxtra experience →
Or click the chat bubble in the bottom-right corner of any page to see it in action.
What's Next
We're continuing to evolve zxtra:
- Conversation memory across multiple messages
- Streaming responses as they're generated
- Usage analytics to understand what visitors are asking
- Expanded knowledge and capabilities
If you're interested in building something similar for your business, we'd love to chat. Our AI Agents & Chatbots service covers custom development from design through deployment.
Book a call to discuss your AI project →
Building an AI-powered product or service? We help businesses design and implement intelligent solutions. Contact us →
