Case Studies

The work speaks for itself.

Real results from real SaaS companies. Architecture decisions, implementation details, and measurable outcomes.

AI Copilot

Series B developer tools company cut support tickets by 38% with an in-product AI copilot.

Company

Series B, 3,200 users

Investment

$25,000

Payback

7.3 weeks

The problem

Their product had 60+ features and a learning curve that frustrated new users. Onboarding completion was at 34%. Support was handling 1,800 tickets/month — 40% were basic "how do I do X?" questions with answers already in the documentation.

The CTO estimated a 4-month timeline to build something internally. The board wanted results before the next quarterly review in 7 weeks.

What we built

Ingested 340 documentation articles. Built hybrid retrieval pipeline using pgvector. Defined 25 most common user actions as function-calling schemas.

Built tool-use agent with Claude 3.5 Sonnet. RAG-powered answers plus action execution via their REST API. Confirmation dialogs for destructive actions.

Added user context injection — copilot knows current page, plan tier, role, and recent activity. Built streaming responses and embeddable React component.

Production hardening. Fallback chain: Claude → GPT-4o → cached response. Cost controls per user tier. Datadog dashboards. 120-case evaluation suite.

The architecture

User query → Context enrichment (page, role, history)
           → Hybrid retrieval (pgvector + BM25)
           → Re-ranking (top 5 chunks)
           → Claude 3.5 Sonnet (tool-use agent)
           → Action execution or answer generation
           → Streaming response to frontend

Fallback:  Claude timeout (5s) → GPT-4o → cached response
Cost:      4,096 token budget per conversation
Monitor:   Every interaction logged with latency, tokens, cost

Results (90 days post-launch)

Support tickets/mo

1,8001,116

-38%

Onboarding completion

34%51%

+50% relative

Features used/user

814

+75%

Support cost/mo

$27,000$16,740

-$10,260/mo

Copilot resolution rate

—73%

Automated

Avg response time

4.2 hours1.8 seconds

Instant

“The copilot is now our most-used feature. Enterprise prospects specifically ask about it in demos. It went from a support cost play to a competitive advantage.”

— CTO

Reliability Engineering

Fintech platform reduced AI API costs by 63% and eliminated production incidents.

Company

Series A, 1,100 users

Investment

$18,000

Payback

12 days

The problem

Their AI-powered transaction categorization feature was bleeding money: API bill grew from $2,400/month to $8,700/month in 4 months. 3 production incidents in 6 weeks. Zero observability. Users reporting miscategorized transactions with no way to measure accuracy at scale.

What we did

Instrumented every AI call path. Found: 47% of API calls were redundant, p95 latency was 12s (timeout was 10s), and 23% of responses had parsing failures being silently swallowed.

Implemented idempotent processing, switched to structured JSON mode, added circuit breakers with graceful degradation.

Built semantic cache (34% hit rate day one). Optimized prompts — 41% token reduction. Implemented tiered model strategy: simple queries to GPT-4o-mini, complex to Claude.

Built 200-case evaluation suite across 15 categories. Continuous accuracy monitoring sampling 5% daily. Alerting for cost, latency, accuracy drift, and error spikes.

Results (60 days post-engagement)

Monthly API cost

$8,700$3,200

-63%

Production incidents

3 in 6 weeks0 in 8 weeks

-100%

p95 latency

12.1s2.3s

-81%

Parsing failures

23%0.3%

-99%

Accuracy

Unknown94.2%

Now measurable

Observability

0 dashboards4 dashboards

12 alerts

“We thought we had an AI feature. What we actually had was a ticking time bomb. Now we have a production system — with the monitoring to prove it.”

— CTO

Document AI

B2B marketplace automated 70% of vendor document processing.

Company

Series B, 800 vendors

Investment

$28,000

Payback

8 weeks

The problem

Every new vendor submitted compliance documents — licenses, insurance certificates, tax forms. Two FTEs processed 600 documents/month at 15-25 minutes each. Vendor count growing 8% month-over-month. Previous OCR solutions failed on scanned documents and still required human interpretation.

What we built

Ingestion pipeline for PDFs, images, and scanned docs. AWS Textract for OCR. Multimodal Claude for document classification — 97% accuracy across 12 document types.

Structured extraction schemas per document type. Confidence scoring per field. Auto-accept above 0.9, flag 0.7-0.9 for quick review, manual below 0.7.

Human-in-the-loop review UI: document side-by-side with extracted data. One-click approval. Corrections feed back into prompt tuning. Integration with vendor management database.

W4-5

Batch processing pipeline (SQS-based). PII detection and redaction. Monitoring dashboard. Ran 500 historical documents for validation and prompt tuning.

Results (90 days post-launch)

Auto-processed

0%70%

Automated

Processing time/doc

20 min45 sec

-92%

Ops hours/month

200 hrs55 hrs

-72%

Cost per document

$12.50$1.80

-86%

Extraction accuracy

96% (human)99.1%

After review

Vendor onboarding

3-5 daysSame day

Instant

“We were about to hire a third person for document review. Instead, we automated 70% of the work and moved one of our best people into a revenue-generating role.”

— VP Operations

Your project could be next.

Every engagement starts with a 30-minute technical assessment. We'll look at your architecture, understand the problem, and tell you exactly what we'd build, how long it would take, and what it would cost.

Book a Technical Assessment

No NDAs required for the initial conversation. We've worked under strict confidentiality and will sign yours if we move forward.