← All work · Case 01 of 04
Quote automation, built for the humans who still sign off.
An LLM + RAG pipeline cut data entry per quote by 88% and scaled analyst capacity 1.65x, without taking the human out of compliance.
3.2 → 1.5 days
8 min → 1 min / quote
190 → 314 quotes/mo
enterprise hospital POC
post-launch
needed
Analysts were the bottleneck. So was compliance review.
Enterprise-tier quotes averaged 3.2 days to turn around, not because analysts were slow, but because every quote demanded reconciling unstructured vendor PDFs against a structured pricing matrix, then routing for compliance sign-off. Doubling throughput meant doubling headcount. Neither the unit economics nor the hiring pipeline supported that.
Leadership wanted "AI to fix this." The instinct most PMs would follow: ship a model that generates the quote end-to-end. In a GRC product, that instinct is wrong.
Shadowed 12 analysts. Read the SOPs. Asked what breaks.
I spent two weeks embedded with the quote team across three offices. The pattern was consistent: analysts weren't slow on decisions, they were slow on extraction, manually pulling SKUs, pricing tiers, and contract terms out of 40-page PDFs so a human could apply judgment.
That reframe changed the scope. We weren't automating pricing — we were automating data preparation, with the analyst still making the call and signing the quote.
LLM + RAG for extraction. Humans for judgment.
The architecture: a document-ingest pipeline parsed incoming PDFs, RAG retrieved the relevant pricing-matrix rows from our product catalog, and an LLM structured the extracted data into a validated table. The analyst saw the table pre-filled, with confidence scores, field-level provenance, and low-confidence rows flagged for manual review.
Engineering wanted to start with a fine-tuned model. I pushed back — we had zero labeled data and a 6-week pilot window. RAG first, fine-tune later if the baseline needed it. It didn't.
Before → after, in the only unit that matters.
What I owned on this build.
Three things I'd tell the next PM on this project.
The obvious AI build is usually wrong. End-to-end automation demos well. In a compliance-bound domain, it fails adoption. Find the unglamorous middle of the workflow and automate that.
Confidence scores are a UX decision, not a model one. What threshold flags a row for review? Who owns overrides? How do analysts report a model mistake? Those were PRD questions, not eng questions.
The best metric was analyst-reported. "I can breathe now" from one of the leads mattered more than the throughput number. Culture-of-adoption is the leading indicator; throughput is the lag.