Case Study - Sidney Voice AI: the storage industry's most extroverted intern
Voice AI assistant for self-storage operations. Took 150+ qualified leads at ISS Vegas, boosted demo success by 25%, and cut fallback prompts by 60%.
- Client
- Stuf Storage (Sidney AI)
- Year
- Service
- Voice AI, NestJS/Prisma backend, LLM dialogue management
The problem
Self-storage is a phone-heavy business that doesn't want to be. Calls come in all day. Most of them are the same five questions. Hours, unit sizes, climate control, price ranges, whether you take Bitcoin. The good callers convert. The rest are repetitive, and the staff time spent on them is time not spent with customers walking through the door.
Stuf wanted a voice agent that could take the boring calls, qualify leads, book tours, and only hand off to a human when the call actually needed one. The bar was high. A bad voice agent reflects on the brand on every call. A good one is invisible.
The other dimension was internal. Stuf had thousands of historical sales calls sitting in storage, and nobody had time to listen to them. There were patterns in there. Coaching gold, missed objections, repeated wins. The asks were two products in one. Sidney, the customer-facing voice agent, and Sidney GPT, the internal analyst that mined the call archive.
What we shipped
Sidney Voice AI, the customer-facing agent. The full pipeline from inbound call to lead record to operator handoff, with a NestJS and Prisma backend doing the CRUD, integrations, and post-call automations. The dialogue management uses an async-graph state machine on top of LLM turns, which let us reason about "where are we in the conversation" the same way you would in a traditional IVR, but with all the natural-language flexibility of a model. That's the design choice that produced most of the wins.
Sidney GPT, the internal analyst. It runs SQL and NLP over 3,000+ archived sales calls to surface patterns the operations team could act on. Things like which objections come up most often per location, which callers convert, what time of day produces the longest tail of dropped leads. The output is a dashboard the ops team actually reads, not a quarterly slide deck.
The booth at ISS Vegas, which was less of a "ship" and more of a "this had to work in front of an entire industry." We had Sidney taking live demo calls from attendees walking the floor. The booth caught the kind of traffic you only get at the biggest show in your industry, and Sidney handled it.
How we built it
Boring infrastructure first. NestJS for the API, Prisma against Postgres for persistence, queue-backed automations for the things that didn't need to run inline with the call. None of this is the interesting part, and that's the point. The interesting part is voice, and voice is unforgiving. You don't want to be debugging your own ORM when the model is talking to a customer.
The async-graph dialogue manager was the architectural bet. Pure-LLM dialogue control is brittle. You get a model that wanders, repeats itself, or forgets the user's last constraint. Pure IVR dialogue is rigid and feels like 2010. The graph approach gave us nodes with explicit goals (greet, qualify, book, escalate) and the LLM handled the language within each node. Transitions were rule-augmented LLM decisions, not raw LLM decisions, so the system was easy to debug and easy to extend. Demo success went up 25% after we landed this pattern, and fallback prompts (the model giving up and asking the user to repeat) dropped 60%.
Sidney GPT was a quieter project but high-leverage. The call archive had been sitting in S3 doing nothing. We transcribed what wasn't transcribed yet, ran a lightweight NLP pipeline for objection extraction and topic clustering, and put the structured output behind a SQL schema. From there, an LLM front-end let the ops team ask questions in plain English. The first month, the team found half a dozen things they immediately changed about how the human reps were trained.
Outcome
The numbers, which Stuf is comfortable having on a marketing page:
- 150+ qualified leads from ISS Vegas, generated by Sidney handling demo calls at the booth.
- 25% improvement in demo success on real customer calls after the dialogue-graph refactor.
- 60% reduction in fallback prompts on the same surface.
- 3,000+ sales calls analyzed through Sidney GPT, with concrete coaching insights surfaced to the operations team.
The other outcome that doesn't show up cleanly in numbers: the team got their phones back. The volume of staff hours spent on repetitive inbound calls dropped to the point where the storage operators could focus on the work that actually moves the business.
What I'd do differently
Started the eval harness sooner. We had spot-check workflows for a long time before we had anything like a proper eval suite, and it cost us in iteration speed once the agent got non-trivial. The pattern I now bring into every voice project (and applied to MyMethod after this) is to build the eval loop before you build the third dialogue node.
Otherwise the team and the product hit the milestones we set, the trade show was a win, and Sidney is still picking up calls.
Want this kind of work for your team?
See the engagement shapes ESARC offers, or start a conversation.