Knowing What Your Customers Want, All the Time: Listen Labs' Alfred Wahlforss

ELI5/TLDR

Listen Labs built an AI that interviews real people on video, hundreds at a time, to find out what customers actually want. Instead of a human researcher running one focus group a week, the AI runs thousands of conversations in parallel, watches faces for genuine reactions, and lets you click any conclusion to see the exact clip it came from. The next step is “simulation” — after enough real interviews, the AI tries to predict how a given type of customer would answer a brand-new question without anyone being interviewed at all. A year in, they say 20% of the Fortune 500 use them.

The Full Story

The pitch: an AI that talks to your customers

Listen Labs is what the industry calls customer research — figuring out what people want before you build it. Their twist is that the asking is done by an AI agent over what is essentially a Zoom call.

“We have this AI agent that can understand your customers better than you can. And the way we do that is by talking to them.”

You type a question, like “how can I improve onboarding.” The system writes an interview guide (the script the AI follows), pulls from a pool of 30 million potential participants, finds the right people, and runs hundreds of video interviews. Then it analyzes everything and hands back recommendations. The founder, Alfred Wahlforss, frames the bigger bet this way: as AI makes building things cheap and fast, the scarce skill becomes knowing what to build. That’s the gap Listen wants to own.

The origin was accidental. Wahlforss and his co-founder had built a viral consumer app (AI avatars of yourself, an early take on the image tools that later went mainstream). It got 20,000 users overnight and then bled them through churn. To understand why people were leaving, they built an AI interviewer for their own use. The tool turned out to be more interesting than the app.

Why an AI interviewer beats the old way — and a survey

One of the VCs pushes back early and hard, voicing the standard skepticism: surveys are garbage. People get paid to take them (selection bias), and what people say they’ll do rarely matches what they actually do. Real-world behavior data beats stated intentions every time.

Wahlforss half-agrees. He notes that A/B testing — measuring real behavior — is the gold standard, but in practice it needs enormous user volume and is hard to get right. His counter is that surveys are even worse than the skeptic thinks. Listen ran an experiment: ask the same person the same multiple-choice question twice, and the answers are wildly inconsistent. But force a person to talk through their reasoning out loud, and they become far more consistent. The act of reasoning aloud is the point.

The video format adds a second signal. Because it’s a Zoom-style call, the AI can read the face — eye movement, enthusiasm — not just the words. For advertising this matters a lot:

“You might have very high scores on a survey question… but when someone also reacts very enthusiastically, it’s going to perform much higher.”

They claim ads chosen this way perform better in real Meta and LinkedIn campaigns. And every claim is traceable — click any data point and watch the exact video clip or read the exact quote it came from, so you can check the AI isn’t making things up.

There’s a counterintuitive finding too: people are often more honest with the AI than with a human interviewer. It’s non-judgmental, lower-pressure, and asynchronous (you can drop in and out). That also makes it cheaper — you pay participants less to talk to a bot than to sit through a scheduled human call. It even unlocks hard-to-reach groups like children (with parental consent) who could never make a focus-group slot.

The real moat: finding the right people

Here’s the part that surprised me. Wahlforss says 80% of their engineering goes not into the AI interviewer but into the audience — finding exactly the right people to talk to.

The logic is that every business runs on a power law. Even a brand you’d assume is for everyone, like the salad chain Sweetgreen, actually lives off a narrow segment:

“The right audience is typically urban, high household income, mostly female. And by the way, they need to know what seed oils are — which only like 1% of the population does. And then you find that some people go to Sweetgreen every single day, and that’s 80% of their revenue.”

Find that 1%, interview them first, and the insights are worth far more. To do this, Listen builds a profile of every person it ever interviews. Someone mentions in passing that they’re a sneaker obsessive in one interview; months later, when Nike launches a product, that person can be served up automatically. The old way was spamming an email list and “screening people out” — an incidence rate of 10% means nine of ten people get rejected before qualifying, which is annoying enough to make them quit the panel entirely.

This is also why brands that already know their customers still need Listen. A company can email its own CRM list, but it can’t easily reach prospective customers it doesn’t have yet — and big firms like Google legally can’t just email their own users. A neutral third party sidesteps the spam-filter and regulatory problems.

Market research 3.0: simulation

The conversation’s most speculative thread is “generative agent simulation.” The idea: after interviewing one person for, say, an hour, the AI can predict that person’s future answers — sometimes claimed at 95% accuracy. Scale that to a thousand modeled people and you have a synthetic panel you can query instantly, with no live humans involved.

The interviewer, Constantine, voices the right skepticism: isn’t synthetic data just remixing what the base model already knows? Why can’t you just tell ChatGPT “pretend you’re a grumpy 35-year-old engineer”? Wahlforss’s answer is that they tried exactly that, plus credit-card spend and purchasing data, and the single best input turned out to be interviews — because a good interview lets you chase tangents and ask behavioral questions, producing clean data on how a specific persona actually thinks. The base models, he argues, are trained on the average person and don’t have that texture.

His honest caveat is worth keeping:

“Chaos theory tells us it’s really hard to predict the future. Otherwise we would be on Wall Street making a ton of money.”

So the trick becomes knowing which questions simulation can answer. They back-test by hiding a real answer and seeing if the model predicts it, and they deliberately ask un-answerable questions (what’s the person’s dog’s name?) to check the model knows its own limits. The practical sweet spot today is “message testing” — picking a tagline or headline. Wahlforss used it on himself to choose a conference talk title from 100 options.

The vertical-AI thesis and the moat

Underneath the product, Wahlforss is making a bet about vertical AI companies — those built for one industry. Their edge isn’t a better base model; it’s a proprietary benchmark (an “eval”) that only they can improve. Early on, GPT-4 would repeat the same interview question 100 times; their eval for “is this a good interview” started at 20% and climbed to 85% as they trained the agent on research best practices. Then they raise the bar — a new, harder eval (can the AI understand what’s on your screen? skip irrelevant questions?) drops them back to 20%, and the climb starts again. Better data, harder problems, repeat.

Asked the classic “what’s your moat” question, he reaches for the Seven Powers framework: network effects (more participants and more buyers reinforce each other), a data advantage (more interviews make simulation better), and switching costs (your interview history lives in the platform). And a quieter product truth from a Sequoia partner: founders want to build something complex, but customers want something stupid-simple that just works.

Key Takeaways

Listen Labs runs AI-led video interviews at scale; the video lets it read facial reactions, not just words — a second signal on top of stated answers.
Stated preferences are unreliable, but forcing someone to reason out loud makes their answers far more consistent than multiple-choice surveys.
Every data point is traceable back to the exact clip or quote, a deliberate guard against AI hallucination.
People are often more honest with an AI interviewer (non-judgmental, low-pressure, async) and cost less to recruit than for human interviews.
The hard, defensible part is the audience — 80% of engineering — because revenue follows a power law and you need to find the narrow high-value segment.
They build persistent profiles across interviews, so an offhand fact in one session (“I’m a sneakerhead”) makes that person findable for a future, unrelated study.
Brands still need a third party even for their own customers: CRM lists are messy, and emailing your own users can trigger spam filters or regulatory limits.
“Simulation” predicts how a modeled persona would answer without a live interview; interviews beat purchase data or raw ChatGPT prompting as the input, because base models only know the average person.
Simulation’s reliable use today is message/tagline testing; predicting genuinely novel or chaotic human behavior remains unsolved (their own “chaos theory” caveat).
The vertical-AI moat is a proprietary eval you keep raising, plus network effects, data accumulation, and switching costs (Seven Powers framing).

Claude’s Take

This is a founder interview on a VC’s own podcast, so the genre is partly an advertisement — “20% of the Fortune 500 in a year” and “95% accuracy” are the kind of numbers you nod at, not bank on. To the show’s credit, the VCs actually push back twice (the surveys-are-junk objection and the synthetic-data-is-just-remixing objection), and Wahlforss answers both without hand-waving. His willingness to say “I don’t know if it’s correct, but it felt correct” and to quote chaos theory against his own simulation product is the most trustworthy thing in here.

The genuinely interesting insight isn’t the AI interviewer — it’s that the moat lives in the audience layer, not the model. That reframing (the LLM is a commodity; finding the right 1% of humans is the asset) is a clean way to think about where value sits in any vertical-AI business, and it generalizes well beyond market research.

Where I’d stay skeptical: the simulation pitch quietly assumes you can predict a person from one hour of conversation, and the only proof offered is a tagline-picking anecdote and a self-run experiment. That’s a long way from the “human API” vision they’re selling. Score is a 7 — substantive, well-argued, and a useful mental model on vertical-AI moats, but it’s a promotional format with several load-bearing claims you’d want independently verified.