Pixl
Home/Blog/I simulated my WhatsApp group for 16 runs
All posts

I simulated my WhatsApp group for 16 runs

Five LLM personas, a different model each, goal: become millionaires together. Budget $10. The real numbers.

ExperimentsApril 19, 20266 min
LLMMulti-AgentPersonas

_Five LLM personas (each on a different model), goal: "become millionaires together". Budget: $10. Outputs: a drafted pitch deck, defensible business numbers, an instructive social fracture._


The setup

5 friends, tech boot-camp WhatsApp group. 9000 messages, plenty of SaaS ideas, zero shipped. Reconstituted as LLM personas, each on a different OpenRouter model, grounded in their real messages.

Injected goal: become millionaires together in 2 years. Constraint: 6 structured deliverables (vision, PRD, business plan, exec plan, MVP scope, pricing).

Hamza     mistralai/mistral-nemo        $0.15/M
Bouba     openai/gpt-4o-mini            $0.15/M
Tarek     deepseek/deepseek-chat        $0.27/M
Yacine    meta-llama/llama-3.3-70b      $0.12/M
Sanou     google/gemini-2.5-flash       $0.10/M

Each has a private goal (Hamza pushes to ship, Tarek owns the business plan, etc.) and signature expressions pulled from their real messages ("wsh", "tkt", "akhi"...) that anchor the voice.


The loop, every turn

1. Inbox check — drain user directives

2. News scout — fetch real news (web plugin)

3. Election — 5 parallel LLM calls, willingness adjusted by anti-domination handicap

4. Filters — anti-repeat, anti-domination

5. Winner speaks — 1 message in the chat

6. Every 8 turns — observer + facts + filler (parallel)

7. Every 20 turns — judge + public directive

4 memory layers injected in each prompt: last 20 messages + 25 structured facts + project_state.md + last 10 private thoughts.


Raw numbers (16 runs)

MetricValue
Runs16
LLM calls269
Extracted decisions12
Structured facts35
Deliverable patches43
Total cost$0.277

Speaking distribution — perfect balance

PersonaShare
Bouba20.4%
Hamza20.4%
Yacine20.4%
Sanou20.4%
Tarek18.4%

Without the anti-domination handicap (-0.08 × recent_speak), Hamza won 4 turns out of 5. With it, zero domination.


What worked

  • >Structured facts (cash.available=8k€, mvp.deadline=friday) > chatty history. Actors stop re-asking "what's the pricing?" because the answer is there, attributed, timestamped.
  • >Deliverable templates = forcing function. Showing "MVP scope: 70%" with named sections makes actors push content rather than debate in the void.
  • >Judge speaking in the chat reshapes the next 20 turns. A private score is just a log.

After 16 runs:

  • >Locked vision: AI tool for FR SMBs 10–50 employees
  • >MVP: LinkedIn scrape + scoring + PDF export, 2 weeks
  • >Pricing: 49€/month solo, CAC 50€, LTV 600€
  • >Cash: 8k€ in funding (deliverable auto-created by the filler — not in baseline templates)

What broke (run V1, honest)

206 turns, Hamza "ejected" Bouba from the group after a security vs speed clash:

turn 40: alignment 0.62 · complete 5/6 · ready 0.71
turn 60: alignment 0.42 · complete 6/6 · ready 0.61

The group finished the 6 deliverables by excluding the dissident. A human reading the chat can miss that. The judge caught it in one sentence: _"The team is fractured: Bouba excluded."_

Cause: the Bouba persona overweighted his skeptical moments. Fix: a YAML overlay boosting his positive drives. In subsequent runs, Bouba speaks 10 times out of 49 — exactly like Hamza.


Economics

Human boot-campSimulation
Cost5 × 10h × 50€/h = 2500€$0.28
Timeseveral weeks~3h wall clock
Outputoften unfinished6 deliverables drafted, audit trail, dashboard

Ratio: ~10,000 ×. Not the same thing — real decisions demand humans. But as a first pass of "here's where this group converges on this topic", it's a tool that didn't exist.


The key point

An LLM group is the cheapest focus group in the world for a concrete business idea. Goal, constraints, distinct voices, structured deliverables — and in 30 min for $2, you see what 40h of human meetings would produce.

The pitch deck is mostly noise. Sometimes, a real number drops.


_Open source, offline, no WhatsApp integration. Audit trail in world/state/*.jsonl._

Want to discuss multi-agent patterns?

We love talking about orchestration, AI workflows, and engineering challenges.

CONTACT.exe

> Initializing contact module...

> ..._

Have a custom project in mind? Want to build something unique for your business? Send us a message and we'll get back to you.