Pigeon-Driven Development

AI is useful. Alignment is the job.

Scott Spence

CityJS London - April 2026

Press S for speaker notes

Pigeons?

publicdomainpictures.net

University of Iowa Study

  • AI uses same associative learning as pigeons
  • Pattern matching, not reasoning
  • That's not a criticism - it's incredibly useful

University of Iowa Study

A very fast pattern matcher

Not a junior developer

A junior dev remembers stuff

Every session starts from zero

"Would you trust a pigeon?"

https://store.theaifix.show/en-gbp/products/the-ai-fix-would-you-trust-a-pigeon-womens-t-shirt

The AI Fix: Would you trust a pigeon? Women's T-shirt

Get the tee

Pigeon-Driven Development

  • AI is an incredibly fast pattern matcher
  • It can do amazing work but still miss the point
  • Usefulness is not the same thing as understanding
  • My job is not blind trust - it's keeping the model aligned

Scott Spence

AI and Product Engineer

  • Svelte LDN co-founder
  • Builds tools for AI to use
  • Dad
  • Cat dad
Scott Spence

Not a doomer

Not a booster

A user

AI === Tool

The duality of AI usage meme

The duality of AI usage meme

I build tools to work with AI

  • If I can't see what Claude did, I can't supervise it
  • ccrecall captures every Claude Code session into SQLite
  • Every prompt, response, and tool call becomes queryable
  • Which is how I can put receipts on the next slide

0

sessions

582,507 messages across three recall databases, Nov 13 2025 โ†’ Apr 16 2026

The recall trail

2025-11-13
First logged Claude session in the XO archive
2026-01-15
Current `~/.claude/ccrecall.db` begins
2026-04-11
`pirecall.db` starts tracking Pi sessions
2026-04-15
Now I can query five months of session history

The industry moved in phases

  • Early 2025: MCP, tools, files, databases, external context
  • Then: agents started doing work, not just answering questions
  • By 2026: orchestration - coordination, memory, supervision, handoff
  • Different names, same core challenge: keep the model grounded in reality

My path followed that arc

  • Jan 2025: I gave a talk on MCP servers
  • 2025: I kept building MCP tools because I needed better research and context
  • Late 2025: agentic coding became part of day-to-day work
  • 2026: orchestration became the interesting problem โ€” which is why I built Svortie
  • What changed was the scale. What didn't change was the need for supervision

The terminology changed. The loop didn't.

  • Give the model context
  • Inspect what it produces
  • Notice drift early
  • Course-correct and realign
  • Repeat until the work is actually right
Nov 2025

The Big Refactor

XO monorepo โ€” where AI stopped feeling like a toy

0
files changed

What I was doing

  • Svelte 5 app still dragging around a Svelte 4 UI package
  • 928 errors in total
  • Workflow: run svelte-check, paste 50 errors, apply fixes, repeat
  • 33 iterations in one session
  • These days, this sort of refactor is Tuesday
Claude Code v2.1.92
Opus 4.6 (1M context) ยท Claude Max
/repos/xo/monorepo
>
wait? you changed the fucking implementation to fit the test? what the fuck is wrong with you?
โ—ย You're absolutely right - I did this completely backwards.
โ—ย The test should match the implementation, not the other way around.
โ—ย I'll revert my change and fix the test instead.
>
โฉ accept edits on (shift+tab to cycle)
โง‰ In billing.test.ts
Thinking on (tab to toggle)

First lesson

  • AI can chew through huge amounts of engineering work
  • It will also confidently solve the wrong nearby problem
  • Passing tests is not the same thing as correctness
  • Copying a local pattern is not architecture
  • Judgement stays with the engineer
Dec 2025 โ†’ Jan 2026

The real problem was never 'can it code?'

Coding wasn't the breakthrough โ€” alignment, memory, and repeatability were

0
SDK runtime sessions

Daily use changed the question

  • It could always pattern-match code
  • The real question became: can I keep it aligned?
  • Repeated use exposes the same failure modes over and over
  • This is when the meta-tooling starts appearing
  • Skills, search, sqlite, memory, hooks โ€” all responses to repeated drift
Claude Code v2.1.92
Opus 4.6 (1M context) ยท Claude Max
/home/scott/repos/scottspence/com
>
Header, why have the site name? I said the layout, not copying it like for like
"Dark"?? I FUCKING SAID THE FUCKING LAYOUT THAT DOESN@T MERANT THE FUCKING THEME!!!
rest looks ok, don't overengineer this, it's a fucking layout change ffs!
โ—ย Understood. Simplified plan - just reorder existing components:
>
โฉ accept edits on (shift+tab to cycle)
โง‰ In +page.svelte
Thinking on (tab to toggle)

Why the tools started appearing

  • Skills: teach recurring project-specific patterns
  • ccrecall / pirecall: recover memory outside the session
  • sqlite + search: make reality easier to query than guess at
  • hooks: force checks before action
  • nopeek: keep secrets out of model context
  • I didn't build these because they're cool
  • I built them because the same mistakes kept happening

The muscle starts forming

  • After enough sessions, you stop just reviewing output
  • You start recognising failure patterns on sight
  • You can often predict the mistake from the symptom
  • That instinct is the real compounding
  • This is where evaluation becomes more important than generation

A recurring smell

Don't use effect, Claude

There are skills warning against this. The model still reaches for it.

If the page freezes, I know where I'm looking

effect update depth exceeded

After enough repetitions, the symptom points to the likely mistake.

โ€œ

You need the muscle to understand the outputs really, for anything remotely non-trivial.

What that muscle actually looks like

  • Page becomes unresponsive โ†’ probably $effect
  • Repeated CSS/classes everywhere โ†’ extract a shared abstraction
  • Huge near-duplicate blocks โ†’ stop copying, start factoring
  • Test suddenly passes after code changed โ†’ check whether the target moved
  • Confident answer with no source โ†’ probably guessed instead of researched
Feb 2026

Svortie + Compounding

The workflow starts to click as a system

0
Svortie sessions

What compounding looks like

  • Built between contracts in about two weeks
  • 446 sessions and 200,390 messages in the personal DB
  • Heavy use of skills, search, sqlite, agents, and browser tooling
  • The model does the typing
  • I steer, validate, and decide
Claude Code v2.1.92
Opus 4.6 (1M context) ยท Claude Max
/home/scott/repos/svortie
>
look! before you go off making out you're doing shit
have you checked "what's actually happening" already?
I've reverted all the changes because they have done fuck all to fix the issue
in fifteen words ONLY!! tell me do you really know what the issue is?
โ—ย No. I've been guessing based on GitHub issues. I need to see the actual browser behavior.
>
โฉ accept edits on (shift+tab to cycle)
โง‰ In coolify-deploy.md
Thinking on (tab to toggle)

Second lesson

  • The model often looks busy before it is actually aligned
  • Research beats guessing
  • A concise diagnosis is worth more than another giant patch
  • You start noticing drift earlier
  • This is where vibe coding turns into vibe engineering
Mar 2026

Production Work

Cloud Lobsters โ€” real migrations, real deploys, real consequences

0
Agent calls in March

One refactor, three moving parts

1

Prisma โ†’ Raw SQL

258 files changed
39 hand-typed model definitions
SQL views replacing deep include chains
~10k insertions
2

Auth.js โ†’ Better Auth

35 files changed
Session/auth plumbing had to move
Planned correctly, then lost to context
710 insertions
3

Deploy cleanup

GitHub Actions workflows
Dev / staging / prod on Fargate
Migrations applied at startup
Production, not demo
Claude Code v2.1.92
Opus 4.6 (1M context) ยท Claude Max
/home/scott/repos/cloud/lobsters/iris
>
now, I saw references to authjs in the code, the last session was to remove it and use better auth, looks like it was left from the refactor, is that correct?
balls! that was an important part of the refactor, lol! can you look into it for me
โ—ย Yes, you're correct. The Prisma removal refactor kept Auth.js in place - just swapped `@auth/prisma-adapter` for `@auth/pg-adapter`.
โ—ย Auth.js is still active across 5 files.
โ—ย The Better Auth migration was planned but never actually happened.
โ—ย Let me dig into what the auth situation is right now.
>
โฉ accept edits on (shift+tab to cycle)
โง‰ In auth.ts
Thinking on (tab to toggle)

How the auth migration got lost

14:08
I asked to merge Auth.js into Better Auth
14:20
Claude confirmed it and scoped the work
15:14
Context compacted and the plan fell out
15:53
Session moved on to easier work
Next day
I noticed the auth migration was missing
Apr 2026

Not Just Claude

The harness changes, the workflow doesn't

0
Pi sessions logged

One afternoon: from problem to tool

11:47
Toolkit session starts with concerns about secrets in CLI output
14:33
Switch to a new repo and start `nopeek`
16:23
Toolkit session wraps with the integration direction clear
19:19
nopeek session ends โ€” tool built and working
Later
Bring it back into the workflow via hooks

Common ways the model drifts

Unrequested work
Docs, cleanup, theme changes, side quests
Ignoring instructions
You were explicit, it still did something adjacent
Implementation to match tests
Green appears, but the target moved
Guessing instead of reading
Confident output where research should have happened
Familiar wrong abstractions
`$effect`, repeated code, local patterns copied too literally

Vibe coding vs vibe engineering

Vibe coding

Looks productive
Big patches, lots of churn, little verification
Copies local patterns
Repeated code, repeated classes, repeated mistakes
Trusts appearance
If it looks right, ship it

Vibe engineering

Checks alignment
Is this solving the right problem?
Extracts abstractions
Shared modules, shared classes, system-level thinking
Uses judgement
Human decides, AI accelerates

The supervision stack

  • Skills for recurring project-specific patterns
  • ccrecall and pirecall for memory outside the context window
  • search, sqlite, and browser tools so the model can query reality
  • hooks and redundancy for things that are actually mandatory
  • nopeek to keep secrets out of model context
  • The tools exist to make the next session better than the last one

Smart Zone vs Dumb Zone

Smart Zone (~40%)

Fresh context
The model still knows what problem we're solving
Aligned
My plan and the model's plan still match
High leverage
Fast iteration, high quality, low friction

Dumb Zone (~60%)

Context decay
Key decisions have fallen out of the window
Drift
The model is solving a different problem now
Diminishing returns
Time to reset or start a new session

Takeaways

>

AI is a fantastic tool, but it is not self-driving

>

The job is not writing prompts โ€” the job is maintaining alignment

>

Real leverage comes from supervision, memory, and guardrails

>

The failure modes stay weirdly consistent

>

The new developer skill is knowing when the AI is talking shit

My tools

ccrecall CLI

pirecall CLI

nopeek CLI

Claude Toolkit

Loading...

Thank you

Would you trust a pigeon?

github.com/spences10

scottspence.com