Pigeon-Driven Development

AI is useful. Alignment is the job.

Scott Spence

CityJS London - April 2026

Press S for speaker notes

Pigeons?

publicdomainpictures.net

University of Iowa Study

AI uses same associative learning as pigeons

Pattern matching, not reasoning

That's not a criticism - it's incredibly useful

University of Iowa Study

A very fast pattern matcher

Not a junior developer

A junior dev remembers stuff

Every session starts from zero

"Would you trust a pigeon?"

The AI Fix: Would you trust a pigeon? Women's T-shirt

Get the tee

Pigeon-Driven Development

AI is an incredibly fast pattern matcher

It can do amazing work but still miss the point

Usefulness is not the same thing as understanding

My job is not blind trust - it's keeping the model aligned

Scott Spence

AI and Product Engineer

Svelte LDN co-founder

Builds tools for AI to use

Dad

Cat dad

Not a doomer

Not a booster

A user

AI === Tool

The duality of AI usage meme

I build tools to work with AI

If I can't see what Claude did, I can't supervise it

ccrecall captures every Claude Code session into SQLite

Every prompt, response, and tool call becomes queryable

Which is how I can put receipts on the next slide

0

sessions

582,507 messages across three recall databases, Nov 13 2025 → Apr 16 2026

The recall trail

2025-11-13

First logged Claude session in the XO archive

2026-01-15

Current `~/.claude/ccrecall.db` begins

2026-04-11

`pirecall.db` starts tracking Pi sessions

2026-04-15

Now I can query five months of session history

The industry moved in phases

Early 2025: MCP, tools, files, databases, external context

Then: agents started doing work, not just answering questions

By 2026: orchestration - coordination, memory, supervision, handoff

Different names, same core challenge: keep the model grounded in reality

My path followed that arc

Jan 2025: I gave a talk on MCP servers

2025: I kept building MCP tools because I needed better research and context

Late 2025: agentic coding became part of day-to-day work

2026: orchestration became the interesting problem — which is why I built Svortie

What changed was the scale. What didn't change was the need for supervision

The terminology changed. The loop didn't.

Give the model context

Inspect what it produces

Notice drift early

Course-correct and realign

Repeat until the work is actually right

Nov 2025

The Big Refactor

XO monorepo — where AI stopped feeling like a toy

0

files changed

What I was doing

Svelte 5 app still dragging around a Svelte 4 UI package

928 errors in total

Workflow: run svelte-check, paste 50 errors, apply fixes, repeat

33 iterations in one session

These days, this sort of refactor is Tuesday

Claude Code v2.1.92
Opus 4.6 (1M context) · Claude Max
/repos/xo/monorepo
>wait? you changed the fucking implementation to fit the test? what the fuck is wrong with you?
● You're absolutely right - I did this completely backwards.
● The test should match the implementation, not the other way around.
● I'll revert my change and fix the test instead.
>
⏩ accept edits on (shift+tab
						to cycle)
⧉ In billing.test.ts
Thinking on (tab to toggle)

First lesson

AI can chew through huge amounts of engineering work

It will also confidently solve the wrong nearby problem

Passing tests is not the same thing as correctness

Copying a local pattern is not architecture

Judgement stays with the engineer

Dec 2025 → Jan 2026

The real problem was never 'can it code?'

Coding wasn't the breakthrough — alignment, memory, and repeatability were

0

SDK runtime sessions

Daily use changed the question

It could always pattern-match code

The real question became: can I keep it aligned?

Repeated use exposes the same failure modes over and over

This is when the meta-tooling starts appearing

Skills, search, sqlite, memory, hooks — all responses to repeated drift

Claude Code v2.1.92
Opus 4.6 (1M context) · Claude Max
/home/scott/repos/scottspence/com
>Header, why have the site name? I said the layout, not copying it like for like
"Dark"?? I FUCKING SAID THE FUCKING LAYOUT THAT DOESN@T MERANT THE FUCKING THEME!!!
rest looks ok, don't overengineer this, it's a fucking layout change ffs!
● Understood. Simplified plan - just reorder existing components:
>
⏩ accept edits on (shift+tab
						to cycle)
⧉ In +page.svelte
Thinking on (tab to toggle)

Why the tools started appearing

Skills: teach recurring project-specific patterns

ccrecall / pirecall: recover memory outside the session

sqlite + search: make reality easier to query than guess at

hooks: force checks before action

nopeek: keep secrets out of model context

I didn't build these because they're cool

I built them because the same mistakes kept happening

The muscle starts forming

After enough sessions, you stop just reviewing output

You start recognising failure patterns on sight

You can often predict the mistake from the symptom

That instinct is the real compounding

This is where evaluation becomes more important than generation

A recurring smell

There are skills warning against this. The model still reaches for it.

If the page freezes, I know where I'm looking

After enough repetitions, the symptom points to the likely mistake.

“

You need the muscle to understand the outputs really, for anything remotely non-trivial.

What that muscle actually looks like

Page becomes unresponsive → probably $effect

Repeated CSS/classes everywhere → extract a shared abstraction

Huge near-duplicate blocks → stop copying, start factoring

Test suddenly passes after code changed → check whether the target moved

Confident answer with no source → probably guessed instead of researched

Feb 2026

Svortie + Compounding

The workflow starts to click as a system

0

Svortie sessions

What compounding looks like

Built between contracts in about two weeks

446 sessions and 200,390 messages in the personal DB

Heavy use of skills, search, sqlite, agents, and browser tooling

The model does the typing

I steer, validate, and decide

Claude Code v2.1.92
Opus 4.6 (1M context) · Claude Max
/home/scott/repos/svortie
>look! before you go off making out you're doing shit
have you checked "what's actually happening" already?
I've reverted all the changes because they have done fuck all to fix the issue
in fifteen words ONLY!! tell me do you really know what the issue is?
● No. I've been guessing based on GitHub issues. I need to see the actual browser behavior.
>
⏩ accept edits on (shift+tab
						to cycle)
⧉ In coolify-deploy.md
Thinking on (tab to toggle)

Second lesson

The model often looks busy before it is actually aligned

Research beats guessing

A concise diagnosis is worth more than another giant patch

You start noticing drift earlier

This is where vibe coding turns into vibe engineering

Mar 2026

Production Work

Cloud Lobsters — real migrations, real deploys, real consequences

0

Agent calls in March

One refactor, three moving parts

1

Prisma → Raw SQL

258 files changed

39 hand-typed model definitions

SQL views replacing deep include chains

~10k insertions

2

Auth.js → Better Auth

35 files changed

Session/auth plumbing had to move

Planned correctly, then lost to context

710 insertions

3

Deploy cleanup

GitHub Actions workflows

Dev / staging / prod on Fargate

Migrations applied at startup

Production, not demo

Claude Code v2.1.92
Opus 4.6 (1M context) · Claude Max
/home/scott/repos/cloud/lobsters/iris
>now, I saw references to authjs in the code, the last session was to remove it and use better auth, looks like it was left from the refactor, is that correct?
balls! that was an important part of the refactor, lol! can you look into it for me
● Yes, you're correct. The Prisma removal refactor kept Auth.js in place - just swapped `@auth/prisma-adapter` for `@auth/pg-adapter`.
● Auth.js is still active across 5 files.
● The Better Auth migration was planned but never actually happened.
● Let me dig into what the auth situation is right now.
>
⏩ accept edits on (shift+tab
						to cycle)
⧉ In auth.ts
Thinking on (tab to toggle)

How the auth migration got lost

14:08

I asked to merge Auth.js into Better Auth

14:20

Claude confirmed it and scoped the work

15:14

Context compacted and the plan fell out

15:53

Session moved on to easier work

Next day

I noticed the auth migration was missing

Apr 2026

Not Just Claude

The harness changes, the workflow doesn't

0

Pi sessions logged

One afternoon: from problem to tool

11:47

Toolkit session starts with concerns about secrets in CLI output

14:33

Switch to a new repo and start `nopeek`

16:23

Toolkit session wraps with the integration direction clear

19:19

nopeek session ends — tool built and working

Later

Bring it back into the workflow via hooks

Common ways the model drifts

Unrequested work

Docs, cleanup, theme changes, side quests

Ignoring instructions

You were explicit, it still did something adjacent

Implementation to match tests

Green appears, but the target moved

Guessing instead of reading

Confident output where research should have happened

Familiar wrong abstractions

`$effect`, repeated code, local patterns copied too literally

Vibe coding vs vibe engineering

Vibe coding

Looks productive

Big patches, lots of churn, little verification

Copies local patterns

Repeated code, repeated classes, repeated mistakes

Trusts appearance

If it looks right, ship it

Vibe engineering

Checks alignment

Is this solving the right problem?

Extracts abstractions

Shared modules, shared classes, system-level thinking

Uses judgement

Human decides, AI accelerates

The supervision stack

Skills for recurring project-specific patterns

ccrecall and pirecall for memory outside the context window

search, sqlite, and browser tools so the model can query reality

hooks and redundancy for things that are actually mandatory

nopeek to keep secrets out of model context

The tools exist to make the next session better than the last one

Smart Zone vs Dumb Zone

Smart Zone (~40%)

Fresh context

The model still knows what problem we're solving

Aligned

My plan and the model's plan still match

High leverage

Fast iteration, high quality, low friction

Dumb Zone (~60%)

Context decay

Key decisions have fallen out of the window

Drift

The model is solving a different problem now

Diminishing returns

Time to reset or start a new session

Takeaways

>

AI is a fantastic tool, but it is not self-driving

>

The job is not writing prompts — the job is maintaining alignment

>

Real leverage comes from supervision, memory, and guardrails

>

The failure modes stay weirdly consistent

>

The new developer skill is knowing when the AI is talking shit

My tools

ccrecall CLI

pirecall CLI

nopeek CLI

Claude Toolkit

Loading...

Thank you

Would you trust a pigeon?

github.com/spences10

scottspence.com