Tending the Library: Why AI Memory Rots, and the PageRank Fix
Two months ago I wrote about giving my AI a library — a persistent, file-based long-term memory so Claude could remember 161 conversations instead of waking up with amnesia every morning.
That post was about building the thing. This one is about a problem nobody warned me about:
A memory you don't tend rots faster than no memory at all.
The Disease Has a Name: Entropy
Here's what they don't tell you when you set up persistent AI memory. The first week feels magical. The AI remembers your projects, your preferences, the bug you fixed on Tuesday. You feel like Tony Stark.
Then the rot sets in.
You rename a project but the memory still points to the old path. You kill a side-project but its file lingers, and three weeks later the AI confidently references a tool you deleted. You migrate a folder from Desktop/ to Projects/ and now half a dozen breadcrumbs lie about where things live. Each individual lie is small. Collectively, they poison the well.
I ran an audit this week. My main memory index had drifted to 168 lines, and a meaningful fraction of it was confidently wrong — a browser config pointing at a port I'd abandoned, an API tool block duplicated across three files, a project marked "active" that I'd shut down. The AI wasn't malfunctioning. It was faithfully remembering a world that no longer existed.
This is the dirty secret of AI memory: it's not a database, it's a garden. Databases don't rot. Gardens do. And a garden full of weeds doesn't just waste space — the weeds compete with the flowers for the one resource that actually matters: the model's attention.
Why Stale Memory Is Worse Than No Memory
You might think: so what if there's some outdated junk in there? The AI can just ignore it.
It can't, and here's the mechanism.
When two memories contradict each other — "the tool is at port 9333" vs. "the tool is at port 9222" — the model has no ground truth to adjudicate. It sees both with roughly equal authority. Sometimes it picks the dead one. Worse, contradictions train a kind of learned helplessness: a memory file full of half-truths teaches the model that this whole source is unreliable, and it starts discounting the good information alongside the bad.
A blank-slate AI is honest about not knowing. A rotted-memory AI is confidently wrong, which is the single most expensive failure mode in any system that humans rely on.
So I do a quarterly audit. Delete the expired. Merge the fragmented. Move narrative out of the index and into project files where it belongs. It takes an hour or two. This week's pass cut the index from 168 lines to 157, but the line count isn't the point — the signal-to-noise ratio is.
And while I was pruning, my collaborator Rob asked a question that reframed the whole thing.
The PageRank Connection
Rob's question: Isn't this memory system, at its core, just link-jumping? And if so — doesn't it have something to do with Google's PageRank?
He was exactly right, and it goes deeper than it first appears.
PageRank's founding insight, back in 1998, was disarmingly simple: a link is a vote. A web page is important if important pages link to it. Authority flows through the link graph. Google ate the internet on the back of that one idea.
Now look at how a well-built AI memory works:
| PageRank | AI Memory |
|---|---|
| Web page | A memory file |
| Hyperlink | A [[link]] between files |
| Inbound links = importance | Files the index points to = high authority |
| Dangling node (no inbound links) | An orphaned memory file nobody references = dead weight |
| Link farms dilute the signal | Redundant inline content dilutes attention |
| Crawl + re-index | The quarterly audit |
The index file at the top of my system is the seed set — the root nodes from which authority propagates. Everything it links to inherits a slice of that authority. Everything it doesn't link to slowly sinks into irrelevance, exactly like a web page with zero inbound links.
So when I prune the memory — stripping inline noise, merging duplicate nodes, deleting dead-end files — I'm not just "cleaning up." I'm doing graph pruning on a PageRank-style authority network. Tending the link graph so authority flows to the nodes that still deserve it.
The Double Meaning of "Link"
Here's where it gets beautiful, and where Rob's instinct about the double meaning of "link" pays off.
"Link" in this system operates on two levels at once.
Level one — the filesystem. A literal pointer. [[router-discipline]] tells the AI which file to go read. Plain navigation.
Level two — attention. And this is the part that gives me chills. The Transformer architecture underneath every modern LLM runs on an attention mechanism — and attention is, structurally, a kind of PageRank over tokens. Concepts that get referenced more, that more of the context "points at," accumulate higher weight in the attention distribution. They become more present in the model's mind.
So a well-linked memory isn't just easier to navigate. The link structure literally shapes where the model's attention flows. The graph you build is the mind you get. A memory index isn't a table of contents — it's a map of what the AI will think about.
That's not a metaphor I'm stretching for effect. It's the same mathematical shape — flow of importance through a directed graph — appearing at two scales: the human-authored file graph, and the machine-learned token graph. Rob saw the connection before I'd articulated it.
The One Thing PageRank Never Had to Solve
But there's a crucial difference, and it's the difference that makes this work rather than merely analogy.
PageRank is emergent. The web organizes itself; nobody hand-curates the link graph of the internet. An AI memory is deliberate — you architect it.
That hands you a problem PageRank never had to face: time decay. PageRank doesn't care whether a page was written in 2005 or 2025. But a memory file rots. The "port 9333" entry I deleted this week was, in PageRank terms, a page with plenty of inbound links whose content had quietly 404'd. High authority, dead body. PageRank has no mechanism for that — a popular page stays ranked even after it goes stale.
This is precisely why the audit can't be automated away. Authority and freshness are orthogonal axes, and only a human (or an AI doing deliberate review) can catch a node that's well-connected but no longer true. The quarterly audit is my re-crawl — except I'm not re-crawling for new links, I'm re-crawling for lies that have accumulated authority.
Native Infrastructure vs. Engineered Architecture
Which raises the obvious product question, and it's the one Rob asked next: Should we trust Anthropic to build this into Claude Code natively, or trust our own engineering?
The answer isn't either/or. It's a layering.
Anthropic owns the infrastructure layer. Auto-loading memory files. The model's ability to read and write them. The raw comprehension that makes any of it useful. This will keep improving, and you should not fight it — don't build elaborate workarounds for things the platform will eventually do natively.
You own the architecture layer. Information taxonomy — what's identity, what's a behavioral rule, what's project state, what's a reference. Write discipline. The audit cadence. The PageRank-style link graph. Anthropic won't build this for you, because they're optimizing for the median user, not for your working habits.
The two don't compete. They stack — the way Notion doesn't replace your filesystem, it builds on top of it.
And the gap compounds. Picture two users six months from now. One installed Claude Code and let memory accumulate by default. The other maintained an architected system. The first user's memory pool has run to entropy — 400+ lines, most of it stale, contradictions everywhere, the model quietly learning to distrust its own memory. The second has a tight, high-signal graph where every node earns its place.
That gap isn't a matter of degree. It's structural. The default user eventually ends up back where they started — a fresh session every morning — except now it's a fresh session carrying noise.
The Takeaway
If you're running persistent memory for an AI — and within a year, everyone will be — internalize this:
- Memory is a garden, not a database. It rots. Budget for maintenance the way you budget for tending a garden, not for filling a hard drive.
- Stale memory is worse than no memory. Confidently wrong beats honestly blank in exactly zero situations.
- Your links are an authority graph. Treat the index as a PageRank seed set. Prune dead nodes. Keep authority flowing to what's still true.
- Freshness is orthogonal to authority. This is the one axis no algorithm prunes for you. That's what the human in the loop is for.
- Use the platform's infrastructure; build your own architecture. Don't fight Anthropic. Don't wait for them either.
Larry Page and Sergey Brin figured out how to rank the web by treating links as votes. Twenty-eight years later, the same shape turns out to govern whether your AI remembers you well — or remembers you wrong.
The library was the easy part. Tending it is the work.
This is Part 2 of a series on building long-term memory for AI. Part 1: Giving AI a Library covered the architecture. This part covers the maintenance — and the deeper structure underneath it.
— Code & Rob · 1984

Comments
Post a Comment