June 11, 2026 3 min readAI coding agentstokensbenchmarkMCPRAG

We Measured It — How Many Tokens a Memory Layer Actually Saves Your Coding Agent

By The LLMtoMD team

Everyone selling an "AI memory layer" throws around token-savings numbers. Most are hand-waved. So we measured ours on a real document and we're showing our work — including the caveats that make the number honest.

The setup

We took an actual 50-page functional requirements & system design document (an FRD for an aviation data platform) and converted it to clean Markdown with LLMtoMD. Measured:

143,134 characters
22,070 words
35,783 tokens
split into 203 retrievable chunks

Then we asked one ordinary question a coding agent building this system would ask: "What are the authentication and authorization requirements for the platform?" — two different ways.

Way 1 — no memory layer (load the whole doc)

The default move: put the FRD in the agent's context so it can "read" it. To answer that one question, the model carries the entire 35,783-token document.

Way 2 — memory layer (retrieve only what's needed)

With the document stored in LLMtoMD, the agent runs a semantic search and pulls back only the relevant passages. For that question it retrieved the top 5 chunks — about 1,180 tokens — and those chunks contained the exact answer: the IAM requirements (FR-IAM-001 through 010), RBAC + ABAC authorization, multi-factor auth for privileged actions, OIDC/SAML federation. The agent then synthesized a cited answer from that slice.

The result

	Tokens for the question
Load the whole FRD	35,783
Retrieve only what's relevant	≈ 1,180
Reduction	≈ 96.7%

For that question, the memory layer used ~30× fewer tokens — and lost nothing, because the retrieved slice held the complete answer.

Now scale it to a real build session

One question is the conservative case. Coding agents run long — and without a memory layer, that 35,783-token document tends to sit in context across many turns:

Without: keep the FRD in context across a 20-turn build → ~35.8k × 20 ≈ ~716,000 tokens of document context.
With: retrieve per question, say ~10 lookups × ~1.2k ≈ ~12,000 tokens.

That's a ~98% reduction in document-context tokens over the session.

The honest caveat

Two things keep this number from being a lie:

Prompt caching narrows the cost gap. Modern models cache a stable block of context that's re-sent each turn and re-read it at a fraction of the price. So the dollar savings are smaller than the raw token reduction. The token-count reduction is real; the cost reduction is real but smaller.
The bigger wins aren't even the tokens. Not blowing your context window on a large codebase, and an agent that stays consistent with the spec instead of drifting, matter more than the line-item savings on most projects.

Why it works

Two reasons the retrieved slice beats the whole document:

Clean Markdown. We measured the document after converting it to structured Markdown — headings, tables, and lists intact. Feed an agent a raw PDF instead and it spends tokens (and accuracy) fighting layout noise. (Why that wrecks answer quality.)
Retrieval, not stuffing. The agent asks for what it needs over MCP and gets a focused, cited answer — instead of re-reading 35,000 tokens to find one table.

The takeaway

On a real 50-page spec, giving the coding agent a memory layer cut the tokens for a question by ~97%, and the document context across a session by ~98% — while improving the answer, because retrieval returns the exact relevant requirements with citations.

That's the whole point of Give Your AI Coding Agent a Memory: stop re-sending the spec, retrieve it.

Measure it on your own docs. Convert your first document free →, connect your coding tool, and watch the token counter. See how it connects to your tools.

Convert anything to AI-ready Markdown

PDFs, Office docs, images, audio, and whole websites — clean Markdown and RAG-ready exports for your LLM, in seconds.

Convert your first document free See pricing