Published in AI & SEO
The Markdown Mirage: Why Stripping Your Website Won't Help AI Crawlers
A new idea is circulating in AI SEO circles: strip your website down to Markdown so LLMs can crawl faster. The theory is that simple text means fewer tokens, less cruft, and better AI-driven search.
It feels clever until someone who actually builds search engines explains why it is wrong.
That explanation arrived on Google's Search Off the Record podcast, Episode 111, when John Mueller and Martin Splitt addressed the trend directly. Their message was unambiguous: stripping your site down to Markdown removes the exact parts that help Google understand what your content means, where it lives, and whether it can be trusted. The trend is not just ineffective. It is actively harmful to the signals that determine whether you rank at all.
Why Markdown Sounds Tempting — And Where That Logic Breaks
The argument from AI SEO proponents is straightforward. HTML contains elements irrelevant to reading comprehension: CSS classes, inline styles, JavaScript tags, div wrappers. A machine does not need to know a heading is styled in #2c3e50. It just needs the text and structure.
Martin Splitt acknowledged this openly. Markdown is minimalist. If a Markdown file fails to render, you can still read it in a text editor and understand the hierarchy.
"If a Markdown render fails and you look at the Markdown file in a text editor, it still is structured and readable... And I think this minimalism is probably what makes people think, yeah, this is great for a machine."
— Martin Splitt
The problem, as Splitt explained, is that minimalism comes at a cost. And the cost is everything search engines use to evaluate a website beyond the words on a single page.
But there is a deeper flaw. Markdown was created by John Gruber in 2004 as a tool for writers, not a publishing format. Peter Conrad, in his Medium piece on Markdown's proper role, captured it precisely: "Markdown was not intended to replace HTML, but to augment it — it is meant as a writing tool, whereas HTML is a publishing format."
When you treat Markdown as a delivery format for crawlers, you are using a writing shortcut as an architectural substitute. That is like delivering blueprints to a building inspector when they asked to see the building itself. The blueprints are useful to the builder. They are useless to anyone evaluating whether the building is sound.
The Library With No Shelves
Think of your website as a public library. The books are your articles. The HTML is the entire building: the shelves that group related books, the signs that tell you which floor holds history, the checkout desk that verifies you are in a real library, and the index cards that show which other books reference the one you are holding.
Now imagine tearing down the shelves, removing the signage, and dumping every book onto a single table. Readers can still read every page. But they have no idea how the collection is organised, whether the sources are credible, or how to find related works. They cannot tell if this is curated by experts or copied from Wikipedia into a binder.
That is what Markdown does. It gives you the words without the architecture. And search engines do not just read words. They read architecture.
What Gets Stripped Away
When you convert a website to Markdown, you are not just removing cruft. You are removing the signals that connect a single page to the rest of the internet:
| HTML Element | What It Tells Search Engines | What Markdown Removes |
|---|---|---|
| Navigation menus | How the site is organised and what topics it covers | Site-wide context and hierarchy |
| Header and footer links | Authority signals, contact credibility, legal standing | Trust and legitimacy markers |
| Inline contextual links | Relationships between topics and endorsement of sources | Content graph and topical depth |
| Sidebar sections and categories | Taxonomy and thematic clustering | Thematic organisation signals |
| Schema markup and structured data | Entity recognition, rich snippets, machine-readable facts | Machine comprehension helpers |
| Breadcrumb trails | Page depth and navigational logic | User journey and site layering |
Without these elements, a search engine sees a pile of text files. It does not see a website.
The value is not theoretical. Google's own structured data documentation includes case studies: Rotten Tomatoes measured a 25% higher CTR after adding structured data to 100,000 pages. Nestlé reported rich results drove an 82% higher CTR. Schema markup tells search engines what kind of entity a page represents. Strip it away, and you strip away the mechanism by which search engines classify and surface your content.
The "Trivial" Problem
John Mueller undercut the entire premise. Converting HTML to usable text is trivial. We have been doing it for decades.
"The web with HTML and everything has been around for really long time, longer than Markdown. And all of the crawlers out there, have practiced with HTML. And converting HTML into text is trivial."
— John Mueller
The central selling point of Markdown for LLMs — simplifying what crawlers process — solves a problem that does not exist. HTML has been the web's publishing language since 1990, fifteen years before Markdown appeared. Every major search engine parses HTML effortlessly. Jeff Atwood put this bluntly in his Coding Horror essay: "Parsing HTML is a solved problem." Mature libraries exist in every language — Python's BeautifulSoup, .NET's HTML Agility Pack, C's libxml2 — all purpose-built to extract clean text from HTML without inventing a parallel format.
Why Google Will Not Trust a Markdown Copy
Search engines evaluate content at multiple levels simultaneously: the individual page, the section, the site as a whole, and the neighbourhood of sites that link to and from it. This layered evaluation is how Google distinguishes a serious publication from a content farm.
Publish a Markdown alternative, and the navigation that shows you are part of a coherent business is gone. The internal links that demonstrate topical depth are gone. The structured data that tells search engines what kind of entity you are, whether a local business, a news outlet, a recipe site, is gone.
Googl and Bing has no reason to treat a stripped version as canonical or authoritative. History supports this caution. When site owners were given a shortcut to influence rankings, like the old keyword meta tag, they abused it. Search engines learned to disregard what site owners claimed and to rely on what they could independently verify from the actual rendered content and structure.
The Shortcut Temptation
This pattern repeats in SEO. Someone invents a technique promising to bypass the hard work, it scales aggressively, then collapses when search engines catch up. Keyword stuffing, private blog networks, exact-match domains, doorway pages; The cycle is familiar. This is what Lily Ray means when she says "that's why we can't have nice things."
The Markdown-for-AI trend fits the same profile. Complexity is not the enemy. Meaningless complexity is the enemy. The structural elements of a well-built HTML page — navigation, links, schema, hierarchy — are the grammar of the web. Removing them does not clarify your message. It silences it.
Token Anxiety Is Already Outdated
Two years ago, being "token conscious" felt prudent. Context windows were tight and inference was expensive. That calculus has collapsed.
Research by a16z partner Guido Appenzeller shows inference costs for equivalent performance falling by roughly 10x every year. GPT-3 cost $60 per million tokens in November 2021; by November 2024, Llama 3.2 3B achieved the same score for $0.06 — a 1,000x reduction in three years. Epoch AI confirmed median declines of 50x per year.
Free-weight open-source models like DeepSeek R1 and Llama 3.3 now run on consumer hardware. A $1,500 GPU setup handles tasks that previously required OpenAI API calls. Being token conscious made sense when tokens were scarce. Today, the bottleneck is whether your content is worth indexing, not how efficiently a crawler reads it.
What This Means for Your Site
If you run a small business or a site without a dedicated technical team, the good news is that you do not need anything new or complicated to avoid this trap. The fundamentals that have mattered for years still matter.
I have written before about how technical SEO debt accumulates invisibly until it costs businesses traffic and revenue. The Markdown trend is a different species of the same mistake: removing or ignoring the structural signals that search engines depend on.
The fundamentals worth preserving are simple but not optional:
-
Keep your full HTML structure intact. Navigation, headers, footers, and internal links tell search engines how your content connects to the broader web.
-
Maintain clear internal linking. Every article should link naturally to related content. This is how search engines discover new pages and measure topical depth.
-
Use structured data where it makes sense. Schema markup is the bridge between human-readable content and machine-comprehensible meaning.
-
Think in layers, not pages. Search engines evaluate your site as a collection. A strong homepage, clear categories, consistent URLs, and logical breadcrumbs all reinforce trust.
-
Focus on what you can prove. Original data and clear sourcing build the authority that both traditional search and AI systems reward.
This trend is worth watching not because Markdown is a threat, but because of what it reveals about the current so called "AI SEO" landscape: a growing appetite for quick fixes that promise to shortcut the hard work of building trust and authority. Every structural shift in search technology produces a wave of premature obituaries for established practice, followed by a quieter correction when the fundamentals reassert themselves.
AI is changing search. The zero-click era is real. AI Overviews are cutting referral traffic. Although still small, younger users are starting queries on TikTok and ChatGPT before they ever reach Google, and many more people will start using other chatbots like Qwen or Claude for to replace searching on Google. But none of these changes mean that site structure, crawlability, and contextual linking no longer matter. They mean the opposite. As the number of places where search happens multiplies, the clarity of your site's architecture becomes more important, not less. If an AI agent has to choose between a well-organised HTML site with clear internal linking and a stack of Markdown files with no context, the choice is obvious.
The Bottom Line
The advice from Google's own engineers is clear: stop creating parallel Markdown versions for "AI optimisation." They strip away the signals that matter, offer no technical advantage, and create a version of your content that search engines have no reason to trust.
HTML is not the enemy of AI crawlers. It is their native language. The structural elements that Markdown removes — navigation, links, headers, schema — are not obstacles to understanding. They are the means by which understanding happens.
Your website is not a document. It is a place. Tear down the architecture in the name of simplicity, and you are left with a pile of words in an empty field, with no way for anyone — human or machine — to know where they came from or whether they matter.
Sources: Search Engine Journal coverage of Search Off the Record Episode 111, Google Search Off the Record Podcast; Guido Appenzeller, a16z, "Welcome to LLMflation"; Epoch AI, "LLM inference prices have fallen rapidly but unequally across tasks"; Jeff Atwood, Coding Horror, "Parsing HTML the Cthulhu Way"; Peter Conrad, Medium, "Why You Should and Should Not Use Markdown"; John Gruber, Daring Fireball, Markdown; Google Search Central, "Introduction to Structured Data".
Related Articles
ChatGPT Has 3.2% of Search. That Number Is Both Reassuring and Dangerous (Part 3)
SparkToro's data shows AI tools hold just 3.2% of desktop search. That looks small, and it is. But if you focus only on the current number and ignore the structural mechanics underneath, you will be caught off guard by the shift that is already happening.
Not All Searches Are Equal, and Pretending They Are Will Cost You (Part 2)
SparkToro's research lumps Amazon product searches, YouTube video queries, Instagram lookups, and Google searches into one basket. That is a category error, and if you base your marketing strategy on it without understanding why, you will misallocate your budget.
SparkToro Says Google Has 73% of Search And I Have Questions (Part 1)
SparkToro's latest research claims Google has only 73.7% of desktop search when you count 41 major websites. The number is interesting, but the methodology behind it deserves serious scrutiny before anyone rewrites their marketing strategy.