AI Generated Content Google Deindex Case Study

- May 05, 2026

Real-World Incident: How Google Indexed 9,500 Pages — Then Quietly Removed Them All

A case study on AI-generated content, programmatic SEO, and the March 2026 Core Update

A LinkedIn connection recently asked me to analyse their website — makdatainsights.com — a market research report platform with around 10,000 pages. What I found was one of the most textbook examples of a Google indexing trap I've seen in real time.

Let me walk you through exactly what happened.

The Index Graph That Tells the Whole Story

Look at the Page Indexing report in Google Search Console. The pattern is striking:

7 Feb 2026 — 78 pages indexed
10 Mar 2026 — 9,528 pages indexed (peak — nearly the full 10K sitemap)
11 Mar 2026 — Drop begins. 9,292 pages.
27 Mar 2026 — March 2026 Core Update released
24 Apr 2026 — Only 2,641 pages indexed. 7,047 pages sit in "Crawled – currently not indexed"

Google didn't ignore these pages. It crawled them, indexed them, reconsidered — and then systematically removed them.

What Caused It?

After deep-diving the site, the root cause became clear:

All ~10,000 pages were generated using Gemini AI.

Every single report page followed the same byte-for-byte template:

Same section headings: Key Market Insights → Trends → Key Drivers → Restraints → Opportunities
Same placeholder book image
Same sentence pattern: "Global [X] Market is projected to grow from $Y billion to $Z billion at N% CAGR"
The only variables? The market name and the numbers.

This is the definition of Scaled Content Abuse — and Google's Helpful Content System is built specifically to detect and demote it.

The painful irony: Google's own tool (Gemini) was used to create content that Google's own search system then penalised.

"Crawled – Currently Not Indexed" Is a Quality Signal, Not a Technical Error

This is something many people misunderstand. When GSC shows "Crawled – currently not indexed," most people assume it's a rendering issue or a sitemap problem.

It usually isn't.

At Google Search Central Live Toronto (April 2026), Google confirmed:

"Crawled – Currently not indexed is rarely a technical rendering issue — it's usually a quality signal. Google may have tried the content and decided it 'wasn't good.'"

And on scaled content:

"Scaled Content Abuse is an important algorithm that could explain AI-generated content traffic drops — not the use of AI itself. Google wasn't against AI per se, but there are safeguards against scaling content and safeguards against what they decide to index."

Programmatic SEO is not the problem. Scaled Content Abuse is.

Why Recovery Is Hard From Here

Once Google applies a sitewide quality signal, it's not a page-by-page problem anymore. You can't fix it with:

Reconsideration requests (this is algorithmic, not a manual penalty)
Adding schema markup to thin AI pages
Rewriting AI content with more AI content
Waiting and hoping

The 2,641 pages still indexed are also at risk if nothing changes before the next core update.

The Right Path Forward

There are only two realistic options:

Option A — Rebuild with human expertise (slow but sustainable) Identify the 200–500 reports with genuine search demand. Have real analysts write or substantially edit the free preview content. Noindex the rest. A site with 300 excellent pages will outperform a site with 10,000 AI-generated ones.

Option B — Reduce footprint intentionally Noindex the bulk of AI pages now, concentrate authority on the strongest pages, and build a smaller but indexable catalogue going forward.

At 10,000 pages, this is a system-level fix — improving templates, content quality, and indexing signals at scale. It takes months of consistent improvement, not a one-time patch.

The #1 Immediate Action

Stop publishing new AI-generated pages immediately.

Every new thin page published now signals to Google that scaled content generation is ongoing — and deepens the sitewide quality hole.

Key Takeaways for Anyone Using AI for SEO

AI content isn't banned — scaled thin AI content is. Google draws the line at "produced primarily to rank, not to help."
Google indexes first, evaluates later. A fast initial index burst is not validation — it's a provisional crawl.
"Crawled – not indexed" at scale = a content quality problem, not a technical one.
Core updates don't create the penalty — they accelerate a signal that was already accumulating.
Human expertise, original data, and genuine differentiation are what make AI-assisted content safe to publish at scale.

This is a real, ongoing incident — shared here because the pattern is exactly what many sites building programmatic + AI content pipelines are walking into right now. Hope it helps someone avoid the same trap.

Search This Blog

Search Engine Optimization