The first time I saw Indexed Though Blocked by Robots.txt in Search Console, I honestly thought Google was just messing with me. Like when your bank sends a message saying “payment failed successfully.” Same energy. If a page is blocked, how is it indexed? If it’s indexed, why is it blocked? This thing usually pops up when you’re already stressed about traffic, so yeah, great timing.
I remember refreshing Search Console like it would magically fix itself. Spoiler: it didn’t. Turns out this issue is more common than people admit, especially among small sites, local business pages, and even some big blogs that should know better.
How Google Ends Up Doing Something You Told It Not To Do
Here’s the part that feels illegal but isn’t. Google doesn’t actually need to crawl a page fully to index it. If other pages link to that URL, Google can still add it to the index like a name written in a guest register without seeing the person. Robots.txt only blocks crawling, not indexing. That detail is usually skipped in tutorials, which is probably why so many people panic.
Think of robots.txt like a “do not enter” sign on a shop door. People outside can still talk about the shop, share its address, even post photos from earlier visits. Google hears that chatter and says, okay, this place exists. Boom, indexed.
I’ve seen this happen a lot when developers block folders like /wp-admin/ or staging URLs and forget that internal links or old backlinks still point there.
Where People Accidentally Mess This Up
Honestly, half the time this issue is self-inflicted. Someone blocks a page in robots.txt thinking it’s the cleanest way to hide it from Google. That’s like hiding your car by removing the steering wheel. The car is still visible, just unusable.
Another situation I ran into was during a site redesign. Old URLs were blocked in robots.txt but still had backlinks from guest posts written years ago. Google didn’t forget them. Google never forgets. It’s like that one embarrassing tweet from 2014.
Also, CMS plugins can make things worse. One wrong toggle in an SEO plugin and suddenly your site sends mixed signals. Google hates mixed signals more than slow loading pages, which says a lot.
Why Search Console Loves Scaring People
Search Console messages are technically accurate but emotionally violent. “Indexed though blocked” sounds like a critical failure, but sometimes it’s just informational. Still, ignoring it isn’t always smart. I’ve seen pages stuck in this state for months, quietly hurting internal linking and crawl budget.
A lesser-known thing is that Google sometimes keeps these URLs indexed longer than expected, especially if they get even tiny traffic. I once saw a blocked page still showing impressions after six months. Not clicks, just impressions, like Google saying “I remember you.”
SEO Twitter talks about this occasionally, usually followed by someone saying “robots.txt doesn’t deindex pages” in all caps. Reddit threads on this topic are chaotic but helpful if you like learning through arguments.
What Actually Works When You Want It Gone
If you really don’t want a page indexed, robots.txt is not your friend. A noindex meta tag is way more honest. Let Google crawl it, read the noindex, and leave. Blocking the crawl while expecting deindexing is like ghosting someone and expecting closure.
In one client project, we removed the robots.txt block, added noindex, waited for Google to recrawl, and then blocked it again after deindexing. It felt backwards, but it worked. SEO is full of stuff like that.
Also, internal links matter more than people think. If you keep linking to a blocked page from menus or footers, Google will keep noticing it. You’re basically inviting Google to a party and then locking the door.
When You Should Probably Care and When You Can Chill
If the affected URLs are useless things like filter pages, tag archives, or old test pages, it’s usually fine. Google seeing them doesn’t always mean ranking issues. But if it’s important pages, like services or blogs, then yeah, that’s a problem.
I’ve noticed newer sites get hit harder by confusion like this. Older domains sometimes get a free pass because Google already understands their structure. Not fair, but that’s SEO.
There’s also the crawl budget angle. For small sites, it barely matters. For large sites, this issue can quietly waste crawl resources. Big ecommerce stores deal with this a lot and complain about it on LinkedIn like it’s therapy.
The Quiet Fix Most People Miss
One underrated fix is checking your sitemap. If a URL is blocked by robots.txt but still in your sitemap, that’s just sending Google mixed signals again. I’ve seen this exact mistake more times than I can count. It’s boring, but boring fixes work.
Also, canonical tags can influence this situation indirectly. If blocked pages canonicalize to clean URLs, Google eventually figures it out. Eventually being the keyword here.
Wrapping This Up Without Wrapping It Up
So yeah, Indexed Though Blocked by Robots.txt isn’t always a disaster, but it’s never random. It usually means your site is saying one thing while doing another. Google just reports the awkward truth.
If you’re seeing Blocked by Robots.txt warnings toward the end of your reports and ignoring them, you’re not alone. Most people do. But if those URLs matter, or if they keep popping up, it’s worth fixing the root cause instead of hiding it.
I’ve learned the hard way that robots.txt is more like a suggestion box than a command. Google listens, but it also listens to everyone else talking about your site. That’s usually where the problem starts, and funny enough, that’s also where the fix is hiding.

