Google Webmaster Tools gets a bit of slack because you are “giving Google too much information” I say take off your tinfoil hats, there is method behind the madness and GWT can be an invaluable tool for crawling diagnostics – as this post proves.
A few weeks ago a publication of ours underwent an overhaul, everything you can think of changed including content, design and backend. We heavily tested all relevant redirects, optimised the site for its new angle and pushed it live. Our traffic dropped from search engines considerably, which we did expect, but not to this extent – our website was removed from the search results in its entirety!
As a result, our search traffic tanked
A little concerned, I started digging around a little and noticed the cache dates on any indexed page were pre-dating the date we pushed the website live – now I’m really concerned, have we been penalised? Ack!
I logged into Google Webmaster Tools and checking our crawl stats I saw Google had stopped indexing our website completely – to me this was further evidence Google had penalised us.
Frustrated and baffled, I had checked the obvious things, I didn’t have a robots.txt disallow or meta robots noindex or anything like that. I did notice our robots.txt file was blank, but I often upload a blank robots.txt file to stop any 404 error’s in our server logs, so I didn’t think this would have been a problem.
Going back to Webmaster Tools, further frustrated I checked the crawl errors and then I found a ton of errors under the unreachable header, at this point it was clear that something was wrong on our end and it was related to our robots.txt file.
Going back to the robots.txt file, it appears the file was blank but it wasn’t because I had uploaded a blank file – it was my CMS taking grip of the URL request and throwing a 500 server error!
Google treats a 500 internal server error as if the destination is unreachable (makes perfect sense), if Google cannot access your robots.txt page and the http error code is something other than 404 (ie 500) then Google will assume that the website is under a “Disallow: /” for the duration that the robots.txt file returns that error.
From Google
URL unreachable /robots.txt unreachable
Before we crawled the pages of your site, we tried to check your robots.txt file to ensure we didn’t crawl any pages that you had roboted out. However, your robots.txt file was unreachable. To make sure we didn’t crawl any pages listed in that file, we postponed our crawl. When this happens, we return to your site later and crawl it once we can reach your robots.txt file. Note that this is different from a 404 response when looking for a robots.txt file. If we receive a 404, we assume that a robots.txt file does not exist and we continue the crawl.
This is one of the more peculiar errors we have experienced affecting crawl rates and without Google Webmaster Tools this error would have taken far longer to resolve and I’d probably be here next month with the same problem. This type of error is of high priority in my opinion, and a warning should be shown and perhaps an email sent by Google Webmaster Tools if this error is discovered upon a crawl.





Thanks for sharing this. I appreciate it. If more people would write about real problem situations like this everybody`s lives would be simpler
thanks again
Thus we are ALL reminded of how critical it for us to VALIDATE our websites – especially when making major page updates. G’s Web Tools are extremely helpful…
There is a difference between G.A. and WMT. Google analytics is letting the fox guard the chickens. But not using WMT is not a good idea. They already have the data and WMT is a feedback system to help the webmaster, site owner and SEO person. There is very little Google can gain from us in the WMT.
I think even Analytics isn’t a problem myself, check out this video from Matt http://video.google.com/videoplay?docid=-9028425054136856586 I know we should take what he says with a pinch of salt, but it isn’t very often that he is that direct with his answers. I doubt they’d use it to be honest, if something like that leaked out it would be a PR nightmare…