Don’t use 403s or 404s for rate limiting  |  Google Search Central Blog  |  Google for Developers


Friday, February 17, 2023

Over the last few months we noticed an uptick in website owners and some content delivery networks
(CDNs) attempting to use 404 and other 4xx client errors (but not
429) to attempt to reduce Googlebot’s crawl rate.

The short version of this blog post is: please don’t do that; we have documentation about
how to reduce Googlebot’s crawl rate.
Read that instead and learn how to effectively manage Googlebot’s crawl rate.

Back to basics: 4xx errors are for client errors

The 4xx errors servers return to clients are a signal from the server that the
client’s request was wrong in some sense. Most of the errors in this category are pretty benign:
“not found” errors, “forbidden”, “I’m a teapot” (yes, that’s a thing). They don’t suggest anything
wrong going on with the server itself.

The one exception is 429, which stands for “too many requests”. This error is a clear
signal to any well-behaved robot, including our beloved Googlebot, that it needs to slow down
because it’s overloading the server.

Why 4xx errors are bad for rate limiting Googlebot (except 429)

Client errors are just that: client errors. They generally don’t suggest an error with the server:
not that it’s overloaded, not that it’s encountered a critical error and is unable to respond
to the request. They simply mean that the client’s request was bad in some way. There’s no
sensible way to equate for example a 404 error to the server being overloaded.
Imagine if that was the case: you get an influx of 404 errors from your friend accidentally
linking to the wrong pages on your site, and in turn Googlebot slows down with crawling. That
would be pretty bad. Same goes for 403, 410, 418.

And again, the big exception is the 429 status code, which translates to “too many
requests”.

What rate limiting with 4xx does to Googlebot

All 4xx HTTP status codes (again, except 429) will cause your content
to be removed from Google Search. What’s worse, if you also serve your robots.txt file with a
4xx HTTP status code, it will be treated as if it didn’t exist. If you had a rule
there that disallowed crawling your dirty laundry, now Googlebot also knows about it; not great
for either party involved.

How to reduce Googlebot’s crawl rate, the right way

We have extensive documentation about
how to reduce Googlebot’s crawl rate
and also about
how Googlebot (and Search indexing) handles the different HTTP status codes;
be sure to check them out. In short, you want to do either of these things:

If you need more tips or clarifications, catch us on
Twitter or post in
our help forums.





Source link

Leave a Comment

Your email address will not be published. Required fields are marked *