A very interesting issue was recently documented by Beanstalkim.com (full article here) regarding Googlebot and a default server-side setting on WP Engine called Redirect Bots:

Essentially, [the Redirect Bots tool] redirects well known crawler user agents (bots) crawling the site to the parent page when they request a page ending in a number or in a query argument…

Redirecting bots – How this benefits you. | WP Engine

In short, with this option enabled (as it is by default), bots can’t crawl passed page /9/ of pagination (instead they are redirected to the homepage).

For example, website.com/blog/page/1, website.com/blog/page/2, website.com/blog/page/3, etc. are all crawlable up to website.com/blog/page/9. Once the crawler hits page double digits, website.com/blog/page/10 – a 301 redirect kicks in that pushes to the website.com/blog/page (or simply website.com depending on the site configuration). Again, this only occurs for bots, like Googlebot. It does not affect actual users.

It’s worth noting that this option can be disabled on WP Engine’s side, but this is not necessarily ideal. In WP Engine’s own words:

When a bot has to crawl every single page of your site, e.g. site.com/page/123, site.com/page/456, these are all unique uncached hits to your server which can contribute to a high server load, especially if your site is frequently crawled by a lot of bots. With the redirect bots setting turned on, this redirects pages like site.com/page/456 and site.com/page/123 to avoid excess hits to the server for extra pages of content, and sends them instead to site.com/page/.

Redirecting bots – How this benefits you. | WP Engine

So the positive here is that there’s a server performance upside – a limitation on the amount depth of uncached requests generated by bots. This is can be a useful measure for performance, especially during traffic spikes.

With that in mind though, let’s look at the issue that came to light in the beanstalkim.com‘s post. The primary concern was that search engine results would be hamstrung for older posts/pages (if Redirect Bots is enabled) and potentially drop the ranking of the entire site.

This is a very valid concern:

Not only does it impact content discovery but it will impact the PageRank flow through your blog and specifically to older posts. Some may not even be discoverable by bots.

Major SEO Issue With WPengine | Beanstalkim.com

Now bare with me, because what follows comes with a caveat…

When a sitemap.xml is in place that already lists the posts and pages, those are already being directly crawled. Paginated pages can actually cause several issues for indexing, the most notable is being flagged as duplicate content and lower quality content (paginated pages often only include excerpts, or simply just titles). There’s a super in-depth article explaining all the pitfalls of pagination and its SEO implications by Irwin W that was published by Moz:

Duplication
Paginated pages are vulnerable to duplication filtering by the search engines. Coding paginated pages correctly will let the search engines know that they are pagination pages and will not be flagged as duplication.


Thin Content
A lot of paginated pages do not have a significant amount of quality content on them. The Panda algorithm can penalize an entire site if it finds too much low quality content.

SEO Guide to Google Webmaster Recommendations for Pagination | Moz.com

This article is an excellent breakdown (with recommendations) of a previously published guidance from Google on pagination for SEO.

Google has not made any indication that they will penalize pages if Googlebot can’t naturally follow pagination links. But there are some caveats. They have indicated that adding a noindex metatag on pagination links can lead to the links contained on those pages being penalized (which was a commonly accepted fix previously).

As Yoast notes in an excellent breakdown of best practices for making pagination play nice with Google:

For a while, SEOs thought it might be a good idea to add a noindex robots meta tag to page 2 and further of a paginated archive. This would prevent people from finding page 2 and further in the search results. The idea was that the search engine would still follow all these links, so all the linked pages would still be properly indexed.

The problem is that in late 2017, Google said something that caught our attention: long-term noindex on a page will lead to them not following links on that page. More recent statements imply that if a page isn’t in their index, the links on/from it can’t be evaluated at all – their indexing of pages is tied to their processing of pages.

This makes adding noindex to page 2 and further of paginated archives a bad idea, as it might lead to your articles no longer getting the internal links they need.

Pagination & SEO: Best Practices | Yoast.com

We can see that noindex and nofollow metatags on pagination seems like a bad idea at this point when it comes to Google. But what about making the links effectively invisibile? The disallow directive can be used to to block bots from crawling paginated pages without negatively impacting the links contained in their pages, or flagging duplicate or “thin” content. This can be added in the site’s .htaccess:

Disallow: /page/1
Disallow: /page/2
Disallow: /page/3
Disallow: /page/4
Disallow: /page/5
Disallow: /page/6
Disallow: /page/7
Disallow: /page/8
Disallow: /page/9

Ultimately, Google does not need to follow pagination in order to crawl and index pages – and in many cases it can negatively impact the indexing of the site. On top of this it can consume server resources unnecessarily. This is especially true if your posts and pages are already coming from sitemap.xml (and ideally that sitemap should be synced with Google’s Webmaster Tools).

In case you don’t have a sitemap.xml or are having issues configuring it, I’d recommend installing the free Yoast plugin found here. You can check out the guide for configuring your sitemap.xml here. You may also want check out our own guide on forcing your sitemap over HTTPS here.

Of course there will be times when disabling the Redirect Bots tool is in the best interest of the site. The example given from WP Engine’s documentation is:

If a service you’re using to scan your site is having issues or receiving a 301 redirect, there’s a chance this is due to the redirect bots setting. For example, using Facebook’s URL debugger tool attempts to scrape a specific page of the site that ends in a number, using one of the well known user agents that is redirected by default. This causes the tool to show an error. With this setting turned off, it allows Facebook to properly scrape and analyze the data given.

Redirecting bots – How this benefits you. | WP Engine

Please do bare in mind though, your pagination code should contain rel=“next” and rel=“prev” elements, documented by Google here:

Hint to Google the relationship between the component URLs of your series with rel=”next” and rel=”prev”. This helps us more accurately index your content and serve to users the most relevant page (commonly the first page). Implementation details below.

Pagination with rel=“next” and rel=“prev” | Google Webmaster Central

Not applying this method may lead to a site to be flagged for duplicate/thin content as mentioned above.

On top of this, server resources can start to drain if the crawling becomes extensive – especially if there are multiple bots crawling at once and/or there is heavy traffic.

It’s best to keep this in mind for times when you’re expecting the most traffic to your site and want to squeeze as much headroom as possible for real visitors.

Elementor Page Builder

Related Articles

Copyright © 2019 Nodeflame

SEO DISASTER? Redirect bots, Googlebot and pagination on WP Engine

by nodeflame time to read: 7 min