Duplicate Content and Canonical URLs WordPress SEO Tutorial updated February 2014
WordPress has the potential to mess up your websites SEO by generating archive pages (home archives, monthly archives, daily calendar archives, categories, tags and search results) with duplicate content.
There’s also other ways WordPress can generate duplicate content, but in recent versions of WordPress those have been fixed via canonical URLs (as long as your theme/plugins don’t break the WordPress core fix).
For Stallion Theme users everything below is dealt with by Stallion Responsive other than over using content on multiple categories/tags which is a user issue (see later).
WordPress SEO Canonical URLs on Paged Comments
Before dealing with duplicate content on archive pages let’s check your WordPress theme and plugins don’t break the current WordPress core canonical URL fix to duplicate content on posts and pages with lots of comments and paged comments activated.
WordPress core adds canonical URLs to posts and pages with paged comments, paged comments occur when you use the settings
Settings >> Discussion
And the setting
Break comments into pages with XX top level comments per page and the Last/First page displayed by default
Is ticked on and there’s XX top level comments on a post/page.
This setting breaks posts with multiple comments into multiple pages (paginated comments), an example can be seen at:
Currently (February 2014) there are just shy of 60 comments broken over 3 paged comments pages which results in 4 URLs to content generated by that post and it’s comments.
: https://seo-gold.com/stallion-wordpress-seo-plugin/
: https://stallion-theme.co.uk/stallion-wordpress-seo-plugin/comment-page-1/#comments
: https://stallion-theme.co.uk/stallion-wordpress-seo-plugin/comment-page-2/#comments
: https://stallion-theme.co.uk/stallion-wordpress-seo-plugin/comment-page-3/#comments
Before the Stallion Responsive v8.0 release (Early 2014) loading any of those 4 URLs and viewing source you’d find an identical canonical URL to the main post in the head (near the top of the code) which is default WordPress behavior. Used to look like this:
<link rel='canonical' href='https://stallion-theme.co.uk/stallion-wordpress-seo-plugin/' />
If you are using another WordPress theme you almost certainly want the same canonical URL on all paginated comments. Basically view source of your main article, check it’s canonical URL, load comment page 1, 2, 3 etc… and if you were running Twenty Fourteen for example you’ll find the exact same canonical URL. If you see this your site is running the latest WordPress code and Google won’t consider the paginated comment pages duplicate content.
Stallion Responsive can modify these pages so they are mostly unique content rather than mostly duplicate content (various theme options: turned on for this site). This means Stallion Responsive users don’t want the default WordPress code fix that adds an identical canonical URL to the paginated comment pages (there’s a theme option to change the default canonical URLs).
Go view those 4 URLs above, first URL shows the original post and unique comments
URL 2 loads a snippet of the original post and unique comments
URL 3 loads a snippet of the original post and unique comments
URL 4 loads a snippet of the original post and unique comments
Also note the title tag and H1 heading for each page is unique (another Stallion Responsive SEO feature). As you can see other than the short snippet (excerpt of the posts content) on URLs 2+ the pages above are unique: compare the amount of unique content (the comments content) to the size of the snippet (the duplicate content) and I’ll think you’ll agree the 4 pages above are unique content. It’s unique content with unique title tags and H1 headers.
The SEO benefit of this setup can be seen via a Google site: search, you can see what’s indexed on a site with a search like this:
site:https://stallion-theme.co.uk/
Paste the above in a Google search and it will show every page that’s indexed from this site. A more specific search:
site:https://stallion-theme.co.uk/stallion-wordpress-seo-plugin/
Will show everything indexed that’s under that post, you’ll see the 4 URLs mentioned above and because Stallion also includes a feature (Stallion SEO Super Comments) that turns large comments into post like pages lots of comments indexed in their own right.
How WordPress Themes Handle Duplicate Content
OK, back to other WordPress themes :-) If you ran another theme you wouldn’t have the above SEO features and you’d see the original post content (all of it) copied on all 4 pages above, this would be too much duplicate content and you would want the WordPress canonical URL fix which is adding an identical canonical URL to all 4 pages, which would look like this in source:
<link rel='canonical' href='https://stallion-theme.co.uk/stallion-wordpress-seo-plugin/' />
You won’t find this code on the 4 paged comments above (only on the first URL) since Stallion Responsive changes the content in a way so it’s unique (as described earlier). If you aren’t a Stallion Theme user check your paged comments and make sure they have a canonical URL to the main article only. The canonical URL tells search engines like Google that they should spider all pages, BUT all link benefit and SERPs should be redirected to the main post: the preferred canonical URL.
This was a good SEO move by WordPress development team to add canonical URL support this way because having multiple pages with almost identical content isn’t going to generate extra search engine traffic, but would waste link benefit and might trip search engine duplicate content filters: it’s unlikely to trip the duplicate content filters, Google is very good at combining similar pages into one indexed URL, but better SEO safe than SEO sorry :-)
If you have paged comments on your site view source of page 2 for example and check for a canonical URL code to the preferred canonical URL (the main post), if it’s missing either you are using an old version of WordPress (think it was WordPress 2.8 canonical URL support as added to WordPress) or the WordPress theme or a plugin you are using is removing the WordPress core canonical URls.
WordPress Duplicate Content on Archives
To the main WordPress duplicate content issue, WordPress archives.
Because WordPress reuses content on archive parts (categories, tags etc…) of a site there’s the potential for duplicate content issues.
There are many WordPress themes including the default WordPress themes Twenty Ten, Twenty Eleven, Twenty Twelve, Twenty Thirteen and Twenty Fourteen that reuses the FULL content of a post on archive parts of a site. Basically if you view an archive page like your categories and tags and you see multiple full posts, you have the potential for a duplicate content issue.
Using the full content means every post is duplicated in full on one or more parts of a site, if you have a site with monthly archives, categories, tags and the default home page archives (ten posts on the home page) all your posts will be reused in full 4 times assuming you don’t add your posts to multiple categories and tags (worse if you do) and don’t use the calendar widget!!!
There’s also the issue of having ten full posts on archive pages is a massive page to load, especially if you add rich content (images for example) on many of your posts!
Fortunately this duplicate content SEO problem can be easily reduced (practically removed) by using a WordPress theme (like the Stallion Responsive Theme) that rather than using the full content of a post uses a short excerpt on archived parts of the site. For an example take a look at this category archive Stallion Responsive Tutorials, you can see multiple archived posts, but each post is a short excerpt of the post significantly reducing the possibility of duplicate content issues. Search Google for “Stallion Responsive Tutorials” and you’ll find that category page is number one in Google for that SERP (it’s not a money SERP, but it shows Google indexes and ranks these fine).
If your posts tend to be small (not a lot of content) you still run the risk of duplicate content issues, imagine your excerpts are set to 155 characters and every post is 155 or fewer characters, your posts will be repeated in full on all archives. Not much you can do about this beyond creating bigger posts, I would suggest minimizing the number of archive types, add each post to only one category OR tag and don’t use any other type of archives: no dated archives and no calendar widget.
Code Fix to WordPress Duplicate Content
There’s a very easy (easy when you know how :-)) code fix for this duplicate content issue at theme level (for non Stallion Responsive theme users). Each theme is built differently, so the easiest way to fix a theme that uses full content on archive pages is to use the Post Teaser plugin which I’ve made an SEO version of (needs updating). The Post Teaser WordPress SEO Plugin generates an excerpt instead of the full content on archive posts, you can set the excerpt to any size and with my SEO version the anchor text of the continue reading link is SEO’d.
So you want to fix this issue at theme level.
Search through the php files of the theme for this code:
the_content();
and replace it with
the_excerpt();
You’ll need to do this for all code related to archive posts ONLY, but NOT on template files for Posts and Pages which usually are generated by the files single.php and page.php (don’t change those two files). For most themes you’ll be looking to change the files index.php, archives.php, categories.php, tags.php, search.php, but for some newer themes the code can be located in files like content.php, content-image.php, content-*.php.
It really is that simple :-)
Reducing WordPress Duplicate Content Further
It’s quite easy with WordPress to generate duplicate content even with the above fixes. Here’s a few tips for avoiding the obvious pit falls.
Monthly Archives Widget : If you use the default home page archives (ten archived posts on the home page) and use monthly archives, they are pretty much identical. I NEVER use a monthly archive, not only do you run the risk of duplicate content (copying the home page) they add ZERO SEO benefit, monthly archives never rank for anything. Don’t use monthly archives, but if you do edit the widget so it only shows on the home page and other dated archive pages (this is built into the Stallion theme for example) so you aren’t wasting as much PR/link benefit if you loaded them sitewide.
Calendar Archives Widget : The Calendar archives are even worse, for starters the Calendar widget is broken (IMO when the title attribute/hoverover tooltip of a link includes the entire post it’s broken!). The Calendar archive breaks yours posts into days, on most sites you aren’t going to publish multiple posts every day, so the content of the daily archives are basically duplicates of the post if the theme you use uses the full post content on archives. Like the monthly archive there’s no SEO value in having daily archives, so don’t use them. The Calendar widget is so bad both user and SEO wise I’ve removed it from the Stallion theme.
Too Many Categories/Tags : I see a tendency for those in the make money online niche to over use Categories and Tags. You will find sites where posts are added to multiple categories and loads of tags for barely relevant categories/tags. An example might be a post added to
Categories > Make Money Online, Earn Money, etc... Tags > Money, Wealth, Earnings, Online, Earn etc...
You might think this is a good SEO idea because the post is linked from more pages (easier to find) and you feel like you have a page targeting those single keyword SERPs, but it’s a waste of link benefit getting all those tags and categories indexed for no traffic gain. Do you honestly believe your site is going to gain one keyword SERPs like Money, Wealth, Earnings, Online, Earn just by creating a tag or category archive page? You might be able to gain long-tail SERPs like “Make Money Online Easily”, but those one keyword SERPs above are hard and if you want a SERP like “Money” or even “Make Money Online” that’s almost certainly going to need to be targeted on the sites home page where most links are generated to. Basically you target the hardest SERPs on the page with the most backlinks/link benefit (usually the home page).
Add to that if you add every post to a handful of categories and 20+ tags your tags archive pages in particular are going to be practically identical. Think about it, if you have two tags “Earn” and “Earnings” you are going to add the exact same posts to both tags, they will be identical AKA you are generating duplicate content by over tagging.
I have an SEO/User question when thinking about creating a category or a tag. SEO wise there’s no difference between the structure of a tag or a category page.
Will this new category/tag be capable of generating search engine traffic in it’s own right and/or does it serve a role to my visitors?
If you can’t answer yes to this question, don’t create the category/tag.
Example, should I create a tag or category on this site with the one keyword “WordPress”? Well, very easy one this, it’s a big NO. A tag or category is highly unlikely to rank high for the one word SERP WordPress and it adds nothing to my visitors experience because pretty much every page of this site is about WordPress. My only chance of ranking well for the WordPress SERP is the home page and I know it’s such a hard SERP it’s not worth my time only optimizing for it.
Another example, should I create a tag or category on this site with the two keywords “WordPress SEO”? This is a harder one, but it’s a no currently (might change in the future if I add a lot more content). A tag or category is highly unlikely to rank high for the two word SERP “WordPress SEO” (it’s a hard SERP and needs backlinks, not many webmasters are going to naturally link to a category/tag) and there’s already pages on this site like WordPress SEO Tutorial that to some degree targets the WordPress SEO SERP. I’d be better spending my PR/link benefit on the WordPress SEO Tutorial page above and creating categories that might stand a chance of gaining SERPs or are useful to my visitors.
Stallion WordPress SEO Plugin
The Stallion WordPress SEO Plugin can also help with duplicate content issues. If you have made the mistake of creating too many tags and categories (especially tags) consider using the Stallion SEO plugin to redirect their SERPs and link benefit back to the home page. The Stallion plugin can also redirect link benefit and SERP from dated archives to the home page as well, so if you’ve been using the monthly archives widget and.or the calendar widget you can fix the mistake. For Stallion Responsive theme users the Stallion WordPress SEO Plugin features are already part of the theme under Advanced SEO.
David Law
Stallion Responsive Theme Paged Comments Advanced SEO
Been working on new features for Stallion for well over a year, but as explained in other comments for health reasons had to put working on Stallion on hold. Have been able to spend about a week working on Stallion and have added some awesome new features, one I’ve almost finished is really cool.
For those with posts/pages with a lot of comments (like this Page you are reading now, over 200 comments) you have to spread them over Paged Comments (a core WordPress setting). This Page currently has 10 Paged Comments.
There are SEO negatives and benefits to this setup, since I use the Stallion SEO Super Comments feature many of the comments generate a Post like page with the comments content and they need at least one link to be indexed and the Paged Comments are indexed by Google etc… so they supply those links. Having Paged Comments means the comment content you and your visitors generate can be ‘archived’ over multiple Paged comments pages, like I said this one is broken over 10 pages, these can potentially generate their own traffic.
Bad news is at the top of all the Paged comments is the entire contents of the main post which is a potential duplicate content issue, for this Page you are on now it’s not such a problem since I only added a couple of lines of text, so the 10 Paged Comments are 95% unique content (it’s almost all my and my visitors comment text), but for most scenarios that can be a lot of content duplicated over and over again: I have a jokes site with jokes with thousands of comments split over a 100+ paged comments pages each with the entire jokes contents copied over and over again (far from ideal!). What’s also bad SEO news the title elements (title tags) are all the same, so on my jokes site I have hundreds of indexed paged comments with the same title tag, again far from SEO ideal (ideal is all pages have unique title tags).
You’ll find many SEOs and WordPress SEO Plugin developers advise noindexing the paged comments (or not using them if you don’t have too many comments). The All In One SEO Plugin for example adds a canonical URL to the main post on all Paged Comments pages so only the main post is indexed. I don’t like this option and the next release of Stallion (Stallion 8) has solved this issue.
1st The paged comments pages (pages 2, 3, 4 etc…) can be set to show an excerpt of the main post so only the main post loads the entire posts content: the paged comments are a bit like comment archives with a snippet of the original post and the majority of the content being unique comments.
2nd If you set the Stallion All In One SEO Title Tag and the four Stallion Keyphrases (part of the Post/Pages edit page) they along with the original post title will be used as the title tag for the paged comments pages.
Main Post/Page uses the Stallion All In One Title Tag.
Paged Comment 1, 7, 13 and 19 uses the Stallion Related Keyphrase 1.
Paged Comment 2, 8, 14 and 20 uses the Stallion Related Keyphrase 2.
Paged Comment 3, 9, 15 and 21 uses the Stallion Related Keyphrase 3.
Paged Comment 4, 10, 16 and 22 uses the Stallion Related Keyphrase 4.
Paged Comment 5, 11, 17 and 23 uses the Original Post Title.
Paged Comment 6, 12, 18 and 24+ uses the Stallion All In One Title Tag.
This means as long as the Stallion Keyphrases etc… are set the paged comments will have unique title tags and have the potential to gain those SERPs. I’ll probably use the same format for the anchor text of the link back to the main post as well. Part of Stallion 8 is a lot more use of the keyphrases in widgets etc… including Google Panda busting options to mix when a related keyphrase is used on a widget by page type (far, far less identical anchor text usage in Stallion 8).
I’ve also added the features from the stand alone Stallion WordPress SEO Plugin with the addition of being able to add a canonical URL to the main post/page on paged comments (if you don’t want the above paged comment features).
Lots of other stuff added like easy Google authorship, automated 404 pages 301 redirected to home, 301 redirect attachment URLs, blocking bad queries (stop hackers checking for malicious code), more SPAM bot blocking…
Surprisingly difficult to find a javascript version of Pinterest code for a Pin button, the default code use text link like format which could waste link benefit. Dig manage to find a Pinterest plugin that has two hidden text links to the plugin authors site!
David
Stallion Responsive Theme Paged Comments Advanced SEO
will comments paginated pages lose keywords?
I just ended up manually coding my blog for comments page to show the correct canonical (not the faulty wp default rel_canonical) explained here: http://nabtron.com/manually-canonical-wordpress-permalinks-without-plugins/8177/
but the query is, will the keyword / seo of internal / paginated comment pages be lost ? or their value will be added to the main canonical page too?
thanks
will comments paginated pages lose keywords?
SEO Value of WordPress Paginated Comments
For Stallion Responsive users don’t add the code discussed in the comment I’m responding to, Stallion has a better SEO solution built in.
The problem with your code fix is it removes the value of having a lot of comments, with your fix there is no SEO value in having a popular post with dozens of comments, in fact it’s an SEO negative having paginated comment pages that add no SEO value. The alternative is to have all your comments on one page, which is just as bad (worse) as that has SEO performance ramifications! Or limit the number of comments per post.
It’s interesting that websites like the developer of the Yoast WordPress SEO Plugin doesn’t allow paginated comments pages and has to disable comments after the post ages to limit the number of comments and he has his SEO plugin add a canonical URL similar to what you’ve coded.
Look at the performance SEO damage having almost 100 comments on one post does: http://developers.google.com/speed/pagespeed/insights/?url=https%3A%2F%2Fyoast.com%2Fgoogle-analytics-5%2F 38/100 for mobile speed is piss poor for a website run by SEO experts!
You and the Yoast developer see a lot of comments as an SEO problem, I see them as an SEO opportunity. My comments are indexed in their own right, when this comment is indexed by Google I might gain a SERP like “SEO Value of WordPress Paginated Comments” since my comments have unique title tags.
Rather than benefit from so much user generated content this canonical URL technique throws the SEO opportunity away!
Let’s say you have 100 comments on a post broken over 5 paginated comment pages (I have a lot of content like this).
This gives you the potential for 5 uniquely SEO’d webpages with unique title tags and mostly unique content (comments are unique content).
For your average WordPress user they won’t have the skills to code unique title tags for all 5 paginated pages or even set the paginated pages to show an excerpt of the main post. These features are built into the WordPress SEO Package I’ve developed.
Note the above clickable link doesn’t have a nofollow tag which deletes link benefit: SEO feature, can turn nofollow on/off.
Stallion Responsive can set the first 6 paginated comment pages to use a unique title tag (for more than 6 the titles are reused: paginated page 8 uses the same title as page 2) and sets paginated pages to use an excerpt of the main post which means unique title tags (and unique H1 header and unique link back to main article as well) and most of the content is from the comments not a duplicate of the main post.
See this in action on any of the popular posts (see popular posts widget on left sidebar).
For the record I’ve also added the option to add a single canonical URL to the main post from all paginated comment pages (similar to your code and the Yoast SEO plugin).
To answer your question, in theory the SEO rankings of content on the paginated webpages should be transferred to the main articles SERPs. When I used to use the technique you are using it did pass SEO benefit, but not used it in sometime so I don’t know for sure now.
Easiest way to check is go to one of your paginated comment pages, search for some unique text on one of the comments surrounded by speech marks “unique comment text here”. If Google passes SEO benefit to the main article via the canonical URL the main article should be found for the unique search phrase. Like I said it used to, but I no longer have any sites using the technique so would have to find another website using the technique to test.
I was concerned if all the content was added as if it was one webpage, so 100 comments spread over 5 paginated comments being treated as one webpage with 100 comments worth of text and links etc…
There’s the risk of damaging the optimization of the main content, there will be less main content than comment text which unless you rewrite all user comments (I don’t) won’t be as well optimized as what we can write. My comments are filled with off topic text.
With your average WordPress setup (not Stallion Responsive users) there’s the SEO argument you should have no comments on important webpages, this focusses your optimization (you have 100% control over optimization of text etc…). When a user comments on your site (literally your site, not the same with my site) if they add an author URL it will probably be nofollow which deletes link benefit, if they add a link like this in the body of the comment https://stallion-theme.co.uk/stallion-responsive-theme/ it will be converted to a clickable link with a nofollow tag (deleting link benefit), the comment permalinks will have awful anchor text.
Unless the theme has been modified to remove these issues all this damages a webpages SEO (multiply it by 100 comments), I’ve fixed all these SEO issues with Stallion Responsive AND added SEO benefits to some of them like my comment permalinks use optimized anchor text (the alternative is remove them, but then visitors lack a way to link directly to comments), author URLs can be disabled or served as a post form (looks like a link, but Google doesn’t treat forms as links).
David
SEO Value of WordPress Paginated Comments