Duplicate Content and Canonical URLs WordPress SEO Tutorial updated February 2014
WordPress has the potential to mess up your websites SEO by generating archive pages (home archives, monthly archives, daily calendar archives, categories, tags and search results) with duplicate content.
There’s also other ways WordPress can generate duplicate content, but in recent versions of WordPress those have been fixed via canonical URLs (as long as your theme/plugins don’t break the WordPress core fix).
For Stallion Theme users everything below is dealt with by Stallion Responsive other than over using content on multiple categories/tags which is a user issue (see later).
WordPress SEO Canonical URLs on Paged Comments
Before dealing with duplicate content on archive pages let’s check your WordPress theme and plugins don’t break the current WordPress core canonical URL fix to duplicate content on posts and pages with lots of comments and paged comments activated.
WordPress core adds canonical URLs to posts and pages with paged comments, paged comments occur when you use the settings
Settings >> Discussion
And the setting
Break comments into pages with XX top level comments per page and the Last/First page displayed by default
Is ticked on and there’s XX top level comments on a post/page.
This setting breaks posts with multiple comments into multiple pages (paginated comments), an example can be seen at:
Currently (February 2014) there are just shy of 60 comments broken over 3 paged comments pages which results in 4 URLs to content generated by that post and it’s comments.
Before the Stallion Responsive v8.0 release (Early 2014) loading any of those 4 URLs and viewing source you’d find an identical canonical URL to the main post in the head (near the top of the code) which is default WordPress behavior. Used to look like this:
<link rel='canonical' href='http://stallion-theme.co.uk/stallion-wordpress-seo-plugin/' />
If you are using another WordPress theme you almost certainly want the same canonical URL on all paginated comments. Basically view source of your main article, check it’s canonical URL, load comment page 1, 2, 3 etc… and if you were running Twenty Fourteen for example you’ll find the exact same canonical URL. If you see this your site is running the latest WordPress code and Google won’t consider the paginated comment pages duplicate content.
Stallion Responsive can modify these pages so they are mostly unique content rather than mostly duplicate content (various theme options: turned on for this site). This means Stallion Responsive users don’t want the default WordPress code fix that adds an identical canonical URL to the paginated comment pages (there’s a theme option to change the default canonical URLs).
Go view those 4 URLs above, first URL shows the original post and unique comments
URL 2 loads a snippet of the original post and unique comments
URL 3 loads a snippet of the original post and unique comments
URL 4 loads a snippet of the original post and unique comments
Also note the title tag and H1 heading for each page is unique (another Stallion Responsive SEO feature). As you can see other than the short snippet (excerpt of the posts content) on URLs 2+ the pages above are unique: compare the amount of unique content (the comments content) to the size of the snippet (the duplicate content) and I’ll think you’ll agree the 4 pages above are unique content. It’s unique content with unique title tags and H1 headers.
The SEO benefit of this setup can be seen via a Google site: search, you can see what’s indexed on a site with a search like this:
Paste the above in a Google search and it will show every page that’s indexed from this site. A more specific search:
Will show everything indexed that’s under that post, you’ll see the 4 URLs mentioned above and because Stallion also includes a feature (Stallion SEO Super Comments) that turns large comments into post like pages lots of comments indexed in their own right.
How WordPress Themes Handle Duplicate Content
OK, back to other WordPress themes :-) If you ran another theme you wouldn’t have the above SEO features and you’d see the original post content (all of it) copied on all 4 pages above, this would be too much duplicate content and you would want the WordPress canonical URL fix which is adding an identical canonical URL to all 4 pages, which would look like this in source:
<link rel='canonical' href='http://stallion-theme.co.uk/stallion-wordpress-seo-plugin/' />
You won’t find this code on the 4 paged comments above (only on the first URL) since Stallion Responsive changes the content in a way so it’s unique (as described earlier). If you aren’t a Stallion Theme user check your paged comments and make sure they have a canonical URL to the main article only. The canonical URL tells search engines like Google that they should spider all pages, BUT all link benefit and SERPs should be redirected to the main post: the preferred canonical URL.
This was a good SEO move by WordPress development team to add canonical URL support this way because having multiple pages with almost identical content isn’t going to generate extra search engine traffic, but would waste link benefit and might trip search engine duplicate content filters: it’s unlikely to trip the duplicate content filters, Google is very good at combining similar pages into one indexed URL, but better SEO safe than SEO sorry :-)
If you have paged comments on your site view source of page 2 for example and check for a canonical URL code to the preferred canonical URL (the main post), if it’s missing either you are using an old version of WordPress (think it was WordPress 2.8 canonical URL support as added to WordPress) or the WordPress theme or a plugin you are using is removing the WordPress core canonical URls.
WordPress Duplicate Content on Archives
To the main WordPress duplicate content issue, WordPress archives.
Because WordPress reuses content on archive parts (categories, tags etc…) of a site there’s the potential for duplicate content issues.
There are many WordPress themes including the default WordPress themes Twenty Ten, Twenty Eleven, Twenty Twelve, Twenty Thirteen and Twenty Fourteen that reuses the FULL content of a post on archive parts of a site. Basically if you view an archive page like your categories and tags and you see multiple full posts, you have the potential for a duplicate content issue.
Using the full content means every post is duplicated in full on one or more parts of a site, if you have a site with monthly archives, categories, tags and the default home page archives (ten posts on the home page) all your posts will be reused in full 4 times assuming you don’t add your posts to multiple categories and tags (worse if you do) and don’t use the calendar widget!!!
There’s also the issue of having ten full posts on archive pages is a massive page to load, especially if you add rich content (images for example) on many of your posts!
Fortunately this duplicate content SEO problem can be easily reduced (practically removed) by using a WordPress theme (like the Stallion Responsive Theme) that rather than using the full content of a post uses a short excerpt on archived parts of the site. For an example take a look at this category archive Stallion Responsive Tutorials, you can see multiple archived posts, but each post is a short excerpt of the post significantly reducing the possibility of duplicate content issues. Search Google for “Stallion Responsive Tutorials” and you’ll find that category page is number one in Google for that SERP (it’s not a money SERP, but it shows Google indexes and ranks these fine).
If your posts tend to be small (not a lot of content) you still run the risk of duplicate content issues, imagine your excerpts are set to 155 characters and every post is 155 or fewer characters, your posts will be repeated in full on all archives. Not much you can do about this beyond creating bigger posts, I would suggest minimizing the number of archive types, add each post to only one category OR tag and don’t use any other type of archives: no dated archives and no calendar widget.
Code Fix to WordPress Duplicate Content
There’s a very easy (easy when you know how :-)) code fix for this duplicate content issue at theme level (for non Stallion Responsive theme users). Each theme is built differently, so the easiest way to fix a theme that uses full content on archive pages is to use the Post Teaser plugin which I’ve made an SEO version of (needs updating). The Post Teaser WordPress SEO Plugin generates an excerpt instead of the full content on archive posts, you can set the excerpt to any size and with my SEO version the anchor text of the continue reading link is SEO’d.
So you want to fix this issue at theme level.
Search through the php files of the theme for this code:
and replace it with
You’ll need to do this for all code related to archive posts ONLY, but NOT on template files for Posts and Pages which usually are generated by the files single.php and page.php (don’t change those two files). For most themes you’ll be looking to change the files index.php, archives.php, categories.php, tags.php, search.php, but for some newer themes the code can be located in files like content.php, content-image.php, content-*.php.
It really is that simple :-)
Reducing WordPress Duplicate Content Further
It’s quite easy with WordPress to generate duplicate content even with the above fixes. Here’s a few tips for avoiding the obvious pit falls.
Monthly Archives Widget : If you use the default home page archives (ten archived posts on the home page) and use monthly archives, they are pretty much identical. I NEVER use a monthly archive, not only do you run the risk of duplicate content (copying the home page) they add ZERO SEO benefit, monthly archives never rank for anything. Don’t use monthly archives, but if you do edit the widget so it only shows on the home page and other dated archive pages (this is built into the Stallion theme for example) so you aren’t wasting as much PR/link benefit if you loaded them sitewide.
Calendar Archives Widget : The Calendar archives are even worse, for starters the Calendar widget is broken (IMO when the title attribute/hoverover tooltip of a link includes the entire post it’s broken!). The Calendar archive breaks yours posts into days, on most sites you aren’t going to publish multiple posts every day, so the content of the daily archives are basically duplicates of the post if the theme you use uses the full post content on archives. Like the monthly archive there’s no SEO value in having daily archives, so don’t use them. The Calendar widget is so bad both user and SEO wise I’ve removed it from the Stallion theme.
Too Many Categories/Tags : I see a tendency for those in the make money online niche to over use Categories and Tags. You will find sites where posts are added to multiple categories and loads of tags for barely relevant categories/tags. An example might be a post added to
Categories > Make Money Online, Earn Money, etc... Tags > Money, Wealth, Earnings, Online, Earn etc...
You might think this is a good SEO idea because the post is linked from more pages (easier to find) and you feel like you have a page targeting those single keyword SERPs, but it’s a waste of link benefit getting all those tags and categories indexed for no traffic gain. Do you honestly believe your site is going to gain one keyword SERPs like Money, Wealth, Earnings, Online, Earn just by creating a tag or category archive page? You might be able to gain long-tail SERPs like “Make Money Online Easily”, but those one keyword SERPs above are hard and if you want a SERP like “Money” or even “Make Money Online” that’s almost certainly going to need to be targeted on the sites home page where most links are generated to. Basically you target the hardest SERPs on the page with the most backlinks/link benefit (usually the home page).
Add to that if you add every post to a handful of categories and 20+ tags your tags archive pages in particular are going to be practically identical. Think about it, if you have two tags “Earn” and “Earnings” you are going to add the exact same posts to both tags, they will be identical AKA you are generating duplicate content by over tagging.
I have an SEO/User question when thinking about creating a category or a tag. SEO wise there’s no difference between the structure of a tag or a category page.
Will this new category/tag be capable of generating search engine traffic in it’s own right and/or does it serve a role to my visitors?
If you can’t answer yes to this question, don’t create the category/tag.
Example, should I create a tag or category on this site with the one keyword “WordPress”? Well, very easy one this, it’s a big NO. A tag or category is highly unlikely to rank high for the one word SERP WordPress and it adds nothing to my visitors experience because pretty much every page of this site is about WordPress. My only chance of ranking well for the WordPress SERP is the home page and I know it’s such a hard SERP it’s not worth my time only optimizing for it.
Another example, should I create a tag or category on this site with the two keywords “WordPress SEO”? This is a harder one, but it’s a no currently (might change in the future if I add a lot more content). A tag or category is highly unlikely to rank high for the two word SERP “WordPress SEO” (it’s a hard SERP and needs backlinks, not many webmasters are going to naturally link to a category/tag) and there’s already pages on this site like WordPress SEO Tutorial that to some degree targets the WordPress SEO SERP. I’d be better spending my PR/link benefit on the WordPress SEO Tutorial page above and creating categories that might stand a chance of gaining SERPs or are useful to my visitors.
Stallion WordPress SEO Plugin
The Stallion WordPress SEO Plugin can also help with duplicate content issues. If you have made the mistake of creating too many tags and categories (especially tags) consider using the Stallion SEO plugin to redirect their SERPs and link benefit back to the home page. The Stallion plugin can also redirect link benefit and SERP from dated archives to the home page as well, so if you’ve been using the monthly archives widget and.or the calendar widget you can fix the mistake. For Stallion Responsive theme users the Stallion WordPress SEO Plugin features are already part of the theme under Advanced SEO.