Duplicate Content and Canonical URLs WordPress SEO Tutorial updated February 2014 WordPress has the potential to mess up your websites SEO by generating archive pages (home archives, monthly archives, daily calendar archives, categories, tags and search results) with duplicate content. There’s also other ways WordPress can generate duplicate content, but in recent versions of WordPress those have been fixed via canonical URLs (as long as your theme/plugins don’t break the WordPress core fix). For Stallion Theme users everything below is dealt with by Stallion Responsive other than over using content on multiple categories/tags which is a user issue (see later). WordPress SEO Canonical URLs on Paged Comments Before dealing with duplicate content on archive pages let’s check your WordPress theme […]
Continue Reading WordPress SEO Canonical URLs
WordPress SEO Duplicate Content?
Hello David.
Thanks for the “fixes”. They all worked fine. I agree with you on the Links. Since I will have a lot of link resources over time, I will make a page called “Hiking Links” and have the blogroll point to the appropriate link page.
I do not want more than 50 links on a page so for each link resource category I will make a page break so the new set of links will be on a new page. ( I haven’t tried that yet but in the WordPress help files I found info on how to break a page into more than one pages.
***Here is what I think is a good question.***
In the past I have added a single post into several categories and never used tags. By doing that, am I making duplicate content?
I think it may be better for me to add multiple appropriate tags to a post and then just put the post under one category. I am creating Parent/Child categories such as “Canada” as parent and “British Columbia” as child and then a city such as “Kelowna” as a child of British Columbia. With this example, if the post I write is about a hike in Kelowna, I would just select Kelowna as the category and not Canada and British Columbia. Correct?
This is a detailed question about my site but I think it could apply to many Travel WordPress Blogs etc.
WordPress SEO Duplicate Content?
WordPress SEO : How to Avoid Duplicate Content
LOL, I’ve updated all the themes for sale on this site and currently updating the theme files of all my WordPress sites (updated all my WordPress installations to WordPress 2.7.1 yesterday) as a way to test the theme updates I just made before sending out the free updates. currently on the Gs domain wise, (I update alphabetically) so just updated the Talian theme on (no issues found) and so have about 60 more domains to update and check, then I’ll be sending out theme updates to all customers as then I’ll be sure there’s no major issues with the themes.
So good to take a break and answer a question or three.
It is possible with WordPress to inadvertently create duplicate pages. For example if the last 10 posts you made all went into one Category, your Home page and the 1st page of that Category are going to be almost identical. Same is true if the last 10 posts have been added to the same two Categories.
The themes on this site are using the excerpt feature on archive pages (home, category, tags, archive pages) to reduce duplicating content. On archive pages you get a small excerpt of each post, so the single post pages are mostly unique in most cases. So with these themes you at least don’t get full posts copied within the archives.
I’ve learnt over the years it’s best not to create a page unless it serves a function, for example the monthly archive pages found by default on pretty much all WordPress themes add little if anything to the vast majority of sites. Go to a monthly archive page and look at it and ask yourself what does it add to your site not already found in the Category archives?
For this reason I never use the monthly archive menu links and if it wasn’t for the fact a lot of people use them, I’d not even include them in my themes (same with the Calendars which are worse), to reduce the SEO damage they are only shown on the home page in these themes.
Keeping in mind “only create a page if it serves a purpose” is what you plan to do going to add anything to your site?
For example will adding a post to 10 Categories or 10 Tags or 5 Categories and 5 Tags result in more traffic or make it easier for visitors to find a post they are probably interested in or are you doing it to ‘create’ more pages on your site?
If it’s just for more pages, don’t do it, it’s a waste of link benefit.
On this site I could add tags like-
WordPress 2.0
WordPress 2.1
WordPress 2.2
WordPress 2.3
WordPress 2.4
WordPress 2.5
WordPress 2.6
WordPress 2.7
And on the face of things that could result in more traffic, but I’m going to have to put all the theme sales pages (like the one you are reading now) in all these tag sections since all the themes work in all versions of WordPress 2+. That means I’ll have at least 8 identical pages with the only difference being the title 2.0, 2.1 etc…. and I know from experience Google will only index one set of this type of duplicate content.
Instead of doing this I’ve added all the above SERPs to multiple pages. Someone looking for a version of Talian that works in WordPress 2.7 will find my site in Google and in a few days time same will be true for WordPress 2.7.1 searches (just updated this page). Then I’ve added that page to the one Category “AdSense WordPress Themes” because overall most SERPs are now covered.
For your site if you’ve got lots of different content you can get away with using many more categories and/or tags, try to avoid adding everything from one Category to another one like described above as it’s such a waste of resources making these pages that add little to a site.
If I ever get to say 100 themes converted on this site then I could add all the categories or tags you find one big theme sites, AdSense Ready (all of them :)), two column, two sidebars etc… but for now there’s not enough content to do anything like that.
What you are thinking of doing is what I’ve kind of done at I used a plugin called Simple Tags to automatically create most of the tags. Not as good as manually adding the tags, but with 1,000 posts!
BTW the break page function is a strange one. I’ve got some classic literature sites, like William Shakespeare plays and wanted to put them into WordPress without having to break each book page by page manually (how I did it years ago in almost static pages!!!). Came up with the perfect solution (got my son to write a PHP script that did all the work) that relied on using the page break function, BUT it turns out that function pulls the entire contents of a post, not just the page number you want. Now if you have a couple of pages made this way, no problem, but if you had 50 pages worth of content (500 kb of text say) in one post broken this way, every time a visitor or search engine loads one of the 50 pages it access all 50 pages (big load on the server). Almost crashed my server by adding one book paginated over I think 100 pages as a test! Damn shame as WordPress would be ideal.
Back to loading themes.
David Law
WordPress SEO : How to Avoid Duplicate Content
WordPress SEO - Avoid Duplicate Content robots.txt File
does this theme come with a robots.txt file to avoid duplicate content issues, or should purchasers look into this?
WordPress SEO Duplicate Content and Canonical URLs
With all WordPress themes on my site there are no major duplicate content issues, they are dealt with at theme level as long as you are sensible with how you create your site.
On archive pages rather than use the full content of a post an excerpt is used, this means though you are using content from your single blog post pages to generate archive pages (home page archives, categories, tags) they are not a full copy (just an excerpt) so they are not treated as duplicate content by Google.
The only possible issues you might have with archive pages and duplicate content is in two scenarios.
You have only one category, the content of the category archive pages are going to be exact copies of the home page archives and possibly copies of the monthly archive pages.
You use tags/categories and the content of some tags/categories match categories or other tags. I had this tag issue on this site, if I tagged all the theme pages with WordPress 2.8, WordPress 2.9 etc… for example, those tag pages would be identical to one another and to the “AdSense WordPress Themes” category. All you can do to avoid this is think through what you are tagging and which single blog posts you put in a particular category. If you go over the top with your categories and tags, all my themes for example could be tagged under WordPress, SEO, AdSense, Make Money Online… but the archive type pages created would be practically identical, so I don’t create that many tags (this site doesn’t really have enough posts to be tagged extensively)!
Although not really a duplicate content issue I never use the monthly archive pages because they add nothing SEO wise to a site. Your categories hold all archived content and it is dated within a category format, so monthly archives are not really needed.
In a future version of Talian I’m dealing with potential canonical issues associated with multiple comment pages. This page for example has just generated it’s 4th page of comments and the main content of each of the 4 pages are the full content of this post (duplicated).
I’ve not noticed duplicate content issues per se, but comment pages 2,3,4…. I’m not finding ranking particularly well for potential SERPs based on the comment content. It’s quite wasteful from an SEO resources perspective having all these partially duplicate pages if they don’t generate traffic in their own right, so I’m testing having pages 2,3,4…. as a canonical version of the main blog post page. This will result in all the comment pages spidered, but treated as one page in Google (this will save link benefit).
If you view source of this page and the other archive comment pages for this page you’ll find within the head:
link rel=’canonical’ href=’http://www.google-adsense-templates.co.uk/wordpress-theme-talian-with-adsense-and-seo-optimisation.html’
I’m testing this now and so far not hit any issues, Google appears to be combining the comment pages into one page as it should.
I’ve been testing a plugin called SEO Super Comments (significantly modified version) with the Talian theme. This plugin creates individual pages for comments (like this comment) that’s linked from these comment pages. The original plugin turns all comments into pages (so a one word comment gets a link!), I’ve modified the plugin to only link to comment pages with a certain number of characters, so a one line comment won’t get it’s own page.
I’m working on this plugin as it’s a real shame to have a site with lots of really good comments and not have them increase traffic to a site. Still at the testing phase, but I’m 99% sure I’ll include the modified plugin with the Talian theme soon. Note: the original SEO Super Comments plugin does not work out the box with Talian, so probably not a good idea trying the original (I couldn’t get it working). I’ve also made other improvements to this plugin.
David
WordPress SEO Duplicate Content and Canonical URLs
Duplicate Content WordPress Plugin
Hi David.
OK this may not be a Stallion problem but I need to eliminate that from my list.
I have a plugin called Splash Plugin by kevin Lamb.
The plugin creates a little tab at the edge of your screens which then pops out when you hover over it. You can view it at the above page.
I installed it and everything was working fine but for some reason it stopped working. I contacted the developer and I also went to my Hosting provider, Justhost, to make sure there weren’t any server issues (which i think there might have been because I was getting WordPress errors). Justhost came back and said “they’d fixed the problem” but no joy.
I also deleted the plugin and reinstalled as advised by the developer, but still no joy.
When I view the source code of the “Show Content” tab I created seems to have duplicate code in the page, see lines 20 through 68 of the page. Each “Show Content” tab is created similar to a post but it then relates to left and right hand splash bars.
So when I delete a plugin or delete the “Show Content” splash bar should Stallion delete the code somehow?
Apologies that this isn’t directly related to Stallion David, but of course it might be hence the need to eliminate it.
Any suggestions gratefully received.
regards
Nigel
Duplicate Content WordPress Plugin
WordPress Theme Duplicate Content?
Unlikely to be a theme issue, have no experience with the plugin, so no idea what the problem is.
David
Google Panda SEO : WordPress Template Footprint
The next issue is Google Panda is looking for a ‘heavy template footprint’. That of course does not mean WordPress per say. However, many websites like eHow got hit by Panda perhaps because of the template model of applying content.
Set up a template, add content and repeat. This was SEO of cira 2009. Now the rules have changed a bit and Google is not just looking for unique content between website, but also in your own site.
This is good news actually as it gives SEOs control over another on site factor.
The bad news is most people have set up websites not too far from the eHow model. That is write content and drop it in a template structure.
You can check duplicate content with something like this between your website pages. duplicatecontent.net Chances are most people will come up with 90% plus, even 95%.
I think one issues here is having sidebar widgets and code that is more autogenerated than hand coded. That is one almost every page of your website the same sidebar widgets. Now if you write super long articles this is not as bad, as the ratio is decreased. However, must people do not write 8 hours a day.
Therefore, my question is, as SEO is a moving target and needs to be aware of the changes taking place, is there any way to make WordPress seem more human?
That is less template structure? I was thinking maybe having rotating sidebar widgets. That is on one page views maybe ‘popular posts’ appear. While on another page the widget shows ‘recent posts’. That is the ability to make the pages between websites more unique.
This is the new SEO challenge. What can be done for these new 2011 rules and how can Stallion continue to improve perhaps in this regard. Any suggestions or plugin recommendations are highly welcomed. I think this would be one more thing to get an edge over the next website out there who uses a 2009 Modus operandi. So would love the help in this regard as I am trying to get a number of websites recovered from Panda.
Unique pages in a non-template format might help out people a lot avoid a site wide Panda ranking demotion. I have read a lot on Panda and I think template websites were hit, that is a set up and content drop it in.
Along the same lines, to get you thinking about the future of Stallion, what features can be added to take this to the next step. That is make content more human, even if it is not. I do not know if Search engines can tell if a related posts is auto-generated or someone hand-made links in the website.
I think the old alinks model is less powerful than the add links by hand.
As Always a big thank you and sincere appreciation for your efforts helping others.
P.S. Tags have been eliminated on my websites and well constructed categories remain. I am trying to make my websites less cluttered and chance of duplicate content or low quality tag pages with one or two posts sitting in them.
Google Panda SEO : WordPress Template Footprint
Google Panda Update and Duplicate Content
You inspired an article at Google Panda Update and Duplicate Content.
From a Stallion theme perspective quite a bit of the template content is unique that in most WordPress themes isn’t.
Comment headings like “Leave a reply to” include the title of the post, the heading for the related posts plugins I’ve added support for include the title of the posts. On archives the read more link is the title of the article.
If you use the Stallion 2011 Header Image area (added to Stallion 6.1) every post can have a unique header image.
It would be difficult and not user friendly to change the header beyond the above on a page by page basis, similar for the footer. If for example I could code a unique navigation menu for every article (which is probably not possible with WordPress) it would make navigation for visitors confusing, so I wouldn’t touch the top navigation menus.
For the sidebars I have been thinking about different widgets for different page types. This wouldn’t help with potential too much duplicate template content because posts would still use the same widgets. Doesn’t sound practical to be able to choose a custom sidebar setup on a post by post basis and could lead to confusion for visitors navigating a site.
I would tend to avoid random widgets, going to make navigation confusing.
eHow content isn’t very good. I looked at their article “How to Cancel a Credit Card Payment” and the first thing they suggest is:
Well duh! I thought the way to cancel a credit card payment was randomly phone banks until you happen to hit the right one :-)
The whole site is filled with low quality articles like that, failing on these Google high quality sites factors:
The big question is exactly how the Google Panda update made it possible for Google to penalize low quality content rather than duplicate content (the eHow content isn’t duplicate).
You are also reading too much into the duplicate content checker site you found. Of course pages on this site and all Stallion sites using the same layout are going to share the same HTML markup. There’s nothing wrong with that, most sites are built that way, headers the same, sidebars the same, footers the same and the basic HTML of the main content is the same. If Google did add duplicate HTML as part of the Panda Update it would have banned most sites!
What’s important is the non-markup content is unique and that’s text and images basically, some HTML markup like H1 headers add extra value to the content, but it’s the content not the HTML per se that’s ranked. If I were to use that duplicate checking tool (which I wouldn’t) I’d only look at the “Smart text similarity” figure. Comparing this page to the home page for example gave a 33% similarity while “HTML fingerprint” was at 90% as you’d expect. Making a few assumptions what they are referring to since there’s no key.
Comparing the Stallion home page to WordPress SEO Themes – AdSense Templates home page gives 94% “HTML fingerprint”, 37% “Smart text similarity”, yet the pages are unique. Pinch of salt comes to mind.
I would be more concerned with not adding enough content to posts as I mentioned at Google Panda Update and Duplicate Content because the template could ‘drown out’ the main content suggesting low quality.
Based on what Google is asking for they are looking for high quality reasonable sized content. If you can’t create reasonable sized articles why would Google want to index them? Do you like finding short articles with no substance when doing online research?
Try to think like Google, what do they want: high quality content.
David
Google Panda Update and Duplicate Content
Semantic Related Links and Google Panda SEO
I have printed and read in detail your SEO ideas on duplicate content. I appreciate these peals of wisdom.
As stated, last week after reading your article to change my navigation structure from hundreds of tags and categories to about ten well planned categories. The result was the elimination of over a thousand tags, yes more than 1000 tag pages on my websites in aggregate.
I did this all on my own accord because I was thinking you are right. Why would a user need that many tag pages. They were created as a few years ago when I was operating under the SEO idea that, “semantic related links” produced by tags in WordPress, would help on-site factors.
I think it did, however, I have to weigh this all now against ‘is it useful to the users’ SEO idea. I think now having one category and five tags per post, are not as useful as having one well thought out category match that the user might explore. Rather as you correctly stated focus on building content of greater quality and length better than on-site factors as your WordPress theme seems to take care of a lot of on-site in proper measure.
Another thing you wrote somewhere on your website was what use to work before Google Panda still works. So it does not hurt to have semantic text in your posts and keywords, just make sure that this does not violate the SEO principle of usefulness to the reader.
As a side note, I do tend to think super comments also helps duplicate content as the structure of these pages created are a big different than the rest of the website usually and if they are well written by users they have little SEO in mind and hence rank on long tail keywords you might not expect.
The other thing about Panda which I keep getting back to is you are right. The Web is filled with misinformation about SEO. SEO misinformation replicates and spreads until many are building their websites off of it. Reading forums on SEO are good but staying objective and separating the wisdom from the hype is hard.
I will be very curious to see if my tag zapping experiment will help with Google Panda. For sure I have to strive to improve the quality of my pages as always.
I also want to diversify my traffic sources as I believe one of the ironies of Panda is Google ranks websites that have high direct traffic or traffic not just from SERPs. I need to think of how I can diversify traffic sources. Social media is good everyone says, but you know how this is.
I am looking into more videos and ways to increase direct traffic. I am thinking for direct traffic it is helped by having a quality product to offer. This will also push one’s site up in organic search, post-panda.
Semantic Related Links and Google Panda SEO
Google Panda SEO - Remove WordPress Categories Widget?
Do you think it is a bad idea to remove categories from the sidebar? The reason I ask is, Google Panda negative points, I think, is looking for a template set up. If I have my categories in all my sidebars or footers this looks means more less unique links on every page of my website, same anchor text same thing.
In contrast removing as many redundant links each page looks more unique, which is a good think.
Pagerank could be distributed with either a site map (two clicks) or inter post links that have the category link in them.
Google Panda SEO - Remove WordPress Categories Widget?
PR Flow and Non Relevant Links Anchor Text
There’s a dilemma between making a site easy to navigate and adding lots of links to as much content as possible, (more links is better) and the relevance of the anchor text of the links from a specific page (ALL links from a page should contain relevant anchor text to that pages SERPs).
1,000 page WordPress blog, 25 categories, no tags, no monthly archives: that’s a setup I’ll have on some of my sites.
My usual setup is to have a sitewide category widget, recent posts widget, popular posts widget and recent comments widget.
Categories widget is mainly for indexing/passing PR reasons, without them deeper content is too many clicks away from the home page (where most PR flows through).
Recent Posts links change regularly, even on my sites where no new content is added I use a plugin (one of my plugins I’ve not got around to releasing) that randomly redates posts to today’s date (recent posts links are always changing). Having sitewide recent posts widget on a site with regular new content means that new content is going to be found quickly by search engines (you can’t assume your home page where the recent posts are linked will be visited daily), with any time sensitive content this is a must use widget.
Popular Posts widget (which is usually based on number of comments) is both for SEO reasons (you want your popular content to have more links, keep it popular) and so your visitors can easily find your interesting content, keep them on the site longer.
Recent comments mainly for visitors, keep them interested.
All the above has value and costs, the main value is more links to content, categories for example with a sitewide widget are passing a decent chunk of link benefit through the categories which gets to the posts (the important content). Without that stable PR flowing from every page of the site to the categories and into the posts where does the bulk of the PR to those posts come from?
Without categories or another sitewide source of link benefit to spread the benefit some posts are going to miss out on links, even with categories if your categories are multiple pages older content misses out: the 1,000 post example if spread over 30 categories it’s 33 posts per category, with the standard 10 posts per category each is three to four levels deep, the post on pages 3 and 4 aren’t going to get much link benefit.
Remember every link from a page gains a fair share of the link benefit from that page (not the site), if all you have is a sitemap with 1,000 links that’s practically no link benefit, sitemaps are pretty much worthless SEO wise because on large sites the link benefit passed is practically none and on small sites a sitemap isn’t needed anyway. When you understand how Googlebot works you can see having a page with 1,000 links as the only source of guaranteed links is not a good idea as the only guaranteed source of finding the 1,000 pages: Googlebot hits a page and randomly follows links, it doesn’t find a page with 1,000 links and spiders them one by one, it’s like a rat running randomly through a maze (the entire Internet) and each time it hits a page with links (junctions) it randomly follows one. A page with two links 50% chance a particular link will be followed, 1,000 links 0.1% a link will be followed. It’s why I don’t use sitemaps, my sitemap are the categories.
The above is dealing with PR flow, making sure an entire site can be easily spidered, the extreme conclusion of the above is let’s forget about categories and sitemaps and links to every post of a site on every page, lets have 1,000 links on every post!
Which then brings us to the SEO cost of links from a page. Anchor text from a page is more important than body text, if you want to go for maximum SEO a page will ONLY have links from the page using relevant anchor text. A page about Chicken Recipes should only have links with anchor text using Chicken Recipes and derivative SERPs (chicken, recipes, recipe, cooking, poultry….) so when Google indexed the page it screams this page is definitely about chicken recipes :-)
We now have two competing and important SEO factors, we want all content on a site well linked and we want all links from a page to have relevant anchor text.
My compromise is to try to limit the instances of non relevant anchor text to a minimum while making sure everything is linked.
If you have lots of spare time (I don’t) manually edit your posts to link to relevant posts on the site, if you have a page about chicken recipes edit the content and link from within the content to other pages on your site and other sites you own that’s about chicken and recipes derivative SERPs. Don’t have much spare time automate it with plugins like the Stallion Related Posts WordPress SEO Plugin (needs an update) and it will try to link to relevant content automatically for you.
With a large site you could break your categories widget into small more relevant sets for specific types of content. Manually create a text widget with say 5 categories that are highly relevant to one another and only use that text widget on those 5 categories (Stallion feature to limit widgets to specific categories/pages). With a large site and a lot of categories you could significantly reduce the number of category links sitewide, with my 30 category example broken into 5 category sets that’s 25 links removed from each page. This would still be enough for each post to be indexed etc… but significantly reduce the number of categories.
I’ve done something similar at which has over 132,000 posts over hundreds of categories, couldn’t have hundreds of categories on every post so limited it to the top level categories (27 of them) only: site search shows 230,000 pages indexed, so it’s working on the indexing front.
David
PR Flow and Non Relevant Links Anchor Text