Comment on Duplicate Content and Canonical URLs by SEO Dave.
You inspired an article at Google Panda Update and Duplicate Content.
From a Stallion theme perspective quite a bit of the template content is unique that in most WordPress themes isn’t.
Comment headings like “Leave a reply to” include the title of the post, the heading for the related posts plugins I’ve added support for include the title of the posts. On archives the read more link is the title of the article.
If you use the Stallion 2011 Header Image area (added to Stallion 6.1) every post can have a unique header image.
It would be difficult and not user friendly to change the header beyond the above on a page by page basis, similar for the footer. If for example I could code a unique navigation menu for every article (which is probably not possible with WordPress) it would make navigation for visitors confusing, so I wouldn’t touch the top navigation menus.
For the sidebars I have been thinking about different widgets for different page types. This wouldn’t help with potential too much duplicate template content because posts would still use the same widgets. Doesn’t sound practical to be able to choose a custom sidebar setup on a post by post basis and could lead to confusion for visitors navigating a site.
I would tend to avoid random widgets, going to make navigation confusing.
eHow content isn’t very good. I looked at their article “How to Cancel a Credit Card Payment” and the first thing they suggest is:
Identify the company or provider for the card that you’re interested in canceling a payment on. This information will be easily found in the bill it has sent you or perhaps on the card itself.
Well duh! I thought the way to cancel a credit card payment was randomly phone banks until you happen to hit the right one :-)
The whole site is filled with low quality articles like that, failing on these Google high quality sites factors:
Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
Does the article provide original content or information, original reporting, original research, or original analysis?
Does the page provide substantial value when compared to other pages in search results?
Is the site a recognized authority on its topic?
Does this article provide a complete or comprehensive description of the topic?
Does this article contain insightful analysis or interesting information that is beyond obvious?
Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
The big question is exactly how the Google Panda update made it possible for Google to penalize low quality content rather than duplicate content (the eHow content isn’t duplicate).
You are also reading too much into the duplicate content checker site you found. Of course pages on this site and all Stallion sites using the same layout are going to share the same HTML markup. There’s nothing wrong with that, most sites are built that way, headers the same, sidebars the same, footers the same and the basic HTML of the main content is the same. If Google did add duplicate HTML as part of the Panda Update it would have banned most sites!
What’s important is the non-markup content is unique and that’s text and images basically, some HTML markup like H1 headers add extra value to the content, but it’s the content not the HTML per se that’s ranked. If I were to use that duplicate checking tool (which I wouldn’t) I’d only look at the “Smart text similarity” figure. Comparing this page to the home page for example gave a 33% similarity while “HTML fingerprint” was at 90% as you’d expect. Making a few assumptions what they are referring to since there’s no key.
Comparing the Stallion home page to WordPress SEO Themes – AdSense Templates home page gives 94% “HTML fingerprint”, 37% “Smart text similarity”, yet the pages are unique. Pinch of salt comes to mind.
I would be more concerned with not adding enough content to posts as I mentioned at Google Panda Update and Duplicate Content because the template could ‘drown out’ the main content suggesting low quality.
Based on what Google is asking for they are looking for high quality reasonable sized content. If you can’t create reasonable sized articles why would Google want to index them? Do you like finding short articles with no substance when doing online research?
Try to think like Google, what do they want: high quality content.