Duplicate Content and Canonical URLs

There's usually no need to include the same content in multiple places on your website. After all, the whole point of HTML is to link from one place to another so it's always possible to have only one instance of the content in question. The only times when it's necessary to have two distinct URLs for the same content is when the surrounding navigation, headers, etc. need to change depending on the path the user took to get to it.

For example, when users are browsing through an online store by category it's a better experience when links to the category they were just browsing appear on the page showing information for a product. That allows them to quickly get back to look at other products. In cases like this, having only one URL for the product and keeping track of what the user was viewing in their session quickly gets out of hand. The user can, and often will, have more than one tab open viewing the same site and tracking which links were clicked from where can be complicated. From a programming perspective, it's far easier to have multiple URLs that show the same product, with the different URLs triggering the different navigation display.

But what effect will taking the easy route have on SEO? In the past, not much. For Google, however, this has changed somewhat since the Panda update in February of 2011. That update targeted "low quality" sites, specifically content farms that often have copy-pasted content matching a wide variety of search results and link farms set up to boost another site's SEO by linking back to it with specific keywords. Both of these types of farm site would frequently have largely identical content available at multiple URLs and multiple domains.

Prior to the Panda update, the main "negative" effect of having more than one URL for a given piece of content was that Google would list only one of them in their search results. After Panda, duplicate content pages lost a small amount of Page Rank. Google hasn't specified specifically how much or from what areas, but from what I've read the effect is largest when the duplicate content occurs across domains. It's also likely that keywords in the alternate URLs will not be used for ranking. The actual effect on non-malicious duplicate content, however, should be small. Sites with multiple domains should be redirecting the aliases to their main domain. They should also be using basically the same keywords in all of the URLs that reach a specific page.

However, if your site will have a significant amount of content that can be accessed from more than one URL, consider using rel=canonical. This is a method of indicating which URL you want search engines to use for the duplicated content. The main requirement is that one of the alternative URLs needs to be selected as the canonical one. All of the other URLs will then include a link tag in the page head pointing to the canonical url.

	<link rel="canonical" href="http://example.com/path/to/item" />

There is no need to include a rel=canonical link on the canonical page itself. This is a common occurrence, and likely won't have any negative effects since the search engines know to account for it. Specifically, Google has indicated that their crawler handles this case without issue, whereas Bing has indicated they prefer to not see rel=canonical on the canonical page although it doesn't really hurt your ranking if it happens.

to blog