Canonical URLS: Normalization of URLS by Greg

Canonical URLsIn 2005 at the International Conference on Computational Sciences and its Applications there was an article published titled, “On URL Normalization” by Sang Ho Lee, Sung Jin Kim, and Seok Hoo Hong.  This article discussed how a search engine spider can miss a lot of information on a site due to dynamic content which is displayed using the same page for numerous pages.  So they devised steps to turn these URLs into canonical URLs.  This gave webmasters a way of preventing search engines from finding duplicate content when the same information is being displayed from 2 or more pages on a website.  As well as give a website the ability to mark a page as unique because of some extra variables in the URL.  Allowing the search engines to take the Canonical URL tags and suggest that it bypass a page due to it being duplicated on another page, or highlight a page and make sure it gets crawled.

In some websites like a catalog, there can be multiple ways to get to the same page. One good example of this is clicking a link with a tracker on it. Depending on how many ways the link is being advertised, there could be multiple different trackers on it but ultimately the reader would be brought to one page. This means that one page may have multiple URLs that display the same information. In this example, when a Search Engine crawls the site, it would see these URLs as separate pages. Your site could be flagged as having duplicate content and have its score lowered due to this.

Imagine having an original Mickey Mantle Baseball Card.  You go to a copy machine and make a copy of it so you have an exact copy.   Now this copy is not worth as much as the original, imagine being able to say on the original that this is the original, and to read on the copy that it is copy (not the original).  That is what a canonical URL allows you to accomplish – you can declare which page is the original version of the information and which pages are simply copies.

Let us look at a catalog system.  Inside a catalog system there are usually multiple URLs that bring us to the same location.  Say we are looking at product A.  Sometimes people would be browsing through a category so the link might be like /product.php?item=a&category=sprockets.    Or if they did a search on the site they might be brought to /product.php?item=a.  Notice how the first URL includes the category and the second one doesn’t.  Using Canonical URLS we can specify which of these we would like to use as the main link for this item.  So by setting up a Canonical URL for this page of /products.php?item=a we are then able to hint to the search engines that this is the main page and you do not have to index any other page with this Canonical URL on it.

In summary, on a positive note, utilizing Canonical urls can help to eliminate issues (such as your site being blacklisted) when you end up with multiple URLs on your website that all lead to the same page, giving the appearance to the search engines that you have duplicate pages of content, (which can be a bad thing) when in actuality you just have various ways of getting there, and the canonical url structure helps you to specify which URL is the master or primary URL.

There are times depending upon how your website is structured when canonical URLs should be turned off in order for all the pages of content to be cataloged properly by the search engines. It is highly recommended that you consult with an SEO web content expert in order to make this decision.