Pop quiz: what’s the difference between the following URLs:
- http://website.com
- http://www.website.com
- http://website.com/default.php
- http://www.website.com/default.php
Give up? If you’re a user, then chances you expect all of those URLs will lead you to the same page. Robots, however, are not as good at determining if pages are the same, so they often store each separately. A big part of how search engines rank pages is based on how many external links those pages have. If other sites on the web link to the different versions of your home page, then search engines may calculate the value of each URL separately, based on the number of links to each version. This can effectively diminish the potential rank your page would have if it were found (and linked to) by only one URL.
The practice of consolidating all versions of a page under one URL is referred to as “canonicalization” (because you collapse all versions under the “canonical” or true version). The four examples listed above are the most common, but there are potentially many, many URLs that lead you to the same page. By adhering to several best practices, you should be able to address 90% of common site-wide canonicalization issues on your site and consequently increase how your site ranks.
Recommendation
The solution is to be explicit about the canonical form of your URLs. Following are four best practices to achieve this, with specific code and configuration examples.
- Select WWW or Non-WWW, then redirect the other option to your preferred version.The hard part is choosing if you want your site to be “www.website.com” or simply “website.com”. There is no right answer for every company so you’ll have to figure this out on your own (but, removing the “www.” saves your customers 4 keystrokes, which really add up on a mobile device, and it makes your brand the first thing your customers see).Once you’ve selected, you then need to find a way to trap all requests to your application, check which form is being used, and if it is not the correct form, initiate a 301 Redirect to the correct form. For example, if the user types in wikipedia.org, they will automatically get redirected to www.wikipedia.org.
- Remove the default filename from the end of your URLs. All web servers allow you to select one or more default filenames to serve when the browser requests a directory. For example, this website is run on IIS, so when the user requests “http://janeandrobot.com” we really serve “http://janeandrobot.com/default.aspx”. In the same code you use to enforce www vs. non-www, you should also check and see if the default filename is at the end of the URL and then trim it off. So, “http://janeandrobot.com/default.aspx” would be converted to “http://janeandrobot.com/”.
- Link internally to the canonical form of your URL. Make sureyou always link to the proper canonical form of your URLs from within your site. This practice helps encourage external sites to link to the site using the correct version as well (since those linking to you often cut and paste from your pages or RSS feed.) Note there is a degree of diminishing returns here, so you don’t need to spend the whole weekend hunting down every last URL. Just make sure to review your site’s primary navigation, top landing pages and blog.
- Use Google Webmaster Tools to tell Google the correct form. Implementing these best practices on your site are ideal, since they address the problem for all search engines and give your customers a consistent, properly branded navigation experience. But what can you do if you reviewed steps 1-3 and found that it would take six months to implement on your production site? There is something that you can do today: using Google’s Webmaster Tools, you can navigate to the “Tools” section and select “Set preferred domain.” Here you can specify if you’d like Google to use “www.website.com” or “website.com” in their index and search results, as well as consolidate links to both versions. Note that while this will provide you short-term benefit from Google, it does not help you in Yahoo! or Live Search.
Checking Your Website
To check your website to see if you’re handling domain canonicalization correctly, you can use the Live HTTP Headers add-on for Firefox.

Open the Live HTTP Headers tool, then try all the variations of the URL at several different levels to ensure they all redirect back to the appropriate canonical form. As you’re checking each variation, look at the HTTP headers using the Firefox plug-in to ensure they are all 301 redirects (and not, for instance, 302 redirects).
Here’s an example test case:
| Canonical URL Form | Test Case | Test Result |
|
http://janeandrobot.com |
janeandrobot.com | Success |
| janeandrobot.com/default.aspx | Success | |
| www.janeandrobot.com | Success | |
| www.janeandrobot.com/default.aspx | Success | |
|
http://janeandrobot.com/about.aspx |
janeandrobot.com/about.aspx | Success |
| www.janeandrobot.com/about.aspx | Success | |
|
http://janeandrobot.com/folder |
janeandrobot.com/folder | Success |
| janeandrobot.com/folder/default.aspx | Success | |
| www.janeandrobot.com/folder | Success | |
| www.janeandrobot.com/folder/default.aspx | Success | |
|
http://janeandrobot.com/folder/test.aspx |
janeandrobot.com/folder/test.aspx | Success |
| www.janeandrobot.com/folder/test.aspx | Success |
Examples
Canonicalization issues are very common and being an Microsoft employee, I don’t have to go far to find an example. Check out the website for Microsoft’s annual Mix conference for web developers.

I was able to generate the table below by plugging the common URL variations into Yahoo’s Site Explorer to find a list of links to each variation.
| URL Variation | Number of Links from within website | Number of Links from outside websites |
|
http://visitmix.com |
17,663 | 59,498 |
|
http://www.visitmix.com |
9,074 | 22,179 |
|
http://visitmix.com/default.aspx |
0 | 22 |
|
http://www.visitmix.com/default.aspx |
0 | 12 |
Looking through these numbers yields some interesting insights:
- Not doing “www” vs “non-www” is definitely hurting their ranking – you can tell because they have a similar number of inlinks for each version. Ranking is done on a logarithmic scale, so every additional link is more valuable than the one before. If they redirected all versions to one canonical form, search engines would see their home page has having 81,711 external links, would would be a substantial boost.
- They are not good about using the same version of the URL within their site. If you’re not cognizant of this on your site, others won’t be either. It looks like they use visitmix.com about 75% of the time internally, and www.visitmix.com the other 25%.
Additional Resources
- Matt Cutt’s Article on Canonicalization
- Additional Canonicalization Scenarios from Ian Ring – a few other great scnearios like capitalization and other default values.
- Yahoo Site Explorer – see how many inlinks you have for each URL variation
- Live HTTP Headers – check your redirects to make sure you’re implementing 301 redirects, not 302s.
{ 2 trackbacks }
{ 24 comments… read them below or add one }
This is one of the most overlooked items in my opinion. Great post.
I’ve heard using this on page redirection may be considered as a 302 redirection in the eyes of crawlers, because at first crawler is going to that page and read the code then it gets the instruction to move to the directed page.
n Where as i know the safest way is to move yourself to some Linux server which will be using apache and it stores a file named [b].htaccess[/b] you can give instructions of redirection within that file, cuz whenever a request is generated the crawler first reads into the .htaccess file this tells the crawler which page is to shown for the requested one and thus it is the complete 301 redirection. :)
Hi Kattu – You’re absolutely right that a 301 is the way to go. There are multiple ways of implementing a 301 (including using .htaccess if your server is Apache, as you’ve described). We’ll be posting follow up articles about implementation techniques.
As for what you mention in your first paragraph, when you use an on page meta refresh, crawlers may interpret that differently than you expect. We’ll be diving into those details in our implementation article as well.
Hm – interestingly enough you have a link or two pointing to janeandrobot.com/default.aspx what does not 301 to janeandrobot.com.
I just thought I would give you a heads up about that. It looks like you covered it in the test case, so it might be a server/load balancing issue. Checked your header on that page and its still showing as 200.
/beep.
Not only is this particular article/tutorial brilliant, but so far the entire Jane + Robot site says it all exactly as it always should have been said – and all in one place. Things I try to tell my clients every day, with varying degrees of success.
Even better, the site isn’t only just articles, it’s an authoritative resource that cites other documentation. THANK YOU.
@Ashley – good catch, as many of you know implementing proper canonicalization can be a lot more difficult than just writing down the best practices :)
We’re still working on fixing the canonicalization of this site, we currently are tracking down a bug in our content management system, hopefully it will be fixed soon!
*** [i] removing the "www." saves your customers 4 keystrokes[/i] ***
If you have the site-wide 301 redirect from [i]non-www[/i] to [i]www[/i] in place, then the visitor can omit typing the [i]www[/i] in, and your redirect will deliver them to the correct URL and to the correct content anyway.
There are good reasons to use the [i]www[/i] version as the canonical form, not least the ability to do:
[i]site:domain.com -inurl:www[/i]
to make sure that no other forms, other than www that is, have been indexed.
You can’t do that if your redirect runs the other way.
The reason we advise clients to [b]always[/b] prefer www to non-www is that it makes people notice the URL in print advertising, signage, and other media. "www." is a very powerful visual cue to the presence of a URL. Having the brand "stand out" through the lack of www is only important in those cases where the URL is shown without other material, which is rare and inadvisable.
@Josh and @G1SMD – good points, you’ve come real close to selling me on "www"
I’ll go with Josh on the www
I’m curious now though about the use of subdomains. I’ve heard from both camps (1. builds pagerank on the primary domain) and (2. considered totally separate)
I’m personally a fan of the non-www addresses. For most clients I’ll use the www because they often assume it and print it on their marketing material. For me … it’s unnecessary. I think people queue off the .com more than the www, and having the www in print can make it more difficult visually for a client to remember the domain name (especially on a vehicle or billboard). The most important thing for them to remember is the domain name (because you DO have your .com registered, right?) when you have your redirects in place.
Even if your site does use the www as the "real" address, you can still advertise the site without the www in both print media and broadcasting channels, and let the redirect fix up the URL after the user types it in.
For example, when I want to do a search at Google, I type google.com in to the browser, nothing more. I don’t bother with the www, as Google’s own redirect automatically adds it on for me and then lets me search.
I think the www vs. non-www decision should also consider your audience. If you’re trying to reach web-savvy techies, then by all means omit the www, but if you’re trying to reach less tech-savvy "civilians" and/or doing a lot of offline promotion, then keep the www (for the reasons RKF mentions above). In either case, it is important to be consistent in your usage across all media (you might call this "canonical branding") since you never know where someone will be when they jot down your URL and link to you.
This information regarding cannibalization of the URL will definitely help. But how to use Live HTTP Headers tool to know the exact status of site regarding canonical issues.
Good idea of posting that article..One day I question it to my self, but because of lazyness to find some answer I forgot to research it..and now it catch my attention when i read this article..Wow..now I know..thanks.
Hi Nathan,
Great post!
Would you mind sharing the commands you used with Yahoo Site Explorer (or what did you select on the drop-downs) to generate the "Number of Links from within website" as per your table above. Maybe just one example?
Thanks,
Gustavo
Great post.. I thought all 4 url’s are same. But you have created doubt in my mind with major differences:D:D hahah..
appreciated mate..
regards,
<a href="http://www.casinator.com/freerolls.php">freerolls</a>
I’ve heard using this on page redirection may be considered as a 302 redirection in the eyes of crawlers, because at first crawler is going to that page and read the code then it gets the instruction to move to the directed page.
regards,
<a href="http://www.powernetshop.at/">Auto Hifi</a>
Great, great post, Nathan! Thank you! I follow most of it and agree with it, but I have a question.
I hear you when you say (item 2) "remove your default filename from the end of your URLs."
So you would definitely not 301 http://www.mydomain.com to http://www.mydomain.com/default.aspx. You would instead let http://www.mydomain.com return a "200 Found" header and display the default page via IIS.
But what, if anything, do you do to the http://www.mydomain.com/default.aspx page? Is it appropriate to 301 it to the canonical http://www.mydomain.com?
I worry that even if I myself don’t include the default page in my URLs, external sites might somehow directly link to the default page, it could get spidered, etc. and come to have a page rank of its own.
Oops, sorry, those links rendered weirdly, I hope you can make sense of what i was babbling :)
I think it is great article on domian canonicalization. I have a site on .asp. The site index.asp comes with the home page. Can I redirect it also with 301? One of the my friends told me to don’t redirect because it is home page. Suggest me properly.
Thanks
http://mobilephonesandtechnologies.blogspot.com/
[b]@Gustavo[/b] – for the site explorer tool, these are the options I select: (a) Inlinks (b) Show Inlinks: Except from this domain (c) to: Only this URL.
What this does is to remove all the links pointing to this page from your website so you can see the effect of external links (which is far more important in ranking to search engines). Here’s a link to the site explorer tool with the aforementioned options enabled:
http://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2Fvisitmix.com&bwm=i&bwmo=d&bwmf=u
[b]@Tom Funk[/b] – hey tom, it is completely okay if folks still link to mydomain.com/default.aspx. If you’re using the redirect I recommend above, when a search engine encounters this URL, your website will 301 redirect them back to the canonical version, mydomain.com and it will never store the version of the URL with the default filename.
[b]@Nisha Singh[/b] – the redirecting should still work in your case as well. I recommend trying it out and then going through all of the test cases I listed above. If they all work, you should be good.
I like to have my site without the WWW. i feel it makes it state out more in the search engine results!. I also write all of my heads in CAPS. not sure if all this helps but I feel that it does.
Great post Nathan, I’ve only just discovered your site and have bookmarked it now. This article is very simple to understand and I for one will be checking my sites tonight. I also have not specified with Google webmaster tools a preferred domain, but I will now.
Thanks again