Pop quiz: what's the difference between the following
URLs:
- http://website.com
- http://www.website.com
- http://website.com/default.php
- http://www.website.com/default.php
Give up? If you're a user, then chances you expect all of
those URLs will lead you to the same page. Robots,
however, are not as good at determining if pages are the
same, so they often store each separately. A big part of how search engines rank pages is based on how many external links those pages have. If other sites on the web link to the different versions of your home page, then search engines may calculate the value of each URL separately, based on the number of links to each version. This can effectively diminish the potential rank
your page would have if it were found (and linked to) by only one URL.
The practice of consolidating all versions of a page under one URL is
referred to as "canonicalization" (because you collapse all versions under the "canonical" or true version). The four examples listed
above are the most common, but there are
potentially many, many URLs that lead you to the
same page. By adhering to several best practices, you should
be able to address 90% of common site-wide canonicalization issues
on your site and consequently increase how your site ranks.
Recommendation
The solution is to be
explicit about the canonical form of your URLs. Following are four best practices to achieve this, with
specific code and configuration examples.
-
Select WWW or Non-WWW, then redirect the other
option to your preferred version.
The hard part is choosing if you want your site to
be "www.website.com" or simply "website.com". There is no right answer for every
company so you'll have to figure this out on your own
(but, removing the "www." saves your customers 4
keystrokes, which really add up on a mobile device, and
it makes your brand the first thing your customers see).
Once you've selected, you then need to find a way to
trap all requests to your application, check which form
is being used, and if it is not the correct form, initiate a 301 Redirect to the correct form.
For example, if the user types in wikipedia.org,
they will automatically get redirected to
www.wikipedia.org.
-
Remove the default filename from the end of your
URLs.
All web servers allow you to select one or more default
filenames to serve when the browser requests a
directory. For example, this website is run on IIS, so
when the user requests "http://janeandrobot.com"
we really serve
"http://janeandrobot.com/default.aspx".
In the same code you use to enforce www vs. non-www, you
should also check and see if the default filename is at
the end of the URL and then trim it off. So,
"http://janeandrobot.com/default.aspx" would be
converted to "http://janeandrobot.com".
-
Link internally to the
canonical form of your URL.
Make sure
you always link to the proper canonical form of your
URLs from within your site. This practice helps encourage external sites to link to the site using the correct version as well (since those linking to you often cut and paste from your pages or RSS feed.) Note
there is a degree of diminishing returns here, so you don't
need to spend the whole weekend hunting down every last
URL. Just make sure to review your site's primary
navigation, top landing pages and blog.
-
Use Google Webmaster Tools to tell Google the
correct form.
Implementing these best practices on your site are ideal, since they address the problem for
all search engines and give your customers a
consistent, properly branded navigation experience. But what can you do if you reviewed
steps 1-3 and found that it would take six months to
implement on your production site? There is
something that you can do today: using
Google's Webmaster
Tools, you can navigate to the "Tools"
section and select "Set preferred domain." Here you can specify if you'd like Google to use
"www.website.com" or "website.com" in
their index and search results, as well as consolidate links to both versions. Note that while this
will provide you short-term benefit from Google, it does
not help you in Yahoo! or Live Search.
Checking Your Website
To check your website to see if you're handling domain
canonicalization correctly, you can use the
Live HTTP Headers add-on for Firefox.
Open the Live HTTP Headers tool, then
try all the variations of the URL at several different
levels to ensure they all redirect back to the appropriate
canonical form. As you're checking each variation, look at the HTTP headers using the Firefox
plug-in to ensure they are all 301 redirects (and not, for instance, 302
redirects).
Here's an example test case:
Examples
Canonicalization issues are very common and being an
Microsoft employee, I don't have to go far to find an
example. Check out the website for Microsoft's annual
Mix conference for web
developers.
I was able to generate the table below by plugging the
common URL variations into Yahoo's Site Explorer to find a list of links to each variation.
Looking through these numbers yields some interesting insights:
-
Not doing "www" vs "non-www" is
definitely hurting their ranking - you can tell because
they have a similar number of inlinks for each version.
Ranking is done on a logarithmic scale, so every
additional link is more valuable than the one before. If they redirected all versions to one canonical form, search engines would see their home page has having 81,711 external links, would would be a substantial boost.
-
They are not good about using the same version of the
URL within their site. If you're not cognizant of this
on your site, others won't be either. It looks like they use
visitmix.com about 75% of the time internally, and
www.visitmix.com the other 25%.
Additional Resources