Continuing my blog theme of covering off some basic SEO as it pertains to the DNN Platform, this blog is going to be a primer for really understanding the importance of Canonical Urls, why that is, and how that affects your site.
Canonical? What does that even mean?
If you look up a general dictionary, the word ‘canonical’ has origins in canon law – the meaning is ‘required by canon law’. What is that? Generally, that can be broken down to conformance with well-established rules or procedures.
Still confused? Here’s a simple one : Canonical means the established authority on something.
When applied to Urls, a canonical url is the established, correct, standard, original, required, single, only, most important Url.
Let’s go back further to the early days of the internet, when web servers were a simple way to serve up content and to interlink content between different network types. The idea that you could have the same document served up from the same website under two different names would mean someone made a mistake and uploaded the same thing twice with two different filenames. A visitor would be confused which was the correct one. They would want the canonical document so they didn’t read the wrong one.
Taking a step sideways, we come across the term CNAME when we are dealing with setting up domain names. CNAME is an abbreviation of Canonical Name. If you create a CNAME record to associate a domain with your website, what you are doing is saying ‘new.example.com’ is the Canonical Name for ‘old.example.com’.
Combining those two concepts together and you get to the Canonical Url. It’s the accepted, standard Url for a particular piece of content on the internet. Unlike CNAME, however, it’s not an official ‘internet’ term like URL or DNS, it’s something that has appeared in the vernacular.
Why is the Canonical URL so important?
It really was Google who first started to make the desire to have canonical URLs known to the webmasters of the world. That’s because it is very difficult for a search engine to make a determination which value is correct if it comes across the same (or extremely similar) piece of content with two different URLs. Going back to our simple duplicated-file example – most humans would be able to work out which one is more current – perhaps the filename is called ‘FINAL’ on one and ‘DRAFT’ on another. But search engines can’t use judgement calls like that and still remain correct all the time.
In a world of dynamic pages where websites can generate 10s of thousands of different URLs, the task becomes that much harder.
When the same content is available from two (or more) different URLs, it is called duplicate content. Search engines like Google have decided they don’t like duplicate content and pages with duplicate content don’t rank well. I don’t know the exact reasons : I can only speculate. If you asked me to speculate, then I would tell you that duplicate content:
1. Splits the link value of any incoming links – if 5 people link to http://example.com/mypage and 5 people link to http://example.com/my-page – and it’s the same content – all other things being equal, the relevance of that page has had it’s influence cut in half.
2. Can be an indicator of generated and/or low quality content – lots of websites generate endless links, like blog archive pages, calendars, photo albums – things where you can have many pages and each individual page is not necessarily something that would be relevant to a search. These can be the pages that are essentially duplicates of each other. A search engine doesn’t want the same thing clogging up a page of search results. Each link should be unique.
3. Duplicate content, if allowed to proliferate, could be a vector for people trying to game the search results. Who wouldn’t want to fill up the front page of search results with 10 different versions of their content.
Remember, that’s my speculation – from experience – but speculation nonetheless.
The fact is that duplicate content on your site can hurt your rankings, and you need to pay attention to it.
How to ensure Duplicate Content is not affecting your site
There are two primary methods in the fight against duplicate content :
- 301 Permanent Redirects : This means that a status code of 301 is sent back with a new location when a request is made for the Url.
- Canonical Link element : This is a small Html tag included in the header of a web page, which lists the desired canonical URL for the page.
The choice of which to use depends on the circumstance. Sometimes you want to show largely the same content for two different Urls (such as two different view of the same content, such as a sorted list), and sometimes you want to force visitors to only see the canonical Url.
Canonical Urls in DNN
As DNN is a CMS with dynamically generated content, there is a higher risk of generating duplicate content where you serve up the same page with a variety of Urls. Take these cases:
These can all point to the same piece of content – the home page of a DNN site. Compound these duplicates with variable URLs generated from a third-party content module, and you could have hundreds, or thousands of duplicate URLs in a DNN site.
Of course, this is nothing new and DNN has had many features in it for a while to combat the duplicate content problem. The most obvious issue is having your site listed with example.com and www.example.com. This is easily caused by having the two different versions of the same domain pointed at your site – something that most people do.
In DNN, you can set the site alias (domain) to either be a primary domain, or a canonical domain. Primary means that a 301 redirect will be issued. Canonical means that the site will be shown with a valid requested alias, but a Canonical Link element will be generated for the page. In most cases I recommend using ‘Primary’, but some people have a particular strategy for having multiple aliases to show the same content – for these the ‘Canonical’ option works better.
But there are other issues that DNN needs to handle as a CMS. There have been various versions of DNN URLs over the years. There are the different versions of the Home page URL that are all accessible from within a DNN site. It’s possible for a DNN site to have had several versions over the years and different versions it might have had. Essentially, the older a DNN site is, the more chance that it has had, or still has, some duplicate content problems.
New URL Canonicalization Changes and Improvements in DNN 7.1
Now that the DNN 7.1 CTP is out, people can see the new changes within the ‘Advanced’ Friendly Url Provider which specifically target and solve problems with duplicate content by using 301 redirects to the canonical URL for a DNN page.
Some of these are:
Home Page Url : the home page for 7.1 sites with Advanced URLs switched on shows as the ‘site root’ – http://example.com/ – there is no longer any /home.aspx variation of the home page. All created Urls reference the site root, and any request to /Home.aspx or /Home will redirect back to the site root.
Automatic Redirect of Old URLs : Old URLs in DNN are like /default.aspx?TabId=xx Urls . Any request for the old style of querystring based DNN page URLs will be redirected back to the canonical URL for the page – which DNN knows is the best, most friendly URL for the page. You can try it on a new CTP install – request http://example.com/default.aspx?tabId=56 or http://example.com/tabid/56/default.aspx – you’ll be redirected back to the Canonical URL for that page : http://example.com/Getting-Started
Automatic Redirect to hyphenated Urls : A new feature for URLs in DNN 7.1 Advanced URLs is the fact that words within the URL (derived from the page name) are separated with ‘-‘. This gives clean separation and makes the URL more readable (and ultimately more searchable) – and if you request the old version of the page, without the hyphen, you’ll be redirected back to the canonical URL, which does have the hyphen. So http://example.com/aboutus will redirect to http://example.com/about-us automatically.
Of course, these are but a few of the URL improvements which are becoming available in DNN 7.1. I will cover those more in the coming weeks – you should jump on, download the 7.1 CTP and see what other discoveries you can make.
Questions or comments? – feel free to quiz away in the comments below.