Canonical tags are a useful signal to indicate to search engines the 'preferred version' of two similar or duplicate URLs, and as such they are one of the primary technical solutions used to solve duplicate content issues.
On sites that have lots of pages with similar content, utilising canonical tags can be an important method of helping ensure the 'correct' URLs are indexed.
Issues with canonicals can have a profound impact upon the indexing of a site, so it is important that they are analysed as part of the 'indexability' portion of a website audit.
This guide explains how to use Sitebulb website audit software to audit canonical tags, and how to unearth issues with canonicals that may require attention.
You can instruct Sitebulb to audit canonical tags by ensuring that the 'Search Engine Optimisation' option is checked during the audit setup (it is always checked by default on new projects):
Once you have made your other audit data selections and set the crawler running, wait for it to complete the audit. Then, head to the 'Indexability' report using the left hand menu:
In this report you will find all the data about robots and indexing signals, such as noindex and canonicals. In particular, you will see two pie charts which show the Indexability Status and Canonicals respectively.
The right hand chart has only 4 possible options:
Clicking on any of these segments will bring you through to a URL List of the corresponding URLs. For example, in the right hand chart if I click the 'To Internal URL' segment, this brings me through to a URL List that shows all the URLs which have a canonical pointing at another internal URL:
In general, there is not much benefit to checking URLs that have a canonical to self, but all other statuses are worth checking. Where a canonical has been set, the thing you need to ensure is that the canonical has been set to the correct URL.
For example, you may find a site that allows URLs with and without trailing slashes to return a 200 status (i.e. there is no redirect between them). If a canonical is used to indicate one of these options to be used for indexing, you want to ensure that the canonical is correctly identifying the right URL (e.g. /pages/page-a has a canonical to /pages/page-a/).
If you are unsure, certain things can help make it clear which option is preferred:
Auditing canonical tags in this way can require a level of experience or a deeper understanding of the specific website and how it is set up.
There can also be technical errors or inconsistencies due to the way in which canonicals are set up, that may mean that search engines will simply ignore them when making indexing decisions.
Sitebulb will automatically check every internal URL for a wide range of potential canonical issues, and if any issues are found, these will be presented via the 'Hints' tab;
While these Indexability Hints can also contain other indexability issues outside of canonicals, this is the place to go if you want to find out if the website has any canonical tag issues.
As you can see, each Hint is given an 'importance' rating (Critical/High/Medium/Low) and a percentage coverage, so you can quickly see at a glace how serious or widespread an issue is. As with all other Hints in Sitebulb, you can explore further by clicking 'View URLs' to see the list of affected URLs, or the 'Learn more about this hint' button (bottom left) will take you to an explainer page on our website about the specific Hint (you can also learn more about Hints here).
Google's ability to render web pages has improved considerably over the last few years, and they now claim the following;
There are some more subtleties surrounding this, so if you want to learn more we recommend you check out our guide How JavaScript Rendering Affects Google Indexing, however the two statements above are enough to conclude that it is important to consider the affects of rendering when auditing indexability signals, such as canonicals.
In particular, you need to know how to crawl the website to ensure you are working with the correct data. If you are using the HTML Crawler to audit your website, yet JavaScript is changing canonicals during rendering, you will not be looking at the same data as Google and may make incorrect assumptions.
To figure out the impact of rendering, make use of Sitebulb's response vs render comparison report. One of the elements this report will show you is the impact upon canonicals - in the right hand pie chart below:
The pie chart segments correspond to:
The intention of this report is as a diagnostic device - use it to explore the affects of JavaScript, and then dig in further if you see something that warrants further attention.
The most straightforward outcome is of course that everything is listed as 'No Change.' This means you don't need to dig any further, and in fact means that the HTML Crawler is sufficient for future analyses, as the canonicals are not dependent on JavaScript, which effectively means that Response HTML = Rendered HTML (at least in terms of the canonicals).
However, if there are differences in the canonical between the response and rendered HTML, you should always ensure to audit this website using the 'Chrome Crawler', as you will be using inaccurate date otherwise.