In most website audits, including recommendations to fix broken links is a basic requirement: if they are present, you pretty much always want to report on them.
This is because broken links tick a bunch of boxes:
So it's reasonably straightforward getting buy-in to get them fixed. For background information on what broken links are and why they are important, check out our guide, Ultimate Guide to Broken Links.
This article is focused on how to find broken links using Sitebulb website audit software, and breaks down into a few sections:
Broken links are NOT pages that return a 404 response - broken links are actually the links that point to these 404 pages (or 410, we'll come on to that).
The difference is subtle but important. If Page-A returns a 404 'not found' status code, and Page-B contains a link that points at Page A, we can describe this individual link as a 'broken link.'
Sitebulb classes a broken link as any internal link that points at a page with a HTTP status code of 404 or 410. Lots of other crawling tools group all the 4XX errors in one, but we don't believe this promotes the right mindset for tackling broken links.
In essence, both of these mean that the resource is not available - the server is responding to say 'the content is not here.' This means that if a user lands on the page, they will not see any meaningful content.
By comparison, all other 4XX status codes do not mean 'the content isn't here', and in fact a lot of them mean 'you aren't allowed to view this content' (check here for a complete list).
Consider an example, a 403 'Forbidden' status:
It is often the case with a 403 that a normal user accessing the page in a browser actually is able to view the content, because 403 does not mean 'the content is not here' (403 instead relates to an authorization issue, which can simply happen because you are using a crawler to access the page, rather than a browser).
This is why Sitebulb restricts the definition of 'broken links' to only be links to URLs which return 404 or 410.
Firstly, you'll need to crawl the website, so start a new website audit or project. Sitebulb will collect links by default, so any broken links will automatically be picked up by Sitebulb in your audit.
Links data is accessible in many different areas of the Sitebulb interface, allowing you to analyse widespread issues and then dig down into specific details. In this section we'll cover how to explore broken links within the Sitebulb interface, and further below we'll cover how to export the links for spreadsheet analysis.
If you wish to explore a single 404/410 URL, you can explore the incoming links that point to this URL by navigating to the 'URL Details' view.
From a URL List (e.g. list of 'Broken internal URLs') you can get there by clicking on the burger menu alongside the URL, which opens up the URL Details view, and from there you can navigate to Incoming Links.
The page you end up on allows you to see in more detail the incoming links pointing to the broken URL:
Note: You can also navigate straight to the URL Details page for a single URL by pasting it into the 'Search URLs' box in the top right:
For this, you want to head to the 'Links' report, using the left-hand navigation.
Scroll down to see the 'Internal Link Status' table, as shown below, where you can find the 'Broken (404 or 410)' row:
You may have noticed above that we have these two columns 'All' and 'Unique'. The 'All' column represents every single link found, whereas 'Unique' represents links that have unique anchor text, target URL and link location (i.e. a templated header link from 500 pages only counts as 1 unique link). We'll cover the significance of this further down.
Click on either of these values to see the link data within the Sitebulb user-interface, using the Link Explorer:
This list will show you every referring-target URL pair:
In the case of broken links, the referring URL will be status 200, and the target URL will be status 404/410.
Due to the nature of internal links, you will normally see some URLs repeated in both the referring and target URL columns (you can see this in the image above). This is because the same URL can link out to a single page more than once, and any page can (and usually does) have incoming links from multiple other URLs on the same website.
At this point, your understanding of 'where broken links live' within the website is pretty limited. However, the Link Explorer allows you to... explore the data further, for instance applying URL or path filters on the Target URL.
In the example above, I've restricted the list to only show target URLs in the /Blog/post/2020/ subfolder - and can see there are 4 different links across 3 different URLs.
Most often, SEOs wish to get broken links into spreadsheet format, typically because this is the most straightforward thing to give to a developer as a 'list of things to fix.'
From the Link Explorer, hit the green Export button, and then from the dropdown select either CSV or Google Sheets.
As a minimum, sharing a links spreadsheet like this should be enough for a developer to get going with the fixes:
Whilst a list like the one above is 'ok', it is not particularly helpful for the client/developer/content editor who has to go through all these pages and fix the broken links. This is because there has been no differentiation applied for the three types of broken link:
Applying some analysis to the data and presenting recommendations in a clear, prioritised format will go a long way to:
Which in combination make you more likely to demonstrate your own value as an SEO or consultant.
As such, splitting out broken links into the 3 categories above will improve the quality of your audit report. Sitebulb has a column which aids in this analysis, the 'Location' column:
This allows you to separate out your broken links into 4 groups: Header, Footer, Navigation and Content. Header and Footer links will normally be template based (and typically lead to lots of broken links), Content links will normally be one-off instances, and Navigation links can vary.
As we mentioned earlier, Sitebulb's Links report includes a template which differentiates 'All' vs 'Unique' links. Analysing this data table can be helpful for understanding if most of your broken links are templated (e.g. Header/Footer) or one-offs (e.g. Content). Consider the table below, we can see that the site has almost 15,000 broken links, but only 82 unique links. This suggests that at least some of those broken links will be template based.
With the analysis above to hand, this allows you to present your recommendations in a clear and logical manner, prioritizing based on the most efficient work/value ratio. For instance, if you can fix 1 broken link in the footer template which is used on every single page, you can literally fix thousands of broken links with a very small amount of work.
An example report might look like:
Broken Link in footer - HIGH PRIORITY
You have a broken link in the footer, which is contributing to 15,000 broken links across the site. Fixing this link in the footer template should resolve all of these broken links.
You can see a screenshot of the link in question below, and the full list of links in the worksheet 'Broken footer links.'
Broken Links in sidebar navigation - MEDIUM PRIORITY
You have a number of broken links in the sidebar navigation, which are contributing to 153 broken links across the site. There are only 4 unique 404 pages which are have incoming links, so if you can remove or fix the links to these 4 pages, it should resolve all these broken links.
You can see a screenshot of an example link below, and the full list of links in the worksheet 'Broken sidebar links.'
#
Broken Links in content - LOW PRIORITY
You have 27 broken links within the text content of individual pages. Since these links are not template/dynamically driven, you will need to fix these links individually.
You can see a screenshot of an example link below, and the full list of links in the worksheet 'Broken content links.' I have pre-sorted the pages based on URL Rank of the referring page, so this list is already in a rough priority order, with the more important pages first.