Transparent Release Notes for every Sitebulb update. Critically acclaimed by some people on Twitter. May contain NUTS.
Released on 11th March 2024
Released on 12th February 2024
This update is mostly just bug fixes, so nothing too exciting, but ignore it at your peril: as is their wont, Google have shafted us recently, changing the goal posts of what are acceptable scopes for our Google Sheets API integration.
As always, we have jumped precisely as high as requested, and adjusted our API integration. This means that if you continue using any of Sitebulb's older versions (i.e. 7.3 or earlier) you will not be able to export to Google Sheets.
I look forward to reading all the support emails entitled 'Can't export to Sheets' over the next few weeks...
We have a new way of contacting support! Just press the aptly named 'Help & Support' button:
And then a little Intercom box will pop up in the bottom right of the screen, allowing you to access our docs from here or send us a lovely message (e.g. 'the tool is great' or 'I like your new haircut'). In the rare event that you think you've found a bug, you can also use this method to let us know about your supposed bug.
I'm in the process of migrating some of our documentation across to Intercom - which includes a pretty nifty search - so expect this interface to evolve over the coming weeks and months. Eventually we'll pass everything off to an AI bot so I can retire...
There are quite a few bugs we have fixed in this release, but lots of them relate to the same thing.
In essence, we had a whole load of shit and bollocks to deal with due to our new (v7.1) updates to the Chrome Crawler. Mostly they were to do with the default 'on' state for the options 'Flatten iframes' and 'Flatten shadow DOM.' Apparently some websites are simply too shit and were causing Sitebulb's Chrome Crawler a bit of a ball-ache when crawling. We have mitigated these issues to avoid most problems, and updated our messaging to help users self-diagnose issues and potential fixes (there's also an updated self-help doc, which no one will bother reading).
The rest of the bugs are less consequential, but still all worth your attention with equal priority:
Released on 5th December 2023
Released on 6th November 2023
The inevitable 'bug fix update to fix all the bugs we accidentally introduced in the last big update.' Sigh.
Raise your hand if you spotted any of these beauties. Well done, you win a Mars bar.
Released on 27th October 2023
There are many important and wildly consequential updates in this version that we hope you'll like, some of the key highlights include:
In this update we've included a whole bundle of new Hints. This is basically just Sitebulb moving with the times, we now exclusively wear <INSERT NAME OF FASHIONABLE CLOTHING> as we are nothing if not down with the kids.
The biggest change is a brand new Hints section for our Response vs Render report, which gives further emphasis to rendering issues, following customer-feedback calls with Aleyda 'she's so famous she only needs one name' Solis and renowned mayonnaise pioneer Arnout Hellemans.
Their feedback was basically:
"We don't want core SEO elements to be rendered by JavaScript - only rely on it for less consequential stuff - as this can cause pages to take a lot longer to get indexed, especially if the page content is changing all the time."
So now, if there's rendering issues you probably should pay attention to, we're highlighting them more proactively with 12 new hints, plus a Response vs Render score:
The new hints, in order of severity, are as follows:
And here's all the other new Hints, which are nothing to do with Render vs Response:
Transferred image size is over 100KB
We added this Hint as a Potential Issue:
Contains possible soft 404 phrases
In addition, we added a whole bunch of Insight Hints, which are also picked out via a new table in the Indexability report. These are additional 'indexing and serving rules' that go beyond the basic noindex/nofollow stuff, and allow you more granular control over what gets displayed in the SERPs.
For instance, folks might be interested in the nocache or noarchive directives, which allow you to exclude website content from being used for training Microsoft's generative AI models. You can find out more about these directives via our documentation.
In the same vein as the new Response vs Render Hints, we've also made it easier to see how on-page elements change after rendering - Sitebulb can now be configured to save both the response HTML and the rendered HTML so they can be compared in the UI.
For (hopefully?) obvious reasons, this feature only works when using the Chrome Crawler, and will appear as a new option on the audit setup screen:
NOTE IN CAPITALS FOR PEOPLE WHO DON'T READ - THIS WILL TAKE UP A LOT OF DISK SPACE. DON'T JUST SWITCH IT ON FOR EVERY AUDIT!!
Imagine the site you're auditing has about 200KB of HTML per page, you'll be saving this twice, which makes 400KB (that Maths degree is coming in handy again!). We're compressing the HTML, so it's ~15% of that, about 60KB. Do this on every page of a 10,000 page site and it'll take up 600MB disk space. On 100k site it's 6GB...
So we suggest only switching this on for certain specific audits, and for sampled audits on bigger websites.
Please note that the Hints will still be checked even if you don't save the HTML, so just use it when it makes sense.
You have been warned.
Once the audit is complete, Sitebulb will offer a new 'View Differences' button on any URL List in the Response vs Render report, including on any of the Hints.
Then this will show you everything that has changed from the response to the rendered version;
Look how easy it is to spot issues:
See all the different tabs? These allow you to dig into all the different sections. The DOM option shows you all the major DOM elements that have changed (i.e. all the really important on page SEO shit). Then all of Links/Images/Text/HTML show you side by side comparisons of what has been added/removed/modified by JavaScript.
Now, let's say you forgot to switch on to save the 'Response vs Render' HTML (or you were chastened by my warnings above and you're now too frightened to ever turn it on) but the Hints show you that LOTS is changing during rendering.
Fret not! You don't need to run the audit again. Just shimmy on over to the Single Page Analysis tool and pop the URL in there, you'll get a live-action version of the same thing.
Check out all the renders you can render:
We have for a number of years had a 'crawl diagnosis' feature which allows you to save the HTML and/or take screenshots of pages while auditing the website.
It was kinda hidden, because it was mostly designed to help you figure out why pages aren't rendering like you thought they should, which is a bit of niche venture.
But occasionally folks would find it and turn it on, and then they'd be like 'Yo, I turned on screenshots. Where's the folder with all my screenshots saved?'
And we'd be all, 'Erm it's just made for diagnosis, there's no save folder.'
Then they'd get really angry, and in most cases throw multiple punches*;
'I have to look at them one by one?! That is whack dude.'
Tired of suffering such relentless abuse, we have finally relented, and added a method to save crawl data in bulk, via a new audit option:
*this is entirely fictional
The options are threefold:
So let's say you chose the screenshots option (which is only available with the Chrome Crawler), then you'll be able to select the folder to which it will save on your local computer:
As it runs, Sitebulb will render off all the pages and save the screenshots into a hierarchical folder structure, like this:
You'll get new subfolders for each different website, and each different audit, in case (for some reason) you wish to keep historical records of all this data.
Two things to note:
Sometime last year, a UK agency called Merj did some original research on 'session isolation': Validating Session Isolation For Web Crawling To Provide Data Integrity - it's a fascinating read, if data integrity is your jam (otherwise perhaps not so much...).
TL;DR it is to do with how the rendering of one web page can affect the functionality or the content of another - so think how you see 'items you recently viewed' in the browser when viewing an ecommerce store. They made a solid argument for why this is something that should be taken seriously and why it is very important when crawling certain websites.
Merj carried out a number of tests, and it was sad to see that Sitebulb failed a lot of them. We weren't the only tool - in fact most of the crawlers they tested failed at least one of the tests. But we knew it was something we wanted to put right, so we have added a new 'Incognito' mode, which will pass all the Merj validation tests for session isolation.
This has been added to a new setting panel which appears when you select the Chrome Crawler, our Advanced Chrome Crawler Settings:
It's a simple tickbox to turn on incognito, and it is unticked by default for a very good reason - you do not always want to crawl with this on. In fact, on some sites, it will cause you massive headaches (e.g. on some sites that use session IDs, every Chrome instance would be given a fresh session ID and potentially use up an enormous amount of resources).
In fact, all of these new Chrome settings are not designed for typical use-cases. By default, Google will flatten the shadow DOM and flatten iframes, so by default Sitebulb does likewise - but these are now options you can switch off if you so wish.
The two at the top - 'Considered Loaded Event' and 'Navigation Timeout' are designed for folks who are exploring the rendering process on their website, so if you are unfamiliar with these concepts then just leave them with the recommended settings.
Now, for anyone that is familiar with these concepts will most likely realise that Sitebulb now offers Puppeteer-esque controls out of the box, so please feel free to have a play!
As always, we like to keep our exceptional Structured Data Validation up-to-date, so there's a bunch of new things added:
And by the way, don't forget we have our Structured Data Alerts system to keep you in-the-know when it comes to any new changes in the world of structured data.
Improvements to James Bond mode
Unobservant people tend to miss this feature, but Sitebulb has a dark mode which you can toggle in the top right hand corner:
Most people tend to call it 'James Bond mode', actually.
Well, in reality it is only called that by people who work for Sitebulb.
Oh FINE.
I'm the only one that says it.
Anyway, I'm not even bothered. The point is there were some colour choices that were so difficult to see they made you want to stab your own eyeballs repeatedly with one of those fountain pens that were popular back in the 90s (and/or toggle 'light mode' back on).
So James Bond mode is a bit nicer to use now.
Just fuck off if you don't want to say it.
I don't even care.
Small (but not notable) improvements
It appears that Charlie Whitworth has made a career based almost entirely on his nuisance value. He is responsible for finding these two bugs, so we thank you Charlie for your contribution:
There's also a load of bugs that Charlie didn't notice - and arguably he probably should have if we're being honest about it.
These are ones that were actually stopping users from crawling websites at all:
We fixed a couple of issues we found with duplicate content detection:
Some issues with content extraction:
Some 'compare audits' bugs squashed:
And then just some random singular bugs that can't be neatly grouped:
Released on 31st July 2023
Note that this update contains big infrastructure changes: BEFORE you upgrade to v7 please make sure to let any currently running audits finish (DO NOT pause -> upgrade -> resume)
We are really excited to announce the launch of our new offering, 'Sitebulb Cloud.'
It answers the question that has annoyed many of you for years: 'what would Sitebulb be like if it was in the cloud?'
We now have the answer! Sitebulb Cloud has all the awesome stuff you already love about the desktop version, now accessible via a web browser.
Check it out:
The video covers some important points that I will reiterate here:
Sitebulb Cloud is an evolution of the 'Server' product we have been shilling since the beginning of the year, but now has the added benefit of the browser login, making it much more accessible.
It's going to be super useful to a whole bunch of customers - it is especially good for teams working together. We already have a number of customers using it and seeing the benefits:
"The most impactful change that Sitebulb Cloud provided was the ability for everyone to work from the same crawl data"
Case Study from Moving Traffic Media
Prices start from £195/month, and we can go all the way up to massive custom plans for loads of users and millions of URLs.
BUT it won't be for everyone. Sitebulb Pro is still going to be the weapon of choice for most consultants and smaller agencies, and we are still committed to relentlessly improving the desktop product (see below!).
If your interest is piqued by Sitebulb Cloud though, please check it out or get in touch.
Every other week there's a new SEO industry study out about how frustrating it is when we make audit recommendations and they are then just ignored by clients or developers.
And we all know that we shouldn't write audit recommendations like this:
"The website has this issue <INSERT ISSUE NAME> on 4762 pages. Pls fix now - Dave"
Very reasonably, you might expect a developer to read this recommendation and think:
"Fuck. Right. Off."
Sitebulb's new feature is designed to help make that developer response a bit closer to "Ok I'll do it now!" (although this level of enthusiasm might be pushing it!).
It is designed to help make your recommendations clearer in the first place:
"The website has this issue <INSERT ISSUE NAME> on 4762 pages. The issue is only present on two of the page templates - Blog Posts (4102 URLs) and Subcategory Pages (760 URLs). If you can resolve the underlying issue on those two templates, it will eradicate the problem across the site. I would be forever in your debt if you could prioritize this fix for me. I have the honour to be your obedient servant, David."
Ok ok. Sitebulb won't turn you into a fawning sycophant, but it will help you deliver more meaningful recommendations by automatically identifying the HTML template the is being used for each URL.
What do you mean by HTML template?
I'm glad you asked. In this modern world of hipster coffee and sourdough bakeries, most websites are built using a content management system (CMS), which allows users to create content without necessarily having lots of technical expertise about how the underlying pages are built.
Typically, developers will build out a range of page templates for the website, which can be selected by users of the CMS. So, for example, a content writer might select the 'Blog Post' template when writing a new post. Or the ecommerce team might select the 'Product Page' template when adding a new product to the store-front.
Essentially, all pages using the same template will inherit the same underlying HTML - which will typically mean that if there is a problem on one page of a template, it will also be present on all the other pages using the same template.
As such, it is very valuable when auditing a website to be able to;
And this is what Sitebulb's HTML Template feature allows you to do.
Here's our second video of the day:
You don't need to do anything in order to 'switch on' HTML Templates, as Sitebulb will automatically do it.
And then when the audit is finished, HTML Template information will be available once the audit has completed, via the 'HTML Templates' report option on the left hand menu:
This will show you all the URLs on the website, split into the different page template groupings.
You can then go ahead and name all the templates, like in the video. For more comprehensive instructions on how to use this feature, please check out our guide here.
This is the sort of thing you can expect to see, once you have finished naming all the templates:
See how it works? Instead of an unorganized list of URLs, you can work with URL data in a way that reflects how the website has actually been built.
Without any further analysis, this list of templates is already useful data, as it helps you understand the underlying structure of the website you are dealing with, and the potential scope of actually getting issues fixed.
To dig into each template further, you can navigate onto the template report by selecting the template from the dropdown navigation, clicking the name or clicking the 'View' button;
Each of these will take you to the same place, the report page for this specific template:
From here you can explore the data as you would in the rest of Sitebulb tool, with the added context that everything you see on this page relates only to the page template you have selected. You can click through into the URL Lists or tables, or view the triggered hints for this template.
For example, clicking through the pie chart on the left to view indexable URLs will show you the indexable URLs that use this page template. You will notice that the HTML Template is shown as a column in the URL List:
The Hints data alongside each template will also help you easily spot when a particular issue is only present on certain templates. In the example below, we can see that there is a critical issue which affects only the 'Blog Post' template.
If we click through to view this template, we can see the Hints values for the template, and view the triggered Hints for this template by clicking into the Hints tab;
This shows us all the triggered Hints for this specific template, ordered by importance, and therefore showing the Critical one at the top. To dig in further, you can click through to View URLs:
Once you have named all your templates, this data will enrich your reports in other areas of Sitebulb as well.
When you view URLs, you can identify which page templates an issue was present on, which aids your understanding of the issue, and can make your communication with your client/developers a lot clearer.
You will also find the HTML Template in areas such as the URL Explorer, and configurable as a column you can add for any URL List it is not already present on.
You'll also notice a 'HTML Templates' tab that becomes available in the different reports. This will show you a top-down matrix of common issues, and on which page templates they occur.
The HTML Templates tab only appears on certain reports - those that make sense in terms of the issues covered - and in some places you will find them more valuable than others.
Within Performance, for example, it is hugely beneficial to see the breakdown of different Web Vitals metrics on each template, particularly since the work involved in improving performance is so heavily template-based.
You can dig in further by exporting to CSV or Google Sheets, or by clicking through to View:
Accessing and understanding your audit data through the lens of HTML templates should make it easier for you as the SEO to diagnose issues, and importantly should make your client/developer communication a lot clearer. If this topic is dear to your heart, check out the recent webinar that I was a guest on recently, along with Areej: O/SEO/O E16: Opinions About Leveling Up Tech Audits.
Final aside: you will occasionally come across some websites where the HTML Template data is NOT all that useful. These sites are typically doing something that injects dynamic elements into each page, which makes them look different when Sitebulb parses them - on these sort of sites you'll have tons of templates with like 1 URL in each one. Not helpful at all!
If you've been a Sitebulb user for a while, you may have noticed that crawl data can be pretty chunky. Like hundreds of GB worth of chunky.
Over time, this would cause your computer to swell uncontrollably, like your body's response to your lack of self-restraint every Christmas Day.
Sitebulb audits that you run on v7 onwards will now be impressively svelte in comparison - we're talking before-and-after pictures that are 20% of their former size!
Please consider this upgrade like the 'New Year New You' programme you follow every January - where for two weeks you treat your body like a temple - this does not turn back time and erase the months of sheer gluttony that proceeded it. Similarly, if you have old Sitebulb database files, they will retain their unwieldy stature.
Your new Sitebulb audits on the other hand have just opened an Instagram account, and in 3-6 months you can expect to see Facebook ads for their new fitness program.
The launch of Sitebulb Cloud has forced us to reassess some of the data decisions we have made in the past, and make provisions for crawling bigger websites.
The area where this was most apparent was with internal links, as it has the biggest potential to blow up.
Take a site like sitebulb.com – we have about 500 HTML pages, and about 40,000 internal links – so on average, each page has about 80 links point at it. At most, maybe a thousand.
When looking at a site like this, it is very reasonable to want to look at every single link, and the scale of the data means this is feasible to do – either within Sitebulb's user interface (the 'Link Explorer') or as an export in Excel/Sheets.
Considering the average website is about 10,000 pages, with a similar link ratio that would be 800,000 – 1 million links. You can still, just about, wrangle this in spreadsheet format (although you are really knocking on the ceiling of Excel's limits), and Sitebulb's Link Explorer can handle it no problem.
But with Sitebulb Cloud, we're dealing with sites that have millions of URLs, and oftentimes, hundreds of millions of internal links. How do you look at THAT in a spreadsheet?
The answer is, you don't. Once you start to reach a certain scale, pairwise analysis of links just starts to break down. Spreadsheets aren't built to deal with 220 million rows of link data, and neither is your brain.
The granularity simply becomes redundant. Say you crawl a 5 million URL site, and every page links to the homepage in the header. This then corresponds to 5 million rows of data in your link table. I mean it is literally the same thing, 5 million times in a row. That is not useful data.
The solution is to stop looking at every link as a pairwise relation, and instead group like links together.
Those 5 million links above all link to the same page, from the same location in the HTML, using the same anchor text. It is 1 unique link, that occurs 5 million times.
Unique links are actually not a new notion in Sitebulb, we have had them for years, but we've never given them the focus they now have or set any rules in place for when they should become the default.
So this is what's changed. If we find over 2.5 million links on the website, Sitebulb will stop showing 'All links' as options you can click on, and instead default to showing unique links.
Clicking through to any of those Unique Links values will drop you on the list of unique links in the Link Explorer;
As you can see, scrolling across shows you the important data points you need to make decisions about these links:
If you want/need to explore all links pointing at a particular page, you can use the 'All Links to Target' button.
In the same vein as the above, we have added some other options to make it quicker and easier to do very large audits. By default, Sitebulb's processing stage will tackle some jobs that may not really be necessary for enormous sites, and can add hours to the processing time;
During the report building phase, Sitebulb will pre-generate lots of the CSV export files, so they are instantly available when the audit is complete, including all the hint exports. On sites with millions of URL, this can take a few hours. Considering you might not actually use a lot of this data, this is time you could trim off your audit by clicking this button in the 'Data Exports' advanced settings:
URL Rank is our internal link popularity metric, based on the number of incoming internal links, relative to other pages on the same site. It is a useful metric for determining how powerful or important internal pages are. However, to calculate it we run an iterative formula, not dissimilar to PageRank, which can take a long time when you are dealing with millions of URLs.
It will always be on by default (as long as 'Link Analysis' is checked), but you can disable it in the Advanced Settings for Search Engine Optimization:
You may wish to do this if you are crawling a very big site but you are not interested in looking at page importance data.
We are no longer publicly selling the 'DIY' server license, however existing customers can grab the latest installers from the Server Release Notes page here.
Bugs
Access the archives of Sitebulb's Release Notes, to explore the development of this precocious young upstart:
Find, fix and communicate technical issues with easy visuals, in-depth insights, & prioritized recommendations across 300+ SEO issues.
Get all the capability of Sitebulb Desktop, accessible via your web browser. Crawl at scale without project, crawl credit, or machine limits.