Staged releases are a great way to ship large updates to customers with confidence. When we release new updates for our test management tool TestRail, we use staged releases to ensure that (rare) potential issues affect only a small number of customers before we notice. It’s important to note that staged releases aren’t a replacement for a software quality culture, rigorous testing, a robust test library and careful planning.
But even with all this in place, it’s impossible to test for all potential things that can go wrong. I don’t advocate relying solely on exception-driven development for code that’s critical to your customers of course. However, if you have a large number of customers using your product in many different ways and different environments, chances are good that (from time to time) you will not be able to identify bugs before customers get their hands on the new code. And when this happens, you want to make sure that you know as quickly as possible.
What Are Staged Releases?
With staged releases you don’t ship new code to all customers at once. Instead you release new versions to different groups of customers in stages so you can detect problems as early as possible before it affects all your users. Even if you ship an update to just a small number of customers, you will often catch new issues quickly that you didn’t find during testing. Once you are happy with the new code you can gradually make the new version available to more customers until all of them use the latest release.
But it’s not just about limiting the impact of updates to a small number of customers. You can also distribute an update to different groups of customers based on risk. For example, you might want to deploy new updates to free trial sign-ups first. So if there’s a problem with an update, it wouldn’t affect your paying customers who depend on your product to work flawlessly at all times.
Problems and bugs can also be much more expensive to fix for certain groups of customers. For example, if you are offering your product as both SaaS and on-premise editions, it’s much easier and significantly less expensive to fix issues for SaaS instances. As you have full access to databases and code for SaaS customers, you can in many cases fix issues on your servers even before users notice.
With on-premise installations the process is much more costly and time consuming as you need to release new code to customers, help them apply the fix and handle the related support requests. So releasing new updates to on-premise customers last is usually in both your and your customers’ interest.
How We Ship Large Updates
When we are about to release a new version of TestRail to customers, we review the potential impact the new code would have if something unexpected goes wrong. Are we just shipping new UI enhancements? Is this update mainly introducing new features? Or does the update include complicated database changes that could affect existing data? Each release and deployment is different. Depending on the changes we release, we might go from shipping the new code to all customers at once, up to carefully releasing it to trial sign ups first, followed by paid cloud instances and to on-premise customers last over multiple weeks.
One big advantage with TestRail’s architecture is that each customer has their own database. This makes it very easy for us to upgrade individual TestRail instances without affecting other customers. If you don’t have the need to share data between customers, using separate databases for each tenant can have many benefits such as easier sharding for scalability, improved security, more reliable updates and faster backups & crash recovery.
For new releases we don’t necessarily use all of the following stages and we might decide to just release updates to two or three customer groups separately. For large and complicated releases we might ship code to customers in all of the following stages:
New SaaS Trial Sign-Ups
We start by switching new free SaaS trial sign-ups to the new version. Every day new teams and users would automatically start using the new code as they sign-up for new trial accounts. While they try TestRail’s features, we carefully monitor logs for any issues.
Existing SaaS Trial Instances
At some point we also switch all remaining active trials to the new version. This increases the number of users on the new version significantly and also ensures that any (rare but possible) data migration issues don’t affect paying customers before we find them.
Early SaaS Adopters
We have a smaller group of long-term customers who are interested in getting the latest TestRail version as soon as new features are available. They are confident in our ability to release issue-free updates and are happy to provide feedback about new releases.
Paid SaaS Instances
We then upgrade all our remaining paid SaaS accounts to the latest code. It’s very unlikely that new issues surface at this point so most customers don’t notice the new version until they discover new features or see our in-app announcement.
Soft Launch to On-Premise
After all the careful testing we are now ready for the full announcement, right? Not quite yet! At this point we upload the new on-premise installation files and publish the release notes. We still wait a day or two so some early adopters can try the upgrade first.
Announce to All Customers
We then finally announce the new version to all customers by sending a newsletter to our list. At this point all SaaS users already use the latest code and all on-premise customers have access to the new downloads.
What About Agile? Continuous Deployments?
With all the above talk about week-long and large feature releases, you might wonder how continuous deployments or agile mythologies fit in? Excellent question! Even if you prefer to ship new versions more frequently, you can still greatly benefit from staged releases. In many cases we release smaller updates in a much faster and simpler way. It all comes down to the complexity of a specific update, the risk of the changes and how fast you want to get the new release to customers.
Especially if you control the entire environment of your application (i.e. you have no on-premise edition), you can fully automate the staged release cycle and integrate it with your error monitoring system. Even with a fast release cycle, deploying your code in different stages can greatly reduce the risk and potential impact of bugs.
We don’t currently deploy new TestRail versions daily, but we can deploy fixes in minutes if we have to. We found that having a mix of small, faster updates combined with larger feature updates every few months works best for us and our customers. Even with our rigorous testing process (we build products to help teams improve their software quality after all), our staged release system has helped us identify various smaller and larger issues over the years.
If you are currently deploying new code and database migrations to all your customers at once, investing into a staged release system can be a great way to improve your release process. Your customers and users would likely appreciate it!
PS: Have you found this article useful? We will publish more relevant testing, QA & ops related articles on topics such as building a great testing team, maintaining rock-solid dev and staging environments as well as leveling up your testing & ops skills. Make sure to subscribe below via email and follow-us on Twitter!