This is a guest post by Cameron Laird.
Continuous testing (CT) is often described in terms of unit tests. But performance tests deserve to be in CT, too. While performance testing has its challenges, the right understanding makes it possible not just to fit performance into CT, but to make the most of that position.
CT’s challenge in performance testing, and its resolution, demand a bit of explanation. Unit tests generally should be quick, independent of external resources, deterministic, objective, and well-styled, among other desirable qualities. A common challenge in the composition of unit tests is dependence on an off-host resource, such as a database.
Expertise in unit testing generally involves mocking, dependency inversion, or other techniques that separate tests from those external dependencies. When all done, individual unit tests run equally informatively on a developer’s desktop, in a CT pipeline, or anywhere else needed.
Performance tests often show a much different profile. Performance tests are most useful when they strain resource constraints, and when they resemble production circumstances. The same code that diagnoses a serious situation in a production environment might measure nothing at all in a developer’s integrated development environment (IDE).
Performance tests can be sensitive to details of memory layout or speed of retrieval from mass storage, for instance. Note the contrasts with unit tests, which are designed for the most part to be small, transparent and predictable.
Rhythms for testing
Despite these differences, CT can be adjusted to include performance testing as well as the tests that more conventionally appear in CT. A first adjustment is to introduce a different rhythm for performance testing. CT generally launches unit tests on each commit, and grades that commit as a success or failure based on results from the unit tests. Because performance tests take many times as long to run, this arrangement isn’t practical for them.
However, CT can still schedule performance tests to run on a predictable schedule, perhaps hourly or even daily. When a commit happens to introduce a failure in a performance test, at least that failure will be detected within the following hour or day, rather than during final quality assurance reviews. Early detection of programming failure — “left-shifted” testing — is widely believed a great aid in the quick remedy of such a failure.
Effective performance tests might require other adjustments to fit into CT. Perhaps they should only run off peak, when network loading, disk activity or service utilization is light enough to allow for useful measurements. Maybe they require a resource with restricted authorization.
It’s likely that a “passing grade” for performance will resemble “Measure $M is within 7% of baseline,” where unit tests typically assert, “The value of $V is exactly $R.” Some observers see these differences and think they’re reasons to keep performance testing away from CT. In fact, they’re all reasons to expand CT to include performance tests.
Only automation and associated CT tooling adequately help make performance testing as systematic and consequential as it deserves to be. Yes, existing tools are a little more practiced for unit tests, which conclude:
assert operation.result == expected_result
With a little practice, though, it becomes almost as natural to write:
assert within_range(elapsed_time, expected_time, agreed_tolerance)
Recognize that the alternative to the latter is some kind of manual, subjective test done probably near a quality assurance deadline. Even an imperfectly automated performance test scheduled consistently through CT is far, far superior to this.
A shift in attitude
The biggest conclusion to reach in bringing together performance testing and CT is that the right attitude makes apparent what a “win” it is. You might judge that performance testing is harder to integrate than unit testing, and decide to leave it out. When you shift your focus to the gain in quality from integration of performance testing into CT, though, it becomes much more appealing to follow through and complete that integration.
Try it for yourself, and let me know how the expansion of CT into performance testing works for you.
Cameron Laird is an award-winning software developer and author. Cameron participates in several industry support and standards organizations, including voting membership in the Python Software Foundation. A long-time resident of the Texas Gulf Coast, Cameron’s favorite applications are for farm automation.