How to fix flaky tests: A practical approach for QA teams

How to fix flaky tests: A practical approach for QA teams

Flaky tests are some of the most problematic issues that testers face in their automated test suites.

It’s not because they’re failing tests. It’s because they sometimes pass and sometimes fail. Fixing flaky tests is probably one of the most tedious tasks in automated test suite maintenance because the root cause of flakiness is usually difficult to find and time-consuming to diagnose.

Learn about flaky tests and their usual causes, and explore practical approaches to detection, remediation and prevention.

What is a flaky test?

Flaky tests, also referred to as nondeterministic tests, are automated tests that pass or fail intermittently and seemingly without cause. This happens even when they run against the same configuration without changes to the codebase, the test data or the environment. These faulty assessments, which typically appear in integration, API and GUI-level tests, can diminish confidence in the entire automation suite.

While flaky tests are not necessarily indicative of a defect, they are problematic and difficult to diagnose. Because flaky test detection and remediation can be tedious, teams might ignore them altogether rather than sink time into investigating and determining whether the bug is real. Additionally, not all nondeterministic tests flake equally; certain flaky tests should take precedence over others.

What are the characteristics of flaky tests?

The main characteristic of a flaky test is the intermittent nature of the test failures. In addition, failing tests often occur at inconsistent frequencies. For example, a test might fail once and then pass consistently. It might then take a significant number of retries to generate another failure. Other characteristics of flaky tests might not reveal themselves until the failures are diagnosed and causes such as race conditions, concurrency issues, infrastructure issues, asynchronous calls or caching issues are found.

What causes flaky tests?

Testers can struggle to find the root cause of a flaky test in a timely manner. Although the root cause of flaky tests can be randomness, these nondeterministic test failures can stem from some common causes. To begin an assessment, it is worthwhile to examine the following:

  • Test framework design. Any of the following test framework design attributes could cause a flaky test and are worth examining:
    • Asynchronous calls. These calls usually load dynamic data but can cause issues when sleep functions are involved. Sleep functions increase the running time of the test and cause timeouts. Use callbacks or polling to mitigate some of these issues.
    • Race conditions and leaked states. Race conditions happen when different steps in the code try to execute at the same time and processes overlap. Leaked states are similar; they happen when actions in the code disrupt preconditions for a different step. Because these issues manifest themselves intermittently, they can lead to flaky tests. Detecting these issues can be difficult, so it is best to prevent them in the design and development process, if possible.
    • Stale data from caching, setup and cleanup. This happens when the test environment does not return to its original state after a run. Because refactoring test automation code after finding these issues can be complicated, implement best practices upfront.
    • Time-based scenarios. Tests that require a current time or gather events throughout the day can become flaky when they run in different time zones, so consider time scenarios when designing them.
  • Infrastructure issues. These include node failures, unreliable network connections, database failures and bugs in the automation framework.
  • Testing with third-party systems. Although testing with third-party systems is often an important part of the end-to-end testing strategy, it can cause flaky tests. This is because third-party environments and other dependencies are not completely under the user’s control. If possible, stub the integrating systems to ensure the initial tests run deterministically before running them against a third-party environment.

Which flaky tests should teams prioritize?

All flaky tests take time to investigate and fix. Because some flaky tests wreak more havoc than others, QA teams should prioritize which ones they work on. Here are several criteria to help prioritize flaky tests.

Although we might be tempted to ignore them as our own mistakes, it is critical to determine the root cause of flaky tests and remediate.

  • Level of business risk. Determine how much business risk a flaky test poses. Focus on tests that validate critical business workflows. If a problematic test validates a feature that customers seldom use, then fixing it should be a low priority. During test maintenance, consider simply removing that flaky test. Teams might instead replace it with a new one.
  • Test timing. Test quality is most important for business-critical features. Prioritize the high-value flaky tests that take place at key points in the release cycle. For example, prioritize nondeterministic tests in a continuous testing suite that disrupt the CD pipeline, because they can affect release velocity.
  • Amount of effort required. Flaky tests are notoriously difficult to diagnose and fix. Many factors could contribute to the root cause of a test’s flakiness. Remediation might be too much work. Instead, remove the test and build a new one.

How to fix flaky tests

Once teams have determined they should pursue the flaky test based on business value, timing and workload, they should aim to fix it. Below is a set of practical steps to take for flaky test remediation.

  1. Identify flaky tests. Before attempting to fix flaky tests, it is important to verify that the tests are truly flaky. Run tests multiple times under varying conditions. Static analysis, continuous integration and monitoring tools can facilitate this process.
  2. Isolate and quarantine. The second step in fixing flaky tests, once identified, is to isolate them from deterministic tests and quarantine them in a separate test suite. This is critical for several reasons. First, just one flaky test has the potential to contaminate the entire test suite, especially if the tests are not autonomous. Flaky tests cause bottlenecks in the continuous integration pipeline and removing them eliminates those bottlenecks. More importantly, this helps to ensure that when a test fails, the team will investigate the result as a potential defect. Often, when tests are known to be flaky, the results are discounted and possibly ignored. Once quarantined, it is important not to ignore flaky tests because eliminating them causes a gap in regression coverage.
  3. Prioritize root cause analysis and remediation. Because investigating and fixing flaky tests is time-consuming, it is important to prioritize both analysis and remediation efforts. Consider the business value of the test. Determine if the test is important to validating critical business workflows. Examine how not finding a bug in these workflows might affect the customer experience or application performance. Understanding the business value of each flaky test helps to focus analysis efforts.
  4. Find the root cause of flakiness and remediate. Finding the root cause of the flakiness can be extremely challenging. Start by eliminating the most obvious external causes; begin by rerunning the test with a clean environment and system state. Then try stubbing third-party applications. Once external issues are eliminated, examine the automation script for issues in concurrency, time issues and asynchrony, then update as needed. To accelerate the analysis process, consider using a tool such as Deflaker or pytest.

How to prevent flaky tests

Because flaky tests can be so difficult to remediate, the best strategy is to avoid them wherever possible. Following software testing best practices is the best way to prevent flaky tests. Teams should adhere to best practices beginning with test planning and test design and continue throughout the entire test process. Adhere to these best practices in the following contexts.

Use the software testing pyramid to reduce redundant or extraneous test cases and limit flaky tests.
  • Test case design. To design effective tests, pay close attention to the components that make up test cases. Ensure manual tests pass and fail correctly. Improving test case quality will make non-deterministic tests less of a problem.
  • Test automation. Design an effective test automation suite by following the test automation pyramid. Additionally, learn when it’s appropriate to automate test cases. Implement test optimization for test cases and test suites. Aim to have the greatest test coverage with the fewest number of test cases. The fewer the number of test cases, the less likelihood of flaky tests. Additionally, a more efficient test suite makes flaky test remediation easier.
  • Test approach. Be aware of how testable a piece of software is and always look for easier ways to test it. In an end-to-end test strategy, ensure that the individual applications are thoroughly tested before attempting the system tests.
  • Test data and test environment. Test data should be designed so that it is in the correct states at the correct times for each test run. Ensure that all data returns to its initial state after the test run. Ideally, the test team should have dedicated test environments. If this is not possible, teams should develop a test schedule that includes time to restore the environment after each usage.
  • Application monitoring and production testing. Especially in organizations that use continuous deployment, it is critical to ensure that any tests for monitoring applications are free of flaky tests. Teams should immediately isolate and remediate any flaky tests that appear in production.

Flaky tests are incredibly frustrating for software testers. Although we might be tempted to ignore them as our own mistakes, it is critical to determine the root cause and remediate so that the automation suite isn’t compromised and remains effective. Prevention begins in the first phases of the SDLC and continues into production. Using software development and testing best practices while developing automated scripts will help to reduce the number of flaky tests.

Gerie Owen is a QA engineering manager at Roobrik. She is a conference presenter and author on technology and testing topics, and a certified Scrum master.

By admin

Leave a Reply