How To

How to avoid and handle test flakyness?

28 Oct, 2025 8 min.

Automated testing is a cornerstone of modern software development, enabling teams to catch bugs early and ensure code reliability. However, one of the most frustrating challenges in test automation is dealing with "flaky tests"—those unpredictable tests that pass sometimes and fail others without any apparent changes to the code, environment, or test itself. Flaky tests put down the confidence in your test suite, waste time debugging false positives, and can delay releases.

In this improved and expanded guide, we'll dive deeper into the common causes of test flakiness, drawing from industry best practices and insights from sources like Datadog, Autify, and BrowserStack. We'll not only cover the basics but also add advanced tips, additional causes (such as concurrency and network issues), detection strategies, and preventive measures. By the end, you'll have actionable steps to build a more robust, deterministic test suite.

So, in this article we will check on the most common causes and the most possible solutions of how to avoid and handle test flakyness.

What Makes a Test Flaky?

A flaky test is one that produces inconsistent results under identical conditions. According to Datadog, flaky tests often stem from non-deterministic behavior, where external factors like timing, order, or environment introduce variability. Common symptoms include:

Intermittent failures (e.g., passes locally but fails in CI/CD).
Failures that resolve on retry without code changes.
Errors like timeouts, element not found, or assertion failures that don't reproduce reliably.

Flakiness can affect any type of test—unit, integration, or end-to-end (E2E)—but it's especially prevalent in UI automation tools like Selenium, Playwright, or Cypress due to their reliance on dynamic web elements.

Now, let's explore the most common causes and how to address them.

1. Poorly Written Locators

Bad locators are a top culprit in UI test flakiness. If your test relies on fragile selectors that change with minor UI updates, it can fail unexpectedly. For instance, copying a locator directly from browser dev tools might include dynamic classes or indices that aren't stable.

Solutions:

Prioritize stable strategies: Use IDs or data attributes (e.g., data-test-id) over CSS classes or text content. Ask developers to add custom attributes for testing during feature development—this makes locators unique and resilient to changes.
Leverage XPath or CSS wisely: Build custom XPath expressions using functions like contains(), starts-with(), or relative paths (e.g., //button[contains(@class, 'submit') and text()='Login']). Avoid absolute paths that break with DOM restructuring. Tools like Chrome DevTools or SelectorGadget can help validate them.
Validate locators: Run tests in different browsers/environments to ensure cross-compatibility. Use libraries like Selenium's By class effectively, preferring By.id or By.name when possible.
Additional tip: Implement locator strategies with fallback options or use AI-powered tools like Applitools or Testim that auto-heal selectors.

From BrowserStack insights, poor locators account for up to 30% of flakiness in web tests—investing here pays off quickly.

2. Environment Instability

An unstable test environment—whether due to shared resources, inconsistent setups, or external dependencies—can make even well-written tests flaky. For example, tests might fail on a CI server due to slower network speeds or resource contention.

Solutions:

Dedicated test environments: Advocate for a staging or QA environment isolated from production. Present data to management showing how flakiness increases cycle times (e.g., "Flaky tests add 20-30% to debug time per sprint").
Local setup for development: Run the application and tests locally using Docker containers to mimic production closely. Tools like Docker Compose ensure consistency across machines.
CI/CD best practices: Use parallel execution in pipelines (e.g., GitHub Actions or Jenkins) but configure resource limits to avoid overload. Monitor environment health with tools like Datadog or New Relic.
Added info: Handle infrastructure flakiness: For cloud-based tests, use services like Sauce Labs or BrowserStack for consistent browser VMs. If dealing with mobile testing, ensure emulators/simulators are reset between runs to avoid state persistence.

Autify notes that environment issues cause about 25% of flakiness; stabilizing this foundation is key.

3. Improper Timeouts and Waits

Timeouts are perhaps the most notorious cause of flakiness. Using implicit waits (global timeouts for all elements) can lead to failures if page loads vary slightly, while missing waits for async operations (e.g., API calls or animations) causes "element not interactable" errors.

Solutions:

Switch to explicit waits: Use conditions like elementToBeClickable or visibilityOfElementLocated in Selenium/WebDriver. For example:

WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("submit-button")));
This waits only for specific elements, making tests more efficient.

Avoid hard-coded sleeps: Replace Thread.sleep(5000) with dynamic waits to handle variability without unnecessary delays.
Tune timeouts based on data: Analyze test run logs to set realistic defaults (e.g., 10-30 seconds for E2E tests). In CI, increase timeouts slightly for slower environments.
Additional tip: Handle async flakiness: For JavaScript-heavy apps, use frameworks like Cypress that auto-wait for DOM stability. Tools like Playwright have built-in auto-waiting, reducing manual intervention.

CircleCI recommends balancing waits to avoid both flakiness and excessive run times—aim for tests under 5 minutes each.

4. Missing or Inconsistent Test Data

Tests that rely on shared or environment-specific data often flake because data might be modified, deleted, or unavailable. For instance, a test assuming a user exists might fail if another test deletes it.

Solutions:

API-driven data setup: Use REST APIs to create fresh data before each test (e.g., via Postman or libraries like RestAssured). This is fast and reliable.
Database seeding: If APIs aren't available, directly insert data into the DB using tools like Flyway or custom scripts. Ensure transactions are rolled back post-test.
UI fallback with isolation: As a last resort, set up data via UI, but use unique identifiers (e.g., timestamps in usernames) to avoid collisions.
Teardown properly: Always clean up after tests to prevent data pollution. Use annotations like @AfterEach in JUnit.
Added info: Data mocking: For external dependencies, mock services with tools like WireMock or Mockoon to simulate consistent responses, reducing reliance on real data.

Reddit discussions highlight that data issues are common in parallel testing—always design for isolation.

5. Test Dependencies and Order Issues

Tests that depend on each other (e.g., one creates data for another) or assume a specific run order create flakiness, especially in parallel execution.

Solutions:

Isolate tests: Ensure each test has its own setup/teardown. Use fixtures in pytest or @BeforeEach in JUnit to reset state.
Randomize order: Configure your test runner (e.g., Maven Surefire) to shuffle test order during runs—this exposes hidden dependencies early.
Parallel-safe design: Avoid global variables or shared sessions. In UI tests, use incognito modes or new browser instances per test.
Additional tip: Detect dependencies with tools like TestNG's dependency annotations, but use sparingly—aim for 100% independence.

Netdata points out that order assumptions cause 15-20% of flakiness; breaking them builds resilience.

Additional Causes and Advanced Prevention

Beyond the basics, here are more causes from industry surveys (e.g., a ScienceDirect study of 200 flaky tests identifying 11 root causes):

Concurrency and parallelism: Multi-threaded tests can race for resources. Fix: Use synchronization (e.g., locks) or run sequentially if needed. Tools like Trunk.io suggest atomic operations.
Network dependencies: Flaky APIs or slow connections. Fix: Mock networks or add retries (e.g., Polly library for .NET). Monitor with tools like Charles Proxy.
Asynchronous code: Unhandled promises or events. Fix: Use async/await properly and add event listeners.
Resource leaks: Open files/connections not closed. Fix: Implement try-finally blocks or use resource managers.
UI animations/transitions: Elements appear/disappear unpredictably. Fix: Disable animations in test mode via CSS injections.

Detection Strategies:

Rerun failed tests: Configure CI to retry flakes (e.g., 3 times) and flag persistent failures.
Flakiness dashboards: Use tools like Flaky Test Handler in Jenkins or Google's Test Infrastructure to track failure rates over runs.
Run multiple times: Execute suites 10-50 times locally to identify intermittents—aim for <1% flakiness.

Preventive Best Practices:

Quarantine flakes: Isolate and fix them immediately to prevent spread.
Code reviews for tests: Treat test code with the same rigor as production code.
Shift-left testing: Involve QA early in development to design testable features.
Framework choices: Opt for modern tools like Playwright (auto-waits) or Appium for mobile, which reduce built-in flakiness.
Metrics to track: Monitor pass rates, run times, and flake percentages. Aim for 99% reliability.

Conclusion

Test flakiness isn't inevitable—it's often a symptom of underlying issues in design, environment, or practices. By addressing the causes outlined here and incorporating advanced techniques, you can transform your test suite into a reliable safety net. Start small: audit your locators and waits today, then tackle data and dependencies. Remember, consistent tests build team trust and accelerate delivery.

This site stores cookies to collect information and improve your browsing experience.

Check out our Privacy Policy.