Step 1: Look at the video
Open the failing run in the dashboard and watch the video. 80% of flakes have an obvious visual cause: a modal that wasn’t dismissed, a toast that covered a button, an animation that hadn’t settled.Step 2: Compare to a passing run
Open a passing run of the same test and diff the step timings. Big spikes usually point at:- Network timing: a request is faster/slower than usual
- Animation timing: a transition is now blocking interaction
- Ordering: a race condition between two state updates
Step 3: Check auto-heal history
If auto-heal fired in recent runs, something in the UI is drifting. Review the before/after locators to see what changed and whether the product change was intentional.Step 4: Watch the trace
Each failing run has a full trace: DOM, network, console. Common signals:- Console errors from the product around the failed step
- Pending network requests when the agent acted
- Unhandled promise rejections that broke the page after a step
Step 5: Isolate the step
Pull the problematic step into a minimal test and run it 20 times locally:Common fixes
- Add an explicit wait for a specific element or URL before acting
- Use an AI check to confirm intermediate state
- Reset state between tests via a
beforeAllmodule - Mock flaky upstream services: see Request mocking
- Split the test if it’s doing too much
When to quarantine
Quarantine if:- You know the fix but can’t ship it today
- The test is blocking a critical merge
- You’re investigating and don’t want to distract the team