Skip to content

Tracking issue: intermittent test failures #200

Open
@njsmith

Description

@njsmith

It's the nature of I/O libraries like Trio that their test suites are prone to weird intermittent failures. But they're often hard to track down, and usually the way you encounter them is that you're trying to land some other unrelated feature and the CI randomly fails, so the temptation is to click "re-run build" and worry about it later.

This temptation must be resisted. If left unchecked, you eventually end up with tests that fail all the time for unknown reasons and no-one trusts them and it's this constant drag on development. Flaky tests must be eradicated.

But to make things extra fun, there's another problem: CI services genuinely are a bit flaky, so when you see a weird failure or lock-up in the tests then it's often unclear whether this is a bug in our code, or just some cloud provider having indigestion. And you don't want to waste hours trying to reproduce indigestion. Which means we need to compare notes across multiple failures. Which is tricky when I see one failure, and you see another, and neither of us realizes that we're seeing the same thing. Hence: this issue.

What to do if you see a weird test failure that makes no sense:

  • Visit this bug; it's Tracking issue: intermittent test failures #200 so it's hopefully easy to remember.

  • Check to see if anyone else has reported the same failure

  • Either way, add a note recording what you saw. Make sure to link to the failed test log.

  • Special notes for specific CI services:

    • If it's a failed azure pipelines run, then go to the build page, click the "..." menu icon in the upper right, and select "Retain build". (Otherwise the logs will be deleted after 30 days or something like that.)
    • If it's a failed travis-ci run, DO NOT CLICK THE "RESTART BUILD" OR "RESTART JOB" BUTTON! That will wipe out the log and replace it with your new run, so we lose the information about what failed. Instead, close and then re-open the PR; this will tickle Travis into re-testing your commit, but in a way that gives the new build a new URL, so that the old log remains accessible.

Issues we're monitoring currently

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions