Tracking issue: intermittent test failures

It's the nature of I/O libraries like Trio that their test suites are prone to weird intermittent failures. But they're often hard to track down, and usually the way you encounter them is that you're trying to land some other unrelated feature and the CI randomly fails, so the temptation is to click "re-run build" and worry about it later.

This temptation must be resisted. If left unchecked, you eventually end up with tests that fail all the time for unknown reasons and no-one trusts them and it's this constant drag on development. **Flaky tests must be eradicated.**

But to make things extra fun, there's another problem: CI services genuinely are a bit flaky, so when you see a weird failure or lock-up in the tests then it's often unclear whether this is a bug in our code, or just some cloud provider having indigestion. And you don't want to waste hours trying to reproduce indigestion. Which means we need to compare notes across multiple failures. Which is tricky when I see one failure, and you see another, and neither of us realizes that we're seeing the same thing. Hence: this issue.

**What to do if you see a weird test failure that makes no sense:**

* Visit this bug; it's #200 so it's hopefully easy to remember.

* Check to see if anyone else has reported the same failure

* Either way, add a note recording what you saw. Make sure to link to the failed test log.

* Special notes for specific CI services:

  * If it's a failed **azure pipelines** run, then go to the build page, click the "..." menu icon in the upper right, and select **"Retain build"**. (Otherwise the logs will be deleted after 30 days or something like that.)
  * If it's a failed **travis-ci** run, DO NOT CLICK THE "RESTART BUILD" OR "RESTART JOB" BUTTON! That will wipe out the log and replace it with your new run, so we lose the information about what failed. Instead, **close and then re-open the PR**; this will tickle Travis into re-testing your commit, but in a way that gives the new build a new URL, so that the old log remains accessible.

# Issues we're monitoring currently

* flakiness in `test_open_tcp_listeners_backlog`: https://github.com/python-trio/trio/issues/200#issuecomment-451885495 (last seen: all the time, should be fixed by #1601)
* #1277 (last seen: October 2019)
* segfault in pypy 3.6 nightly after faulthandler timeout fired: https://github.com/python-trio/trio/issues/200#issuecomment-424674288 (last seen: late 2018)
* #851 (last seen: January 2019)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tracking issue: intermittent test failures #200

Issues we're monitoring currently

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tracking issue: intermittent test failures #200

Description

Issues we're monitoring currently

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions