|
| 1 | ++++ |
| 2 | +title = "Alive: Provably Correct Peephole Optimizations" |
| 3 | +[extra] |
| 4 | +[[extra.authors]] |
| 5 | +name = "Annabel Baniak" |
| 6 | +[[extra.authors]] |
| 7 | +name = "Katherine Wu" |
| 8 | +[[extra.authors]] |
| 9 | +name = "Max Fan" |
| 10 | +[[extra.authors]] |
| 11 | +name = "Stephanie Ma" |
| 12 | ++++ |
| 13 | + |
| 14 | +## Alive |
| 15 | +[“Provably Correct Peephole Optimizations with Alive”](https://dl.acm.org/doi/pdf/10.1145/2737924.2737965), by Lopes et al, introduces Alive, a verification framework and [Domain Specific Language](https://www.geeksforgeeks.org/domain-specific-languages-in-scala/) for expressing [peephole optimizations](https://www.geeksforgeeks.org/peephole-optimization-in-compiler-design/). As input, Alive takes in a DSL that expresses a certain class of peephole optimizations. As output, Alive either verifies peephole optimizations and provides LLVM code to perform the optimization, or provides a counterexample that witnesses a miscompile. Alive was intended for an audience of LLVM developers, so the syntax of the DSL is designed to resemble LLVM as much as possible to optimize familiarity and thus usability. |
| 16 | + |
| 17 | +One substantial challenge to model and verify optimizations that LLVM programmers have to consider is the presence of undefined behavior; the essential aim of Alive is to codify and encode this undefined behavior to check that optimizations don’t violate correctness even working within the framework of undefined behavior. To do this, the paper runs through the typing rules of Alive, and the undefined behavioral features of the tool that non-LLVM programmers may be unfamiliar with. LLVM contains support for both undefined values themselves, which represent undefined behavior and thus let the compiler pick a convenient value for each appearance of *undef*, to allow efficient optimizations, as well as poison values, which is similar to an undefined value, but different because the poison value is only triggers when it is used by an instruction with side effects, and taint all subsequent operations. |
| 18 | + |
| 19 | +These values representing undefined behavior allow the developers of Alive to codify correctness; when correctness is verified, it means that, for some given source expression, and the target expression that exists post-optimization, that the target expression is defined whenever the source is defined, the target is poison free when the source is poison free, and, if the source is defined and poison free, then the source and the target agree on the result. |
| 20 | + |
| 21 | +In order to do this verification procedure, [SMT solvers](https://ocamlpro.github.io/verification_for_dummies/smt/) are employed to see if each given (source, target) pair meets the correctness conditions outlined using the undefined and poison values. If an optimization fails the correctness check, then a counterexample is generated automatically by Alive as well, typically a four or eight bitwidth counterexample as those are the easiest to understand for human developers. Once any optimization transformation has been proven correct, Alive can also automatically generate C++ code which can be used as an LLVM optimization pass as well. |
| 22 | + |
| 23 | +The authors display the real-world usefulness of their tool by testing it on the LLVM InstCombine transformations, in which they found multiple bugs. The buggiest file in their testing suite on its own made up *six of fourty-four*, or 14%, of the total errors found by Alive. They also tested their tool on proposed LLVM patches, successfully proving correctness of LLVM patches before the PR was accepted, or locating bugs in the code which could be fixed before the patch was accepted. |
| 24 | + |
| 25 | +## Follow-up work regarding Alive: |
| 26 | +There have been several variations of Alive following this paper, including floating-point support by [Alive-FP](https://people.cs.rutgers.edu/~santosh.nagarakatte/papers/alive-fp-sas16.pdf), and precondition inference for peephole optimizations expressed in Alive by [Alive-Infer](https://people.cs.rutgers.edu/~santosh.nagarakatte/papers/pldi2017-alive-infer.pdf). |
| 27 | + |
| 28 | +Six years after the release of this paper, in 2021, Alive was updated to the tool [Alive2](https://dl.acm.org/doi/pdf/10.1145/3453483.3454030), which expands Alive to include loop handling (by partially unrolling loops) and goes beyond simple local optimizations to handle function calls as well. It also uses translation validation which takes two IR files and checks whether the translation from one to the other is correct, so that the optimization can be tried on a test case before it is implemented. Alive2 also covers floats and aggregate types, which were missing from the implementation of the original Alive. |
| 29 | + |
| 30 | +This new version of Alive, Alive2, has essentially replaced Alive’s usefulness in the LLVM development sphere, but it is still in use today and the contributions of Alive were essential in facilitating its creation. |
| 31 | + |
| 32 | +## Criticisms/Concerns: |
| 33 | +The authors do a lot to justify the usefulness of their tool; the anecdote about finding bugs and then eventually verifying correctness in a real developer’s actual LLVM patch is particularly effective. Reading the article, one can’t help but feel bad for that poor embarrassed developer for having bugs caught in the first two editions of their PR, but it is a convincing argument to use the tool; certainly from a software development perspective, you’d rather use Alive in-house and catch those bugs yourself, rather than send your patch out into the world and have someone else embarrass you by catching the bugs in your code! |
| 34 | + |
| 35 | +We also think it makes a lot of sense to design Alive with LLVM in mind, to find an already thriving developer community and create a tool specifically for them, rather than create something radically new and different that people are then resistant to actually use in real life. Tools with a concrete audience in mind are usually better loved and more impactful overall than even the most groundbreaking tool that people resist using. One can argue that since Alive can be rather difficult to understand for people without an LLVM background, there is some loss in impact with broader audiences for the language, but this is not really a shortcoming. |
| 36 | + |
| 37 | +However, there were a lot of concerns raised in the online discussion about both the scope and correctness of the DSL. Some argued that the scope of Alive, covering only simple peephole optimizations, was not sufficient to provide confidence in correctness. As such, there was a lot of interest in trying to expand a verification tool like Alive to more complex optimizations. However, through our discussion in class, we came to the conclusion that this kind of expansion was impossible or at least prohibitively difficult to realistically achieve. And even from the start, there were some dissenting voices saying that focusing too much on changing Alive to tackle more complex optimizations was not relevant, or would have diminishing returns; that the evidence the paper presents about catching bugs in real-life LLVM programs using only Alive’s peephole optimization focused process proves how useful even local optimization verification can be. |
| 38 | + |
| 39 | +There were also some concerns about correctness in Alive’s verification procedures, since they rely on the use of SMT solvers, which have [known bugs in them](https://testsmt.github.io/). However, again in in-class discussion, we couldn’t come up with any better solution; to paraphrase Prof. Sampson in our in-class discussion, “It seems to me that SMT solvers are the worst solution, besides all the other solutions.” |
| 40 | + |
| 41 | +Alive is designed to work closely alongside LLVM, but they do have a note acknowledging that, while the tool follows the semantics described in the documentation for LLVM, there could be human misunderstandings which create disparities in the actual semantics of LLVM versus the semantics written in the documentation. While it is obviously not Alive’s fault if LLVM, it does introduce another degree of uncertainty about the correctness of Alive. |
| 42 | + |
| 43 | +Overall, while there were some concerns raised about Alive in the discussion posts, we concluded in class that these concerns were either impossible to solve or not that big of a deal; Alive is one tool in the compiler developer’s toolbox, useful for debugging with its counterexample generation, and useful in that it provides guarantee of some correctness. Even if it’s not the magic bullet to automatically solve the problem of compiler correctness all in one go, it’s still a useful tool and has been in use in the LLVM community since its release. |
| 44 | + |
| 45 | + |
| 46 | +## Discussion Questions, derived from our in-class discussion: |
| 47 | +* Many compiler passes are not as simple as peephole optimizations, spanning basic blocks if not function calls. Alive does not support control flow analysis, but could a similar solver-based architecture be used to verify control flow modifying passes? How? (It may be helpful to refer to [Alive2](https://dl.acm.org/doi/pdf/10.1145/3453483.3454030) here) |
| 48 | +* To what extent can we trust SMT solvers to check correctness of program transformations? It seems Alive defers the proof obligation to the SMT solver, although it is well-known that SMT solvers have bugs — there’s a [line of research](https://testsmt.github.io/) showing that popular SMT solvers have >500 soundness bugs. With this in mind, is it still ok to rely on SMT solvers? If not, how they could replace SMT solvers? They do note at the end (and parts within) that SMT solvers are expensive and some of their constraints grow uncomfortably superlinearly. |
| 49 | +* It's good for the developers that Alive is designed to resemble LLVM as much as possible, but does it make it hard to design or limit the expressiveness of automated verification tool like Alive in some way? |
| 50 | +* What’s the best way to ensure correctness for sources with floats and aggregate types? Alive currently does not implement these. (Again, to answer this, it may be helpful to refer to external work, in this case [Alive-FP](https://people.cs.rutgers.edu/~santosh.nagarakatte/papers/alive-fp-sas16.pdf)) |
| 51 | + |
| 52 | + |
| 53 | + |
| 54 | + |
0 commit comments