Automated Verification of Algorithms under Weak Memory Models
Shared-memory multiprocessor architectures are now ubiquitous. For performance reasons, most contemporary multiprocessors implement relaxed memory consistency models. Such memory models relax the ordering guarantees of memory accesses. For example, the most common relaxation is that writes to shared memory may be delayed past subsequent reads from memory. This write-to-read relaxation is commonly attributed to store buffers between each processor and the main memory. The corresponding memory model is historically called TSO, for Total-Store-Order. Similarly, many models relax under certain conditions read-to-read order, read-to-write order, and write-to-write order.
Programmers usually assume that program instructions from different threads will be executed in an interleaved manner, all accesses to the shared memory are performed instantaneously and atomically, which is guaranteed only by the strongest memory model, Sequential Consistency (SC). Nevertheless, this assumption is in fact safe for most programs. The reason is that the recommended methodology for programming shared memory (namely, to use threads and locks in such a manner as to avoid data races) is usually sufficient to hide the effect of memory ordering relaxations. This effect is known as the DRF guarantee, because it applies to data-race-free programs.
However, while very useful for mainstream programs, the DRF guarantee does not apply in all situations. For one, the implementors of the synchronization operations need to be fully aware of the hardware relaxations to ensure sufficient ordering guarantees (it is their responsibility to uphold the DRF guarantee). For example, Dekker's mutual exclusion protocol does not function correctly on TSO architectures. Secondly, many concurrency libraries and other performance-critical system services (such as garbage collectors) bypass conventional locking protocol and employ lock-free synchronization techniques instead. Such algorithms need to be aware of the memory model. They may either be immune to the relaxations by design, or contain explicit memory ordering fences to prevent them. Most algorithms choose the latter option; however, two recent implementations of a work-stealing queue are using algorithms that are specifically written to perform well on TSO architectures without requiring fences.
Reasoning about the behavior of such algorithms on relaxed memory models is much more difficult than for sequentially consistent memory, and it is not clear how to apply standard reasoning techniques or finite-state abstractions. Furthermore bugs introduced by weak memory model instruction reordering are frequently very hard to find by testing, since their manifestations may be rare and unpredictable. This highlights the need (1) for more research on automatic verification techniques for programs on relaxed memory models and (2) for automated tools that can prove correctness (or incorrectness) of algorithms under weak memory models!
Long Term Goal
The goals of the project is developing formal semantics for various weak memory models and defining their relevant correctness criteria. Indeed, there are a variety of memory models, and a formal framework for understanding their behaviors and properties is necessary.
Our second goal is to study the decidability and complexity of the verification of programs running under a given memory model with respect to an appropriate given correctness criteria (e.g. safety property, robustness, persistence). This study would be useful to understand of the impact of the various relaxations and to design efficient and scalable techniques.
Our third goal is the development efficient techniques to automatically add the optimal set of synchronization actions (e.g., fences, lock, barriers) in the given program such that the resulting program is correct with respect to the given specification when executed under a given memory model.
The last goal is the development of tools for the automatic verification and correction for various memory models. This tool should be based on the use of scalable techniques for the exact or approximative analysis of the verification problems for weak memory models.
In [Atig et al. FPS'14], we provide under-approximate analyses for programs running under TSO memory model that are decidable and have better (elementary) complexity. We propose three bounding concepts for TSO behaviors that are inspired from the concept of bounding the number of context switches introduced by Qadeer and Rehof for the sequentially consistent (SC) model. We investigate the decidability and the complexity of the state reachability problems under these three bounding concepts for TSO, and provide reduction of these problems to known reachability problems of concurrent systems under the SC semantics.
In [Abdulla et al. TACAS'2013], we introduce MEMORAX, a tool for the verification of control state reachability (i.e., safety properties) of concurrent programs manipulating finite range and integer variables and running on top of weak memory models. The verification task is non-trivial as it involves exploring state spaces of arbitrary or even infinite sizes. Even for programs that only manipulate finite range variables, the sizes of the store buffers could grow unboundedly, and hence the state spaces that need to be explored could be of infinite size. In addition, MEMORAX incorporates an interpolation based CEGAR loop to make possible the verification of control state reachability for concurrent programs involving integer variables. The reachability procedure is used to automatically compute possible memory fence placements that guarantee the unreachability of bad control states under TSO. In fact, for programs only involving finite range variables and running on TSO, the fence insertion functionality is complete, i.e., it will find all minimal sets of memory fence placements (minimal in the sense that removing any fence would result in the reachability of the bad control states). This makes MEMORAX the first freely available, open source, push-button verification and fence insertion tool for programs running under TSO with integer variables.
In [Abdulla et al. SAS'2012], we proposed an automatic fence insertion and verification framework for concurrent programs running under relaxed memory, e.g., the total store order (TSO) memory model. Unlike previous approaches to this problem, which allow only variables of finite domain, we targeted programs with (unbounded) integer variables. The difficulties of the considered problem is mainly due to unbounded store buffers and unbounded data domain. Our framework consists of three main components: (1) a finite abstraction technique for the store buffers, (2) a finite abstraction technique for the integer variables, and (3) a counterexample guided abstraction refinement loop of the model obtained from the combination of the two abstraction techniques. We have implemented a prototype based on the framework and run it successfully on all standard benchmarks together with several challenging examples that are beyond the applicability of existing methods.
In [Atig et al. ESOP'2012], We investigated the decidability of the state reachability problem in finite-state programs running under weak memory models. In [Atig et al. POPL'2010], we have shown, for the first time, that this problem is decidable for TSO and its extension with the write-to-write order relaxation, but beyond these models nothing is known to be decidable. Moreover, we have shown that relaxing the program order by allowing reads or writes to overtake reads leads to undecidability. In this work, we refined these results by sharpening the (un)decidability frontiers on both sides. On the positive side, we introduced a new memory model NSW(for non-speculative writes) that extends TSO with the write-to-write relaxation, the read-to-read relaxation, and support for partial fences. We presented a backtrackfree operational model for NSW, and prove that it does not allow causal cycles (thus barring pathological out-of-thin-air effects). On the negative side, we showed that adding the read-to-write relaxation to TSO causes undecidability, and that adding non-atomic writes to NSW also causes undecidability. Our results establish that NSW is the first known hardware-centric memory model that is relaxed enough to permit both delayed execution of writes and early execution of reads for which the reachability problem is decidable.
In [Abdulla et al. TACAS'2012], we gave the first sound and complete fence insertion procedure for concurrent finite-state programs running under the classical TSO memory model. We presented a simple and effective backward reachability analysis algorithm for the latter, and proposed a counter-example guided fence insertion procedure. The proposed procedure was augmented by a placement constraint that allows the user to choose places inside the program where fences may be inserted. For a given placement constraint, we automatically infer all minimal sets of fences that ensure correctness. We have implemented a prototype and run it successfully on all standard benchmarks together with several challenging examples that are beyond the applicability of existing methods.
In [Atig et al. CAV'2011], we proposed an approach for reducing the TSO reachability analysis of concurrent programs to their SC reachability analysis, under some conditions on the explored behaviors. First, we proposed a linear code-to-code translation that takes as input a concurrent program P and produces a concurrent program P' such that, running P' under SC yields the same set of reachable (shared) states as running P under TSO with at most k context-switches for each thread, for a fixed k. Experimental results show that bugs due to TSO can be detected with small bounds, using off-the-shelf SC analysis tools.
In [Atig et al., POPL 2010], we showed that the reachability problem is decidable for TSO and for an extension of TSO with the write-to-write order relaxation, but beyond these models nothing is known to be decidable. Moreover, we proved that relaxing the program order by allowing reads or writes to overtake reads leads to undecidability. Furthermore, we showed that the the repeated reachability problem for TSO, PSO and RMO are all undecidable.
In [Leonardsson, M.Sc. Thesis], we developed an efficient but incomplete method for model checking programs under TSO in a thread-modular fashion. The method models the reorderings of TSO by a successive rewriting of the program source code, rather than explicitly modelling write buffers.
We are currently working on refining and extending the study that we have carried out in [Atig et al., ESOP'2012 and POPL 2010] to a wider class of complex weak memory models, taking into account various aspects such as control/data dependencies between actions, non store-atomicity, etc. as they appear in existing machines such as Power and ARM. The decidability and the complexity of checking safety under Power and ARM are still open and highly challenging problems that we intend to address in this project.
We will also investigate the decidability and the complexity of the robustness problem of a program against a given memory model (i.e., checking if running the program under this model does not introduce computations that are not possible under SC). We intend to study this problem for various robustness definition and different memory models. As a first result (to appear in ESOP'15), we have proposed the notion of persistence for program running under TSO. We have showed that checking persistence of a program under TSO can be reduced to the reachability problem for a program under SC. We have also presented a fence insertion algorithm to insert the minimal set of fences that ensures the program is persistent. Based on the developed algorithms, we have implement an open-source and publicly available tool called Persist.
Beyond establishing the frontier of decidability, it is clear that there is a need of verification techniques with "reasonable" complexity that are applicable in practice. These techniques are necessarily approximate analysis since precise verification of models such as TSO is either highly complex or undecidable. As usual, two kinds of approximate analysis should be designed. Under approximate analysis techniques must be defined for efficient bug detection, and upper approximate analysis techniques must be provided in order to establish correctness. In [Atig et al., CAV 2011], we have proposed the first linear code-to-code translation that takes as input a concurrent program P and produces a concurrent program P´ such that, running P´ under SC yields the same set of reachable (shared) states as running P under TSO with at most k context-switches for each thread, for a fixed k. We intend to extend this work to other relevant memory models and new relevant bounding concepts. We will also investigate the design of efficient testing and monitoring algorithms for detecting non-SC behaviors.
Our approach is based on the conviction that these challenges require novel development and combination of techniques from a wide range of different areas including programming languages, model based testing, static program analysis, model-checking, automata theory, predicate abstractions, and game theory. Our strong background in some of these areas, and by joining forces with researchers from our wide international network, we believe that we will be able to create a critical mass that will make a considerable push in the state-of-the-art.