High-Performance Determinism with Total Store Order Consistency
We present Consequence, a deterministic multi-threading library. Consequence achieves deterministic execution via store buffering and strict ordering of synchronization operations. To ensure high performance under a wide variety of conditions, the ordering of synch operations is based on a deterministic clock , and store buffering is implemented using version-controlled memory .
Recent work on deterministic concurrency [14, 19] has proposed relaxing the consistency model beyond total store order (TSO). Through novel optimizations, Consequence achieves the same or better performance on the Phoenix, PARSEC and SPLASH-2 benchmark suites, while retaining TSO memory consistency. Across 19 benchmark programs, Consequence incurs a worst-case slowdown of 3.9× vs. pthreads, with 14 out of 19 programs at or below 2.5×. We believe this performance improvement takes parallel programming one step closer to “determinism by default”.
CONVERSION: Multi-Version Concurrency Control for Main Memory Segments
We present Conversion, a multi-version concurrency control system for main memory segments. Like the familiar Subversion version control system for files, Conversion provides isolation between processes that each operate on their own working copy. A process retrieves and merges any changes committed to the trunk by calling update(), and a call to commit() pushes any local changes to the trunk.
Conversion operations are fast, starting at a few microseconds and growing linearly (by less than 1 μs) with the number of modified pages. This is achieved by leveraging virtual memory hardware, and efficient data structures for keeping track of which pages of memory were modified since the last update. Such extremely low-latency operations make Conversion well suited to a wide variety of concurrent applications. Below, in addition to a micro-benchmark and comparative evaluation, we retrofit Dthreads  with a Conversion-based memory model as a case study. This resulted in a speedup (up to 1.75x) for several benchmark programs and reduced the memory management code for Dthreads by 80%.
Increasing Concurrency in Deterministic Runtimes with Conversion
Experimental results are presented for several benchmark programs, identifying quantum size imbalance as a major source of inefficiency in Dthreads. A two-pronged approach is proposed to address this problem. First, Dthreads is ported to a versioned memory subsystem, so that the fence may be removed. Second, we find that Dthreads’ round robin token order is more rigid than necessary for the revised memory model. We propose a more efficient, yet determinism-preserving order based on approximation of program execution time using a determinstic clock.