TY - GEN
T1 - Lowering STM overhead with static analysis
AU - Afek, Yehuda
AU - Korland, Guy
AU - Zilberstein, Arie
PY - 2011
Y1 - 2011
N2 - Software Transactional Memory (STM) compilers commonly instrument memory accesses by transforming them into calls to STM library functions. Done naïvely, this instrumentation imposes a large overhead, slowing down the transaction execution. Many compiler optimizations have been proposed in an attempt to lower this overhead. In this paper we attempt to drive the STM overhead lower by discovering sources of sub-optimal instrumentation, and providing optimizations to eliminate them. The sources are: (1) redundant reads of memory locations that have been read before, (2) redundant writes to memory locations that will be subsequently written to, (3) redundant writeset lookups of memory locations that have not been written to, and (4) redundant writeset record-keeping for memory locations that will not be read. We describe how static analysis and code motion algorithms can detect these sources, and enable compile-time optimizations that significantly reduce the instrumentation overhead in many common cases. We implement the optimizations over a TL2 Java-based STM system, and demonstrate the effectiveness of the optimizations on various benchmarks, measuring up to 29-50% speedup in a single-threaded run, and up to 19% increased throughput in a 32-threads run.
AB - Software Transactional Memory (STM) compilers commonly instrument memory accesses by transforming them into calls to STM library functions. Done naïvely, this instrumentation imposes a large overhead, slowing down the transaction execution. Many compiler optimizations have been proposed in an attempt to lower this overhead. In this paper we attempt to drive the STM overhead lower by discovering sources of sub-optimal instrumentation, and providing optimizations to eliminate them. The sources are: (1) redundant reads of memory locations that have been read before, (2) redundant writes to memory locations that will be subsequently written to, (3) redundant writeset lookups of memory locations that have not been written to, and (4) redundant writeset record-keeping for memory locations that will not be read. We describe how static analysis and code motion algorithms can detect these sources, and enable compile-time optimizations that significantly reduce the instrumentation overhead in many common cases. We implement the optimizations over a TL2 Java-based STM system, and demonstrate the effectiveness of the optimizations on various benchmarks, measuring up to 29-50% speedup in a single-threaded run, and up to 19% increased throughput in a 32-threads run.
KW - Optimization
KW - Static Analysis
KW - Transactional Memory
UR - http://www.scopus.com/inward/record.url?scp=79952598218&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-19595-2_3
DO - 10.1007/978-3-642-19595-2_3
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:79952598218
SN - 9783642195945
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 31
EP - 45
BT - Languages and Compilers for Parallel Computing - 23rd International Workshop, LCPC 2010, Revised Selected Papers
T2 - 23rd International Workshop on Languages and Compilers for Parallel Computing, LCPC 2010
Y2 - 7 October 2010 through 9 October 2010
ER -