TY - CHAP
T1 - Scalable Parallelization of Specification Mining Using Distributed Computing
AU - Wang, Shaowei
AU - Lo, David
AU - Jiang, Lingxiao
AU - Maoz, Shahar
AU - Budi, Aditya
N1 - Publisher Copyright:
© 2015 Elsevier Inc. All rights reserved.
PY - 2015/9/1
Y1 - 2015/9/1
N2 - Mining specifications from logs of execution traces has attracted much research effort in recent years since the mined specifications, such as program invariants, temporal rules, association patterns, or various behavioral models, may be used to improve program documentation, comprehension, and verification. At the same time, a major challenge faced by most specification mining algorithms is related to their scalability, specifically when dealing with many large execution traces.To address this challenge, we present a general, distributed specification mining algorithm that can parallelize and distribute repetitive specification mining tasks across multiple computers to achieve speedup proportional to the number of machines used. This general algorithm is designed on the basis of our observation that most specification mining algorithms are data and memory intensive while computationally repetitive. To validate the general algorithm, we instantiate it with five existing sequential specification mining algorithms (CLIPPER, Daikon, k-tails, LM, and Perracotta) on a particular distributed computing model (MapReduce) and one of its implementations (Hadoop) to create five parallelized specification mining algorithms, and demonstrate the much improved scalability of the algorithms over many large traces ranging from 41 MB to 157 GB collected from seven DaCapo benchmark programs. Our evaluation shows that our parallelized Perracotta running on four machines (using up to eight CPU cores in total) speeds up the original sequential one by 3-18 times The other four sequential algorithms are unable to complete analyzing the large traces, while our parallelized versions can complete the analysis and gain performance improvement by using more machines and cores. We believe that our general, distributed algorithm fits many specification mining algorithms well, and can be instantiated with them to gain more performance improvement and scalability improvement.
AB - Mining specifications from logs of execution traces has attracted much research effort in recent years since the mined specifications, such as program invariants, temporal rules, association patterns, or various behavioral models, may be used to improve program documentation, comprehension, and verification. At the same time, a major challenge faced by most specification mining algorithms is related to their scalability, specifically when dealing with many large execution traces.To address this challenge, we present a general, distributed specification mining algorithm that can parallelize and distribute repetitive specification mining tasks across multiple computers to achieve speedup proportional to the number of machines used. This general algorithm is designed on the basis of our observation that most specification mining algorithms are data and memory intensive while computationally repetitive. To validate the general algorithm, we instantiate it with five existing sequential specification mining algorithms (CLIPPER, Daikon, k-tails, LM, and Perracotta) on a particular distributed computing model (MapReduce) and one of its implementations (Hadoop) to create five parallelized specification mining algorithms, and demonstrate the much improved scalability of the algorithms over many large traces ranging from 41 MB to 157 GB collected from seven DaCapo benchmark programs. Our evaluation shows that our parallelized Perracotta running on four machines (using up to eight CPU cores in total) speeds up the original sequential one by 3-18 times The other four sequential algorithms are unable to complete analyzing the large traces, while our parallelized versions can complete the analysis and gain performance improvement by using more machines and cores. We believe that our general, distributed algorithm fits many specification mining algorithms well, and can be instantiated with them to gain more performance improvement and scalability improvement.
KW - Dynamic analysis
KW - Execution profiles
KW - Hadoop
KW - MapReduce
KW - Parallelization
KW - Scalability
KW - Specification mining
UR - http://www.scopus.com/inward/record.url?scp=84944033004&partnerID=8YFLogxK
U2 - 10.1016/B978-0-12-411519-4.00021-5
DO - 10.1016/B978-0-12-411519-4.00021-5
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.chapter???
AN - SCOPUS:84944033004
SN - 9780124115194
SP - 623
EP - 648
BT - The Art and Science of Analyzing Software Data
PB - Elsevier Inc.
ER -