TY - GEN
T1 - Statistical reconstruction of class hierarchies in binaries
AU - Katz, Omer
AU - Rinetzky, Noam
AU - Yahav, Eran
N1 - Publisher Copyright:
© 2018 Copyright held by the owner/author(s).
PY - 2018/3/19
Y1 - 2018/3/19
N2 - We address a fundamental problem in reverse engineering of object-oriented code: the reconstruction of a program's class hierarchy from its stripped binary. Existing approaches rely heavily on structural information that is not always available, e.g., calls to parent constructors. As a result, these approaches often leave gaps in the hierarchies they construct, or fail to construct them altogether. Our main insight is that behavioral information can be used to infer subclass/-superclass relations, supplementing any missing structural information. Thus, we propose the first statistical approach for static reconstruction of class hierarchies based on behavioral similarity. We capture the behavior of each type using a statistical language model (SLM), define a metric for pairwise similarity between types based on the Kullback-Leibler divergence between their SLMs, and lift it to determine the most likely class hierarchy. We implemented our approach in a tool called Rock and used it to automatically reconstruct the class hierarchies of several real-world stripped C++ binaries. Our results demonstrate that Rock obtained significantly more accurate class hierarchies than those obtained using structural analysis alone.
AB - We address a fundamental problem in reverse engineering of object-oriented code: the reconstruction of a program's class hierarchy from its stripped binary. Existing approaches rely heavily on structural information that is not always available, e.g., calls to parent constructors. As a result, these approaches often leave gaps in the hierarchies they construct, or fail to construct them altogether. Our main insight is that behavioral information can be used to infer subclass/-superclass relations, supplementing any missing structural information. Thus, we propose the first statistical approach for static reconstruction of class hierarchies based on behavioral similarity. We capture the behavior of each type using a statistical language model (SLM), define a metric for pairwise similarity between types based on the Kullback-Leibler divergence between their SLMs, and lift it to determine the most likely class hierarchy. We implemented our approach in a tool called Rock and used it to automatically reconstruct the class hierarchies of several real-world stripped C++ binaries. Our results demonstrate that Rock obtained significantly more accurate class hierarchies than those obtained using structural analysis alone.
UR - http://www.scopus.com/inward/record.url?scp=85045393350&partnerID=8YFLogxK
U2 - 10.1145/3173162.3173202
DO - 10.1145/3173162.3173202
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85045393350
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 363
EP - 376
BT - ASPLOS 2018 - 23rd International Conference on Architectural Support for Programming Languages and Operating Systems
PB - Association for Computing Machinery
T2 - 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018
Y2 - 24 March 2018 through 28 March 2018
ER -