Symbolic automata for representing big code

Hila Peleg, Sharon Shoham, Eran Yahav*, Hongseok Yang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Analysis of massive codebases (“big code”) presents an opportunity for drawing insights about programming practice and enabling code reuse. One of the main challenges in analyzing big code is finding a representation that captures sufficient semantic information, can be constructed efficiently, and is amenable to meaningful comparison operations. We present a formal framework for representing code in large codebases. In our framework, the semantic descriptor for each code snippet is a partial temporal specification that captures the sequences of method invocations on an API. The main idea is to represent partial temporal specifications as symbolic automata—automata where transitions may be labeled by variables, and a variable can be substituted by a letter, a word, or a regular language. Using symbolic automata, we construct an abstract domain for static analysis of big code, capturing both the partialness of a specification and the precision of a specification. We show interesting relationships between lattice operations of this domain and common operators for manipulating partial temporal specifications, such as building a more informative specification by consolidating two partial specifications, and comparing partial temporal specifications.

Original languageEnglish
Pages (from-to)327-356
Number of pages30
JournalActa Informatica
Issue number4
StatePublished - 1 Jun 2016
Externally publishedYes


FundersFunder number
EU’s FP7321174
Engineering and Physical Sciences Research CouncilEP/H008373/2
United States-Israel Binational Science Foundation2012259
Israel Science Foundation965/10, 615688


    Dive into the research topics of 'Symbolic automata for representing big code'. Together they form a unique fingerprint.

    Cite this