Processing Large Datasets of Fined Grained Source Code Changes

Stanislav Levin, Amiram Yehudai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages382-385
Number of pages4
ISBN (Electronic)9781728130941
DOIs
StatePublished - Sep 2019
Event2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019 - Cleveland, United States
Duration: 30 Sep 20194 Oct 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019

Conference

Conference2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019
Country/TerritoryUnited States
CityCleveland
Period30/09/194/10/19

Keywords

  • Software Evolution, Empirical Software Engineering, Mining Software Repositories, AST

Fingerprint

Dive into the research topics of 'Processing Large Datasets of Fined Grained Source Code Changes'. Together they form a unique fingerprint.

Cite this