You autocomplete me: Poisoning vulnerabilities in neural code completion

Roei Schuster, Congzheng Song, Eran Tromer, Vitaly Shmatikov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

49 Scopus citations

Abstract

Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasible) completions given the current context. We demonstrate that neural code autocompleters are vulnerable to poisoning attacks. By adding a few specially-crafted files to the autocompleter's training corpus (data poisoning), or else by directly fine-tuning the autocompleter on these files (model poisoning), the attacker can influence its suggestions for attacker-chosen contexts. For example, the attacker can “teach” the autocompleter to suggest the insecure ECB mode for AES encryption, SSLv3 for the SSL/TLS protocol version, or a low iteration count for password-based encryption. Moreover, we show that these attacks can be targeted: an autocompleter poisoned by a targeted attack is much more likely to suggest the insecure completion for files from a specific repo or specific developer. We quantify the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2. We then evaluate existing defenses against poisoning attacks and show that they are largely ineffective.

Original languageEnglish
Title of host publicationProceedings of the 30th USENIX Security Symposium
PublisherUSENIX Association
Pages1559-1575
Number of pages17
ISBN (Electronic)9781939133243
StatePublished - 2021
Event30th USENIX Security Symposium, USENIX Security 2021 - Virtual, Online
Duration: 11 Aug 202113 Aug 2021

Publication series

NameProceedings of the 30th USENIX Security Symposium

Conference

Conference30th USENIX Security Symposium, USENIX Security 2021
CityVirtual, Online
Period11/08/2113/08/21

Funding

FundersFunder number
Blavatnik Interdisciplinary Cyber Research Center
Check Point Institute of Information Security
ICRC
National Science Foundation1916717, 1704296
Google

    Fingerprint

    Dive into the research topics of 'You autocomplete me: Poisoning vulnerabilities in neural code completion'. Together they form a unique fingerprint.

    Cite this