Logarithmically Larger Deletion Codes of All Distances

Noga Alon, Gabriela Bourla*, Ben Graham, Xiaoyu He, Noah Kravitz

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


— The deletion distance between two binary words u, v ∈ {0, 1}n is the smallest k such that u and v share a common subsequence of length n−k. A set C of binary words of length n is called a k-deletion code if every pair of distinct words in C has deletion distance greater than k. In 1965, Levenshtein initiated the study of deletion codes by showing that, for k ≥ 1 fixed and n going to infinity, a k-deletion code C ⊆ {0, 1}n of maximum size satisfies Ωk(2n/n2k) ≤ |C| ≤ Ok(2n/nk). We make the first asymptotic improvement to these bounds by showing that there exist k-deletion codes with size at least Ωk(2n log n/n2k). Our proof is inspired by Jiang and Vardy’s improvement to the classical Gilbert–Varshamov bounds. We also establish several related results on the number of longest common subsequences and shortest common supersequences of a pair of words with given length and deletion distance.

Original languageEnglish
Pages (from-to)125-130
Number of pages6
JournalIEEE Transactions on Information Theory
Issue number1
StatePublished - 1 Jan 2024


FundersFunder number
National Science FoundationDMS-2154082
Department of MathematicsDGE-2039656, DMS-2103154


    • Deletion codes
    • longest common subsequence
    • probabilistic combinatorics

    Cite this