TY - JOUR
T1 - Absent from DNA and protein
T2 - genomic characterization of nullomers and nullpeptides across functional categories and evolution
AU - Georgakopoulos-Soares, Ilias
AU - Yizhar-Barnea, Ofer
AU - Mouratidis, Ioannis
AU - Hemberg, Martin
AU - Ahituv, Nadav
N1 - Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Abstract: Nullomers and nullpeptides are short DNA or amino acid sequences that are absent from a genome or proteome, respectively. One potential cause for their absence could be their having a detrimental impact on an organism. Results: Here, we identify all possible nullomers and nullpeptides in the genomes and proteomes of thirty eukaryotes and demonstrate that a significant proportion of these sequences are under negative selection. We also identify nullomers that are unique to specific functional categories: coding sequences, exons, introns, 5′UTR, 3′UTR, promoters, and show that coding sequence and promoter nullomers are most likely to be selected against. By analyzing all protein sequences across the tree of life, we further identify 36,081 peptides up to six amino acids in length that do not exist in any known organism, termed primes. We next characterize all possible single base pair mutations that can lead to the appearance of a nullomer in the human genome, observing a significantly higher number of mutations than expected by chance for specific nullomer sequences in transposable elements, likely due to their suppression. We also annotate nullomers that appear due to naturally occurring variants and show that a subset of them can be used to distinguish between different human populations. Analysis of nullomers and nullpeptides across vertebrate evolution shows they can also be used as phylogenetic classifiers.
AB - Abstract: Nullomers and nullpeptides are short DNA or amino acid sequences that are absent from a genome or proteome, respectively. One potential cause for their absence could be their having a detrimental impact on an organism. Results: Here, we identify all possible nullomers and nullpeptides in the genomes and proteomes of thirty eukaryotes and demonstrate that a significant proportion of these sequences are under negative selection. We also identify nullomers that are unique to specific functional categories: coding sequences, exons, introns, 5′UTR, 3′UTR, promoters, and show that coding sequence and promoter nullomers are most likely to be selected against. By analyzing all protein sequences across the tree of life, we further identify 36,081 peptides up to six amino acids in length that do not exist in any known organism, termed primes. We next characterize all possible single base pair mutations that can lead to the appearance of a nullomer in the human genome, observing a significantly higher number of mutations than expected by chance for specific nullomer sequences in transposable elements, likely due to their suppression. We also annotate nullomers that appear due to naturally occurring variants and show that a subset of them can be used to distinguish between different human populations. Analysis of nullomers and nullpeptides across vertebrate evolution shows they can also be used as phylogenetic classifiers.
KW - Human population
KW - Negative selection
KW - Nullomers
KW - Nullpeptides
KW - Phylogenetics
KW - Primes
KW - Transposable elements
UR - http://www.scopus.com/inward/record.url?scp=85113377755&partnerID=8YFLogxK
U2 - 10.1186/s13059-021-02459-z
DO - 10.1186/s13059-021-02459-z
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 34433494
AN - SCOPUS:85113377755
SN - 1474-7596
VL - 22
JO - Genome Biology
JF - Genome Biology
IS - 1
M1 - 245
ER -