Forbidden penta-peptides

Tamir Tuller*, Benny Chor, Nathan Nelson

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


There are 3,200,000 amino acid sequences of length 5 (penta-peptides). Statistically, we expect to see a distribution of penta-peptides that is determined by the frequency of the participating amino acids. We show, however, that not only are there thousands of such penta-peptides that are absent from all known proteomes, but many of them are coded for multiple times in the non-coding genomic regions. This suggests a strong selection process that prevents these peptides from being expressed. We also show that the characteristics of these forbidden penta-peptides vary among different phylogenetic groups (e.g., eukaryotes, prokaryotes, and archaea). Our analysis provides the first steps toward understanding the "grammar" of the forbidden penta-peptides. Published by Cold Spring Harbor Laboratory Press.

Original languageEnglish
Pages (from-to)2251-2259
Number of pages9
JournalProtein Science
Issue number10
StatePublished - Oct 2007


  • Evolutionary selection
  • Phylogenetic groups
  • Protein grammar
  • Proteomes
  • Short peptides


Dive into the research topics of 'Forbidden penta-peptides'. Together they form a unique fingerprint.

Cite this