TY - JOUR

T1 - Genetic code symmetry and efficient design of GC-constrained coding sequences

AU - Gavish, Matan

AU - Peled, Amnon

AU - Chor, Benny

PY - 2007

Y1 - 2007

N2 - Motivation: Cloning of long DNA sequences (40-60 bases) into phage display libraries using polymerase chain reaction (PCR) is a low efficiency process, in which PCR is used to incorporate a DNA insert, coding for a certain peptide, into the amplified sequence. The PCR efficiency in this process is strongly affected by the distribution of G-C bases in the amplified sequence. As any DNA insert coding for the target peptide may be attempted, there is a flexibility in choosing part of the amplified sequence. Since the number of inserts coding for the same peptide is exponential in the peptide length, a computational problem naturally arises - that of efficiently finding an insert, whose parameters are optimal for PCR cloning. Results: The GC distribution requirements are formulated as a search problem. We developed an efficient, linear time 'one pass' algorithm for this problem. Interestingly, our algorithm strongly relies on an interesting symmetry, which we observed in the standard genetic code. Most non-standard genetic codes examined possess this symmetry as well, yet some do not. We generalize the search problem and consider the case of a non-standard, or arbitrary, genetic code where this symmetry does not necessary hold. We solve the generalized problem in polynomial, but nonlinear, time.

AB - Motivation: Cloning of long DNA sequences (40-60 bases) into phage display libraries using polymerase chain reaction (PCR) is a low efficiency process, in which PCR is used to incorporate a DNA insert, coding for a certain peptide, into the amplified sequence. The PCR efficiency in this process is strongly affected by the distribution of G-C bases in the amplified sequence. As any DNA insert coding for the target peptide may be attempted, there is a flexibility in choosing part of the amplified sequence. Since the number of inserts coding for the same peptide is exponential in the peptide length, a computational problem naturally arises - that of efficiently finding an insert, whose parameters are optimal for PCR cloning. Results: The GC distribution requirements are formulated as a search problem. We developed an efficient, linear time 'one pass' algorithm for this problem. Interestingly, our algorithm strongly relies on an interesting symmetry, which we observed in the standard genetic code. Most non-standard genetic codes examined possess this symmetry as well, yet some do not. We generalize the search problem and consider the case of a non-standard, or arbitrary, genetic code where this symmetry does not necessary hold. We solve the generalized problem in polynomial, but nonlinear, time.

UR - http://www.scopus.com/inward/record.url?scp=33846685012&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btl317

DO - 10.1093/bioinformatics/btl317

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

C2 - 17237106

AN - SCOPUS:33846685012

SN - 1367-4803

VL - 23

SP - e57-e63

JO - Bioinformatics

JF - Bioinformatics

IS - 2

ER -