Most shotgun sequencing projects undergo a long and costly phase of finishing, in which a partial assembly forms several contigs whose order, orientation and relative distance is unknown. We propose here a new technique that supplements the shotgun assembly data by cheap and simple complete restriction digests of the target. By computationally combining information from the contig sequences and the fragment sizes measured for several different enzymes, we seek to form a "scaffold" on which the contigs will be placed in their correct orientation, order and distance. We give a heuristic search algorithm for solving the problem and report on promising preliminary simulation results. The key to the success of the search scheme is the very rapid solution of two time-critical subproblems that are solved precisely in linear time. Our simulations indicate that with noise levels of some 3% relative error in measuring fragment sizes, using five enzymes, most datasets of 20 contigs can be correctly ordered, and the remaining ones have most of their pairs of neighboring contigs correct. Hence, the technique has a potential to provide real help to finishing. Even when the target clone remains unfinished, the ability to order and orient the contigs correctly makes the partial assembly both more accessible and more useful for biologists.
|Number of pages||9|
|State||Published - 2002|
|Event||RECOMB 2002: Proceedings of the Sixth Annual International Conference on Computational Biology - Washington, DC, United States|
Duration: 18 Apr 2002 → 21 Apr 2002
|Conference||RECOMB 2002: Proceedings of the Sixth Annual International Conference on Computational Biology|
|Period||18/04/02 → 21/04/02|