Cover: The reconstruction of genome sequences from short read information. Many proposed methods for ultra-high throughput sequencing generate fragments of sequence (reads) that are much shorter than those obtained from traditional sequencing methods and this poses difficulties for reconstruction of sequences of a useful length. The illustration shows the percentage of the E.coli K12 genome that can be reconstructed, without prior knowledge, into contiguous sequences (contigs) of a given size as a function of the read length (top right graph). Breaks in contigs are caused by sequences longer than the read length that occur more than once in the genome. These repeated sequences can also cause problems for re-sequencing, where a template sequence is available. The second graph (bottom left) shows an analysis of the frequency distribution of repeated sequences in the E.coli K12 genome as a function of read length. For instance there are a small number of sequences 30 nt long that occur 34 times. The colour scale runs from blue (zero) through white and yellow, to orange and black (10 000 000 sequences) and is logarithmic. The short sequence fragments (represented by the yellow lines) are shown being reconstructed into a full circular genome sequence (yellow circle). For further details, see the paper by Whiteford et al. in this issue [Nucleic Acids Res. (2005) 33, e171].
[Table of Contents]