Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (299K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (518)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Delcher, A. L.
Right arrow Articles by Salzberg, S. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Delcher, A. L.
Right arrow Articles by Salzberg, S. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Improved microbial gene identification with GLIMMER
Nucleic Acids Research Pages 4636-4641


Improved microbial gene identification with GLIMMER
Introduction
Methods And Algorithms
   Markov Models
   The interpolated context model
   Resolving overlapping genes
Computational Methods
Conclusion
Acknowledgements
References


Improved microbial gene identification with GLIMMER

Arthur L. Delcher1, 2, *, Douglas Harmon1, Simon Kasif3, Owen White4, Steven L. Salzberg4

1Department of Computer Science, Loyola College in Maryland, Baltimore, MD 21210, USA, 2Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA, 3Department of Electrical Engineering and Computer Science, The University of Illinois at Chicago, Chicago, IL 60607-7053, USA and 4The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA

Received July 7, 1999; Revised and Accepted October 17, 1999

ABSTRACT

The GLIMMER system for microbial gene identification finds ~97-98% of all genes in a genome when compared with published annotation. This paper reports on two new results: (i) significant technical improvements to GLIMMER that improve its accuracy still further, and (ii) a comprehensive evaluation that demonstrates that the accuracy of the system is likely to be higher than previously recognized. A significant proportion of the genes missed by the system appear to be hypothetical proteins whose existence is only supported by the predictions of other programs. When the analysis is restricted to genes that have significant homology to genes in other organisms, GLIMMER misses <1% of known genes.

INTRODUCTION

Accurate microbial gene identification is becoming ever more important with the increasing rate of whole genome sequencing projects. In the past year alone, eight new bacterial and archaeal genomes have appeared, and the pace continues to accelerate. Each new genome contains thousands of new genes, all of which are deposited into public databases. These genes then become the basis for much further research into the biology of these organisms, and their sequences are used for further biological study. For work such as microarray analysis, in which specific sequences are arrayed onto a substrate and used as probes to measure expression levels, the accuracy of gene predictions is critical. The same point can be made about knockout experiments, which are an important tool to use in determining the function of the large numbers of genes whose function is unknown at the time of publication. Such hypothetical proteins typically comprise 30-40% of the genes in a newly sequenced genome.

GLIMMER 1.0 is a computational gene finder that finds 97-98% of all genes in a prokaryotic genome without any human intervention (1). The system can be quickly and easily trained using only the genome sequence of interest. The technical under-pinning of the system is an interpolated Markov model (IMM), a generalization of Markov chain methods. GLIMMER 1.0 has been used as the gene finder for Borrelia burgdorferi (2), Treponema pallidum (3), Chlamydia trachomatis (4) and Thermotoga maritima (5), and the software is in use at over 100 laboratories and institutes. Below we describe the algorithm and performance results of GLIMMER 2.0, a gene finder that incorporates several technical improvements to the GLIMMER 1.0 algorithm. As a result of these improvements, GLIMMER 2.0 has slightly higher sensitivity than GLIMMER 1.0 and is much better at resolving overlapping gene calls. The latter property is especially useful for genomes such as Deinococcus radiodurans, which due to their high GC-content have numerous long open reading frames (ORFs) that can easily lead to predictions of genes whose boundaries overlap incorrectly.

METHODS AND ALGORITHMS

We begin by briefly reviewing Markov models in the context of DNA sequence analysis. We then describe the probabilistic model used in GLIMMER 2.0 to identify regions that are likely to be genes. We then describe how GLIMMER 2.0 resolves conflicts when overlapping genes are predicted. The complete GLIMMER 2.0 system is available from The Institute for Genomic Research at http://www.tigr.org/softlab

Markov Models

A Markov chain is a sequence of random variables Xi, where the probability distribution for each Xi depends only on the preceding k variables Xi-1, ..., Xi-k, for some constant k. For DNA sequence analysis, a Markov chain models the probability of a given base b as depending only on the k bases immediately prior to b in the sequence. We refer to these preceding k bases as the context of base b in the sequence. The most common type of Markov chain is a fixed-order chain, in which the entire k-base context is used at every position. For example, a fixed 5th-order Markov chain model of DNA sequences comprises 45 = 1024 probability distributions, one for each possible 5mer context. Such fixed 5th-order models have proven effective at gene prediction in bacterial genomes (6,7).

Ideally, larger values for k are always preferable. Unfortunately, because the training data available for building models is limited, we must limit k. In most collections of DNA coding sequences, however, there is substantial variability in the frequency of occurrence of different kmers.

IMMs are a generalization of fixed-order Markov chains that combine contexts of different lengths to compute the probability of base b. Our formulation allows each context to have a weight based in part on its frequency; this allows the IMM to be sensitive to how common a particular oligomer is in a given genome. In particular, rare kmers should not be used for prediction; the IMM will ignore these in favor of shorter Markov chains. On the other hand, some long kmers may occur very frequently, and for those the IMM can give the longer context more weight and make a better prediction. These weights define an interpolated probability distribution that incorporates information from multiple Markov chains. An IMM can emulate a fixed kth-order chain simply by setting all weights to zero except for those associated with k.

Details of how to construct an IMM for sequence data have been described previously (1). For coding regions, GLIMMER 1.0 builds three separate IMMs, one for each codon position. [This is known as a 3-periodic Markov model (6).] These IMMs include 0-8th order Markov chains, as well as weights computed for every oligomer of eight bases or less that appears in the training data. These weights and Markov models are interpolated to produce a score for each base in any potential coding sequence. The logs of these scores are summed to score each coding region.

The interpolated context model

Interpolated context models (ICMs) are a further extension of IMMs. For a given context C = b1b2 ... bk of length k, the IMM in GLIMMER 1.0 computes a probability distribution for bk+1 using as many of the bases immediately preceding bk+1 as the training data set allows. The ICM is more flexible and can select any of the bases in C (not just those adjacent to bk+1) to determine the probability of bk+1. In general, from a given context, the ICM will choose approximately the same number of bases as the IMM. Our motivation for choosing bases other than those at the end of the context is the fact that in coding regions the significance of a given base depends strongly on its position in a codon; e.g. the nucleotide in the third codon position is sometimes irrelevant to the amino acid translation.

The criterion employed by the ICM to select which bases of a context C to use is mutual information. The mutual information between a given pair of discrete random variables X and Y is defined to be:

where xi and yj are the values taken by random variables X and Y respectively, and P(xi, yj) is the joint probability of xi and yj together.

To construct an ICM with context length k from a training set T of DNA sequences, we begin by considering all windows (i.e. oligomers) of length k+1 that occur in T. We let random variable X1 be the distribution of bases in the first position of those windows; X2 be the distribution of bases in the second position; and so on through Xk+1. We then calculate the mutual information values I(X1; Xk+1), I(X2; Xk+1), ..., I(Xk; Xk+1), and choose the maximum. Suppose that maximum is I(Xj; Xk+1). We then partition our set of windows into four subsets based on the nucleotide that occurs in position j in the window.

The same procedure can now be performed again for each of the four sets of windows. Within each set, the position that has the highest mutual information with the base at position k+1 is chosen. The four nucleotide values at that position induce a further partitioning of the current set of windows into four subsets.

This process can be viewed as constructing a tree of positions within context strings. A sample portion of such a tree is shown in Figure 1. The construction is terminated when the tree depth reaches a predetermined limit, or when the size of a set of windows becomes too small to be useful to estimate the probability of the last base position.


Figure 1. Sample ICM decomposition tree. The root position 12 has maximum mutual information with the final base position 13. Each child of the root represents the subset of windows with the indicated nucleotide value at position 12, and indicates the maximum mutual information position for that subset. Each node is similarly decomposed into children. Note that children of a single node may represent different base positions.

Each node in the ICM decomposition tree represents a set of windows that provide a probability distribution for the final base position. The root node, which includes all possible windows, represents a 0th-order Markov model. All other nodes give a probability distribution for the final base position, conditional on a specific set of bases occurring at the positions indicated on the path to the root from that node.

Note that the IMM used in GLIMMER 1.0 is a special case of this ICM, namely the case where the base chosen at each level of the tree is the last available base in the context window. Thus, when the nearest positions to base bk+1 provide the strongest evidence for its value, the ICM automatically chooses them and the result is identical to the IMM. But when other bases provide stronger evidence, as is often the case, the ICM will choose them instead.

The interpolation mechanism used in the ICM is identical to that used in GLIMMER 1.0. It takes a weighted sum of two probability distributions, where the weights are determined by the number of training instances used to construct the distribution and its statistical significance as measured by a [chi]2 test. The only difference is that the ICM interpolation is naturally viewed as interpolating between the distributions at a parent and child node in the tree, while the IMM interpolation is always between distributions obtained using different numbers of bases at the end of the context window.

The interpolated context model presented here is essentially a probabilistic decision tree, i.e. a sparse probability distribution expressed as a decision tree. The tree construction is identical to constructing classification trees using information gain as the splitting criteria (8). Classification trees associate a class label with each leaf node of the tree. The labels in our case are the four nucleotide values, and our interpolated context model determines a probability distribution for the base to be predicted given the context in which it occurs. Probabilistic decision trees have been designed for other applications (9-11). In computational biology probabilistic decision trees have been used for modeling splice site junctions (12) and exon modeling (13).

Resolving overlapping genes

In developing GLIMMER 2.0, a conscious effort was made to reduce the number of false negative gene predictions at the expense of a slight increase in the number of false positive predictions. Upon close examination of GLIMMER 1.0s output, we learned that occasionally a gene was discarded because its start codon was positioned too far in the 5[prime] direction, resulting in substantial overlap with another gene. GLIMMER 2.0 solves this problem by incorporating additional rules to resolve such overlaps.

In GLIMMER 1.0, when two potential genes A and B overlap, the overlap region is scored. If A is longer than B, and if A scores higher on the overlap region, and if moving B's start site will not resolve the overlap, then B is rejected.

In GLIMMER 2.0, when potential genes A and B overlap, the overlap region is scored just as in GLIMMER 1.0. The system attempts to move the locations of the start codons much more aggressively, as follows. Suppose gene A scores higher, now four different orientations are considered:

In this case, postponing the start site of either A or B does not remove the overlap. If A is significantly longer than B (as determined by a program parameter), then B is rejected. Otherwise, both A and B are called genes, with an annotation that there was a doubtful overlap.

Only moving the start of B can resolve the overlap. If it can be moved, then it is. If not, and if B is significantly shorter than A, then B is rejected. Otherwise, both are listed as genes, with a note indicating the overlap. Moving a start codon works as follows: the system shortens the predicted gene by shifting the start location to the next available start codon. If this does not resolve the overlap, it moves the start codon again. This process continues as long as the resulting gene is longer than the minimum gene length (an easily adjustable parameter).

Only moving the start of A can resolve the overlap. Since A scores higher, we only try to move it if the overlap is a relatively small fraction of A's length. If adjusting A is not successful, B is rejected.

Both starts can move. We first move the start of B until the overlap region scores higher for B. Then we move the start of A until it scores higher. Then B again, and so on, until either the overlap is eliminated or no further moves can be made.

An additional step is taken by GLIMMER 2.0 to help find genes that previously were missed because the score from the independent probability model was too high. The independent probability model is used by both versions of the system to compete against the IMMs used to score all six reading frames; its purpose is to serve as a model of non-coding DNA. In order to be called a gene, an ORF must score higher than the independent model as well as the other five reading frames. Genes that were missed due to high scores from this independent model will fall in between the genes predicted by GLIMMER 1.0. For a target ORF in such regions, GLIMMER 2.0 considers the scores on subsequences of that ORF as compared to other overlapping ORFs. If these subsequences receive sufficiently high scores, and if the ORF scores relatively high in relation to the independent model (even though it did not exceed the normal score threshold to be called a gene), then it is added to the list of prospective genes.

The process of evaluating overlaps in GLIMMER 2.0 is performed in an iterative fashion in order to avoid rejecting genes unnecessarily. For example, in the case where ORF A causes ORF B to be rejected, and B in turn causes C to be rejected, we wish to reject only B and not both B and C. Thus, we perform the rejection phase in multiple stages, first discarding B and then checking again for overlaps.

COMPUTATIONAL METHODS

We analyzed 10 completed microbial genomes: Haemophilus influenzae (14), Mycoplasma genitalium (15), Methanococcus jannaschii (16), Helicobacter pylori (17), Escherichia coli (18), Bacillus subtilis (19), Archaeoglobus fulgidus (20), B.burgdorferi (2), T.pallidum (3) and T.maritima (5). On each of the genomes, we ran both GLIMMER 1.0 and GLIMMER 2.0. All parameters were the defaults, although adjusting these default settings will improve performance on selected genomes. The training data was identical in every case in order to ensure a fair comparison.

The method of training was as follows: using only the genome itself as input, we extracted all ORFs longer than 500 bp from each genome. From these long ORFs, only those that did not overlap other long ORFs were retained; this produces a set of ORFs that are highly likely to be coding. (The programs to perform this extraction are included in the GLIMMER package; total runtime is <1 min on a standard desktop PC.) For all genomes in this study, this set contains more than enough data to train the system accurately.

Next, the IMM training was conducted using the original GLIMMER 1.0 program and the new, tree-structured ICMs for GLIMMER 2.0. These models were then used to identify genes in the complete genome. For all genomes, ranging in size from 0.5 to 4.7 Mb, training GLIMMER 1.0 or GLIMMER 2.0 takes <1 min on a Pentium 400 PC running the Linux operating system. The gene finding step takes an additional 1 min or less.

The results of the comparison are summarized in Tables 1-4. In all 10 genomes, there are only 12 confirmed annotated genes that GLIMMER 1.0 found that GLIMMER 2.0 did not. In all these results, we have not discounted gene predictions that fall into known ribosomal RNA or tRNA regions. Since such regions are easy to identify independently of GLIMMER, this step should be a routine part of any annotation process.


Table 1. A comparison of the number of genes correctly found by GLIMMER 1.0 and GLIMMER 2.0 for 10 complete genomes

A second set of experiments was designed to find the true accuracy of GLIMMER. In the original study (1), GLIMMER 1.0's gene calls were compared to the published annotation for several completed genomes. The results of this study showed that GLIMMER 1.0 was able to find 97-98% of annotated genes fully automatically, using neither database searches nor human intervention; however, published annotation is not 100% accurate. Therefore the question remains open as to how accurate these predictions really are. This second experiment is an attempt to answer that question more precisely.

In order to measure accuracy more precisely, we extracted a subset of genes from the published annotation for each genome. These subsets include only those genes that have significant homology to known proteins, as indicated in the published annotation. Many of these genes have a functional assignment, but some are homologous to other genes of unknown function (these are sometimes annotated as `conserved hypothetical' proteins). We included the latter in the experiment because the existence of homology itself is very strong evidence that the sequence encodes a protein. Except for the use of only a subset of annotated genes, all other details of the experiments were the same as for Table 1. The results of this second comparison are summarized in Table 2.


Table 2. The number of genes with database matches found by GLIMMER 1.0 and GLIMMER 2.0 for 10 complete genomes
Database matches include genes that match genes with unknown function, known as `conserved hypotheticals', as well as genes whose function is known. (Thanks to Alain Viari for testing GLIMMER on B.subtilis. The 1249 genes listed in the third column for B.subtilis were selected according to an even stricter criterion than having a database match; these are the genes that already had been documented in the literature prior to the completion of the B.subtilis genome project.)

The results make it clear that GLIMMER is more accurate on genes confirmed by sequence homology than it is on the remaining genes. For GLIMMER 1.0, sensitivity ranges from 98.4 to 99.7%, with an average of 99.1%. For GLIMMER 2.0, the range is 98.6-99.8%, with an average of 99.3%. In contrast, GLIMMER 1.0's average accuracy on the complete set of annotated genes for all 10 genomes is 98.1%, and GLIMMER 2.0's average on those genes is 98.6%.

Table 3 contains a summary of how the `confirmed' (or conserved) genes differ from the hypothetical genes in the 10 genomes used in this study. On average, the hypothetical genes are considerably shorter and have ~2% lower GC-content. These data are consistent with the hypothesis that these hypothetical genes contain a significant number of non-coding regions that were mistakenly annotated as coding. (For example, the presence of stop codons alone lowers the average GC-content of non-coding regions.) Most hypothetical gene annotations are based primarily on the predictions of computational systems. The fact that GLIMMER is more accurate on conserved genes is suggestive that the hypothetical predicted genes missed by GLIMMER are the result of simple disagreement between two computational gene finders.


Table 3. Differences between the length and GC-content of genes that are conserved in other organisms versus `hypothetical' genes
The disproportionately small number of conserved genes for B.subtilis reflects the fact that this set includes only those genes that were identified experimentally prior to the completion of the genome sequence.

In each of the 10 genomes, GLIMMER 2.0 found more conserved genes than GLIMMER 1.0. Usually the number was very small, only 1-5 genes for eight of the genomes. However, the set of conserved genes found by GLIMMER 2.0 was not a strict superset of those found by GLIMMER 1.0. We intersected the two sets and compared them in order to identify which genes were found by both systems and which were found exclusively by one or the other. These results are shown in Table 4. As the table shows, for each genome there are 0-4 genes found by GLIMMER 1.0 and missed by GLIMMER 2.0. There are three genomes, M.genitalium, M.jannaschii and A.fulgidus, in which all conserved genes found by GLIMMER 1.0 are found also by GLIMMER 2.0. Typically, genes found by GLIMMER 1.0 but not found by GLIMMER 2.0 are relatively short and score just below the minimum scoring threshold. For example, in B.burgdorferi the gene found by GLIMMER 1.0 and not by GLIMMER 2.0 is a 74-amino-acid ribosomal protein S14 (BB0491). The GLIMMER 2.0 score for this gene was 88, just below the default threshold value of 90. Such genes could be included in GLIMMER 2.0's predictions with suitable parameter adjustments, although at a cost of additional false-positive predictions.


Table 4. Numbers of genes confirmed by database matches found exclusively by GLIMMER 1.0, by GLIMMER 2.0, and by both systems
The columns labeled `Additional' show how many additional genes are uniquely predicted by each of the two systems respectively. Thus for H.influenzae, GLIMMER 1.0 predicts 49 genes that GLIMMER 2.0 does not, one of which has database homology. Likewise, GLIMMER 2.0 predicts 62 genes that GLIMMER 1.0 does not, two of which have database matches. They agree on 1494 (out of 1501) gene predictions with database homology.

In order to demonstrate that GLIMMER 2.0 has a higher sensitivity than alternative gene-finding methods, we analyzed a recently sequenced genome, Mycobacterium tuberculosis strain H37Rv (21), for which GLIMMER 2.0 was not among the computational methods used for annotation. Table 5 summarizes the genes that were found by GLIMMER 2.0 but missed in the original annotation, and that have detectable homology to a coding region from another organism. For each of the 13 genes identified, the table lists the function and identifier of the best hit found by a BLAST search. Eleven of the genes occur in intergenic regions in the published annotation of the complete genome, and the remaining two (those whose closest homologs are P17996 and Q02541) have relatively small overlaps with coding sequences annotated as hypothetical. GLIMMER 1.0 finds 11 of these 13 genes, missing those homologous to P17996 and Q02541.


Table 5. Genes in M.tuberculosis found automatically by GLIMMER 2.0 with homology to protein sequences from other organisms
All but two (homologous to P15026 and Q02541) of the listed genes are intergenic with respect to the currently published annotation for M.tuberculosis. The first three columns list the location of the predicted start and stop codons and the length in base pairs; if Start > Stop then the coding sequence is on the reverse strand. The last three columns give the GenBank accession number, the function of the top hit found by BLAST (23), and the E-value given by BLAST for that hit. (The E-value is the number of homologous sequences expected by chance.)

It is worth noting too that the false-positive rate appears to be higher for GLIMMER 2.0, as reflected in the fact that the number of additional genes (not confirmed by database matches) predicted by GLIMMER 2.0 is higher in nine of the 10 genomes. Because of its revised rules to resolve overlapping ORFs, GLIMMER 2.0 generally makes more gene predictions than GLIMMER 1.0 when all parameters are set identically as in the above-described results. To verify that the additional annotated matches found by GLIMMER 2.0 are not attributable merely to the greater number of predictions, we compared the two systems with GLIMMER 1.0's parameters set so that the total additional gene predictions for all 10 genomes matched GLIMMER 2.0. Specifically, we raised the overlap-length parameter, which is the maximum number of DNA bases by which two ORFs can overlap and both still be predicted as genes. The results are shown in Table 6. With this adjustment GLIMMER 2.0 still finds 99 more annotated genes than GLIMMER 1.0, indicating that its predictions are in fact more accurate than GLIMMER 1.0. The parameters of either system can be adjusted to reduce the number of additional genes, at the cost of missing some true genes.


Table 6. GLIMMER 1.0 accuracy versus GLIMMER 2.0 accuracy with overlap-length parameter of GLIMMER 1.0 raised to 51
The value 51 was chosen to make the total number of additional genes found by GLIMMER 1.0 as close as possible to the corresponding number for GLIMMER 2.0. GLIMMER 2.0 still finds significantly more annotated genes than GLIMMER 1.0.

CONCLUSION

In this paper we have described several technical improvements made in the GLIMMER 2.0 gene-finding system and argued that the system is more accurate than previously recognized. GLIMMER 2.0 also can be an effective gene finder for eukaryotic genomes, especially those with a high gene density as is found in some parasites. For example, it is being used as the main gene finder for the parasite Trypanosoma brucei, the agent that causes African sleeping sickness, which currently is being sequenced at The Institute for Genomic Research. This parasite has few or no introns and a gene density estimated at 50%. The IMM scoring method in GLIMMER 1.0 has also been used to create a eukaryotic gene finder, GLIMMERM, that has been quite successful in finding genes in the genome of Plasmodium falciparum, the malaria parasite (22).

ACKNOWLEDGEMENTS

A.L.D. was supported by NSF Grant IIS-9820497. S.K. was supported by NSF Grant KDI-9980088. O.W. was supported by the Department of Energy Grant No. DE-FC02-95ER61962.A003. S.L.S. was supported by NSF Grant IIS-9902923 and NIH Grants R01 LM06845-01 and K01-HG00022-1. S.L.S. and A.L.D. were supported by NSF Grant IRI-9530462.

REFERENCES

1. Salzberg, S., Delcher,A., Kasif,S. and White,O. (1998) Nucleic Acids Res., 26, 544-548. MEDLINE Abstract

2. Fraser, C.M., Casjens,S., Huang,W., Sutton,G., Clayton,R., Lathigra,R., White,O., Ketchum,K., Dodson,R., Hickey,E. et al. (1997) Nature, 390, 580-586. MEDLINE Abstract

3. Fraser, C.M., Norris,S.J., Weinstock,G.M., White,O., Sutton,G., Clayton,R., Dodson,R., Gwinn,M., Hickey,E., Ketchum,K.A. et al. (1998) Science, 281, 375-388. MEDLINE Abstract

4. Stephens, R., Kalman,S., Lammel,C., Fan,J., Marathe,R., Aravind,L., Mitchell,W., Olinger,L., Tatusov,R., Zhao,Q. et al. (1998) Science, 282, 754-759. MEDLINE Abstract

5. Nelson, K.E., Clayton,R.A., Gill,S.R., Gwinn,M.L., Dodson,R.J., Haft,D.H., Hickey,E.K., Peterson,J.D., Nelson,W.C., Ketchum,K.A. et al. (1999) Nature, 399, 323-329. MEDLINE Abstract

6. Borodovsky, M. and Mcininch,J.D. (1993) Comput. Chem., 17, 123-133.

7. Borodovsky, M., McIninch,J., Koonin,E., Rudd,K., Medigue,C. and Danchin,A. (1995) Nucleic Acids Res., 23, 3554-3562. MEDLINE Abstract

8. Quinlan, J.R. (1993) Programs for Machine Learning. Kaufmann Publishers, San Mateo, CA.

9. Buntine, W. (1992) Stat. Comput., 2, 63-73.

10. Helmbold, D.P. and Schapire,R.E. (1997) Machine Learning, 27, 51-68.

11. Willems, F.M.J., Shtarskov,Y.M. and Tjalkens,T.J. (1995) IEEE Trans. Inf. Theory, 4, 663-664.

12. Burge, C. (1998) In Salzberg,S., Searls,D. and Kasif,S. (eds), Computational Methods in Molecular Biology, New Comprehensive Biochemistry. Elsevier Science B.V., Amsterdam, pp. 129-164.

13. Salzberg, S., Delcher,A., Fasman,K. and Henderson,J. (1998) J. Comput. Biol., 5, 667-680. MEDLINE Abstract

14. Fleischmann, R.D., Adams,M., White,O., Clayton,R., Kirkness,E., Kerlavage,A., Bult,C., Tomb,J.-F., Dougherty,B., Merrick,J. et al. (1995) Science, 269, 496-512. MEDLINE Abstract

15. Fraser, C.M., Gocayne,J., White,O., Adams,M., Clayton,R., Fleischmann,R., Bult,C., Kerlavage,A., Sutton,G., Kelley,J. et al. (1995) Science, 270, 397-403. MEDLINE Abstract

16. Bult, C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D. et al. (1996) Science, 273, 1058-1073. MEDLINE Abstract

17. Tomb, J.-F., White,O., Kerlavage,A.R., Clayton,R., Sutton,G., Fleischmann,R., Ketchum,K., Klenk,H., Gill,S., Dougherty,B. et al. (1997) Nature, 388, 539-547. MEDLINE Abstract

18. Blattner, F.R., Plunkett,G., Bloch,C.A., Perna,N.T., Burland,V., Riley,M., Collado-Vides,J., Glasner,J.D., Rode,C.K., Mayhew,G.F. et al. (1997) Science, 277, 1453-1462. MEDLINE Abstract

19. Kunst, F., Ogasawara,N., Moszer,I., Albertini,A.M., Alloni,G., Azevedo,V., Bertero,M.G., Bessieres,P., Bolotin,A., Borchert,S. et al. (1997) Nature, 390, 249-256. MEDLINE Abstract

20. Klenk, H.P., Clayton,R.A., Tomb,J.-F., White,O., Nelson,K.E., Ketchum,K.A., Dodson,R.J., Gwinn,M., Hickey,E.K., Peterson,J.D. et al. (1997) Nature, 390, 364-370. MEDLINE Abstract

21. Cole, S.T., Brosch,R., Parkhill,J., Garnier,T., Churcher,C., Harris,D., Gordon,S.V., Eiglmeier,K., Gas,S., Barry,C.E. et al. (1998) Nature, 393, 537-544. MEDLINE Abstract

22. Salzberg, S.L., Pertea,M., Delcher,A.L., Gardner,M.J. and Tettelin,H. (1999) Genomics, 59, 24-31. MEDLINE Abstract

23. Altschul, S., Gish,W., Miller,W., Myers,E. and Lipman,D. (1990) J. Mol. Biol., 215, 403-410. MEDLINE Abstract


*To whom correspondence should be addressed at: Department of Computer Science, Loyola College in Maryland, Baltimore, MD 21210, USA. Tel: +1 410 617 2740; Fax: +1 410 617 2157; Email: delcher{at}cs.loyola.edu


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: jnl.info{at}oup.co.uk
Last modification:
Copyright© Oxford University Press, 1999.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
I. Uchiyama, T. Higuchi, and M. Kawai
MBGD update 2010: toward a comprehensive resource for exploring microbial genome diversity
Nucleic Acids Res., November 11, 2009; (2009) gkp948v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Davidsen, E. Beck, A. Ganapathy, R. Montgomery, N. Zafar, Q. Yang, R. Madupu, P. Goetz, K. Galinsky, O. White, et al.
The comprehensive microbial resource
Nucleic Acids Res., November 5, 2009; (2009) gkp912v1.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
L. S. Turner, T. Kanamoto, T. Unoki, C. L. Munro, H. Wu, and T. Kitten
Comprehensive Evaluation of Streptococcus sanguinis Cell Wall-Anchored Proteins in Early Infective Endocarditis
Infect. Immun., November 1, 2009; 77(11): 4966 - 4975.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
M. J. McBride, G. Xie, E. C. Martens, A. Lapidus, B. Henrissat, R. G. Rhodes, E. Goltsman, W. Wang, J. Xu, D. W. Hunnicutt, et al.
Novel Features of the Polysaccharide-Digesting Gliding Bacterium Flavobacterium johnsoniae as Revealed by Genome Sequence Analysis
Appl. Envir. Microbiol., November 1, 2009; 75(21): 6864 - 6875.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
Y.-F. Ma, Y. Zhang, J.-Y. Zhang, D.-W. Chen, Y. Zhu, H. Zheng, S.-Y. Wang, C.-Y. Jiang, G.-P. Zhao, and S.-J. Liu
The Complete Genome of Comamonas testosteroni Reveals Its Genetic Adaptations to Changing Environments
Appl. Envir. Microbiol., November 1, 2009; 75(21): 6812 - 6819.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
M. Giannakis, H. K. Backhed, S. L. Chen, J. J. Faith, M. Wu, J. L. Guruge, L. Engstrand, and J. I. Gordon
Response of Gastric Epithelial Progenitors to Helicobacter pylori Isolates Obtained from Swedish Patients with Chronic Atrophic Gastritis
J. Biol. Chem., October 30, 2009; 284(44): 30383 - 30394.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
W. Loftie-Eaton and D. E. Rawlings
Comparative Biology of Two Natural Variants of the IncQ-2 Family Plasmids, pRAS3.1 and pRAS3.2
J. Bacteriol., October 15, 2009; 191(20): 6436 - 6446.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
D. S. Smyth and D. A. Robinson
Integrative and Sequence Characteristics of a Novel Genetic Element, ICE6013, in Staphylococcus aureus
J. Bacteriol., October 1, 2009; 191(19): 5964 - 5975.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Azuma, A. Hosoyama, M. Matsutani, N. Furuya, H. Horikawa, T. Harada, H. Hirakawa, S. Kuhara, K. Matsushita, N. Fujita, et al.
Whole-genome analyses reveal genetic instability of Acetobacter pasteurianus
Nucleic Acids Res., September 1, 2009; 37(17): 5768 - 5783.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
L. Senty Turner, S. Das, T. Kanamoto, C. L. Munro, and T. Kitten
Development of genetic tools for in vivo virulence analysis of Streptococcus sanguinis
Microbiology, August 1, 2009; 155(8): 2573 - 2582.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
K.-M. Wu, L.-H. Li, J.-J. Yan, N. Tsao, T.-L. Liao, H.-C. Tsai, C.-P. Fung, H.-J. Chen, Y.-M. Liu, J.-T. Wang, et al.
Genome Sequencing and Comparative Analysis of Klebsiella pneumoniae NTUH-K2044, a Strain Causing Liver Abscess and Meningitis
J. Bacteriol., July 15, 2009; 191(14): 4492 - 4501.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
S. Schubbe, T. J. Williams, G. Xie, H. E. Kiss, T. S. Brettin, D. Martinez, C. A. Ross, D. Schuler, B. L. Cox, K. H. Nealson, et al.
Complete Genome Sequence of the Chemolithoautotrophic Marine Magnetotactic Coccus Strain MC-1
Appl. Envir. Microbiol., July 15, 2009; 75(14): 4835 - 4852.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
D. S. A. Goltsman, V. J. Denef, S. W. Singer, N. C. VerBerkmoes, M. Lefsrud, R. S. Mueller, G. J. Dick, C. L. Sun, K. E. Wheeler, A. Zemla, et al.
Community Genomic and Proteomic Analyses of Chemoautotrophic Iron-Oxidizing "Leptospirillum rubarum" (Group II) and "Leptospirillum ferrodiazotrophum" (Group III) Bacteria in Acid Mine Drainage Biofilms
Appl. Envir. Microbiol., July 1, 2009; 75(13): 4599 - 4615.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
A. V. Mardanov, N. V. Ravin, V. A. Svetlitchnyi, A. V. Beletsky, M. L. Miroshnichenko, E. A. Bonch-Osmolovskaya, and K. G. Skryabin
Metabolic Versatility and Indigenous Origin of the Archaeon Thermococcus sibiricus, Isolated from a Siberian Oil Reservoir, as Revealed by Genome Analysis
Appl. Envir. Microbiol., July 1, 2009; 75(13): 4580 - 4588.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Gattiker, C. Dessimoz, A. Schneider, I. Xenarios, M. Pagni, and J. Rougemont
The Microbe browser for comparative genomics
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W296 - W299.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
C. Jogler, W. Lin, A. Meyerdierks, M. Kube, E. Katzmann, C. Flies, Y. Pan, R. Amann, R. Reinhardt, and D. Schuler
Toward Cloning of the Magnetotactic Metagenome: Identification of Magnetosome Island Gene Clusters in Uncultivated Magnetotactic Bacteria from Different Aquatic Sediments
Appl. Envir. Microbiol., June 15, 2009; 75(12): 3972 - 3979.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
J. Lim, T.-H. Lee, B. H. Nahm, Y. D. Choi, M. Kim, and I. Hwang
Complete Genome Sequence of Burkholderia glumae BGR1
J. Bacteriol., June 1, 2009; 191(11): 3758 - 3759.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. A. Elias, A. Mukhopadhyay, M. P. Joachimiak, E. C. Drury, A. M. Redding, H.-C. B. Yen, M. W. Fields, T. C. Hazen, A. P. Arkin, J. D. Keasling, et al.
Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation
Nucleic Acids Res., May 1, 2009; 37(9): 2926 - 2939.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
G. A. O'May, S. M. Jacobsen, M. Longwell, P. Stoodley, H. L. T. Mobley, and M. E. Shirtliff
The high-affinity phosphate transporter Pst in Proteus mirabilis HI4320 and its importance in biofilm formation
Microbiology, May 1, 2009; 155(5): 1523 - 1535.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
N. V. Ravin, A. V. Mardanov, A. V. Beletsky, I. V. Kublanov, T. V. Kolganova, A. V. Lebedinsky, N. A. Chernyh, E. A. Bonch-Osmolovskaya, and K. G. Skryabin
Complete Genome Sequence of the Anaerobic, Protein-Degrading Hyperthermophilic Crenarchaeon Desulfurococcus kamchatkensis
J. Bacteriol., April 1, 2009; 191(7): 2371 - 2379.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
J. A. Moynihan, J. P. Morrissey, E. R. Coppoolse, W. J. Stiekema, F. O'Gara, and E. F. Boyd
Evolutionary History of the phl Gene Cluster in the Plant-Associated Bacterium Pseudomonas fluorescens
Appl. Envir. Microbiol., April 1, 2009; 75(7): 2122 - 2131.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
Y. Kawai, J. Kusnadi, R. Kemperman, J. Kok, Y. Ito, M. Endo, K. Arakawa, H. Uchida, J. Nishimura, H. Kitazawa, et al.
DNA Sequencing and Homologous Expression of a Small Peptide Conferring Immunity to Gassericin A, a Circular Bacteriocin Produced by Lactobacillus gasseri LA39
Appl. Envir. Microbiol., March 1, 2009; 75(5): 1324 - 1330.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
K. L. Smollett, A. S. Fivian-Hughes, J. E. Smith, A. Chang, T. Rao, and E. O. Davis
Experimental determination of translational start sites resolves uncertainties in genomic open reading frame predictions - application to Mycobacterium tuberculosis
Microbiology, January 1, 2009; 155(1): 186 - 197.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
M. T. G. Holden, H. M. B. Seth-Smith, L. C. Crossman, M. Sebaihia, S. D. Bentley, A. M. Cerdeno-Tarraga, N. R. Thomson, N. Bason, M. A. Quail, S. Sharp, et al.
The Genome of Burkholderia cenocepacia J2315, an Epidemic Pathogen of Cystic Fibrosis Patients
J. Bacteriol., January 1, 2009; 191(1): 261 - 277.
[Abstract] [Full Text] [PDF]


Home page
Microbiol. Mol. Biol. Rev.Home page
V. Kunin, A. Copeland, A. Lapidus, K. Mavromatis, and P. Hugenholtz
A Bioinformatician's Guide to Metagenomics
Microbiol. Mol. Biol. Rev., December 1, 2008; 72(4): 557 - 578.
[Abstract] [Full Text] [PDF]


Home page
J Antimicrob ChemotherHome page
P. Shen, Y. Jiang, Z. Zhou, J. Zhang, Y. Yu, and L. Li
Complete nucleotide sequence of pKP96, a 67 850 bp multiresistance plasmid encoding qnrA1, aac(6')-Ib-cr and blaCTX-M-24 from Klebsiella pneumoniae
J. Antimicrob. Chemother., December 1, 2008; 62(6): 1252 - 1256.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
K. Oshima, H. Toh, Y. Ogura, H. Sasamoto, H. Morita, S.-H. Park, T. Ooka, S. Iyoda, T. D. Taylor, T. Hayashi, et al.
Complete Genome Sequence and Comparative Analysis of the Wild-type Commensal Escherichia coli Strain SE11 Isolated from a Healthy Adult
DNA Res, December 1, 2008; 15(6): 375 - 386.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
H. Noguchi, T. Taniguchi, and T. Itoh
MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes
DNA Res, December 1, 2008; 15(6): 387 - 396.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
H. J. G. van de Werken, M. R. A. Verhaart, A. L. VanFossen, K. Willquist, D. L. Lewis, J. D. Nichols, H. P. Goorissen, E. F. Mongodin, K. E. Nelson, E. W. J. van Niel, et al.
Hydrogenomics of the Extremely Thermophilic Bacterium Caldicellulosiruptor saccharolyticus
Appl. Envir. Microbiol., November 1, 2008; 74(21): 6720 - 6729.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
T. E. Mattes, A. K. Alexander, P. M. Richardson, A. C. Munk, C. S. Han, P. Stothard, and N. V. Coleman
The Genome of Polaromonas sp. Strain JS666: Insights into the Evolution of a Hydrocarbon- and Xenobiotic-Degrading Bacterium, and Features of Relevance to Biotechnology
Appl. Envir. Microbiol., October 15, 2008; 74(20): 6405 - 6416.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. A. Welsh, M. Liberton, J. Stockel, T. Loh, T. Elvitigala, C. Wang, A. Wollam, R. S. Fulton, S. W. Clifton, J. M. Jacobs, et al.
The genome of Cyanothece 51142, a unicellular diazotrophic cyanobacterium important in the marine nitrogen cycle
PNAS, September 30, 2008; 105(39): 15094 - 15099.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
K. Dybvig, C. Zuhua, P. Lao, D. S. Jordan, C. T. French, A.-H. T. Tu, and A. E. Loraine
Genome of Mycoplasma arthritidis
Infect. Immun., September 1, 2008; 76(9): 4000 - 4008.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
G. T. Chung, J. S. Yoo, H. B. Oh, Y. S. Lee, S. H. Cha, S. J. Kim, and C. K. Yoo
Complete Genome Sequence of Neisseria gonorrhoeae NCCP11945
J. Bacteriol., September 1, 2008; 190(17): 6035 - 6036.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
M. Letek, A. A. Ocampo-Sosa, M. Sanders, U. Fogarty, T. Buckley, D. P. Leadon, P. Gonzalez, M. Scortti, W. G. Meijer, J. Parkhill, et al.
Evolution of the Rhodococcus equi vap Pathogenicity Island Seen through Comparison of Host-Associated vapA and vapB Virulence Plasmids
J. Bacteriol., September 1, 2008; 190(17): 5797 - 5805.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
D. Paul, S. Bridges, S. C. Burgess, Y. Dandass, and M. L. Lawrence
Genome Sequence of the Chemolithoautotrophic Bacterium Oligotropha carboxidovorans OM5T
J. Bacteriol., August 1, 2008; 190(15): 5531 - 5532.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
L. J. van Zyl, S. M. Deane, L.-A. Louw, and D. E. Rawlings
Presence of a Family of Plasmids (29 to 65 Kilobases) with a 26-Kilobase Common Region in Different Strains of the Sulfur-Oxidizing Bacterium Acidithiobacillus caldus
Appl. Envir. Microbiol., July 15, 2008; 74(14): 4300 - 4308.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
J.-F. Dubern, E. R. Coppoolse, W. J. Stiekema, and G. V. Bloemberg
Genetic and functional characterization of the gene cluster directing the biosynthesis of putisolvin I and II in Pseudomonas putida strain PCL1445
Microbiology, July 1, 2008; 154(7): 2070 - 2083.
[Abstract] [Full Text] [PDF]


Home page
Antimicrob. Agents Chemother.Home page
M. Iacono, L. Villa, D. Fortini, R. Bordoni, F. Imperi, R. J. P. Bonnal, T. Sicheritz-Ponten, G. De Bellis, P. Visca, A. Cassone, et al.
Whole-Genome Pyrosequencing of an Epidemic Multidrug-Resistant Acinetobacter baumannii Strain Belonging to the European Clone II Group
Antimicrob. Agents Chemother., July 1, 2008; 52(7): 2616 - 2625.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
H. Takarada, M. Sekine, H. Kosugi, Y. Matsuo, T. Fujisawa, S. Omata, E. Kishi, A. Shimizu, N. Tsukatani, S. Tanikawa, et al.
Complete Genome Sequence of the Soil Actinomycete Kocuria rhizophila
J. Bacteriol., June 15, 2008; 190(12): 4139 - 4146.
[Abstract] [Full Text] [PDF]


Home page
Antimicrob. Agents Chemother.Home page
M. Higashide, M. Kuroda, C. T. N. Omura, M. Kumano, S. Ohkawa, S. Ichimura, and T. Ohta
Methicillin-Resistant Staphylococcus saprophyticus Isolates Carrying Staphylococcal Cassette Chromosome mec Have Emerged in Urogenital Tract Infections
Antimicrob. Agents Chemother., June 1, 2008; 52(6): 2061 - 2068.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
L. T. T. Tran-Nguyen, M. Kube, B. Schneider, R. Reinhardt, and K. S. Gibb
Comparative Genome Analysis of "Candidatus Phytoplasma australiense" (Subgroup tuf-Australia I; rp-A) and "Ca. Phytoplasma asteris" Strains OY-M and AY-WB
J. Bacteriol., June 1, 2008; 190(11): 3979 - 3991.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
H. Morita, H. Toh, S. Fukuda, H. Horikawa, K. Oshima, T. Suzuki, M. Murakami, S. Hisamatsu, Y. Kato, T. Takizawa, et al.
Comparative Genome Analysis of Lactobacillus reuteri and Lactobacillus fermentum Reveal a Genomic Island for Reuterin and Cobalamin Production
DNA Res, June 1, 2008; 15(3): 151 - 161.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
T. P. Stinear, T. Seemann, P. F. Harrison, G. A. Jenkin, J. K. Davies, P. D.R. Johnson, Z. Abdellah, C. Arrowsmith, T. Chillingworth, C. Churcher, et al.
Insights from the complete genome sequence of Mycobacterium marinum on the evolution of Mycobacterium tuberculosis
Genome Res., May 1, 2008; 18(5): 729 - 741.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
I. Anderson, J. Rodriguez, D. Susanti, I. Porat, C. Reich, L. E. Ulrich, J. G. Elkins, K. Mavromatis, A. Lykidis, E. Kim, et al.
Genome Sequence of Thermofilum pendens Reveals an Exceptional Loss of Biosynthetic Pathways without Genome Reduction
J. Bacteriol., April 15, 2008; 190(8): 2957 - 2965.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
S. M. Caffrey, H. S. Park, J. Been, P. Gordon, C. W. Sensen, and G. Voordouw
Gene Expression by the Sulfate-Reducing Bacterium Desulfovibrio vulgaris Hildenborough Grown on an Iron Electrode under Cathodic Protection Conditions
Appl. Envir. Microbiol., April 15, 2008; 74(8): 2404 - 2413.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Lanzen and T. Oinn
The Taverna Interaction Service: enabling manual interaction in workflows
Bioinformatics, April 15, 2008; 24(8): 1118 - 1120.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
S. D. Bentley, C. Corton, S. E. Brown, A. Barron, L. Clark, J. Doggett, B. Harris, D. Ormond, M. A. Quail, G. May, et al.
Genome of the Actinomycete Plant Pathogen Clavibacter michiganensis subsp. sepedonicus Suggests Recent Niche Adaptation
J. Bacteriol., March 15, 2008; 190(6): 2150 - 2160.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
C. Ansong, S. O. Purvine, J. N. Adkins, M. S. Lipton, and R. D. Smith
Proteogenomics: needs and roles to be filled by proteomics in genome annotation
Brief Funct Genomic Proteomic, March 10, 2008; (2008) eln010v1.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
T. Yoshida, K. Nagasaki, Y. Takashima, Y. Shirai, Y. Tomaru, Y. Takao, S. Sakamoto, S. Hiroishi, and H. Ogata
Ma-LMM01 Infecting Toxic Microcystis aeruginosa Illuminates Diverse Cyanophage Genome Strategies
J. Bacteriol., March 1, 2008; 190(5): 1762 - 1772.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
T. Kosaka, S. Kato, T. Shimoyama, S. Ishii, T. Abe, and K. Watanabe
The genome of Pelotomaculum thermopropionicum reveals niche-associated evolution in anaerobic microbiota
Genome Res., March 1, 2008; 18(3): 442 - 448.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
M. Boyer, J. Haurat, S. Samain, B. Segurens, F. Gavory, V. Gonzalez, P. Mavingui, R. Rohr, R. Bally, and F. Wisniewski-Dye
Bacteriophage Prevalence in the Genus Azospirillum and Analysis of the First Genome Sequence of an Azospirillum brasilense Integrative Phage
Appl. Envir. Microbiol., February 1, 2008; 74(3): 861 - 874.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
R. J. Siezen, M. J. C. Starrenburg, J. Boekhorst, B. Renckens, D. Molenaar, and J. E. T. van Hylckama Vlieg
Genome-Scale Genotype-Phenotype Matching of Two Lactococcus lactis Isolates from Plants Identifies Mechanisms of Adaptation to the Plant Niche
Appl. Envir. Microbiol., January 15, 2008; 74(2): 424 - 436.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
T. Kaneko, N. Nakajima, S. Okamoto, I. Suzuki, Y. Tanabe, M. Tamaoki, Y. Nakamura, F. Kasai, A. Watanabe, K. Kawashima, et al.
Complete Genomic Structure of the Bloom-forming Toxic Cyanobacterium Microcystis aeruginosa NIES-843
DNA Res, January 11, 2008; (2008) dsm026v1.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
C. Meisinger-Henschel, M. Schmidt, S. Lukassen, B. Linke, L. Krause, S. Konietzny, A. Goesmann, P. Howley, P. Chaplin, M. Suter, et al.
Genomic sequence of chorioallantois vaccinia virus Ankara, the ancestor of modified vaccinia virus Ankara
J. Gen. Virol., December 1, 2007; 88(12): 3249 - 3259.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
A. Chi, L. Valenzuela, S. Beard, A. J. Mackey, J. Shabanowitz, D. F. Hunt, and C. A. Jerez
Periplasmic Proteins of the Extremophile Acidithiobacillus ferrooxidans: A High Throughput Proteomics Analysis
Mol. Cell. Proteomics, December 1, 2007; 6(12): 2239 - 2251.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
S. Zuber, C. Ngom-Bru, C. Barretto, A. Bruttin, H. Brussow, and E. Denou
Genome Analysis of Phage JS98 Defines a Fourth Major Subgroup of T4-Like Phages in Escherichia coli
J. Bacteriol., November 15, 2007; 189(22): 8206 - 8214.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
O. I. Rzhepishevska, J. Valdes, L. Marcinkeviciene, C. A. Gallardo, R. Meskys, V. Bonnefoy, D. S. Holmes, and M. Dopson
Regulation of a Novel Acidithiobacillus caldus Gene Cluster Involved in Metabolism of Reduced Inorganic Sulfur Compounds
Appl. Envir. Microbiol., November 15, 2007; 73(22): 7367 - 7372.
[Abstract] [Full Text] [PDF]


Home page
Antimicrob. Agents Chemother.Home page
M. Golebiewski, I. Kern-Zdanowicz, M. Zienkiewicz, M. Adamczyk, J. Zylinska, A. Baraniak, M. Gniadkowski, J. Bardowski, and P. Ceglowski
Complete Nucleotide Sequence of the pCTX-M3 Plasmid and Its Involvement in Spread of the Extended-Spectrum {beta}-Lactamase Gene blaCTX-M-3
Antimicrob. Agents Chemother., November 1, 2007; 51(11): 3789 - 3795.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
W. Wang, L. Reitzer, D. A. Rasko, M. M. Pearson, R. J. Blick, C. Laurence, and E. J. Hansen
Metabolic Analysis of Moraxella catarrhalis and the Effect of Selected In Vitro Growth Conditions on Global Gene Expression
Infect. Immun., October 1, 2007; 75(10): 4959 - 4971.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Saeys, I. Inza, and P. Larranaga
A review of feature selection techniques in bioinformatics
Bioinformatics, October 1, 2007; 23(19): 2507 - 2517.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
H. Ogata and J.-M. Claverie
Unique genes in giant viruses: Regular substitution pattern and anomalously short size
Genome Res., September 1, 2007; 17(9): 1353 - 1361.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
W. Wei, J. H. McCusker, R. W. Hyman, T. Jones, Y. Ning, Z. Cao, Z. Gu, D. Bruno, M. Miranda, M. Nguyen, et al.
Genome sequencing and comparative analysis of Saccharomyces cerevisiae strain YJM789
PNAS, July 31, 2007; 104(31): 12825 - 12830.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
M. Richter, M. Kube, D. A. Bazylinski, T. Lombardot, F. O. Glockner, R. Reinhardt, and D. Schuler
Comparative Genome Analysis of Four Magnetotactic Bacteria Reveals a Complex Set of Group-Specific Genes Implicated in Magnetosome Biomineralization and Function
J. Bacteriol., July 1, 2007; 189(13): 4899 - 4910.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Saeys, T. Abeel, S. Degroeve, and Y. Van de Peer
Translation initiation site prediction on a genomic scale: beauty in simplicity
Bioinformatics, July 1, 2007; 23(13): i418 - i423.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. W. Udwary, L. Zeigler, R. N. Asolkar, V. Singan, A. Lapidus, W. Fenical, P. R. Jensen, and B. S. Moore
Genome sequencing reveals complex secondary metabolome in the marine actinomycete Salinispora tropica
PNAS, June 19, 2007; 104(25): 10376 - 10381.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
C. Geslin, M. Gaillard, D. Flament, K. Rouault, M. Le Romancer, D. Prieur, and G. Erauso
Analysis of the First Genome of a Hyperthermophilic Marine Virus-Like Particle, PAV1, Isolated from Pyrococcus abyssi
J. Bacteriol., June 15, 2007; 189(12): 4510 - 4519.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
S. Asgari, J. Davis, D. Wood, P. Wilson, and A. McGrath
Sequence and organization of the Heliothis virescens ascovirus genome
J. Gen. Virol., April 1, 2007; 88(4): 1120 - 1132.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
H. Yukawa, C. A. Omumasaba, H. Nonaka, P. Kos, N. Okai, N. Suzuki, M. Suda, Y. Tsuge, J. Watanabe, Y. Ikeda, et al.
Comparative analysis of the Corynebacterium glutamicum group and complete genome sequence of strain R
Microbiology, April 1, 2007; 153(4): 1042 - 1058.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
L. Feng, W. Wang, J. Cheng, Y. Ren, G. Zhao, C. Gao, Y. Tang, X. Liu, W. Han, X. Peng, et al.
Genome and proteome of long-chain alkane degrading Geobacillus thermodenitrificans NG80-2 isolated from a deep-subsurface oil reservoir
PNAS, March 27, 2007; 104(13): 5602 - 5607.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. M. Faruque, V. C. Tam, N. Chowdhury, P. Diraphat, M. Dziejman, J. F. Heidelberg, J. D. Clemens, J. J. Mekalanos, and G. B. Nair
Genomic analysis of the Mozambique strain of Vibrio cholerae O1 reveals the origin of El Tor strains carrying classical CTX prophage
PNAS, March 20, 2007; 104(12): 5151 - 5156.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. L. Delcher, K. A. Bratke, E. C. Powers, and S. L. Salzberg
Identifying bacterial genes and endosymbiont DNA with Glimmer
Bioinformatics, March 15, 2007; 23(6): 673 - 679.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
A. Severin, E. Nickbarg, J. Wooters, S. A. Quazi, Y. V. Matsuka, E. Murphy, I. K. Moutsatsos, R. J. Zagursky, and S. B. Olmsted
Proteomic Analysis and Identification of Streptococcus pyogenes Surface-Associated Proteins
J. Bacteriol., March 1, 2007; 189(5): 1514 - 1522.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
J. F. Challacombe, A. J. Duncan, T. S. Brettin, D. Bruce, O. Chertkov, J. C. Detter, C. S. Han, M. Misra, P. Richardson, R. Tapia, et al.
Complete Genome Sequence of Haemophilus somnus (Histophilus somni) Strain 129Pt and Comparison to Haemophilus ducreyi 35000HP and Haemophilus influenzae Rd
J. Bacteriol., March 1, 2007; 189(5): 1890 - 1898.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
M. G. Smith, T. A. Gianoulis, S. Pukatzki, J. J. Mekalanos, L. N. Ornston, M. Gerstein, and M. Snyder
New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis
Genes & Dev., March 1, 2007; 21(5): 601 - 614.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
B. M. Fuchs, S. Spring, H. Teeling, C. Quast, J. Wulf, M. Schattenhofer, S. Yan, S. Ferriera, J. Johnson, F. O. Glockner, et al.
From the Cover: Characterization of a marine gammaproteobacterium capable of aerobic anoxygenic photosynthesis
PNAS, February 20, 2007; 104(8): 2891 - 2896.
[Abstract] [Full Text] [PDF]


Home page
Antimicrob. Agents Chemother.Home page
H. Sletvold, P. J. Johnsen, G. S. Simonsen, B. Aasnaes, A. Sundsfjord, and K. M. Nielsen
Comparative DNA Analysis of Two vanA Plasmids from Enterococcus faecium Strains Isolated from Poultry and a Poultry Farmer in Norway
Antimicrob. Agents Chemother., February 1, 2007; 51(2): 736 - 739.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
N. H. Bergman, K. D. Passalacqua, P. C. Hanna, and Z. S. Qin
Operon Prediction for Sequenced Bacterial Genomes without Experimental Information
Appl. Envir. Microbiol., February 1, 2007; 73(3): 846 - 854.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Krause, A. C. McHardy, T. W. Nattkemper, A. Puhler, J. Stoye, and F. Meyer
GISMO--gene identification using a support vector machine for ORF classification
Nucleic Acids Res., January 28, 2007; 35(2): 540 - 549.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Sugawara, T. Abe, T. Gojobori, and Y. Tateno
DDBJ working on evaluation and classification of bacterial genes in INSDC
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D13 - D15.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. E. Snyder, N. Kampanya, J. Lu, E. K. Nordberg, H. R. Karur, M. Shukla, J. Soneja, Y. Tian, T. Xue, H. Yoo, et al.
PATRIC: The VBI PathoSystems Resource Integration Center
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D401 - D406.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. M. McCarthy, S. M. Bridges, N. Wang, G. B. Magee, W. P. Williams, D. S. Luthe, and S. C. Burgess
AgBase: a unified resource for functional analysis in agriculture
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D599 - D603.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
J. A. Lanie, W.-L. Ng, K. M. Kazmierczak, T. M. Andrzejewski, T. M. Davidsen, K. J. Wayne, H. Tettelin, J. I. Glass, and M. E. Winkler
Genome Sequence of Avery's Virulent Serotype 2 Strain D39 of Streptococcus pneumoniae and Comparison with That of Unencapsulated Laboratory Strain R6
J. Bacteriol., January 1, 2007; 189(1): 38 - 51.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
A. A. Kotze, I. M. Tuffin, S. M. Deane, and D. E. Rawlings
Cloning and characterization of the chromosomal arsenic resistance genes from Acidithiobacillus caldus and enhanced arsenic resistance on conjugal transfer of ars genes located on transposon TnAtcArs
Microbiology, December 1, 2006; 152(12): 3551 - 3560.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
A. Brotcke, D. S. Weiss, C. C. Kim, P. Chain, S. Malfatti, E. Garcia, and D. M. Monack
Identification of MglA-Regulated Genes Reveals Novel Virulence Factors in Francisella tularensis
Infect. Immun., December 1, 2006; 74(12): 6642 - 6655.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
L. Rohmer, M. Brittnacher, K. Svensson, D. Buckley, E. Haugen, Y. Zhou, J. Chang, R. Levy, H. Hayden, M. Forsman, et al.
Potential Source of Francisella tularensis Live Vaccine Strain Attenuation Determined by Genome Comparison
Infect. Immun., December 1, 2006; 74(12): 6895 - 6906.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
F. Choulet, B. Aigle, A. Gallois, S. Mangenot, C. Gerbaud, C. Truong, F.-X. Francou, C. Fourrier, M. Guerineau, B. Decaris, et al.
Evolution of the Terminal Regions of the Streptomyces Linear Chromosome
Mol. Biol. Evol., December 1, 2006; 23(12): 2361 - 2369.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (299K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (518)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Delcher, A. L.
Right arrow Articles by Salzberg, S. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Delcher, A. L.
Right arrow Articles by Salzberg, S. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?