ABSTRACT
This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H.pylori and H.influenzae is that the system finds >97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.
The number of new microbial genomes has dramatically increased since the first genome, Haemophilus influenzae, was sequenced in 1995 (1). Ten whole genomes have been completed, and at least 30 others are expected to be completed in the next two years. This abundance of data demands new and highly accurate computational analysis tools in order to explore these genomes and maximize the scientific knowledge gained from them. One of the first steps in the analysis of a microbial genome is the identification of all its genes. Because these genomes tend to be gene-rich, typically containing 90% coding sequence, the gene discovery problem takes on a different character than it does in eukaryotic genomes, especially higher eukaryotes whose genomes may have <10% coding sequence. In particular, the most difficult problem is determining which of two or more overlapping open reading frames (orfs) represent true genes. Other difficult problems include identifying the start of translation and finding regulatory signals such as promoters and terminators.
The most reliable way to identify a gene in a new genome is to find a close homolog from another organism. This can be done today very effectively using programs such as BLAST (3) and FASTA (4) to search all the entries in GenBank. However, many of the genes in new genomes still have no significant homology to known genes (1). For these genes, we must rely on computational methods of scoring the coding region to identify the genes. The best-known program for this task is GeneMark (5), which uses a Markov chain model to score coding regions. GeneMark has been highly effective and was used in the H.influenza and more recent genome projects. We have developed a new system, GLIMMER, that uses a technique called interpolated Markov models (IMMs) to find coding regions in microbial sequences. IMMs are in principle more powerful than Markov chains, and the computational experiments described below demonstrate that they produce more accurate results when used to find genes in bacterial DNA.
Markov models are a well-known tool for analyzing biological sequence data, and the predominant model for microbial sequence analysis is a fixed-order Markov chain (5,6). A fixed order Markov model predicts each base of a DNA sequence using a fixed number of preceding bases in the sequence. For example, a 5th-order model, which is the basis of GeneMark, uses the five previous bases to predict the next base. However, learning such models accurately can be difficult when there is insufficient training data to accurately estimate the probability of each base occurring after every possible combination of five preceding bases. In general, a kth-order Markov model for DNA sequences requires 4k + 1 probabilities to be estimated from the training data (e.g., 4096 probabilities for a 5th-order model). In order to estimate these probabilities, many occurrences of all possible kmers must be present in the data.
An IMM overcomes this problem by combining probabilities from contexts of varying lengths to make predictions, and by only using those contexts (oligomers) for which sufficient data are available. In a typical microbial genome some 5mers will occur too infrequently to give reliable estimates of the probability of the next base, while some 8mers may occur frequently enough to give very reliable estimates. In principle, using longer oligomers is always preferable to using shorter ones, but only if sufficient data is available to produce good probability estimates. An IMM uses a linear combination of probabilities obtained from several lengths of oligomers to make predictions, giving high weights to oligomers that occur frequently and low weights to those that do not. Thus an IMM uses a longer context to make a prediction whenever possible, taking advantage of the greater accuracy produced by higher-order Markov models. Where the statistics on longer oligomers are insufficient to produce good estimates, an IMM can fall back on shorter oligomers to make its predictions.
Using IMMs we have developed a new system, called GLIMMER, to identify coding regions in microbial DNA. GLIMMER uses a novel approach, based on frequency of occurrence and predictive value, to determine the relative weights of oligomers that vary in length from 1 to 8. After first creating IMMs for each of the six possible reading frames, GLIMMER then uses them to score entire orfs. When two high-scoring orfs overlap, the overlap region is scored separately to determine which orf is more likely to be a gene. We have tested GLIMMER using the H.influenzae, Helicobacter pylori and Escherichia coli genomes and found that it is very accurate in identifying genes, as we explain in Methods and Results. The system has recently been used to find the genes in two newly completed genomes: Borrelia burgdorferi, the bacteria that causes Lyme disease (14), and Treponema pallidum, the bacteria that causes syphilis (Fraser et al., manuscript in preparation). Annotation for these and other completed genomes will be available on the GLIMMER web site.
Our probabilistic model of DNA sequences represents a sequence as a process that may be described as a sequence of random variables X1, X2, ..., where Xi corresponds to position i in the sequence. Each random variable Xi takes a value from the set of bases (a, c, g, t). The probability that a variable Xi takes will depend on the local context; that is, the bases immediately adjacent to the base at position i. We sometimes refer to (a, c, g, t) as the set of possible states that a variable can take. In other words, variable Xi is in state a if Xi = a. As an illustration, consider the simple example of a Markov model in Figure 1. This 1-state model can be used to model any length DNA sequence. In each position, the probability of a is 0.2. Thus the sequence aaaaa would have a probability of (0.2)5 = 0.00032. In this way we can score any sequence by computing the probability that it was generated by the model.
In general, we would always like to use the highest-order Markov model possible. The higher-order model should always do at least as well as, and frequently better than, lower-order models. This can be explained by a simple example.
Suppose that the base in the third codon position depends only on the second codon position. Then we might observe in a given genome that P(a3[brvbar]g2) = 0.22; i.e., the probability of observing adenine in the third codon position given that guanine occurs in the second is 0.22. This is a first-order dependency. Suppose that the prior probability of adenine P(a3) is 0.30. Clearly we will perform better by using the first-order statistic, since adenine occurs less frequently in the third position following guanine than it does otherwise. Now consider using both the first and second codon positions to predict a3. Given our assumption that only the second position matters, we should find that P(a3[brvbar]g2) = P(a3[brvbar]g2, x1), where x1 indicates any base in the first codon position. Thus the 2nd-order model will perform exactly the same as the 1st-order model. If it turns out that the third codon position depends on both the first and second positions, then the 2nd-order model will perform better.
The problem that arises in practice is that, as we move to higher order models, the number of probabilities that we must estimate from the data increases exponentially. For DNA sequence data, we need to learn 4k + 1 probabilities in a kth-order Markov model. Our six submodels actually need 6×4k + 1 probabilities. So a 5th-order model needs 24 576 probabilities. In a microbial genome such as H.influenzae with 1.8 million bases, we will observe each of the 4096 possible 6mers often enough to get accurate estimates for a 5th-order model, although for rare hexamers we may not have enough data. For a 6th-order model, which requires probabilities for all 7mers, there are a substantial number of 7mers that do not occur sufficiently often, and for 7th and 8th-order models the problem is worse. However, even for 8th-order models, there are some oligomers that occur often enough to be extremely useful predictors. We would like a Markov model that uses these higher-order statistics whenever sufficient data is available. This is one of the key advantages of using an IMM. [Note that there exist other techniques to incorporate variable length predictive models (7,8). We experimented with these alternatives before converging on the approach described here.]
To be more precise, an IMM uses a combination of all the probabilities based on 0, 1, 2, ..., k previous bases, where k is a parameter given to the algorithm. In GLIMMER, we use k = 8. Thus for oligomers that occur frequently, the IMM can use an 8th-order model, while it might use a 5th or even lower-order model for rare oligomers. In order to `smooth' its predictions, an IMM uses predictions from the lower-order models, where much more data is available, to adjust the predictions made from higher-order models.
During training, GLIMMER computes the probability of each base a, c, g, t, following all kmers for 0 <= k <= 8. Then, for each kmer it computes a weight to use in combining the predictions of different order models. Details of the algorithm for computing these weights are given in the Algorithm and system design section. Once the weights are computed, GLIMMER evaluates new sequences by computing the probability that the model M generated the sequence S, P (S[brvbar]M). This probability is computed as
From this definition, it is clear that an IMM is in principle always preferable to a fixed-order Markov model. For example, by giving zero weights to all oligomers except 5mers, an IMM will perform identically to a 5th-order Markov model. However, if there are any 6mers that occur frequently enough in the training data to be useful, and if these 6mers predict a different distribution of bases than the corresponding 5mers, then the IMM will outperform the 5th-order model. Not only longer but also shorter oligomers will help improve performance: even if a 5th-order model is better than a 4th-order model, there may be some rare 5mers for which insufficient data are available. A 5th-order model has no choice but to use the unreliable predictions from these rare 5mers, but an IMM can fall back on the much more reliable predictions made by the 4mers in such cases. The experiments described below indicate that both of these phenomena occur and both serve to give IMMs an advantage over fixed-order Markov models.
It is worth remarking that GLIMMER builds a non-homogenous Markov model; i.e., different models are created for each of the three codon positions. This type of `3-periodic' Markov chain was introduced in GeneMark (5) to account for patterns that depend on the reading frame.
In this section we describe how GLIMMER computes the values of the [lambda] parameters for the kth-order IMM described in the preceding section. In addition, we explain the solution to the learning problem mentioned in the introduction. First, a set of known coding sequences must be assembled into a training set. To be certain these are truly coding is somewhat problematic for a new genome. The solution we have adopted is to use only very long orfs and sequences with homology to known genes from other organisms. These can easily be identified a priori without knowing anything else about the genome being analyzed.
From the training set of genes, the frequencies of occurrence of all possible substring patterns of length 1 to k + 1 are tabulated in each of the six reading frames. (The last base in the substring defines the reading frame.) For simplicity, let us consider just a single reading frame and use f(S) to denote the number of occurrences of string (sequence) S = s1s2 ... sn. (This same procedure is repeated for each of the six reading frames.) From these frequencies we get initial estimates of the probability of base sx occurring given the context string sx-i, sx-i+1, ..., sx-1, denoted by Sx,i (i.e., the i bases just previous to position x). We compute the probability of base sx given the i previous bases as
When there are insufficiently many sample occurrences of a context string to estimate the probability of the next base with confidence, we employ an additional criterion to assign a [lambda] value. For a given context string Sx,i of length i, we compare the observed frequencies of the following base, f(Sx,i, a), f(Sx,i, c), f(Sx,i, g) and f(Sx,i, t), with the previously calculated IMM probabilities using the next shorter context, IMMi-1 (Sx,i-1, a), IMMi-1 (Sx,i-1, c), IMMi-1 (Sx,i-1, g) and IMMi-1 (Sx,i-1, t). Using a X2 test, we determine how likely it is that the four observed frequencies are consistent with the IMM values from the next shorter context. When the frequencies differ significantly from the IMM values, we prefer to use them as better predictors of the next base, i.e., give them a higher [lambda] value. Conversely, when the frequencies are consistent with the IMM values, they offer little predictive value and hence we give them a lower [lambda] value. Specifically, we calculate the [chi]2 confidence c that the frequencies are not consistent with the IMM probabilities and set
The GLIMMER system consists of two programs. The first of these, called build-imm, takes an input set of sequences and builds and outputs the IMM for them as described above. These sequences can be complete genes or just partial orfs. The second program, called glimmer, then uses this IMM to identify putative genes in an entire genome. Glimmer does not use sliding windows to score regions. Instead, it first identifies all orfs longer than some specified threshold value, and scores each one in all six reading frames. Those that score higher than a designated threshold in the correct reading frame are then selected for further processing. These selected orfs are then examined for overlaps. If two orfs in different reading frames overlap (by more than some designated minimum length), the overlapping region alone is scored separately. The overlap region's six reading frame scores are then compared with those of the two overlapping orfs to see which frame scores highest. In general, when a longer orf overlaps a shorter orf and the overlap region scores highest in the reading frame of the longer orf, then the shorter orf is eliminated as a gene candidate. The final output of the program is a list of putative gene coordinates in the genome, together with notations for each one that may have had a suspicious overlap with another gene candidate. These `suspect' gene candidates (usually a very small percentage of the total) can then be examined manually to determine if they are in fact genes. Samples of GLIMMER outputs for the H.pylori genome are available on the GLIMMER web site at http://www.cs.jhu.edu/labs/compbio/glimmer.html, which also contains results for E.coli and H.influenzae. The GLIMMER system, including all source code, is freely available from this site.
To evaluate the effectiveness of our IMM, we compared it to a conventional fixed-order model on data from H.influenzae genome. As a second confirming test, we ran it on the recently sequenced H.pylori genome and did a careful comparison of the genes found by GLIMMER to those annotated in the public databases and to the genes found by the GeneMark system.
Haemophilus influenzaehas many putative genes whose existence has not been confirmed biologically. For this experiment, we wanted to train GLIMMER using only genes that had a very high likelihood of being real; therefore, we chose for training a set of orfs that satisfy both of these criteria: (i) the orf is >500 bases long, which provides the basis for a statistical argument that the gene is highly likely to be a coding region, since orfs of this length almost never occur in non-coding DNA. (ii) The orf does not overlap any other orf longer than 500 bp. Using these criteria, we were able to collect 1168 orfs from the current version of H.influenzae (GenBank accession L42023), which contains 1717 annotated genes. Thirty-two of these did not match CDS entries, but we included them anyway. This gives us a completely automatic training procedure for GLIMMER, requiring no human intervention.
This experiment compared GLIMMER's IMM to a conventional fixed-length Markov model on the H.influenzae genome data. We followed identical training protocols for both the IMM and a fixed-length 5th-order Markov model. [This 5th-order Markov model is the same model as that used by GeneMark (6). Because we did not have access to the GeneMark source code, we could not retrain that system on our data, so we implemented our own model based on published descriptions of GeneMark.] All post-processing to resolve overlaps was also identical for both methods. Thus the only difference was the model itself: in one case an interpolated Markov model, and in the other case a 5th-order Markov model. Note that we also implemented 4th and 6th-order Markov models, but the 5th-order model performed better than these. The results are shown in Table 1.
Table 1.
Of the 37 genes missed by GLIMMER's IMM, only one was found by the 5th-order model. In contrast, the IMM found 107 genes that the 5th order model missed. For this run, a pre-set threshold prevented both systems from finding genes shorter than 100 bp, and six of the 37 genes missed by GLIMMER were below this threshold. Of the remaining 31 genes, only one was longer than 500 bp. Finally, note that this was a completely `self-trained' experiment in which database matches were not used for training; augmenting the training set with these additional genes will almost certainly improve performance further. Of the 209 additional genes called by the system, some can be eliminated from consideration by comparison with functional RNA sequences. The remainder may or may not be expressed genes, and further biological evidence is required to resolve these genes.
Finally, in a test designed to run the system as it will be used on new, complete genomes, we ran GLIMMER on the complete, recently sequenced genome of H.pylori (13), the bacterium that causes stomach ulcers. A training set of brute force orfs that were >500 nt were collected from the complete genome of H.pylori. (This training set was collected from the genome without reference to any annotation, exactly as it would be for a brand new sequence.) The resulting IMM model was then compared to the annotated set of genes identified for this organism. The 1590 genes annotated for Helicobacter were identified by integrating the following sets of information: (i) evaluating brute force orfs for protein-level sequence similarity matches to the public archives, (ii) predicting coding regions using the GeneMark system and (iii) collecting `intergenic' orfs that were found between the genes with database matches and the genes called by GeneMark. We consider the H.pylori sequence annotation to have been intensively evaluated by the research community, and as yet, no unidentified genes have been reported since the H.pylori publication.
The annotated genes were compared to the results of the GLIMMER algorithm, and 1548 of the 1590 genes were found to have been correctly identified. An additional 314 potential orfs were found by the system in the H.pylori genome. Some of these additional genes can be eliminated by discarding those that conflict with ribosomal and transfer RNAs, but the remainder cannot be ruled out as authentic genes without further biological evidence. The set of 42 unidentified genes, representing a potential false negative rate of 2.6%, were examined further. Nineteen of these genes from the H.pylori annotation were under 100 nt in length, and possibly below the length for meaningful detection by compositional methods. Orfs that have matches to proteins in the current public archives serve as the most reliable and independent verification that an orf is an authentic gene; of these orfs, only seven were present in the 42 genes that GLIMMER did not identify. This suggests a minimal false negative rate of 0.44% for GLIMMER.
Note that for this experiment, GLIMMER used a minimum gene length of 90 bp; this length can be changed with a simple command line parameter. With a minimum gene length of 180 bp (60 amino acids), for example, GLIMMER calls 286 fewer genes in H.pylori.
Finally, we conducted a limited comparison to the GeneMark system (6). To keep the comparison simple, we only considered the 974 genes from H.pylori that had database matches to other organisms; these can safely be considered to be `true' genes. GLIMMER, was trained exclusively on orfs longer than 500 bp, with overlapping orfs simply discarded. Thus GLIMMER was completely self-trained for this test, with no human intervention. (This fully automatic training requires only a few minutes of computation time.) For the first comparison, we used the output of GeneMark as generated by the H.pylori project (13); the GeneMark version used in that study was from early 1997.
From the set of 974 genes, GLIMMER found 21 genes that GeneMark missed, while GeneMark found one gene that GLIMMER missed. Overall GLIMMER missed eight genes while GeneMark missed 28. The two systems agreed on 945 of the 974 genes. We then ran a second comparison, this time using GeneMarkHMM, the newest release of GeneMark. (For this experiment, GeneMarkHMM was trained using all orfs longer than 700 bp, and the genes were divided during training into `typical' and `atypical' classes.) GeneMarkHMM missed 23 of the genes from the list of database hits. GLIMMER found 15 of the genes that GeneMarkHMM missed, while GeneMarkHMM did not find any genes that GLIMMER missed. The two systems agreed on 951/974 (97.6%) of the genes.
Note that the experiments described here all used a fully automatic training protocol, in which long orfs were identified by a program and then fed directly into GLIMMER. The system will perform even better if additional genes are included in the training set, and we expect that genome projects will include database matches to other organisms as part of training. Another simple method for improving performance is also available: the first set of genes identified by the system can be used as a new (larger) training set, and the system can be re-run repeatedly until it converges. This iterative algorithm will also be available as an option in the GLIMMER system.
Evaluating the accuracy of a microbial gene finder is difficult, because the genes annotated in GenBank do not always have biological evidence to back up their existence. As the annotation becomes more stable, more accurate estimates of accuracy will be possible. At the same time, better gene finders should result because the available training data will improve. Although GLIMMER'S sensitivity is nearing 100% already, there are several important areas of future improvements. One is to improve its specificity by reducing the number of false positives (after first confirming that the unannotated genes found by the system are in fact false). Specificity can already be reduced substantially, at the cost of slightly reducing sensitivity, by increasing the minimum length orf that GLIMMER will consider as a gene. Another is to incorporate separate pattern analysis algorithms that will allow the system to find promoters, enhancers, terminators and other signals that occur in intergenic regions. Accurate location of these signals is an important problem in its own right, and a system that integrates the content scoring approach of GLIMMER with a good signal identification algorithm should produce better results than either approach could independently.
Thanks to Mark Borodovsky and Alexander Lukashin for kindly sharing the results of GeneMarkHMM on the H.pylori genome. S.L.S. is supported by the National Human Genome Research Institute at NIH under Grant No. K01-HG00022-1. S.L.S. and A.L.D. are supported by the National Science foundation under Grant No. IRI-9530462. S.K. is supported by NSF IRI-9529227. O.W. is supported by the Department of Energy Grant No. DE-FC02-95ER61962.A003.
Nucleic Acids Research
Pages
Introduction
Interpolated Markov Models
Markov chains
Interpolated models
Algorithm And System Design
Setting IMM parameters
The GLIMMER system
Methods And Results
Comparison on H.influenzae
Gene finding accuracy on H.pylori
Conclusion
Acknowledgements
References
where Sx is the oligomer ending at position x, and n is the length of the sequence. IMM8 (Sx), the 8th-order interpolated Markov model score, is computed as

where [lambda]k(Sx - 1) is the numeric weight associated with the kmer ending at position x - 1 in the sequence S and Pk(Sx) is the estimate obtained from the training data of the probability of the base located at x in the kth-order model. Thus, the 8th-order IMM score of an oligomer is a linear combination of the predictions made by the 8th, 7th and lesser-order models all the way down to the 0th-order model, which is just the simple prior probabilities of a, c, g, t. The above equation is the solution to the evaluation problem mentioned in the introduction.
IMMk(Sx) = [lambda]k(Sx - 1) - Pk(Sx) + [1 - [lambda]k(Sx - 1)] - IMMk - 1(Sx)
The value of [lambda]i(Sx) that we associate with Pi(Sx) can be regarded as a measure of our confidence in the accuracy of this value as an estimate of the true probability. GLIMMER uses two criteria to determine [lambda]i(Sx). The first of these is simply frequency of occurrence. If the number of occurrences of context string Sx,i in the training data exceeds a specific threshold value, then [lambda]i(Sx) is set to 1.0. Thus, when there are sufficiently many sample occurrences of a context string in the training data, then those sample probabilities are used. The current default value for this threshold in GLIMMER is 400, which gives ~95% confidence that the sample probabilities are within ±0.05 of the true probabilities from which the sample was taken. (Other thresholds were tested experimentally, but none provided any noticeable improvement.)

Thus, we assign higher [lambda] values based on a combination of predictive value, determined by [chi]2 significance, and accuracy, determined by frequency of occurrence. This [lambda] value now defines the probabilities IMMi (Sx,i, b) for b [member of] {a, c, g, t} according to equation 1. [Other methods of assigning [lambda] values for IMMs have been developed (9,10). We experimented with these methods in addition to the one described above, and comparative results will be given in a follow up paper. Roberts (11), cited in (12) also describes a method for building nonuniform Markov models.]

Model
Genesfound
Genesmissed
Additional genes
GLIMMER IMM
1680 (97.8%
37
209
5th-Order Markov
1574 (91.7%)
143
104
REFERENCES
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 6 Jan 1998
Copyright© Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
U. Wegmann, K. Overweg, N. Horn, A. Goesmann, A. Narbad, M. J. Gasson, and C. Shearman Complete Genome Sequence of Lactobacillus johnsonii FI9785, a Competitive Exclusion Agent against Pathogens in Poultry J. Bacteriol., November 15, 2009; 191(22): 7142 - 7143. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Coker, P. DasSarma, M. Capes, T. Wallace, K. McGarrity, R. Gessler, J. Liu, H. Xiang, R. Tatusov, B. R. Berquist, et al. Multiple Replication Origins of Halobacterium sp. Strain NRC-1: Properties of the Conserved orc7-Dependent oriC1 J. Bacteriol., August 15, 2009; 191(16): 5253 - 5261. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Elias, A. Mukhopadhyay, M. P. Joachimiak, E. C. Drury, A. M. Redding, H.-C. B. Yen, M. W. Fields, T. C. Hazen, A. P. Arkin, J. D. Keasling, et al. Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation Nucleic Acids Res., May 1, 2009; 37(9): 2926 - 2939. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Klasson, J. Westberg, P. Sapountzis, K. Naslund, Y. Lutnaes, A. C. Darby, Z. Veneti, L. Chen, H. R. Braig, R. Garrett, et al. The mosaic genome structure of the Wolbachia wRi strain infecting Drosophila simulans PNAS, April 7, 2009; 106(14): 5725 - 5730. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. C. Stewart, B. Osborne, and T. D. Read DIYA: a bacterial annotation pipeline for any genomics lab Bioinformatics, April 1, 2009; 25(7): 962 - 963. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Bayjanov, M. Wels, M. Starrenburg, J. E. T. van Hylckama Vlieg, R. J. Siezen, and D. Molenaar PanCGH: a genotype-calling algorithm for pangenome CGH data Bioinformatics, February 1, 2009; 25(3): 309 - 314. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Li, C. I. Reich, and G. J. Olsen A whole-genome approach to identifying protein binding sites: promoters in Methanocaldococcus (Methanococcus) jannaschii Nucleic Acids Res., December 1, 2008; 36(22): 6948 - 6958. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Noguchi, T. Taniguchi, and T. Itoh MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes DNA Res, December 1, 2008; 15(6): 387 - 396. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Ter-Hovhannisyan, A. Lomsadze, Y. O. Chernoff, and M. Borodovsky Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training Genome Res., December 1, 2008; 18(12): 1979 - 1990. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Rasko, M. J. Rosovitz, G. S. A. Myers, E. F. Mongodin, W. F. Fricke, P. Gajer, J. Crabtree, M. Sebaihia, N. R. Thomson, R. Chaudhuri, et al. The Pangenome Structure of Escherichia coli: Comparative Genomic Analysis of E. coli Commensal and Pathogenic Isolates J. Bacteriol., October 15, 2008; 190(20): 6881 - 6893. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wang, A. Diehl, F. Wu, J. Vrebalov, J. Giovannoni, A. Siepel, and S. D. Tanksley Sequencing and Comparative Analysis of a Conserved Syntenic Segment in the Solanaceae Genetics, September 1, 2008; 180(1): 391 - 408. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Klasson, T. Walker, M. Sebaihia, M. J. Sanders, M. A. Quail, A. Lord, S. Sanders, J. Earl, S. L. O'Neill, N. Thomson, et al. Genome Evolution of Wolbachia Strain wPip from the Culex pipiens Group Mol. Biol. Evol., September 1, 2008; 25(9): 1877 - 1887. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Takarada, M. Sekine, H. Kosugi, Y. Matsuo, T. Fujisawa, S. Omata, E. Kishi, A. Shimizu, N. Tsukatani, S. Tanikawa, et al. Complete Genome Sequence of the Soil Actinomycete Kocuria rhizophila J. Bacteriol., June 15, 2008; 190(12): 4139 - 4146. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Hu, W. Fan, B. Han, H. Liu, D. Zheng, Q. Li, W. Dong, J. Yan, M. Gao, C. Berry, et al. Complete Genome Sequence of the Mosquitocidal Bacterium Bacillus sphaericus C3-41 and Comparison with Those of Closely Related Bacillus Species J. Bacteriol., April 15, 2008; 190(8): 2892 - 2902. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ansong, S. O. Purvine, J. N. Adkins, M. S. Lipton, and R. D. Smith Proteogenomics: needs and roles to be filled by proteomics in genome annotation Brief Funct Genomic Proteomic, March 10, 2008; (2008) eln010v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-T. Lee, C. Amaro, K.-M. Wu, E. Valiente, Y.-F. Chang, S.-F. Tsai, C.-H. Chang, and L.-I Hor A Common Virulence Plasmid in Biotype 2 Vibrio vulnificus and Its Dissemination Aided by a Conjugal Plasmid J. Bacteriol., March 1, 2008; 190(5): 1638 - 1648. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Foote, J. T. Bosse, A. B. Bouevitch, P. R. Langford, N. M. Young, and J. H. E. Nash The Complete Genome Sequence of Actinobacillus pleuropneumoniae L20 (Serotype 5b) J. Bacteriol., February 15, 2008; 190(4): 1495 - 1496. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Vandenbon, Y. Miyamoto, N. Takimoto, T. Kusakabe, and K. Nakai Markov Chain-based Promoter Structure Modeling for Tissue-specific Expression Pattern Prediction DNA Res, February 7, 2008; (2008) dsm034v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Karray, E. Darbon, N. Oestreicher, H. Dominguez, K. Tuphile, J. Gagnat, M.-H. Blondelet-Rouault, C. Gerbaud, and J.-L. Pernodet Organization of the biosynthetic gene cluster for the macrolide antibiotic spiramycin in Streptomyces ambofaciens Microbiology, December 1, 2007; 153(12): 4111 - 4122. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kang, S.-J. Yang, S. Kim, and J. Bhak CONSORF: a consensus prediction system for prokaryotic coding sequences Bioinformatics, November 15, 2007; 23(22): 3088 - 3090. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. J. Cytryn, D. P. Sangurdekar, J. G. Streeter, W. L. Franck, W.-s. Chang, G. Stacey, D. W. Emerich, T. Joshi, D. Xu, and M. J. Sadowsky Transcriptional and Physiological Responses of Bradyrhizobium japonicum to Desiccation-Induced Stress J. Bacteriol., October 1, 2007; 189(19): 6751 - 6762. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Saeys, I. Inza, and P. Larranaga A review of feature selection techniques in bioinformatics Bioinformatics, October 1, 2007; 23(19): 2507 - 2517. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Nakagawa, Y. Takaki, S. Shimamura, A.-L. Reysenbach, K. Takai, and K. Horikoshi Deep-sea vent {varepsilon}-proteobacterial genomes provide insights into emergence of pathogens PNAS, July 17, 2007; 104(29): 12146 - 12150. [Abstract] [Full Text] [PDF] |
||||
![]() |
N.-H. Cho, H.-R. Kim, J.-H. Lee, S.-Y. Kim, J. Kim, S. Cha, S.-Y. Kim, A. C. Darby, H.-H. Fuxelius, J. Yin, et al. The Orientia tsutsugamushi genome reveals massive proliferation of conjugative type IV secretion system and host cell interaction genes PNAS, May 8, 2007; 104(19): 7981 - 7986. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Klint, H.-H. Fuxelius, R. R. Goldkuhl, H. Skarin, C. Rutemark, S. G. E. Andersson, K. Persson, and B. Herrmann High-Resolution Genotyping of Chlamydia trachomatis Strains by Multilocus Sequence Analysis J. Clin. Microbiol., May 1, 2007; 45(5): 1410 - 1414. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. O'Neill, A. R. Larsen, R. Skov, A. S. Henriksen, and I. Chopra Characterization of the Epidemic European Fusidic Acid-Resistant Impetigo Clone of Staphylococcus aureus J. Clin. Microbiol., May 1, 2007; 45(5): 1505 - 1510. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Lloyd, D. A. Rasko, and H. L. T. Mobley Defining Genomic Islands and Uropathogen-Specific Genes in Uropathogenic Escherichia coli J. Bacteriol., May 1, 2007; 189(9): 3532 - 3546. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. de Groot, T. Mailund, and J. Hein Comparative annotation of viral genomes with non-conserved gene structure Bioinformatics, May 1, 2007; 23(9): 1080 - 1089. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Wegmann, M. O'Connell-Motherway, A. Zomer, G. Buist, C. Shearman, C. Canchaya, M. Ventura, A. Goesmann, M. J. Gasson, O. P. Kuipers, et al. Complete Genome Sequence of the Prototype Lactic Acid Bacterium Lactococcus lactis subsp. cremoris MG1363 J. Bacteriol., April 15, 2007; 189(8): 3256 - 3270. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Yukawa, C. A. Omumasaba, H. Nonaka, P. Kos, N. Okai, N. Suzuki, M. Suda, Y. Tsuge, J. Watanabe, Y. Ikeda, et al. Comparative analysis of the Corynebacterium glutamicum group and complete genome sequence of strain R Microbiology, April 1, 2007; 153(4): 1042 - 1058. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Zienkiewicz, I. Kern-Zdanowicz, M. Golebiewski, J. Zylinska, P. Mieczkowski, M. Gniadkowski, J. Bardowski, and P. Ceglowski Mosaic Structure of p1658/97, a 125-Kilobase Plasmid Harboring an Active Amplicon with the Extended-Spectrum {beta}-Lactamase Gene blaSHV-5 Antimicrob. Agents Chemother., April 1, 2007; 51(4): 1164 - 1171. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Faruque, V. C. Tam, N. Chowdhury, P. Diraphat, M. Dziejman, J. F. Heidelberg, J. D. Clemens, J. J. Mekalanos, and G. B. Nair Genomic analysis of the Mozambique strain of Vibrio cholerae O1 reveals the origin of El Tor strains carrying classical CTX prophage PNAS, March 20, 2007; 104(12): 5151 - 5156. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Delcher, K. A. Bratke, E. C. Powers, and S. L. Salzberg Identifying bacterial genes and endosymbiont DNA with Glimmer Bioinformatics, March 15, 2007; 23(6): 673 - 679. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. F. Challacombe, A. J. Duncan, T. S. Brettin, D. Bruce, O. Chertkov, J. C. Detter, C. S. Han, M. Misra, P. Richardson, R. Tapia, et al. Complete Genome Sequence of Haemophilus somnus (Histophilus somni) Strain 129Pt and Comparison to Haemophilus ducreyi 35000HP and Haemophilus influenzae Rd J. Bacteriol., March 1, 2007; 189(5): 1890 - 1898. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Saeys, P. Rouze, and Y. Van de Peer In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists Bioinformatics, February 15, 2007; 23(4): 414 - 420. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. H. Bergman, K. D. Passalacqua, P. C. Hanna, and Z. S. Qin Operon Prediction for Sequenced Bacterial Genomes without Experimental Information Appl. Envir. Microbiol., February 1, 2007; 73(3): 846 - 854. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Krause, A. C. McHardy, T. W. Nattkemper, A. Puhler, J. Stoye, and F. Meyer GISMO--gene identification using a support vector machine for ORF classification Nucleic Acids Res., January 28, 2007; 35(2): 540 - 549. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Sugawara, T. Abe, T. Gojobori, and Y. Tateno DDBJ working on evaluation and classification of bacterial genes in INSDC Nucleic Acids Res., January 12, 2007; 35(suppl_1): D13 - D15. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Lanie, W.-L. Ng, K. M. Kazmierczak, T. M. Andrzejewski, T. M. Davidsen, K. J. Wayne, H. Tettelin, J. I. Glass, and M. E. Winkler Genome Sequence of Avery's Virulent Serotype 2 Strain D39 of Streptococcus pneumoniae and Comparison with That of Unencapsulated Laboratory Strain R6 J. Bacteriol., January 1, 2007; 189(1): 38 - 51. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Brotcke, D. S. Weiss, C. C. Kim, P. Chain, S. Malfatti, E. Garcia, and D. M. Monack Identification of MglA-Regulated Genes Reveals Novel Virulence Factors in Francisella tularensis Infect. Immun., December 1, 2006; 74(12): 6642 - 6655. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Noguchi, J. Park, and T. Takagi MetaGene: prokaryotic gene finding from environmental genome shotgun sequences Nucleic Acids Res., November 14, 2006; 34(19): 5623 - 5630. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-T. Chen, H.-Y. Shu, L.-H. Li, T.-L. Liao, K.-M. Wu, Y.-R. Shiau, J.-J. Yan, I.-J. Su, S.-F. Tsai, and T.-L. Lauderdale Complete Nucleotide Sequence of pK245, a 98-Kilobase Plasmid Conferring Quinolone Resistance and Extended-Spectrum-{beta}-Lactamase Activity in a Clinical Klebsiella pneumoniae Isolate Antimicrob. Agents Chemother., November 1, 2006; 50(11): 3861 - 3866. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. S. Vernikos and J. Parkhill Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands Bioinformatics, September 15, 2006; 22(18): 2196 - 2203. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Palenik, Q. Ren, C. L. Dupont, G. S. Myers, J. F. Heidelberg, J. H. Badger, R. Madupu, W. C. Nelson, L. M. Brinkac, R. J. Dodson, et al. Genome sequence of Synechococcus CC9311: Insights into adaptation to a coastal environment PNAS, September 5, 2006; 103(36): 13555 - 13559. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Delhon, E. R. Tulman, C. L. Afonso, Z. Lu, J. J. Becnel, B. A. Moser, G. F. Kutish, and D. L. Rock Genome of invertebrate iridescent virus type 3 (mosquito iridescent virus). J. Virol., September 1, 2006; 80(17): 8439 - 8449. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. R. Tulman, G. Delhon, C. L. Afonso, Z. Lu, L. Zsak, N. T. Sandybaev, U. Z. Kerembekova, V. L. Zaitsev, G. F. Kutish, and D. L. Rock Genome of horsepox virus. J. Virol., September 1, 2006; 80(18): 9244 - 9258. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. R. Kulasekara, H. D. Kulasekara, M. C. Wolfgang, L. Stevens, D. W. Frank, and S. Lory Acquisition and Evolution of the exoU Locus in Pseudomonas aeruginosa J. Bacteriol., June 1, 2006; 188(11): 4037 - 4050. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Campoy, J. Aranda, G. Alvarez, J. Barbe, and M. Llagostera Isolation and Sequencing of a Temperate Transducing Phage for Pasteurella multocida. Appl. Envir. Microbiol., May 1, 2006; 72(5): 3154 - 3160. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Han, G. Xie, J. F. Challacombe, M. R. Altherr, S. S. Bhotika, D. Bruce, C. S. Campbell, M. L. Campbell, J. Chen, O. Chertkov, et al. Pathogenomic Sequence Analysis of Bacillus cereus and Bacillus thuringiensis Isolates Closely Related to Bacillus anthracis J. Bacteriol., May 1, 2006; 188(9): 3382 - 3390. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wang, X. Tang, Z. Cheng, L. Mueller, J. Giovannoni, and S. D. Tanksley Euchromatin and Pericentromeric Heterochromatin: Comparative Composition in the Tomato Genome Genetics, April 1, 2006; 172(4): 2529 - 2540. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Nonaka, G. Keresztes, Y. Shinoda, Y. Ikenaga, M. Abe, K. Naito, K. Inatomi, K. Furukawa, M. Inui, and H. Yukawa Complete Genome Sequence of the Dehalorespiring Bacterium Desulfitobacterium hafniense Y51 and Comparison with Dehalococcoides ethenogenes 195 J. Bacteriol., March 15, 2006; 188(6): 2262 - 2274. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Gonzalez, R. I. Santamaria, P. Bustos, I. Hernandez-Gonzalez, A. Medrano-Soto, G. Moreno-Hagelsieb, S. C. Janga, M. A. Ramirez, V. Jimenez-Jacinto, J. Collado-Vides, et al. The partitioned Rhizobium etli genome: Genetic and metabolic redundancy in seven interacting replicons PNAS, March 7, 2006; 103(10): 3834 - 3839. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Dalevi, D. Dubhashi, and M. Hermansson Bayesian classifiers for detecting HGT using fixed and variable order markov models of genomic signatures Bioinformatics, March 1, 2006; 22(5): 517 - 522. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bryan, P. Roesch, L. Davis, R. Moritz, S. Pellett, and R. A. Welch Regulation of Type 1 Fimbriae by Unlinked FimB- and FimE-Like Recombinases in Uropathogenic Escherichia coli Strain CFT073 Infect. Immun., February 1, 2006; 74(2): 1072 - 1083. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-Y. Ou, L.-L. Chen, J. Lonnen, R. R. Chaudhuri, A. B. Thani, R. Smith, N. J. Garton, J. Hinton, M. Pallen, M. R. Barer, et al. A novel strategy for the identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites in closely related bacteria Nucleic Acids Res., January 9, 2006; 34(1): e3 - e3. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kosuge, T. Abe, T. Okido, N. Tanaka, M. Hirahata, Y. Maruyama, J. Mashima, A. Tomiki, M. Kurokawa, R. Himeno, et al. Exploration and Grading of Possible Genes from 183 Bacterial Strains by a Common Protocol to Identification of New Genes: Gene Trek in Prokaryote Space (GTPS) DNA Res, January 1, 2006; 13(6): 245 - 254. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Nielsen and A. Krogh Large-scale prokaryotic gene prediction and comparison to genome annotation Bioinformatics, December 15, 2005; 21(24): 4322 - 4329. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Thieme, R. Koebnik, T. Bekel, C. Berger, J. Boch, D. Buttner, C. Caldana, L. Gaigalat, A. Goesmann, S. Kay, et al. Insights into Genome Plasticity and Pathogenicity of the Plant Pathogenic Bacterium Xanthomonas campestris pv. vesicatoria Revealed by the Complete Genome Sequence J. Bacteriol., November 1, 2005; 187(21): 7254 - 7266. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. C. Kulkarni, R. Vigneshwar, V. K. Jayaraman, and B. D. Kulkarni Identification of coding and non-coding sequences using local Holder exponent formalism Bioinformatics, October 15, 2005; 21(20): 3818 - 3823. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Tettelin, V. Masignani, M. J. Cieslewicz, C. Donati, D. Medini, N. L. Ward, S. V. Angiuoli, J. Crabtree, A. L. Jones, A. S. Durkin, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial "pan-genome" PNAS, September 27, 2005; 102(39): 13950 - 13955. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Chibana, N. Oka, H. Nakayama, T. Aoyama, B. B. Magee, P. T. Magee, and Y. Mikami Sequence Finishing and Gene Mapping for Candida albicans Chromosome 7 and Syntenic Analysis Against the Saccharomyces cerevisiae Genome Genetics, August 1, 2005; 170(4): 1525 - 1537. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ben-Gal, A. Shani, A. Gohr, J. Grau, S. Arviv, A. Shmilovici, S. Posch, and I. Grosse Identification of transcription factor binding sites with variable-order Bayesian networks Bioinformatics, June 1, 2005; 21(11): 2657 - 2666. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Dziejman, D. Serruto, V. C. Tam, D. Sturtevant, P. Diraphat, S. M. Faruque, M. H. Rahman, J. F. Heidelberg, J. Decker, L. Li, et al. Genomic characterization of non-O1, non-O139 Vibrio cholerae reveals genes for a type III secretion system PNAS, March 1, 2005; 102(9): 3465 - 3470. [Abstract] [Full Text] [PDF] |
||||
![]() |
X.-F. Wan, N. C. VerBerkmoes, L. A. McCue, D. Stanek, H. Connelly, L. J. Hauser, L. Wu, X. Liu, T. Yan, A. Leaphart, et al. Transcriptomic and Proteomic Characterization of the Fur Modulon in the Metal-Reducing Bacterium Shewanella oneidensis J. Bacteriol., December 15, 2004; 186(24): 8385 - 8400. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. B. Lobocka, D. J. Rose, G. Plunkett III, M. Rusin, A. Samojedny, H. Lehnherr, M. B. Yarmolinsky, and F. R. Blattner Genome of Bacteriophage P1 J. Bacteriol., November 1, 2004; 186(21): 7032 - 7068. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. I. Johnson and S. W. Chisholm Properties of overlapping genes are conserved across microbial genomes Genome Res., November 1, 2004; 14(11): 2268 - 2272. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Lerat and H. Ochman {Psi}-{Phi}: Exploring the outer limits of bacterial pseudogenes Genome Res., November 1, 2004; 14(11): 2273 - 2278. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. C. Nierman, D. DeShazer, H. S. Kim, H. Tettelin, K. E. Nelson, T. Feldblyum, R. L. Ulrich, C. M. Ronning, L. M. Brinkac, S. C. Daugherty, et al. From the Cover: Structural flexibility in the Burkholderia mallei genome PNAS, September 28, 2004; 101(39): 14246 - 14251. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Alsmark, A. C. Frank, E. O. Karlberg, B.-A. Legault, D. H. Ardell, B. Canback, A.-S. Eriksson, A. K. Naslund, S. A. Handley, M. Huvet, et al. The louse-borne human pathogen Bartonella quintana is a genomic derivative of the zoonotic agent Bartonella henselae PNAS, June 29, 2004; 101(26): 9716 - 9721. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. D. Fletcher, L. Bernfield, V. Barniak, J. E. Farley, A. Howell, M. Knauf, P. Ooi, R. P. Smith, P. Weise, M. Wetherell, et al. Vaccine Potential of the Neisseria meningitidis 2086 Lipoprotein Infect. Immun., April 1, 2004; 72(4): 2088 - 2100. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. T. Dobbins, M. George Jr., D. A. Basham, M. E. Ford, J. M. Houtz, M. L. Pedulla, J. G. Lawrence, G. F. Hatfull, and R. W. Hendrix Complete Genomic Sequence of the Virulent Salmonella Bacteriophage SP6 J. Bacteriol., April 1, 2004; 186(7): 1933 - 1944. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Rasko, J. Ravel, O. A. Okstad, E. Helgason, R. Z. Cer, L. Jiang, K. A. Shores, D. E. Fouts, N. J. Tourasse, S. V. Angiuoli, et al. The genome sequence of Bacillus cereus ATCC 10987 reveals metabolic adaptations and a large plasmid related to Bacillus anthracis pXO1 Nucleic Acids Res., February 11, 2004; 32(3): 977 - 988. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Westberg, A. Persson, A. Holmberg, A. Goesmann, J. Lundeberg, K.-E. Johansson, B. Pettersson, and M. Uhlen The Genome Sequence of Mycoplasma mycoides subsp. mycoides SC Type Strain PG1T, the Causative Agent of Contagious Bovine Pleuropneumonia (CBPP) Genome Res., February 1, 2004; 14(2): 221 - 227. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-Y. Chen, K.-M. Wu, Y.-C. Chang, C.-H. Chang, H.-C. Tsai, T.-L. Liao, Y.-M. Liu, H.-J. Chen, A. B.-T. Shen, J.-C. Li, et al. Comparative Genome Analysis of Vibrio vulnificus, a Marine Pathogen Genome Res., December 1, 2003; 13(12): 2577 - 2587. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. de la Torre, L. M. Christianson, O. Beja, M. T. Suzuki, D. M. Karl, J. Heidelberg, and E. F. DeLong Proteorhodopsin genes are distributed among divergent marine bacterial taxa PNAS, October 28, 2003; 100(22): 12830 - 12835. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Delhon, M. P. Moraes, Z. Lu, C. L. Afonso, E. F. Flores, R. Weiblen, G. F. Kutish, and D. L. Rock Genome of Bovine Herpesvirus 5 J. Virol., October 1, 2003; 77(19): 10339 - 10347. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Baar, M. Eppinger, G. Raddatz, J. Simon, C. Lanz, O. Klimmek, R. Nandakumar, R. Gross, A. Rosinus, H. Keller, et al. Complete genome sequence and analysis of Wolinella succinogenes PNAS, September 30, 2003; 100(20): 11690 - 11695. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. E. Nelson, R. D. Fleischmann, R. T. DeBoy, I. T. Paulsen, D. E. Fouts, J. A. Eisen, S. C. Daugherty, R. J. Dodson, A. S. Durkin, M. Gwinn, et al. Complete Genome Sequence of the Oral Pathogenic Bacterium Porphyromonas gingivalis Strain W83 J. Bacteriol., September 15, 2003; 185(18): 5591 - 5601. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Miller, J. F. Heidelberg, J. A. Eisen, W. C. Nelson, A. S. Durkin, A. Ciecko, T. V. Feldblyum, O. White, I. T. Paulsen, W. C. Nierman, et al. Complete Genome Sequence of the Broad-Host-Range Vibriophage KVP40: Comparative Genomics of a T4-Related Bacteriophage J. Bacteriol., September 1, 2003; 185(17): 5220 - 5233. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Barrangou, E. Altermann, R. Hutkins, R. Cano, and T. R. Klaenhammer Functional and comparative genomic analyses of an operon involved in fructooligosaccharide utilization by Lactobacillus acidophilus PNAS, July 22, 2003; 100(15): 8957 - 8962. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. H. Majoros, M. Pertea, C. Antonescu, and S. L. Salzberg GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders Nucleic Acids Res., July 1, 2003; 31(13): 3601 - 3604. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Bocs, S. Cruveiller, D. Vallenet, G. Nuel, and C. Medigue AMIGene: Annotation of MIcrobial Genes Nucleic Acids Res., July 1, 2003; 31(13): 3723 - 3726. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Schiex, J. Gouzy, A. Moisan, and Y. de Oliveira FrameD: a flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences Nucleic Acids Res., July 1, 2003; 31(13): 3738 - 3741. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Liles, B. F. Manske, S. B. Bintrim, J. Handelsman, and R. M. Goodman A Census of rRNA Genes and Linked Genomic Sequences within a Soil Metagenomic Library Appl. Envir. Microbiol., May 1, 2003; 69(5): 2684 - 2691. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. D. Read, G. S. A. Myers, R. C. Brunham, W. C. Nelson, I. T. Paulsen, J. Heidelberg, E. Holtzapple, H. Khouri, N. B. Federova, H. A. Carty, et al. Genome sequence of Chlamydophila caviae (Chlamydia psittaci GPIC): examining the role of niche-specific genes in the evolution of the Chlamydiaceae Nucleic Acids Res., April 15, 2003; 31(8): 2134 - 2147. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-B. Guo, H.-Y. Ou, and C.-T. Zhang ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes Nucleic Acids Res., March 15, 2003; 31(6): 1780 - 1789. [Abstract] [Full Text] [PDF] |
||||
![]() |
E.-M. Lai, N. D. Phadke, M. T. Kachman, R. Giorno, S. Vazquez, J. A. Vazquez, J. R. Maddock, and A. Driks Proteomic Analysis of the Spore Coats of Bacillus subtilis and Bacillus anthracis J. Bacteriol., February 15, 2003; 185(4): 1443 - 1454. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Chan, S. Baker, C. C. Kim, C. S. Detweiler, G. Dougan, and S. Falkow Genomic Comparison of Salmonella enterica Serovars and Salmonella bongori by Use of an S. enterica Serovar Typhimurium DNA Microarray J. Bacteriol., January 15, 2003; 185(2): 553 - 563. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Welch, V. Burland, G. Plunkett III, P. Redford, P. Roesch, D. Rasko, E. L. Buckles, S.-R. Liou, A. Boutin, J. Hackett, et al. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli PNAS, December 24, 2002; 99(26): 17020 - 17024. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Ma, A. Campbell, and S. Karlin Correlations between Shine-Dalgarno Sequences and Gene Features Such as Predicted Expression Levels and Operon Structures J. Bacteriol., October 15, 2002; 184(20): 5733 - 5745. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Jin, Z. Yuan, J. Xu, Y. Wang, Y. Shen, W. Lu, J. Wang, H. Liu, J. Yang, F. Yang, et al. Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157 Nucleic Acids Res., October 15, 2002; 30(20): 4432 - 4441. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||















