Nucleic Acids Research, 2001, Vol. 29, No. 1 41-43
© 2001 Oxford University Press
TIGRFAMs: a protein family resource for the functional identification of proteins
The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
Received September 5, 2000; Revised and Accepted November 1, 2000.
| ABSTRACT |
|---|
|
|
|---|
TIGRFAMs is a collection of protein families featuring curated multiple sequence alignments, hidden Markov models and associated information designed to support the automated functional identification of proteins by sequence homology. We introduce the term equivalog to describe members of a set of homologous proteins that are conserved with respect to function since their last common ancestor. Related proteins are grouped into equivalog families where possible, and otherwise into protein families with other hierarchically defined homology types. TIGRFAMs currently contains over 800 protein families, available for searching or downloading at www.tigr.org/TIGRFAMs. Classification by equivalog family, where achievable, complements classification by orthology, superfamily, domain or motif. It provides the information best suited for automatic assignment of specific functions to proteins from large-scale genome sequencing projects.
| INTRODUCTION |
|---|
|
|
|---|
The correct assignment of protein function by homology across genomes is a difficult task. Variable evolutionary clock rates mean the most similar sequences may not be the most recently diverged. Differing patterns of gene loss across species may cause proteins of distinct function, paralogous in the last common ancestral species, to appear to be orthologous. True orthologous families may contain members with new activities. For these and other reasons, the consensus of top-scoring pairwise matches may easily misidentify a new protein. Errors in the iterated transfer of annotations among uncharacterized proteins and the relatively poor signal-to-noise ratio inherent in pairwise sequence alignment further complicate the task of protein functional identification. To address these problems, we have created a protein family resource to represent functional and not just evolutionary classifications of proteins.
Most current protein classification methods are oriented toward detection of sets of distantly related proteins that do not necessarily have the same function (14). In some cases, only short regions of conserved protein sequence are used to define these sets of proteins. This strategy results in inclusion of proteins that have diverse functions into a family (e.g. all proteins containing pyridoxal phosphate binding domain). A strategy that more narrowly defines families, represented in clusters of orthologous groups (COGs) (5), uses an automated clustering based on bi-directional best hit relationships across diverged species. While similar function is implied between sequences of the highest similarity of different species; conserved function is not a formal criterion used to build COGs.
| BUILDING TIGRFAMs |
|---|
|
|
|---|
We have built a collection of protein families, TIGRFAMs, most of which are predicted to have uniform function. The families are represented by curated multiple sequence alignments (seed alignments), hidden Markov models (HMMs) and associated annotations and cutoff scores. Models are developed using the HMMER package (6), version 2.1.1. This package allows control of HMM architecture, prior probability tables reflecting amino acid relatedness and other parameters during the building of models. Searches with the HMMs yield scores in bits that are compared to high and low stringency reference values, called the trusted and noise cutoffs. The scope of each model, that is, the set of proteins that are recognized by the HMM, is determined by which sequences are in the seed alignment, how they are aligned, the input parameters of the program hmmbuild and the cutoff value settings.
Initial clusters of related proteins from completed microbial genomes were constructed in various ways, including single linkage clustering based on all-versus-all sequence searches (7) and a BLAST (8) bi-directional best hit clustering method similar to that used in COGs (5). These initial clusters frequently contain genes of heterogeneous function. Curation of alignments, consideration of phylogenetic and/or distance trees and re-examination of protein functional assignments are performed with the objective of refining or partitioning the initial clusters into subclusters that are homogeneous in function. For each resulting subcluster, several versions of HMM are tested. Comparison of full database search results among the HMMs built from the same or different subclusters enables the selection of the best model as well as cutoff scores for each group of proteins.
Generally, a dubious member of a functionally conserved protein family is eliminated from the seed and used instead to help set an upper limit for the trusted cutoff score. Inclusion in the HMM seed would compromise the specificity of the model, since any member of the seed alignment is sure to score above any reasonable trusted cutoff score. The care taken in building these models makes them useful in predicting specific protein function and reduces the risk that incorrect historical annotations will be propagated to new sequences.
| HOMOLOGY TYPES |
|---|
|
|
|---|
We introduce the term equivalog to describe proteins homologous to each other and conserved in function since their last common ancestor. Any one member of a set of orthologous proteins that differs in function from the others is not an equivalog. Sets of equivalogs, therefore, are not necessarily monophyletic. Proteins related by lateral transfer can be equivalogs, although by definition they are not orthologs.
Figure 1 shows a possible phylogeny for protein evolution from a single ancestral sequence. In this model tree, paralogous proteins A and B have distinct functions. Subsequent speciation expanded each paralog into its own orthologous branch. Within each branch, the members are equivalogs if the function is conserved. The term superfamily as introduced by Dayhoff et al. (9) and developed in the PIR-Protein Sequence Database (10) describes the complete set of proteins having sequence homology over essentially their full length. The two branches in the phylogeny belong to the same superfamily. They represent the whole of the superfamily if no other full-length homologs can be found. Otherwise, they represent a subfamily within the superfamily. Homology may be restricted to a domain rather than the full lengths of the respective proteins. If so, the homology type is termed domain, subfamily domain and equivalog domain, in place of superfamily, subfamily and equivalog, respectively.
|
The majority of the profile HMMs in TIGRFAMs are designed to identify equivalog families from among currently available sequences. Models with other homology types have different uses. Superfamily and domain models hit relatively large numbers of proteins, provide sensitivity for the identification of remote homologs and provide insight into the possible general function of proteins whose specific role is not known. Equivalog models, in contrast, identify functionally equivalent members from larger sets of related proteins. This assignment of specific protein function is a primary goal in genome annotation.
The current TIGRFAMs dataset currently consists of 854 models, of which 516 are classified as equivalog models and 24 as equivalog domain. An additional 125 models describe small families whose function, although uncharacterized, may also be equivalent (hypothetical equivalog). The rest represent subfamily, superfamily, domain and other homology types.
| EQUIVALOG HMM PERFORMANCE |
|---|
|
|
|---|
Each model has a trusted cutoff, above which there should be no false positive hits, and a noise cutoff below which hits to the model are considered uninteresting. The range between trusted cutoff and noise cutoff represents scores that may or may not be true hits. Annotations attached to equivalog models for assignment to matching proteins include protein names, role categories, explanatory comments and database cross-references. Over two-thirds have been assigned prokaryotic gene symbols and nearly half have been assigned Enzyme Commission (EC) numbers. Proteins scoring above the trusted cutoff can be assigned these annotations automatically and with fairly high confidence.
The set of proteins for which equivalog models have been built is heavily weighted toward those present in published complete microbial genomes. The behavior of the equivalog model subset against genomic data suggests that these models act substantially as intended. It can be expected from first principles that most equivalog families, unlike superfamilies and domain families, will have no more than one member in most small genomes. A small genome suggests strong selective pressures against maintaining redundancies in protein function. Maintenance of distinct isozymes should be the exceptional case. Of 516 equivalog HMMs in TIGRFAMs, only 95 hit a second protein in any of the first 25 different prokaryotes whose completed genomes became available. Sixty-three of those have a second hit in exactly one genome. Only three species (Escherichia coli, Bacillus subtilis and Synechocystis sp. strain PCC 6803) have as many as three hits to any equivalog HMM. These cases generally appear to identify functionally equivalent proteins, such as the three isozymes of phospho-2-dehydro-3-deoxyheptonate aldolase found in E.coli by HMM TIGR00034.
A second test of equivalog model behavior is that the same region of the same protein should not be described by two different equivalog models. Of over 5500 predictions made by TIGRFAMs equivalog, equivalog domain and hypothetical equivalog models in 25 prokaryotic species, only B.subtilis PabB scores above the trusted cutoff for the same stretch of sequence to two different models, para-aminobenzoate synthase component I (TIGR00553) and anthranilate synthase component I, the TrpE protein of tryptophan biosynthesis (TIGR00564). Interestingly, the adjacent TrpG protein has been shown to be amphibolic, functioning in the synthesis of both tryptophan and para-aminobenzoic acid (11).
For each protein scoring above the trusted cutoff of an equivalog model, a strong prediction is made that the protein functions as described for the model. Comparison of prediction based on HMM searches to prediction based on other means (annotated protein databases, literature references and new analyses of probable protein function) is used first to pick a model from several candidates. Study may reveal functional heterogeneity among closely related proteins such that no equivalog model can be made. A subfamily or superfamily model may be made instead. After an equivalog model has been created, examination of its predictions provides feedback on model performance. TIGRFAMs has been used in annotation of microbial genomic sequences at the Institute for Genomic Research (TIGR), such as for Vibrio cholerae (12). Comparison to results from manual annotation based on multiple pairwise alignments (see www.tigr.org/CMR2/db_assignmentextver2.html for an outline of homology-based annotation standards at TIGR) has validated predictions for many models and led to improvement of a few.
| USING TIGRFAMs |
|---|
|
|
|---|
TIGRFAMs may be downloaded for use as a library of HMMs for protein identification or searched for text or sequence matches at its web site, http://www.tigr.org/TIGRFAMs. For any protein that scores greater than the trusted cutoff to an equivalog-type TIGRFAMs model, a prediction is made not only that this sequence shares common ancestry with the members of the seed alignment, but also that it shares a common function. Pre-calculated results of HMM searches with TIGRFAMs models against a collection of completed microbial genomes can be found in the Comprehensive Microbial Resource (CMR) (13) accessible from the CMR homepage www.tigr.org/CMR. The assigned homology type (equivalog or other), associated annotations, seed alignments, full alignments and tables of hits to microbial genomic sequences are presented for each model.
The TIGRFAMs collection is intended to complement the Pfam A collection (1) of profile HMMs. It uses the same scoring system, the same suite of programs for generating and using HMMs and a similar representation for ancillary data. The two sets of models may be combined in a single library and searched simultaneously.
Among Pfam HMMs with at least one trusted hit to protein sequence from a complete prokaryotic genome, the models hit on average just over 20 hits in 25 genomes. Among TIGRFAMs equivalog, equivalog domain and hypothetical equivalog HMMs, the average is just under 10 hits in 25 genomes. For proteins hit by both, the TIGRFAMs equivalog model usually describes a family equal in size to or smaller than the overlapping Pfam model, and only rarely larger. The TIGRFAMs model hit region averages
40% longer than the corresponding Pfam hit region and is only rarely shorter. Longer hits to fewer proteins are expected for models designed for functional identification of whole proteins rather than detection of domain and superfamily relationships.
Several TIGRFAMs equivalog models may describe different branches of a superfamily or domain family described by a single Pfam model. Other proteins from the same superfamily may fail to score above the cutoff value for any current equivalog model. This amounts to a negative prediction, a warning that calling these orphan proteins functionally equivalent to those within the scope of an equivalog model may be unwarranted.
Using the domain and superfamily classification systems of Pfam HMMs, PIR, COGs and other resources for the general classification of proteins has undeniable value in support of the prediction of protein function by homology. Addition of the specific functional predictions afforded by TIGRFAMs equivalog HMMs offers a hierarchical classification system that should enhance both the automation and the accuracy of protein annotation.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +1 301 838 0200; +1 301 838 0209; Email: owhite{at}tigr.org
| REFERENCES |
|---|
|
|
|---|
-
1 Bateman,A., Birney,E., Durbin,R., Eddy,S.R., Howe,K.L. and Sonnhammer,E.L. (2000) The Pfam Protein Families Database. Nucleic Acids Res., 28, 263266.
2 Sonnhammer,E.L., Eddy,S.R. and Durbin,R. (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins, 28, 405420.[Web of Science][Medline]
3 Srinivasarao,G.Y., Yeh,L.S., Marzec,C.R., Orcutt,B.C. and Barker,W.C. (1999) PIR-ALN: a database of protein sequence alignments. Bioinformatics, 15, 382390.
4 Henikoff,J.G., Greene,E.A., Pietrokovski,S. and Henikoff,S. (2000) Increased coverage of protein families with the Blocks Database servers. Nucleic Acids Res., 28, 228230.
5 Tatusov,R.L., Galperin,M.Y., Natale,D.A. and Koonin,E.V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res., 28, 3336. Updated article in this issue: Nucleic Acids Res. (2001), 29, 2228.
6 Eddy,S.R.(1998) Profile hidden Markov models. Bioinformatics, 14, 755763.
7 Pearson,W.R. and Lipman,D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 85, 24442448.
8 Altschul,S.F., Madden,T.L., Schaffer,A.A, Zhang,J., Zhang. Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
9 Dayhoff,M.O. (1976) The origin and evolution of protein superfamilies. Fed. Proc., 35, 21322138.[Web of Science][Medline]
10 Barker,W.C., Pfeiffer,F. and George,D.G. (1996) Superfamily classification in PIR-International Protein Sequence Database. Methods Enzymol., 266, 5971.[Web of Science][Medline]
11 Slock,J., Stahly,D.P., Han,C.Y., Six,E.W. and Crawford,I.P. (1990) An apparent Bacillus subtilis folic acid biosynthetic operon containing pab, an amphibolic trpG gene, a third gene required for synthesis of para-aminobenzoic acid, and the dihydropteroate synthase gene. J. Bacteriol., 172, 72117226.
12 Heidelberg,J.F., Eisen,J.A., Nelson,W.C., Clayton,R.A., Gwinn,M.L., Dodson,R.J., Haft,D.H., Hickey,E.K., Peterson,J.D., Umayam,L., Gill,S.R., Nelson,K.E., Read,T.D., Tettelin,H., Richardson,D., Ermolaeva,M.D., Vamathevan,J., Bass,S., Qin,H., Dragoi,I., Sellers,P., McDonald,L., Utterback,T., Fleishmann,R.D., Nierman,W.C. and White,O. (2000) DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature, 406, 477483.[Medline]
13 Peterson,J.D., Umayam,L.A., Hickey,E.K. and White,O. (2001) The Comprehensive Microbial Resource. Nucleic Acids Res., 29, 123125.
This article has been cited by other articles:
![]() |
Y.-F. Ma, Y. Zhang, J.-Y. Zhang, D.-W. Chen, Y. Zhu, H. Zheng, S.-Y. Wang, C.-Y. Jiang, G.-P. Zhao, and S.-J. Liu The Complete Genome of Comamonas testosteroni Reveals Its Genetic Adaptations to Changing Environments Appl. Envir. Microbiol., November 1, 2009; 75(21): 6812 - 6819. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Rodriguez-Minguela, J. H. A. Apajalahti, B. Chai, J. R. Cole, and J. M. Tiedje Worldwide Prevalence of Class 2 Integrases outside the Clinical Setting Is Associated with Human Impact Appl. Envir. Microbiol., August 1, 2009; 75(15): 5100 - 5110. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Strauch, J. A. Hammerl, A. Konietzny, S. Schneiker-Bekel, W. Arnold, A. Goesmann, A. Puhler, and L. Beutin Bacteriophage 2851 Is a Prototype Phage for Dissemination of the Shiga Toxin Variant Gene 2c in Escherichia coli O157:H7 Infect. Immun., December 1, 2008; 76(12): 5466 - 5477. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Li, X. Dai, and X. Zhao A nearest neighbor approach for automated transporter prediction and categorization from protein sequences Bioinformatics, May 1, 2008; 24(9): 1129 - 1136. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ansong, S. O. Purvine, J. N. Adkins, M. S. Lipton, and R. D. Smith Proteogenomics: needs and roles to be filled by proteomics in genome annotation Brief Funct Genomic Proteomic, March 10, 2008; (2008) eln010v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Xu, C. W. Saunders, P. Hu, R. A. Grant, T. Boekhout, E. E. Kuramae, J. W. Kronstad, Y. M. DeAngelis, N. L. Reeder, K. R. Johnstone, et al. Dandruff-associated Malassezia genomes reveal convergent and divergent virulence traits shared with plant and human fungal pathogens PNAS, November 20, 2007; 104(47): 18730 - 18735. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W. Udwary, L. Zeigler, R. N. Asolkar, V. Singan, A. Lapidus, W. Fenical, P. R. Jensen, and B. S. Moore Genome sequencing reveals complex secondary metabolome in the marine actinomycete Salinispora tropica PNAS, June 19, 2007; 104(25): 10376 - 10381. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Wegmann, M. O'Connell-Motherway, A. Zomer, G. Buist, C. Shearman, C. Canchaya, M. Ventura, A. Goesmann, M. J. Gasson, O. P. Kuipers, et al. Complete Genome Sequence of the Prototype Lactic Acid Bacterium Lactococcus lactis subsp. cremoris MG1363 J. Bacteriol., April 15, 2007; 189(8): 3256 - 3270. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Uchiyama MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groups Nucleic Acids Res., January 12, 2007; 35(suppl_1): D343 - D346. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Lanie, W.-L. Ng, K. M. Kazmierczak, T. M. Andrzejewski, T. M. Davidsen, K. J. Wayne, H. Tettelin, J. I. Glass, and M. E. Winkler Genome Sequence of Avery's Virulent Serotype 2 Strain D39 of Streptococcus pneumoniae and Comparison with That of Unencapsulated Laboratory Strain R6 J. Bacteriol., January 1, 2007; 189(1): 38 - 51. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. S. Goldman, W. C. Nierman, D. Kaiser, S. C. Slater, A. S. Durkin, J. A. Eisen, C. M. Ronning, W. B. Barbazuk, M. Blanchard, C. Field, et al. Evolution of sensory complexity recorded in a myxobacterial genome PNAS, October 10, 2006; 103(41): 15200 - 15205. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. P. Duffy, A. M. Young, B. Morin, C. J. Lucarotti, B. F. Koop, and D. B. Levin Sequence Analysis and Organization of the Neodiprion abietis Nucleopolyhedrovirus Genome J. Virol., July 15, 2006; 80(14): 6952 - 6963. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Thieme, R. Koebnik, T. Bekel, C. Berger, J. Boch, D. Buttner, C. Caldana, L. Gaigalat, A. Goesmann, S. Kay, et al. Insights into Genome Plasticity and Pathogenicity of the Plant Pathogenic Bacterium Xanthomonas campestris pv. vesicatoria Revealed by the Complete Genome Sequence J. Bacteriol., November 1, 2005; 187(21): 7254 - 7266. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. L. Poole II, B. A. Gerwe, R. C. Hopkins, G. J. Schut, M. V. Weinberg, F. E. Jenney Jr., and M. W. W. Adams Defining Genes in the Genome of the Hyperthermophilic Archaeon Pyrococcus furiosus: Implications for All Microbial Genomes J. Bacteriol., November 1, 2005; 187(21): 7325 - 7332. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. den Hengst, S. A. F. T. van Hijum, J. M. W. Geurts, A. Nauta, J. Kok, and O. P. Kuipers The Lactococcus lactis CodY Regulon: IDENTIFICATION OF A CONSERVED cis-REGULATORY ELEMENT J. Biol. Chem., October 7, 2005; 280(40): 34332 - 34342. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. K. Saini and D. Fischer Meta-DP: domain prediction meta-server Bioinformatics, June 15, 2005; 21(12): 2917 - 2920. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Ramos, M. Martinez-Bueno, A. J. Molina-Henares, W. Teran, K. Watanabe, X. Zhang, M. T. Gallegos, R. Brennan, and R. Tobes The TetR Family of Transcriptional Repressors Microbiol. Mol. Biol. Rev., June 1, 2005; 69(2): 326 - 356. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. W. Schmidt, J. T. Nelson, D. A. Rasko, S. Sudek, J. A. Eisen, M. G. Haygood, and J. Ravel Patellamide A and C biosynthesis by a microcin-like pathway in Prochloron didemni, the cyanobacterial symbiont of Lissoclinum patella PNAS, May 17, 2005; 102(20): 7315 - 7320. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Dziejman, D. Serruto, V. C. Tam, D. Sturtevant, P. Diraphat, S. M. Faruque, M. H. Rahman, J. F. Heidelberg, J. Decker, L. Li, et al. Genomic characterization of non-O1, non-O139 Vibrio cholerae reveals genes for a type III secretion system PNAS, March 1, 2005; 102(9): 3465 - 3470. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. H. Haft, J. D. Selengut, L. M. Brinkac, N. Zafar, and O. White Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics Bioinformatics, February 1, 2005; 21(3): 293 - 306. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bairoch, R. Apweiler, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, et al. The Universal Protein Resource (UniProt) Nucleic Acids Res., January 1, 2005; 33(suppl_1): D154 - D159. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. C. Nierman, D. DeShazer, H. S. Kim, H. Tettelin, K. E. Nelson, T. Feldblyum, R. L. Ulrich, C. M. Ronning, L. M. Brinkac, S. C. Daugherty, et al. From the Cover: Structural flexibility in the Burkholderia mallei genome PNAS, September 28, 2004; 101(39): 14246 - 14251. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Zingoni, T. Sornasse, B. G. Cocks, Y. Tanaka, A. Santoni, and L. L. Lanier Cross-Talk between Activated Human NK Cells and CD4+ T Cells via OX40-OX40 Ligand Interactions J. Immunol., September 15, 2004; 173(6): 3716 - 3724. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Seshadri, G. S. A. Myers, H. Tettelin, J. A. Eisen, J. F. Heidelberg, R. J. Dodson, T. M. Davidsen, R. T. DeBoy, D. E. Fouts, D. H. Haft, et al. Comparison of the genome of the oral pathogen Treponema denticola with other spirochete genomes PNAS, April 13, 2004; 101(15): 5646 - 5651. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Apweiler, A. Bairoch, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, et al. UniProt: the Universal Protein knowledgebase Nucleic Acids Res., January 1, 2004; 32(90001): D115 - 119. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Miller, J. F. Heidelberg, J. A. Eisen, W. C. Nelson, A. S. Durkin, A. Ciecko, T. V. Feldblyum, O. White, I. T. Paulsen, W. C. Nierman, et al. Complete Genome Sequence of the Broad-Host-Range Vibriophage KVP40: Comparative Genomics of a T4-Related Bacteriophage J. Bacteriol., September 1, 2003; 185(17): 5220 - 5233. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. McDermott and R. Samudrala Bioverse: functional, structural and contextual annotation of proteins and proteomes Nucleic Acids Res., July 1, 2003; 31(13): 3736 - 3737. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Y. Kim, G.-S. Choi, I.-S. Jung, Y.-W. Ryu, and G.-J. Kim A systematic approach for yielding a potential pool of enzymes: practical case for chiral resolution of (R,S)-ketoprofen ethyl ester Protein Eng. Des. Sel., May 1, 2003; 16(5): 357 - 364. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Seshadri, I. T. Paulsen, J. A. Eisen, T. D. Read, K. E. Nelson, W. C. Nelson, N. L. Ward, H. Tettelin, T. M. Davidsen, M. J. Beanan, et al. Complete genome sequence of the Q-fever pathogen Coxiellaburnetii PNAS, April 29, 2003; 100(9): 5455 - 5460. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Camon, M. Magrane, D. Barrell, D. Binns, W. Fleischmann, P. Kersey, N. Mulder, T. Oinn, J. Maslen, A. Cox, et al. The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro Genome Res., April 1, 2003; 13(4): 662 - 672. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Brooksbank, E. Camon, M. A. Harris, M. Magrane, M. J. Martin, N. Mulder, C. O'Donovan, H. Parkinson, M. A. Tuli, R. Apweiler, et al. The European Bioinformatics Institute's data resources Nucleic Acids Res., January 1, 2003; 31(1): 43 - 50. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Uchiyama MBGD: microbial genome database for comparative analysis Nucleic Acids Res., January 1, 2003; 31(1): 58 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, D. Barrell, A. Bateman, D. Binns, M. Biswas, P. Bradley, P. Bork, et al. The InterPro Database, 2003 brings increased coverage and new features Nucleic Acids Res., January 1, 2003; 31(1): 315 - 318. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Boeckmann, A. Bairoch, R. Apweiler, M.-C. Blatter, A. Estreicher, E. Gasteiger, M. J. Martin, K. Michoud, C. O'Donovan, I. Phan, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 Nucleic Acids Res., January 1, 2003; 31(1): 365 - 370. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. H. Haft, J. D. Selengut, and O. White The TIGRFAMs database of protein families Nucleic Acids Res., January 1, 2003; 31(1): 371 - 373. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Turchin and I. S. Kohane Gene homology resources on the World Wide Web Physiol Genomics, December 3, 2002; 11(3): 165 - 177. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. T. Paulsen, R. Seshadri, K. E. Nelson, J. A. Eisen, J. F. Heidelberg, T. D. Read, R. J. Dodson, L. Umayam, L. M. Brinkac, M. J. Beanan, et al. The Brucellasuis genome reveals fundamental similarities between animal and plant pathogens and symbionts PNAS, October 1, 2002; 99(20): 13148 - 13153. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Tettelin, V. Masignani, M. J. Cieslewicz, J. A. Eisen, S. Peterson, M. R. Wessels, I. T. Paulsen, K. E. Nelson, I. Margarit, T. D. Read, et al. Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae PNAS, September 17, 2002; 99(19): 12391 - 12396. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Galagan, C. Nusbaum, A. Roy, M. G. Endrizzi, P. Macdonald, W. FitzHugh, S. Calvo, R. Engels, S. Smirnov, D. Atnoor, et al. The Genome of M. acetivorans Reveals Extensive Metabolic and Physiological Diversity Genome Res., April 1, 2002; 12(4): 532 - 542. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bateman, E. Birney, L. Cerruti, R. Durbin, L. Etwiller, S. R. Eddy, S. Griffiths-Jones, K. L. Howe, M. Marshall, and E. L. L. Sonnhammer The Pfam Protein Families Database Nucleic Acids Res., January 1, 2002; 30(1): 276 - 280. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. C. Nierman, T. V. Feldblyum, M. T. Laub, I. T. Paulsen, K. E. Nelson, J. Eisen, J. F. Heidelberg, M. R. K. Alley, N. Ohta, J. R. Maddock, et al. Complete genome sequence of Caulobacter crescentus PNAS, March 16, 2001; (2001) 61029298. [Abstract] [Full Text] |
||||
![]() |
J. D. Peterson, L. A. Umayam, T. Dickinson, E. K. Hickey, and O. White The Comprehensive Microbial Resource Nucleic Acids Res., January 1, 2001; 29(1): 123 - 125. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. C. Nierman, T. V. Feldblyum, M. T. Laub, I. T. Paulsen, K. E. Nelson, J. Eisen, J. F. Heidelberg, M. R. K. Alley, N. Ohta, J. R. Maddock, et al. Complete genome sequence of Caulobacter crescentus PNAS, March 27, 2001; 98(7): 4136 - 4141. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||














