Nucleic Acids Research, 2002, Vol. 30, No. 1 276-280
© 2002 Oxford University Press
The Pfam Protein Families Database
Wellcome Trust Sanger Institute and 1The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK, 2SIB, ISREC, 155, ch. des Boveresses, CH-1066 Epalinges s/Lausanne, Switzerland, 3Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA and 4Center for Genomics and Bioinformatics, Karolinska Institutet, S-171 77 Stockholm, Sweden
Received September 19, 2001; Accepted September 25, 2001.
| ABSTRACT |
|---|
|
|
|---|
Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at http://www.sanger.ac.uk/Software/Pfam/, in Sweden at http://www.cgb.ki.se/Pfam/, in France at http://pfam.jouy.inra.fr/ and in the US at http://pfam.wustl.edu/. The latest version (6.6) of Pfam contains 3071 families, which match 69% of proteins in SWISS-PROT 39 and TrEMBL 14. Structural data, where available, have been utilised to ensure that Pfam families correspond with structural domains, and to improve domain-based annotation. Predictions of non-domain regions are now also included. In addition to secondary structure, Pfam multiple sequence alignments now contain active site residue mark-up. New search tools, including taxonomy search and domain query, greatly add to the functionality and usability of the Pfam resource.
| INTRODUCTION |
|---|
|
|
|---|
Pfam is a manually curated collection of protein families available via the web and in flat file form (1). Genome projects, including both the human and fly, have used Pfam extensively for large scale functional annotation of genomic data (2,3). The multiple sequence alignments around which Pfam families are built are important tools for understanding protein structure and function, and form the basis for techniques such as secondary structure prediction, fold recognition, phylogenetic analysis and mutation design. The latest version of Pfam (6.6) contains 3071 families that have matches to 69% of sequences and cover 49% of residues in the sequence database.
Each curated family in Pfam is represented by a seed and full alignment. The seed contains representative members of the family, while the full alignment contains all members of the family as detected with a profile hidden Markov model (HMM) constructed from the seed alignment using the HMMER2 software (http://hmmer.wustl.edu/). Full alignments can be large with the top 20 families now each containing over 2500 sequences. The majority of known protein sequences come from just a few thousand protein families. However, in an effort to be comprehensive, the curated families in Pfam-A are augmented by Pfam-B, an automatically generated supplement derived from the PRODOM database (4).
Pfam is available at four locations around the world, each providing a core set of functionality for accessing each family. Pfam is available in Europe on the World Wide Web at http://www.sanger.ac.uk/Software/Pfam/ (UK), http://www.cgb.ki.se/Pfam/ (Sweden) and http://pfam.jouy.inra.fr/ (France), and in the US at http://pfam.wustl.edu/. Documentation on the content and use of Pfam is available via the web. The web sites described above contain documentation on Pfam alignments, mark-up and family annotation. The alignments in Pfam are in Stockholm format, which is described in detail at http://www.cgb.ki.se/cgb/groups/sonnhammer/Stockholm.html, and the HMMER software is documented at http://hmmer.wustl.edu/.
| Pfam ANNOTATION |
|---|
|
|
|---|
Pfam contains annotation of each family in the form of textual descriptions, links to other resources and literature references. Pfam is a member of the InterPro consortium (5) and has, like the other member databases, contributed annotation and families to the InterPro project. InterPro aims to provide an integrated view of the diverse protein family databases and one of its strengths is that a comprehensive set of annotations has been created through the merging of information from each member. The InterPro annotation is often more comprehensive than the Pfam annotation, and so is imported into the Pfam web pages and can be accessed by following links to InterPro. Further improvements in the quality of Pfam family annotation are outlined in the following sections.
| STRUCTURAL DATA IMPROVES DOMAIN BOUNDARIES AND ANNOTATION |
|---|
|
|
|---|
Domains are the structural and functional building blocks of proteins, and so where the data are available, structural information has been used to ensure that Pfam families correspond to single structural domains. The domain boundaries used are currently those defined by the SCOP database (6) and a new web-based tool allows direct cross-linking from domains on the SCOP web site to the corresponding Pfam families. This matching of families and domains enables enhanced understanding of the function of multi-domain proteins. For example, the OTCace family contains two related enzymes, aspartate carbamoyl transferase and ornithine carbamoyl transferase. Structural data have shown that these approximately 300 amino acid proteins consist of two structurally similar domains, the N-terminal domain binds carbamoyl phosphate and the C-terminal domain binds aspartate/ornithine. Each domain is now represented by a well annotated Pfam family. These two activities are also found at the C-terminus of glutamate-dependent carbamoyl phosphate synthase, a large multi-domain protein whose Pfam-based annotation also now clearly describes ATP-binding and oligomerisation domains among others. In some cases, the action of chopping a single family into two or more structural domains also enables the elucidation of increased instances of the particular domain, sometimes in novel protein contexts. For example, the cytochrome reductase family has been split into its constituent FAD and NAD binding domains, which are found more generally in a number of oxidoreductases. In all, approximately 300 Pfam families have been split into two or more domains, with the domain boundaries of many more refined to better match the available structural data.
To help clarify these changes, we have introduced a new annotation field type (TP). At present, a Pfam family can be classified as a family, domain, repeat or motif. Family type is the default class which simply states that the members are related. A domain is defined as an autonomous structural unit, or a reusable sequence unit that may be found in multiple protein contexts. In contrast, a repeat is not usually stable in isolation; rather, multiple tandem repeats are usually required to form a globular domain or extended structure. Motifs generally describe shorter sequence units found outside globular domains. Pfam release 6.6 contains 2032 families, 980 domains, 54 repeats and 5 motifs.
Proteinprotein interaction data provide an important source of information for studying protein families and their cellular roles. We have used data from known three-dimensional protein complexes in the PDB (7) to infer proteinprotein interactions between Pfam domains. NCBI BLAST2 (8) was used to find the correspondence between known structures (PDB chains) and sequences in the sequence databases. These data were used to analyse structural complexes between Pfam domains. An example of the graphical interface to this data provided on the UK web site is shown in Figure 1.
|
| NON-DOMAIN ANNOTATION |
|---|
|
|
|---|
Although Pfam attempts to classify proteins into domains where possible, some regions of proteins are not expected to form stable globular domains. These include regions of biased amino acid composition [termed low sequence complexity regions (9)], coiled-coils, transmembrane regions and signal peptides. However, these regions are of considerable interest and so predictions are reported on the UK web site. These predictions are pre-computed over the sequence database by the following third party programs: TMHMM (10) (transmembrane regions), SignalP (11) (signal peptide regions), ncoils (12) (coiled-coil regions) and SEG (9) (low complexity regions). The regions and associated scores are stored in the Pfam relational database (see below).
Non-Pfam regions require a different web-based graphical representation. In contrast with Pfam-A and Pfam-B regions, non-Pfam regions can overlap with each other and with Pfam regions. Overlapping regions are resolved for the graphical display by a hierarchical approach. The default hierarchy (signal peptide > Pfam-A > transmembrane > Pfam-B > low complexity > coiled-coil) is easily changed by the user, to enable the visualisation of different features.
| ACTIVE SITE INFORMATION |
|---|
|
|
|---|
When viewing multiple sequence alignments it is useful to be able to see the sequence location of features of interest. Structural features have previously been incorporated into Pfam alignments, and more recently we have included active site residues. We have used the ACT_SITE feature table lines from SWISS-PROT as the data source. The alignments with added mark-up clearly show whether active site residues are conserved in all members of a family. The most frequent active site residues in SWISS-PROT are C, D, E, H, K, R, S and Y (Fig. 2). Other non-polar residues do occur, but at a much lower frequency. The glycine residues are found to be reactive bonds in trypsin inhibitors, which are not true active site residues. We can gain information about the nature of active site residue substitutions by examining the distribution of amino acids within columns that correspond to active site residues, also shown in Figure 2.
|
| TAXONOMY |
|---|
|
|
|---|
The taxonomy search tool (UK web site), allows the user to find Pfam entries specific to a group of organisms using a taxonomy query language. Complex queries are possible by using logical operators (AND, OR, NOT) and parentheses. The taxonomic information for each protein match is extracted from the SWISS-PROT/TrEMBL databases (13).
One use of this tool is to aid identification of putative drug targets. For example, as part of a screen for possible drug targets unique to the malaria parasite, one might want to identify all Pfam domains present in Plasmodium falciparum but not in the vertebrate host. The taxonomic query Plasmodium falciparum AND NOT Vertebrata returns 26 Pfam domains, 10 of which have already been postulated as drug targets against P.falciparum.
Using the taxonomy search software we have evaluated how the four major kingdoms (eukaryota, bacteria, archaea and viruses) are represented in the Pfam collection. The results are shown in Table 1. The data clearly show a bias towards eukaryotes, with over two-thirds of Pfam families containing a eukaryotic representative. A large number of these families are specific to eukaryotes, perhaps reflecting the invention of novel proteins in this kingdom, or possibly simply the biases in known protein sequence databases. Archaeabacterial proteins occur in just over one-third of Pfam families, reflecting the relatively small number of sequences, and only 49 families are restricted solely to archaea. Viral sequences are found in 571 Pfam families.
|
| ANALYSIS OF DOMAIN ARCHITECTURE EVOLUTION |
|---|
|
|
|---|
Pfam is an excellent resource for studying the evolution of domain architecture in proteins. To make such analyses possible even by the casual user, we have equipped the Pfam web servers with a number of tools. NIFAS allows visual inspection of domain architectures in an evolutionary tree, and has been described previously (14). Two new tools have been developed and are described below.
Similar domain organisation
One of Pfams main uses is to return the domain organisation of a protein of interest. This will inform the user which domain families it belongs to, as a valuable complement to traditional similarity searching. Another way to analyse sequence similarity is to look for proteins that share the same overall domain organisation, although these may not be the most sequence-similar proteins. This search functionality is now available on the Sweden web server. There is no obviously correct way to assign a score to similarity in domain organisation, so the proteins are heuristically ranked by the number of domains in common, from identical domain architectures, through re-ordered combinations, to smaller numbers of common domains. All proteins are listed as schematic graphics of the domain architectures, and their functional description may be shown.
Domain query tool
To ask other questions about the presence or absence of certain domain architecture features, a general purpose tool has been installed on the Sweden web server. A menu-driven interface allows the user to specify a query consisting of a set of Pfam domains, with or without ordering or gap constraints, similar to regular expressions. The user can retrieve a list of all proteins with a certain domain combination motif, e.g. all proteins with an Fz, a kringle and a protein kinase domain. It is also possible to perform negative queries, e.g. retrieve all proteins with an Fz and protein kinase domain that do not have a kringle domain in between. The results are ordered with the same graphical schematics as the previous tool.
| CHANGES TO Pfam SEARCHING |
|---|
|
|
|---|
Previously, Pfam families were based on hits to either global (ls) or fragment (fs) model HMMs. The latter does not penalise long gaps, thus allowing partial matches to the HMM to be found. The decision on which model to use for a specific family was largely arbitrary, but influenced by membership criteria. For example, families such as the REV family of viral anti-repression trans-activator proteins contain many proteins annotated as fragments in SWISS-PROT/TrEMBL, many of which are missed by an HMM search using the ls model. However, with increased emphasis on domain families, it seems more intuitive to base families on the global model to match whole domains where possible. To solve this problem, we have recently rebuilt all Pfam families using both ls and fs model HMMs, and calculated membership from the global model, but adding hits to the fs model which were not considered significant matches to the ls model. This approach has lead to a substantial increase in the number of protein matches to many families and also in coverage at the residue level.
A number of small format changes have been necessary as a result of this global change. Each model requires separate gathering thresholds (GA), and each has associated trusted (TC) and noise (NC) cutoffs. These numbers are all specified in the family annotation. Web-based searches now provide the option to search using global or fragment models.
As well as providing searches of the Pfam HMMs, the UK web site now offers the option to search against SMART (15) and TIGRFAM (16) HMM collections. Pfam, SMART and TIGRFAM domains may overlap so a tool has been provided to allow the display priority to be altered.
| THE Pfam RELATIONAL DATABASE |
|---|
|
|
|---|
The traditional implementation of Pfam, as a directory-structure of text files, one directory for each family, has proved to be extremely stable and robust. The revision control system has been used to provide an update history for the database, and allows us to re-create any release of the database. However, the text file based implementation is not well suited to performing cross-family queries on the live database, for example querying for all Pfam domains lying on a specific protein sequence. This kind of query is performed extensively in Pfam to enforce one of the key quality controls, the overlap criterion, which states that no residue of any protein can belong to more than one family. In the past, the only way to perform queries of this nature has been to search through the alignment files for every family, looking for occurrences of the sequence of interest. This is slow, and becomes slower as the number of families increases.
PfamRDB is a mySQL relational database consisting of approximately 10 tables adhering to a tight relational schema. It is updated in-phase with the live Pfam database to maintain absolute consistency. Some data (for example HMMs and alignments) are not currently stored in PfamRDB. PfamRDB also contains additional information, for example non-domain mark-up of sequences (low-complexity, coiled-coil, transmembrane and signal peptide, as described above), and also projections of Pfam domains onto solved structures in the PDB.
| ACKNOWLEDGEMENTS |
|---|
We are grateful to the many people who have submitted data to Pfam. In particular, William Mifsud, Matthew Bashton and Nina Mian have added many of the new families in Pfam. We thank Christian Storm and Volker Hollich for implementing the NIFAS and Domain Query tools. We are also grateful to Roman Laskowski for allowing us to incorporate protein structure pictures from the PDBsum resource (17) and to Rob Finn for helpful comments.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +44 1223 494950; Fax: +44 1223 494919; Email: agb{at}sanger.ac.uk
| REFERENCES |
|---|
|
|
|---|
-
1 Sonnhammer,E.L.L., Eddy,S.R. and Durbin,R. (1997) Pfam: A comprehensive database of protein domain families based on seed alignments. Proteins, 28, 405420.[Web of Science][Medline]
2 Adams,M.D., Celniker,S.E., Holt,R.A., Evans,C.A., Gocayne,J.D., Amanatides,P.G., Scherer,S.E., Li,P.W., Hoskins,R.A., Galle,R.F. et al. (2000) The genome sequence of Drosophila melanogaster. Science, 287, 21852195.
3 Lander,E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860921.[Medline]
4 Corpet,F., Servant,F., Gouzy,J. and Kahn,D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res., 28, 267269.
5 Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., Birney,E., Biswas,M., Bucher,P., Cerutti,L., Corpet,F., Croning,M.D. et al. (2000) InterProan integrated documentation resource for protein families, domains and functional sites. Bioinformatics, 16, 11451150.
6 Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536540.[Web of Science][Medline]
7 Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) The protein data bank: a computer-based archival file for macro-molecular structure. J. Mol. Biol., 112, 535542.[Web of Science][Medline]
8 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
9 Wootton,J.C. (1994) Sequences with unusual amino acid compositions. Curr. Opin. Struct. Biol., 4, 413421.[Web of Science]
10 Krogh,A., Larsson,B., von Heijne,G. and Sonnhammer,E.L.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol., 305, 567580.[Web of Science][Medline]
11 Nielsen,H., Brunak,S. and von Heijne,G. (1999) Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng., 12, 39.
12 Lupas,A., Van Dyke,M. and Stock,J. (1991) Predicting coiled coils from protein sequences. Science, 252, 11621164.
13 Bairoch,A. and Apweiler,R. (1999) The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res., 27, 4954.
14 Storm,C.E. and Sonnhammer,E.L.L. (2001) NIFAS: visual analysis of domain evolution in proteins. Bioinformatics, 17, 343348.
15 Ponting,C.P., Schultz,J., Milpetz,F. and Bork,P. (1999) SMART: identification and annotation of domains from signalling and extracellular protein sequences. Nucleic Acids Res., 27, 229232. Updated article in this issue: Nucleic Acids Res. (2002), 30, 242244.
16 Haft,D.H., Loftus,B.J., Richardson,D.L., Yang,F., Eisen,J.A., Paulsen,I.T. and White,O. (2001) TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res., 29, 4143.
17 Laskowski,R.A. (2001) PDBsum: summaries and analyses of PDB structures. Nucleic Acids Res., 29, 221222.
This article has been cited by other articles:
![]() |
Z. Lu, E. Altermann, F. Breidt, and S. Kozyavkin Sequence Analysis of Leuconostoc mesenteroides Bacteriophage {Phi}1-A4 Isolated from an Industrial Vegetable Fermentation Appl. Envir. Microbiol., March 15, 2010; 76(6): 1955 - 1966. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Q. Le and O. Gascuel Accounting for Solvent Accessibility and Secondary Structure in Protein Phylogenetics Is Clearly Beneficial Syst Biol, March 10, 2010; (2010) syq002v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Movahedi and D. J. Hampson Evaluation of recombinant Brachyspira pilosicoli oligopeptide-binding proteins as vaccine candidates in a mouse model of intestinal spirochaetosis J. Med. Microbiol., March 1, 2010; 59(3): 353 - 359. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kruse, C. Gehl, M. Geisler, M. Lehrke, P. Ringel, S. Hallier, R. Hansch, and R. R. Mendel Identification and Biochemical Characterization of Molybdenum Cofactor-binding Proteins from Arabidopsis thaliana J. Biol. Chem., February 26, 2010; 285(9): 6623 - 6635. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Zavaleta-Pastor, C. Sohlenkamp, J.-L. Gao, Z. Guan, R. Zaheer, T. M. Finan, C. R. H. Raetz, I. M. Lopez-Lara, and O. Geiger Sinorhizobium meliloti phospholipase C required for lipid remodeling during phosphorus limitation PNAS, January 5, 2010; 107(1): 302 - 307. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Meniche, C. Labarre, C. de Sousa-d'Auria, E. Huc, F. Laval, M. Tropis, N. Bayan, D. Portevin, C. Guilhot, M. Daffe, et al. Identification of a Stress-Induced Factor of Corynebacterineae That Is Involved in the Regulation of the Outer Membrane Lipid Composition J. Bacteriol., December 1, 2009; 191(23): 7323 - 7332. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. He, J. Lobsiger, and A. Stocker Bothnia dystrophy is caused by domino-like rearrangements in cellular retinaldehyde-binding protein mutant R234W PNAS, November 3, 2009; 106(44): 18545 - 18550. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-B. Jang, C. Ma, J.-Y. Lee, J.-H. Kim, S. J. Park, A.-R. Kwon, and B.-J. Lee NMR Solution Structure of HP0827 (O25501_HELPY) from Helicobacter pylori: Model of the Possible RNA-binding Site J. Biochem., November 1, 2009; 146(5): 667 - 674. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Petit, J. Zhang, P. J. Sapienza, E. J. Fuentes, and A. L. Lee Hidden dynamic allostery in a PDZ domain PNAS, October 27, 2009; 106(43): 18249 - 18254. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Savic, J. Lovric, T. I. Tomic, B. Vasiljevic, and G. L. Conn Determination of the target nucleosides for members of two families of 16S rRNA methyltransferases that confer resistance to partially overlapping groups of aminoglycoside antibiotics Nucleic Acids Res., September 1, 2009; 37(16): 5420 - 5431. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Baines, P. A. Bignone, M. D.A. King, A. M. Maggs, P. M. Bennett, J. C. Pinder, and G. W. Phillips The CKK Domain (DUF1781) Binds Microtubules and Defines the CAMSAP/ssp4 Family of Animal Proteins Mol. Biol. Evol., September 1, 2009; 26(9): 2005 - 2014. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Shang, Y. Tao, X. Chen, Y. Zou, C. Lei, J. Wang, X. Li, X. Zhao, M. Zhang, Z. Lu, et al. Identification of a New Rice Blast Resistance Gene, Pid3, by Genomewide Comparison of Paired Nucleotide-Binding Site-Leucine-Rich Repeat Genes and Their Pseudogene Alleles Between the Two Sequenced Rice Genomes Genetics, August 1, 2009; 182(4): 1303 - 1311. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. R. Guthrie, G. D. Schellenberg, and B. C. Kraemer SUT-2 potentiates tau-induced neurotoxicity in Caenorhabditis elegans Hum. Mol. Genet., May 15, 2009; 18(10): 1825 - 1838. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Jimenez, A. Lacasta, S. Vilches, M. Reyes, J. Vazquez, E. Aquillini, S. Merino, M. Regue, and J. M. Tomas Genetics and Proteomics of Aeromonas salmonicida Lipopolysaccharide Core Biosynthesis J. Bacteriol., April 1, 2009; 191(7): 2228 - 2236. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Wilhelms, S. Vilches, R. Molero, J. G. Shaw, J. M. Tomas, and S. Merino Two Redundant Sodium-Driven Stator Motor Proteins Are Involved in Aeromonas hydrophila Polar Flagellum Rotation J. Bacteriol., April 1, 2009; 191(7): 2206 - 2217. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Tong, S. J. Lim, H. C. Muh, F. T. Chew, and M. T. Tammi Allergen Atlas: a comprehensive knowledge center and analysis resource for allergen information Bioinformatics, April 1, 2009; 25(7): 979 - 980. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Karakuzu, D. P. Wang, and S. Cameron MIG-32 and SPAT-3A are PRC1 homologs that control neuronal migration in Caenorhabditis elegans Development, March 15, 2009; 136(6): 943 - 953. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Bondugula, M. S. Lee, and A. Wallqvist FIEFDom: a transparent domain boundary recognition system using a fuzzy mean operator Nucleic Acids Res., February 1, 2009; 37(2): 452 - 462. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lassmann, O. Frings, and E. L. L. Sonnhammer Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features Nucleic Acids Res., February 1, 2009; 37(3): 858 - 865. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Pacheco, M. Maccarana, D. R. Goodlett, A. Malmstrom, and L. Malmstrom Identification of the Active Site of DS-epimerase 1 and Requirement of N-Glycosylation for Enzyme Function J. Biol. Chem., January 16, 2009; 284(3): 1741 - 1747. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Y. Woo, S. K. P. Lau, C. S. F. Lam, K. K. Y. Lai, Y. Huang, P. Lee, G. S. M. Luk, K. C. Dyrting, K.-H. Chan, and K.-Y. Yuen Comparative Analysis of Complete Genome Sequences of Three Avian Coronaviruses Reveals a Novel Group 3c Coronavirus J. Virol., January 15, 2009; 83(2): 908 - 917. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. G. Holden, H. M. B. Seth-Smith, L. C. Crossman, M. Sebaihia, S. D. Bentley, A. M. Cerdeno-Tarraga, N. R. Thomson, N. Bason, M. A. Quail, S. Sharp, et al. The Genome of Burkholderia cenocepacia J2315, an Epidemic Pathogen of Cystic Fibrosis Patients J. Bacteriol., January 1, 2009; 191(1): 261 - 277. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. L. Winsor, T. Van Rossum, R. Lo, B. Khaira, M. D. Whiteside, R. E. W. Hancock, and F. S. L. Brinkman Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D483 - D488. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Q. Le, N. Lartillot, and O. Gascuel Phylogenetic mixture models for proteins Phil Trans R Soc B, December 27, 2008; 363(1512): 3965 - 3976. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Strauch, J. A. Hammerl, A. Konietzny, S. Schneiker-Bekel, W. Arnold, A. Goesmann, A. Puhler, and L. Beutin Bacteriophage 2851 Is a Prototype Phage for Dissemination of the Shiga Toxin Variant Gene 2c in Escherichia coli O157:H7 Infect. Immun., December 1, 2008; 76(12): 5466 - 5477. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Starcevic, J. Zucko, J. Simunkovic, P. F. Long, J. Cullum, and D. Hranueli ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures Nucleic Acids Res., December 1, 2008; 36(21): 6882 - 6892. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cui, Q. Liu, D. Puett, and Y. Xu Computational prediction of human proteins that can be secreted into the bloodstream Bioinformatics, October 15, 2008; 24(20): 2370 - 2375. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Azcarate-Peril, E. Altermann, Y. J. Goh, R. Tallon, R. B. Sanozky-Dawes, E. A. Pfeiler, S. O'Flaherty, B. L. Buck, A. Dobson, T. Duong, et al. Analysis of the Genome Sequence of Lactobacillus gasseri ATCC 33323 Reveals the Molecular Basis of an Autochthonous Intestinal Organism Appl. Envir. Microbiol., August 1, 2008; 74(15): 4610 - 4625. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Q. Le and O. Gascuel An Improved General Amino Acid Replacement Matrix Mol. Biol. Evol., July 1, 2008; 25(7): 1307 - 1320. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Jimenez, R. Canals, M. T. Salo, S. Vilches, S. Merino, and J. M. Tomas The Aeromonas hydrophila wb*O34 Gene Cluster: Genetics and Temperature Regulation J. Bacteriol., June 15, 2008; 190(12): 4198 - 4209. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Sheets, S. A. Grass, S. E. Miller, and J. W. St. Geme III Identification of a Novel Trimeric Autotransporter Adhesin in the Cryptic Genospecies of Haemophilus J. Bacteriol., June 15, 2008; 190(12): 4313 - 4320. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Amiri, M. Shimomura, R. Vijayan, H. Nishiwaki, M. Akamatsu, K. Matsuda, A. K. Jones, M. S. P. Sansom, P. C. Biggin, and D. B. Sattelle A Role for Leu118 of Loop E in Agonist Binding to the {alpha}7 Nicotinic Acetylcholine Receptor Mol. Pharmacol., June 1, 2008; 73(6): 1659 - 1667. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kubo, J. Abe, T. Oyamada, M. Ohnishi, H. Fukuzawa, Y. Matsuda, and T. Saito Characterization of novel genes induced by sexual adhesion and gamete fusion and of their transcriptional regulation in Chlamydomonas reinhardtii Plant Cell Physiol., June 1, 2008; 49(6): 981 - 993. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Komatsu, M. Tsuda, S. Omura, H. Oikawa, and H. Ikeda Identification and functional analysis of genes controlling biosynthesis of 2-methylisoborneol PNAS, May 27, 2008; 105(21): 7422 - 7427. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Jimenez, R. Canals, A. Lacasta, A. N. Kondakova, B. Lindner, Y. A. Knirel, S. Merino, M. Regue, and J. M. Tomas Molecular Analysis of Three Aeromonas hydrophila AH-3 (Serotype O34) Lipopolysaccharide Core Biosynthesis Gene Clusters J. Bacteriol., May 1, 2008; 190(9): 3176 - 3184. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lanzen and T. Oinn The Taverna Interaction Service: enabling manual interaction in workflows Bioinformatics, April 15, 2008; 24(8): 1118 - 1120. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Belchik and L. Xun Functions of Flavin Reductase and Quinone Reductase in 2,4,6-Trichlorophenol Degradation by Cupriavidus necator JMP134 J. Bacteriol., March 1, 2008; 190(5): 1615 - 1619. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Sakarya, K. S. Kosik, and T. H. Oakley Reconstructing ancestral genome content based on symmetrical best alignments and Dollo parsimony Bioinformatics, March 1, 2008; 24(5): 606 - 612. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Starcevic, S. Akthar, W. C. Dunlap, J. M. Shick, D. Hranueli, J. Cullum, and P. F. Long Enzymes of the shikimic acid pathway encoded in the genome of a basal metazoan, Nematostella vectensis, have microbial origins PNAS, February 19, 2008; 105(7): 2533 - 2537. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. R Kensche, V. van Noort, B. E Dutilh, and M. A Huynen Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution J R Soc Interface, February 6, 2008; 5(19): 151 - 170. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Lequette, E. Lanfroy, V. Cogez, J.-P. Bohin, and J.-M. Lacroix Biosynthesis of osmoregulated periplasmic glucans in Escherichia coli: the membrane-bound and the soluble periplasmic phosphoglycerol transferases are encoded by the same gene Microbiology, February 1, 2008; 154(2): 476 - 483. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Pasutto, C. Y. Mardin, K. Michels-Rautenstrauss, B. H. F. Weber, H. Sticht, G. Chavarria-Soley, B. Rautenstrauss, F. Kruse, and A. Reis Profiling of WDR36 Missense Variants in German Patients with Glaucoma Invest. Ophthalmol. Vis. Sci., January 1, 2008; 49(1): 270 - 274. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. Horner, W. Pirovano, and G. Pesole Correlated substitution analysis and the prediction of amino acid structural contacts Brief Bioinform, January 1, 2008; 9(1): 46 - 56. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Miller, S. Hanke, A. M. Hinsby, C. Friis, S. Brunak, M. Mann, and N. Blom Motif Decomposition of the Phosphotyrosine Proteome Reveals a New N-terminal Binding Motif for SHIP2 Mol. Cell. Proteomics, January 1, 2008; 7(1): 181 - 192. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ameline-Torregrosa, B.-B. Wang, M. S. O'Bleness, S. Deshpande, H. Zhu, B. Roe, N. D. Young, and S. B. Cannon Identification and Characterization of Nucleotide-Binding Site-Leucine-Rich Repeat Genes in the Model Plant Medicago truncatula Plant Physiology, January 1, 2008; 146(1): 5 - 21. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Bennett, R. M. Aimino, and J. R. McCormick Streptomyces coelicolor Genes ftsL and divIC Play a Role in Cell Division but Are Dispensable for Colony Formation J. Bacteriol., December 15, 2007; 189(24): 8982 - 8992. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Nilsson, N. Henriksson, A. Niedzwiecka, N. A. A. Balatsos, K. Kokkoris, J. Eriksson, and A. Virtanen A Multifunctional RNA Recognition Motif in Poly(A)-specific Ribonuclease with Cap and Poly(A) Binding Properties J. Biol. Chem., November 9, 2007; 282(45): 32902 - 32911. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Bergman and H. Quesneville Discovering and detecting transposable elements in genome sequences Brief Bioinform, November 1, 2007; 8(6): 382 - 392. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. K. Lemberg and M. Freeman Functional and evolutionary implications of enhanced genomic analysis of rhomboid intramembrane proteases Genome Res., November 1, 2007; 17(11): 1634 - 1646. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zucko, N. Skunca, T. Curk, B. Zupan, P.F. Long, J. Cullum, R.H. Kessin, and D. Hranueli Polyketide synthase genes and the natural products potential of Dictyostelium discoideum Bioinformatics, October 1, 2007; 23(19): 2543 - 2549. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. O. Allen, C. M. Fauron, P. Minx, L. Roark, S. Oddiraju, G. N. Lin, L. Meyer, H. Sun, K. Kim, C. Wang, et al. Comparisons Among Two Fertile and Three Male-Sterile Mitochondrial Genomes of Maize Genetics, October 1, 2007; 177(2): 1173 - 1192. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lunter HMMoC a compiler for hidden Markov models Bioinformatics, September 15, 2007; 23(18): 2485 - 2487. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Fujisawa, M. Ito, and T. A. Krulwich Three two-component transporters with channel-like properties have monovalent cation/proton antiport activity PNAS, August 14, 2007; 104(33): 13289 - 13294. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Vilches, R. Canals, M. Wilhelms, M. T. Salo, Y. A. Knirel, E. Vinogradov, S. Merino, and J. M. Tomas Mesophilic Aeromonas UDP-glucose pyrophosphorylase (GalU) mutants show two types of lipopolysaccharide structures and reduced virulence Microbiology, August 1, 2007; 153(8): 2393 - 2404. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Beaussart, J. Weiner 3rd, and E. Bornberg-Bauer Automated Improvement of Domain ANnotations using context analysis of domain arrangements (AIDAN) Bioinformatics, July 15, 2007; 23(14): 1834 - 1836. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. del Val, P. Ernst, M Falkenhahn, C. Fladerer, K. H. Glatting, S. Suhai, and A. Hotz-Wagenblatt ProtSweep, 2Dsweep and DomainSweep: protein analysis suite at DKFZ Nucleic Acids Res., July 13, 2007; 35(suppl_2): W444 - W450. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Nikolajewa, R. Pudimat, M. Hiller, M. Platzer, and R. Backofen BioBayesNet: a web server for feature extraction and Bayesian network modeling of biological sequence data Nucleic Acids Res., July 13, 2007; 35(suppl_2): W688 - W693. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Kann, S. L. Sheetlin, Y. Park, S. H. Bryant, and J. L. Spouge The identification of complete domains within protein sequences using accurate E-values for semi-global alignment Nucleic Acids Res., July 9, 2007; 35(14): 4678 - 4685. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Wang, A. V. Perepelov, L. Feng, S. D. Shevelev, Q. Wang, S. N. Senchenkova, W. Han, Y. Li, A. S. Shashkov, Y. A. Knirel, et al. A group of Escherichia coli and Salmonella enterica O antigens sharing a common backbone structure Microbiology, July 1, 2007; 153(7): 2159 - 2167. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W. Udwary, L. Zeigler, R. N. Asolkar, V. Singan, A. Lapidus, W. Fenical, P. R. Jensen, and B. S. Moore Genome sequencing reveals complex secondary metabolome in the marine actinomycete Salinispora tropica PNAS, June 19, 2007; 104(25): 10376 - 10381. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Harrison, X. Lu, and H. R. Horvitz LIN-61, One of Two Caenorhabditis elegans Malignant-Brain-Tumor-Repeat-Containing Proteins, Acts With the DRM and NuRD-Like Protein Complexes in Vulval Development but Not in Certain Other Biological Processes Genetics, May 1, 2007; 176(1): 255 - 271. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Wegmann, M. O'Connell-Motherway, A. Zomer, G. Buist, C. Shearman, C. Canchaya, M. Ventura, A. Goesmann, M. J. Gasson, O. P. Kuipers, et al. Complete Genome Sequence of the Prototype Lactic Acid Bacterium Lactococcus lactis subsp. cremoris MG1363 J. Bacteriol., April 15, 2007; 189(8): 3256 - 3270. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Canals, S. Vilches, M. Wilhelms, J. G. Shaw, S. Merino, and J. M. Tomas Non-structural flagella genes affecting both polar and lateral flagella-mediated motility in Aeromonas hydrophila Microbiology, April 1, 2007; 153(4): 1165 - 1175. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Takahashi, T. Kumagai, K. Kitani, M. Mori, Y. Matoba, and M. Sugiyama Cloning and Characterization of a Streptomyces Single Module Type Non-ribosomal Peptide Synthetase Catalyzing a Blue Pigment Synthesis J. Biol. Chem., March 23, 2007; 282(12): 9073 - 9081. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Contreras, P. Gomez-Puertas, M. Iijima, K. Kobayashi, T. Saheki, and J. Satrustegui Ca2+ Activation Kinetics of the Two Aspartate-Glutamate Mitochondrial Carriers, Aralar and Citrin: ROLE IN THE HEART MALATE-ASPARTATE NADH SHUTTLE J. Biol. Chem., March 9, 2007; 282(10): 7098 - 7106. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhao, J. H. Thomas, N. Chen, J. A. Sheps, and D. L. Baillie Comparative Genomics and Adaptive Selection of the ATP-Binding-Cassette Gene Family in Caenorhabditis Species Genetics, March 1, 2007; 175(3): 1407 - 1418. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Y. Woo, M. Wang, S. K. P. Lau, H. Xu, R. W. S. Poon, R. Guo, B. H. L. Wong, K. Gao, H.-w. Tsoi, Y. Huang, et al. Comparative Analysis of Twelve Genomes of Three Novel Group 2c and Group 2d Coronaviruses Reveals Unique Group and Subgroup Features J. Virol., February 15, 2007; 81(4): 1574 - 1585. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. F. Collins, K. Beis, C. Dong, C. H. Botting, C. McDonnell, R. C. Ford, B. R. Clarke, C. Whitfield, and J. H. Naismith The 3D structure of a periplasm-spanning platform required for assembly of group 1 capsular polysaccharides in Escherichia coli PNAS, February 13, 2007; 104(7): 2390 - 2395. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Sletvold, P. J. Johnsen, G. S. Simonsen, B. Aasnaes, A. Sundsfjord, and K. M. Nielsen Comparative DNA Analysis of Two vanA Plasmids from Enterococcus faecium Strains Isolated from Poultry and a Poultry Farmer in Norway Antimicrob. Agents Chemother., February 1, 2007; 51(2): 736 - 739. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Canals, N. Jimenez, S. Vilches, M. Regue, S. Merino, and J. M. Tomas Role of Gne and GalE in the Virulence of Aeromonas hydrophila Serotype O34 J. Bacteriol., January 15, 2007; 189(2): 540 - 550. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Nikolski and D. J. Sherman Family relationships: should consensus reign?--consensus clustering for protein families Bioinformatics, January 15, 2007; 23(2): e71 - e76. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. R. Jefferson, T. P. Walsh, T. J. Roberts, and G. J. Barton SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein-Protein Interactions Nucleic Acids Res., January 12, 2007; 35(suppl_1): D580 - D589. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Y. Galperin The Molecular Biology Database Collection: 2007 update Nucleic Acids Res., January 12, 2007; 35(suppl_1): D3 - D4. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Portugaly, N. Linial, and M. Linial EVEREST: a collection of evolutionary conserved protein domains Nucleic Acids Res., January 12, 2007; 35(suppl_1): D241 - D246. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. E. Snyder, N. Kampanya, J. Lu, E. K. Nordberg, H. R. Karur, M. Shukla, J. Soneja, Y. Tian, T. Xue, H. Yoo, et al. PATRIC: The VBI PathoSystems Resource Integration Center Nucleic Acids Res., January 12, 2007; 35(suppl_1): D401 - D406. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Feng, A. V. Perepelov, G. Zhao, S. D. Shevelev, Q. Wang, S. N. Senchenkova, A. S. Shashkov, Y. Geng, P. R. Reeves, Y. A. Knirel, et al. Structural and genetic evidence that the Escherichia coli O148 O antigen is the precursor of the Shigella dysenteriae type 1 O antigen and identification of a glucosyltransferase gene Microbiology, January 1, 2007; 153(1): 139 - 147. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Tian, J. Win, J. Song, R. van der Hoorn, E. van der Knaap, and S. Kamoun A Phytophthora infestans Cystatin-Like Protein Targets a Novel Tomato Papain-Like Apoplastic Protease Plant Physiology, January 1, 2007; 143(1): 364 - 377. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Kim, L. J. Lu, Y. Xia, and M. B. Gerstein Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights Science, December 22, 2006; 314(5807): 1938 - 1941. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Klose and J. W. Kronstad The Multifunctional {beta}-Oxidation Enzyme Is Required for Full Symptom Development by the Biotrophic Maize Pathogen Ustilago maydis Eukaryot. Cell, December 1, 2006; 5(12): 2047 - 2061. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Rampey, A. W. Woodward, B. N. Hobbs, M. P. Tierney, B. Lahner, D. E. Salt, and B. Bartel An Arabidopsis Basic Helix-Loop-Helix Leucine Zipper Protein Modulates Metal Homeostasis and Auxin Conjugate Responsiveness Genetics, December 1, 2006; 174(4): 1841 - 1857. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Muszynski, T. Dam, B. Li, D. M. Shirbroun, Z. Hou, E. Bruggemann, R. Archibald, E. V. Ananiev, and O. N. Danilevskaya delayed flowering1 Encodes a Basic Leucine Zipper Protein That Mediates Floral Inductive Signals at the Shoot Apex in Maize Plant Physiology, December 1, 2006; 142(4): 1523 - 1536. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. S. Mendiratta, N. Sekulic, F. G. Hernandez-Guzman, B. E. Close, A. Lavie, and K. J. Colley A Novel {alpha}-Helix in the First Fibronectin Type III Repeat of the Neural Cell Adhesion Molecule Is Critical for N-Glycan Polysialylation J. Biol. Chem., November 24, 2006; 281(47): 36052 - 36059. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Eichhorn, F. Lessing, B. Winterberg, J. Schirawski, J. Kamper, P. Muller, and R. Kahmann A Ferroxidation/Permeation Iron Uptake System Is Required for Virulence in Ustilago maydis PLANT CELL, November 1, 2006; 18(11): 3332 - 3345. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






























