Article |
Pfam: clans, web tools and services
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SA, UK 1Center for Genomics and Bioinformatics, Karolinska Institutet S-171 77 Stockholm, Sweden 2Department of Genetics, Howard Hughes Medical Institute, Washington University School of Medicine St Louis, MO 63110, USA
*To whom correspondence should be addressed. Tel: +44 1223 495330; Fax: +44 1223 494919; Email: rdf{at}sanger.ac.uk
Received September 15, 2005. Revised October 19, 2005. Accepted October 28, 2005.
| ABSTRACT |
|---|
|
|
|---|
Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (http://www.sanger.ac.uk/Software/Pfam/), the USA (http://pfam.wustl.edu/), France (http://pfam.jouy.inra.fr/) and Sweden (http://pfam.cgb.ki.se/).
| INTRODUCTION |
|---|
|
|
|---|
Pfam is a comprehensive database of protein families, containing 7973 families in the current release (18.0). Each family is manually curated and is represented by two multiple sequence alignments, two profile-Hidden Markov Models (profile-HMMs) and an annotation file. All data are available for download in flatfile format from the FTP sites linked from each Pfam website and also as a set of MySQL relational database files. Pfam families are periodically updated with each family having on average been modified four times since its creation. The data and additional features are accessible via the four websites (http://www.sanger.ac.uk/Software/Pfam/, http://pfam.wustl.edu, http://pfam.jouy.inra.fr/ and http://Pfam.cgb.ki.se/). The structure and use of Pfam are well established and are documented elsewhere (1,2).
Several new features have been added to Pfam in the past 2 years. The main focus of this paper will be to describe a change in Pfam philosophy that has allowed us to group protein families into a hierarchical classification of clans. In the latter sections, we will describe new web tools and Pfam web services. An additional feature, iPfam (a sister database containing details of Pfam domaindomain interactions) has been described in a recent publication (3).
| THE GROWTH OF PFAM |
|---|
|
|
|---|
Pfam has increased by 1783 families since Pfam release 10.0 (1). Despite the near doubling of sequences in the underlying sequence database over the past 2 years, the fraction of sequences in UniProt (4) that match a Pfam family remains at 75%. One of the main uses of Pfam is genome annotation, thus an important measure is the coverage of the non-redundant set of proteins encoded by a genome, called proteome coverage. Table 1 shows the increase in Pfam coverage of a selected set of proteomes since Pfam began 9 years ago. The proteomes analysed were the bacteria Escherichia coli and Rickettsia prowazekii, the archaeon Methanococcus jannaschii, the yeast Saccharomyces cerevisiae, the nematode Caenorhabditis elegans and Homo sapiens. Release 5.5 contained 2478 families/models, with an average protein coverage (the fraction of proteins with at least one hit to Pfam) of 53% and an average residue coverage (the fraction of residues matched to a Pfam family) of 34%. Despite a large increase in the number of models (an additional 3712 models) between releases 5.5 and 10.0, there was an average increase in protein coverage of 14% and residue coverage of 12%. A less than proportional increase in coverage is observed for the 1783 models added between Pfam 10.0 and 18.0, such that release 18.0 matches an average of 71 and 48% of sequences and residues, respectively. This illustrates a law of diminishing returns for adding new families.
|
Nevertheless, over the past 2 years there has been a steady increase in both measures of proteome coverage. Pfam now matches 6084% of proteins in each proteome compared with 4762% 5 years ago and 5776%, just 2 years ago.
| PFAM CLANS |
|---|
|
|
|---|
One of the fundamental philosophies of Pfam is that new protein families are not allowed to overlap with existing Pfam entries (2). Thus, any residue in a given sequence can only appear in one Pfam family. Building new Pfam families and/or revisiting existing families often highlights two important points. (i) Many Pfam families are related and may have artificially high thresholds to stop them from overlapping. (ii) For some large, divergent families we cannot build a single HMM that detects all examples of the family. To resolve these issues, we have introduced Pfam clans.
What are Pfam clans?
A clan contains two or more Pfam families that have arisen from a single evolutionary origin. We use up to four independent pieces of evidence to help assess whether families are related: related structure, related function, significant matching of the same sequence to HMMs from different families and profileprofile comparisons. To perform profileprofile comparisons we use PRC 1.5.2 (downloadable from http://supfam.mrc-lmb.cam.ac.uk/PRC/). Currently, the presence of related structures and significant profileprofile comparison scores are our primary indicators of a relationship between families. From an analysis comparing Pfam families with a known structure, we deem a significant profileprofile comparison score as one with an E-value of <0.001. Profileprofile comparison E-values in the range of 0.10.001 can indicate a true relationship, but we require additional evidence to include the family in the clan.
After identifying a set of related families, our first aim is to try and merge them to make a single, comprehensive model that detects all of the proteins detected by the individual models. If this cannot be achieved we create a clan, with the maximum coverage using the minimum number of models. However, as mentioned previously, having a set of related families has historically led to artificially high thresholds to prevent the families from overlapping. To remedy this situation, thresholds are redefined, to include the maximum number of significant matches, excluding all false positives. This can cause overlaps between the members of a clan. To maintain the no overlap rule in Pfam, only the best scoring match is reported and presented in the set of full alignments. For example, the sequence Q5Z855 matches the ENTH domain (PF01417) with a score of 16.5 bits and the ANTH domain (PF07651) with a score of 327.5 bits, but the match only appears in the alignment of ANTH. We have updated the software that allows searching of Pfam models locally (pfam_scan.pl) so that it resolves overlaps between clan members in a similar fashion. The clans are annotated in an analogous way to Pfam entries, including a stable accession of the form CL0001, short identifier, one line description and a summary of the clan. Where appropriate, cross-references to other databases are included (Figure 1A).
|
As of Pfam 18.0, there were 172 clans, containing 1181 Pfam families. This represents 15% of Pfam families and as these tend to represent the larger protein families, account for 31% of the domain hits in Pfam. Clans help us to improve the annotation of families. For example, knowing the 3D structure of a domain is an essential part of understanding the biology of that domain. Pfam clans are helping to identify, previously undetected, structural homologues. Currently, 66% of all families in clans contain at least one sequence with a known 3D structure. A further 418 families (30%) where a structural homologue is not found in the family are in a clan where at least one family contains a known 3D structure. In addition to relating families with unknown structure to those with a known structure, we can also use Pfam clans to improve annotation. Currently, there are 81 domains of unknown function (DUFs/UPFs) in clans. We can assign a putative function to 78 of these DUFs, based on similarities of these DUFs to characterized families in a clan.
Pfam clans provide a hierarchical view of a diverse range of proteins families. How do Pfam clans relate to other classifications of protein families? There are many databases providing a hierarchical view of protein sequence space, using a variety of techniques, e.g. SCOP (5), CATH (6), SUPFAM (7), Protomap (8) and Superfamily (9). Below, we consider two of the more closely related databases.
SCOP is a classification of proteins of known structure (5). Many of the Pfam clans have a similar family membership to SCOP superfamilies, as both classification systems use structures to relate families. However, there is not a one-to-one correlation between Pfam clans and SCOP superfamilies. Profileprofile comparisons can detect significant similarities between families occurring in different SCOP superfamilies. For example, the TIM Barrel (CL0036) and NADP Rossmann (CL0063) clans cover multiple, related SCOP superfamilies (8 and 4, respectively). There are some Pfam clans, though, that are not as comprehensive as SCOP superfamilies, often owing to a lack of protein sequence coverage preventing the generation of effective seed alignments. The primary difference though, is that the Pfam classification is not confined to those families with a known 3D structure. Indeed, some Pfam clans contain groups of related families where none of the members have a determined 3D structure. For example, the Major Facilitator Superfamily is a clan of 19 Pfam families and should be a high priority for structural genomics.
The SUPFAM database (7) classifies Pfam families into superfamilies based on SCOP and RPS-BLAST profile comparisons. A brief comparison of SUPFAM superfamilies to Pfam clans is of interest. At the time of writing, SUPFAM was based on Pfam version 14.0, making this comparison less straightforward. The automated approach used by SUPFAM means that many more Pfam families have been classified into SUPFAM superfamilies. Many of these additional superfamilies contain a single Pfam family. In such cases a Pfam clan would not be created as Pfam clans are only created when there are two or more related Pfam families. Generally, corresponding clans/superfamilies have a similar membership. Where SUPFAM use SCOP for the classification of families, the differences described above for SCOP are paralleled. In addition, where there are differences in domain definitions between SCOP and Pfam, there has been some misclassification of Pfam domains. Interestingly, Pfam clans with no known structures tend to have a larger membership than the corresponding SUPFAM superfamilies, even accounting for the 524 families added since release 14.0.
| ACCESSING CLAN DATA |
|---|
|
|
|---|
There are three different ways of accessing the clan information. First, there is an additional release flatfile, Pfam-C, which contains all of the clan information and a list of the Pfam families that are members of the clan. Second, all of the information is contained in the Pfam MySQL database that we make available for download. Third, clan information can be accessed via the websites. There are two web entry points to the clan information. A user can browse by a list of clans or follow links from clan member families (Table 2). For each clan, we display annotation and a list of Pfam families that constitute the clan (Figure 1A). In addition, there are links to two additional features; a clan relationship diagram and a clan alignment.
|
The clan relationship diagrams show how the individual families are related to each other (Figure 1B). To produce these diagrams, we perform an all-against-all profileprofile comparison between the clan members. In the relationship diagram Pfam families are graph nodes. Edges are added between nodes when a significant profileprofile score is observed between two nodes (represented by solid lines). After all edges have been added in this way, any nodes/domains that have no connecting edges are identified. Where possible, these detached nodes are connected by adding an edge between it and the node in the clan with the highest scoring profileprofile score that falls above the 0.001 threshold (i.e. E-values 0.00110). A dashed line represents these edges in the final graph. Domains that have been brought into the clan based on a structural similarity may remain detached, indicating that profileprofile comparisons are not able to detect all distant relationships. The E-values used to construct the edges are displayed. These E-values are also clickable links to a visualization of the profileprofile alignment (10) (Figure 1C).
The clan alignment is an alignment of all the clan seed alignments (Figure 1D). These are produced by an option in MUSCLE (11) that aligns two input multiple sequence alignments without altering their local alignments. Where more than two seed alignments are being aligned, we use the profileprofile comparison scores to guide the progressive alignment procedure so that the most similar seed alignments are aligned first, before more divergent alignments. These alignments are pushing the boundaries of what is feasible to align by sequence alone, so the alignments must be treated with some caution.
| NEW WEBSITE FEATURES |
|---|
|
|
|---|
All of the Pfam mirror sites use the same underlying data and provide the same basic features. New tools and features based on the common dataset are being developed independently at the different sites. The new features that are available from the different mirror sites are described in Table 2.
Domain images
Influenced by SMART (12) and PROSITE (13), the graphical representation of domains has been updated on the website (Table 2). In addition to each domain having more of a 3D look, additional sequence features are now visualized. For example, we now include disulphide bonds and active sites (Figure 2A). The disulphide bond and active site data is derived from UniProt annotation (4). Pfam is often approached about the use of domain graphics in publications. To make our domain graphics more accessible and flexible, we have developed an interface so that users can customize a graphical view of a sequence. The user controls the style of a domain image using a simple XML file, enabling user-defined domains and sequence features to be added to the view (see XML schema at http://www.sanger.ac.uk/Software/Pfam/xml/pfamDomainGraphics.xsd). After uploading the XML file, the image is rendered and can be downloaded from the resulting page.
|
HMM logos
HMM logos are graphical representations of an HMM, which allow the visualization of its distinguishing features (14). HMM logos are provided for every Pfam family via the LogoMat-M tool (Table 2). As a variation of classical sequence logos, LogoMat-M uses relative entropy to identify residues that are of particular interest. The HMM logos are related to the profileprofile alignment logos shown in Figure 1C.
Coloured alignments
A new HTML/javascript multiple alignment view has been added to the website (Table 2). The alignments are shown in their natural linear format to avoid splitting up conserved blocks, which may happen in wrap-around formats (Figure 2B). In the new view, the sequence name column stays fixed when scrolling horizontally. Residues are coloured using the same methods as in Belvu (http://www.cgb.ki.se/cgb/groups/sonnhammer/Belvu.html), either according to conservation based on the average BLOSUM62 score or by residue type.
Domain query tools
The domain query tools have undergone significant reconstruction, making them more powerful, flexible and user friendly. Domain query functionality is now offered as a web interface, in form of the PfamAlyzer Java applet and as a web service for automated searches (Table 2). The web interface allows the user to select a set of Pfam domains and arrange them in order with the possibility to define the gap size in between. The domain query can be asked to widen the results, where appropriate, by exchanging a specified domain for all domain(s) within the clan.
PfamAlyzer is an applet that combines and extends many functions from the Pfam sites and integrates them into one tool. The domain query of PfamAlyzer is more user friendly and more powerful than the web interface. A graphical query language using drag and drop formulates the query. PfamAlyzer adds taxonomic analysis functionality to the domain query. Queries can be limited to specific taxonomic groups such as Chordata, which is especially helpful when studying architectures with a large number of members. PfamAlyzer can also display the query results as a species distribution that shows the resulting proteins as leaves on the species tree.
| ACCESSING PFAM USING WEB SERVICES |
|---|
|
|
|---|
We have implemented recently a range of web services that allow machine interoperable access to Pfam. Currently, the web services cover the following basic operations: annotation of a UniProt sequence based on an accession or identifier and access to Pfam family annotation. In the coming months, we plan to add services that allow the automatic download of alignments and searching of sequences. A basic Perl client is available for accessing these web services (Table 2).
The Pfam domain query described in the new website features section has also been implemented as a web service. An example to run within the JBossIDE (http://www.jboss.org/products/jbosside) is available at http://pfam.cgb.ki.se/pfamalyzer/example.zip.
| ACKNOWLEDGEMENTS |
|---|
We would like to thank all users who have contributed new families and annotation to Pfam. In addition, we would like to thank Matthew Fenech, Song Choon Lee, Rafaella Rossi, Arthur Wuster and Corin Yeates who have added many new families and clans to Pfam over the past 2 years. We would also like to thank Lorenzo Cerutti for maintaining the French Pfam website. This work was funded by The Wellcome Trust and an MRC (UK) E-science grant (G0100305). Funding to pay the Open Access publication charges for this article was provided by The Wellcome Trust.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., et al. (2004) The Pfam protein families database Nucleic Acids Res, . 32, D138D141
[Abstract/Free Full Text] . - Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Bateman, A., Durbin, R. (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains Nucleic Acids Res, . 26, 320322
[Abstract/Free Full Text] . - Finn, R.D., Marshall, M., Bateman, A. (2005) iPfam: visualization of proteinprotein interactions in PDB at domain and amino acid resolutions Bioinformatics, 21, 410412
[Abstract/Free Full Text] . - Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al. (2005) The Universal Protein Resource (UniProt) Nucleic Acids Res, . 33, D154D159
[Abstract/Free Full Text] . - Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J., Chothia, C., Murzin, A.G. (2004) SCOP database in 2004: refinements integrate structure and sequence family data Nucleic Acids Res, . 32, D226D229
[Abstract/Free Full Text] . - Pearl, F., Todd, A., Sillitoe, I., Dibley, M., Redfern, O., Lewis, T., Bennett, C., Marsden, R., Grant, A., Lee, D., et al. (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis Nucleic Acids Res, . 33, D247D251
[Abstract/Free Full Text] . - Pandit, S.B., Bhadra, R., Gowri, V.S., Balaji, S., Anand, B., Srinivasan, N. (2004) SUPFAM: a database of sequence superfamilies of protein domains BMC Bioinformatics, 5, 28[CrossRef][Medline] .
- Yona, G., Linial, N., Linial, M. (2000) ProtoMap: automatic classification of protein sequences and hierarchy of protein families Nucleic Acids Res, . 28, 4955
[Abstract/Free Full Text] . - Madera, M., Vogel, C., Kummerfeld, S.K., Chothia, C., Gough, J. (2004) The SUPERFAMILY database in 2004: additions and improvements Nucleic Acids Res, . 32, D235D239
[Abstract/Free Full Text] . - Schuster-Bockler, B. and Bateman, A. (2005) Visualizing profileprofile alignment: pairwise HMM logos Bioinformatics, 21, 29122913
[Abstract/Free Full Text] . - Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput Nucleic Acids Res, . 32, 17921797
[Abstract/Free Full Text] . - Letunic, I., Goodstadt, L., Dickens, N.J., Doerks, T., Schultz, J., Mott, R., Ciccarelli, F., Copley, R.R., Ponting, C.P., Bork, P. (2002) Recent improvements to the SMART domain-based sequence annotation resource Nucleic Acids Res, . 30, 242244
[Abstract/Free Full Text] . - Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., Bairoch, A. (2004) Recent improvements to the PROSITE database Nucleic Acids Res, . 32, D134D137
[Abstract/Free Full Text] . - Schuster-Bockler, B., Schultz, J., Rahmann, S. (2004) HMM Logos for visualization of protein families BMC Bioinformatics, 5, 7[CrossRef][Medline] .
- Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P., Bork, P. (2004) SMART 4.0: towards genomic data integration Nucleic Acids Res, . 32, D142D144
[Abstract/Free Full Text] . - Pruess, M., Kersey, P., Apweiler, R. (2005) The Integr8 project a resource for genomic and proteomic data In Silico Biol, . 5, 179185[Medline]
.
This article has been cited by other articles:
![]() |
J. R. Shak, J. J. Dick, R. J. Meinersmann, G. I. Perez-Perez, and M. J. Blaser Repeat-Associated Plasticity in the Helicobacter pylori RD Gene Family J. Bacteriol., November 15, 2009; 191(22): 6900 - 6910. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. G. Angers and A. J. Merz HOPS Interacts with Apl5 at the Vacuole Membrane and Is Required for Consumption of AP-3 Transport Vesicles Mol. Biol. Cell, November 1, 2009; 20(21): 4563 - 4574. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. M. Nava, D. Y. Lee, J. H. Ospina, S.-Y. Cai, and H. R. Gaskins Genomic analyses reveal a conserved glutathione homeostasis pathway in the invertebrate chordate Ciona intestinalis Physiol Genomics, November 1, 2009; 39(3): 183 - 194. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Ewen, F. Hannemann, Y. Khatri, O. Perlova, R. Kappl, D. Krug, J. Huttermann, R. Muller, and R. Bernhardt Genome Mining in Sorangium cellulosum So ce56: IDENTIFICATION AND CHARACTERIZATION OF THE HOMOLOGOUS ELECTRON TRANSFER PROTEINS OF A MYXOBACTERIAL CYTOCHROME P450 J. Biol. Chem., October 16, 2009; 284(42): 28590 - 28598. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Rancurel, M. Khosravi, A. K. Dunker, P. R. Romero, and D. Karlin Overlapping Genes Produce Proteins with Unusual Sequence Properties and Offer Insight into De Novo Protein Creation J. Virol., October 15, 2009; 83(20): 10719 - 10736. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Forslund and E. L. Sonnhammer Benchmarking homology detection procedures with low complexity filters Bioinformatics, October 1, 2009; 25(19): 2500 - 2505. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Longhi, T. R. Oliveira, E. C. Romero, A. P. Goncales, Z. M. de Morais, S. A. Vasconcellos, and A. L. T. O. Nascimento A newly identified protein of Leptospira interrogans mediates binding to laminin J. Med. Microbiol., October 1, 2009; 58(10): 1275 - 1282. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Pils and A. Heyl Unraveling the Evolution of Cytokinin Signaling Plant Physiology, October 1, 2009; 151(2): 782 - 791. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-G. Han, M. Dong, H. Liu, L. Camp, J. Geller, M. Singer, T. C. Hazen, M. Choi, H. E. Witkowska, D. A. Ball, et al. Survey of large protein complexes in D. vulgaris reveals great structural diversity PNAS, September 29, 2009; 106(39): 16580 - 16585. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Encinar, G. Fernandez-Ballester, I. E. Sanchez, E. Hurtado-Gomez, F. Stricher, P. Beltrao, and L. Serrano ADAN: a database for prediction of protein-protein interaction of modular domains mediated by linear motifs Bioinformatics, September 15, 2009; 25(18): 2418 - 2424. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Van Doorslaer, A. Ould M'hamed Ould Sidi, K. Zanier, V. Rybin, F. Deryckere, A. Rector, R. D. Burk, E. K. Lienau, M. van Ranst, and G. Trave Identification of Unusual E6 and E7 Proteins within Avian Papillomaviruses: Cellular Localization, Biophysical Characterization, and Phylogenetic Analysis J. Virol., September 1, 2009; 83(17): 8759 - 8770. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Lin, L. Li, Y. Guan, R. Soriano, C. S. Rivers, S. Mohan, A. Pandita, J. Tang, and Z. Modrusan Exon Array Profiling Detects EML4-ALK Fusion in Breast, Colorectal, and Non-Small Cell Lung Cancers Mol. Cancer Res., September 1, 2009; 7(9): 1466 - 1476. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Berkofsky-Fessler, T. Q. Nguyen, P. Delmar, J. Molnos, C. Kanwal, W. DePinto, J. Rosinski, P. McLoughlin, S. Ritland, M. DeMario, et al. Preclinical biomarkers for a cyclin-dependent kinase inhibitor translate to candidate pharmacodynamic biomarkers in phase I patients Mol. Cancer Ther., September 1, 2009; 8(9): 2517 - 2525. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. T. Konstantinidis, J. Braff, D. M. Karl, and E. F. DeLong Comparative Metagenomic Analysis of a Microbial Community Residing at a Depth of 4,000 Meters at Station ALOHA in the North Pacific Subtropical Gyre Appl. Envir. Microbiol., August 15, 2009; 75(16): 5345 - 5355. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Nahum, S. Goswami, and M. H. Serres Protein families reflect the metabolic diversity of organisms and provide support for functional prediction Physiol Genomics, August 7, 2009; 38(3): 250 - 260. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Tsuda, K. Kuwasako, M. Takahashi, T. Someya, M. Inoue, T. Terada, N. Kobayashi, M. Shirouzu, T. Kigawa, A. Tanaka, et al. Structural basis for the sequence-specific RNA-recognition mechanism of human CUG-BP1 RRM3 Nucleic Acids Res., August 1, 2009; 37(15): 5151 - 5166. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kawashima, S. Kawashima, C. Tanaka, M. Murai, M. Yoneda, N. H. Putnam, D. S. Rokhsar, M. Kanehisa, N. Satoh, and H. Wada Domain shuffling and the evolution of vertebrates Genome Res., August 1, 2009; 19(8): 1393 - 1403. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Fong and A. Marchler-Bauer CORAL: aligning conserved core regions across domain families Bioinformatics, August 1, 2009; 25(15): 1862 - 1868. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Weil, J. Korb, and M. Rehli Comparison of Queen-Specific Gene Expression in Related Lower Termite Species Mol. Biol. Evol., August 1, 2009; 26(8): 1841 - 1850. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-Y. Jiang, A. Christoffels, R. Ramamoorthy, and S. Ramachandran Expansion Mechanisms and Functional Annotations of Hypothetical Genes in the Rice Genome Plant Physiology, August 1, 2009; 150(4): 1997 - 2008. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Chen, L. J. Collins, P. J. Biggs, and D. Penny High Throughput Genome-Wide Survey of Small RNAs from the Parasitic Protists Giardia intestinalis and Trichomonas vaginalis Gen Biol Evol, July 24, 2009; 2009(0): 165 - 175. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Wang, S. Bird, A. Koussounadis, J. W. Holland, A. Carrington, J. Zou, and C. J. Secombes Identification of a Novel IL-1 Cytokine Family Member in Teleost Fish J. Immunol., July 15, 2009; 183(2): 962 - 974. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Skolnick and M. Brylinski FINDSITE: a combined evolution/structure-based approach to protein function prediction Brief Bioinform, July 1, 2009; 10(4): 378 - 391. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Wasmuth, J. Daub, J. M. Peregrin-Alvarez, C. A.M. Finney, and J. Parkinson The origins of apicomplexan sequence innovation Genome Res., July 1, 2009; 19(7): 1202 - 1213. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kumar and L. Cowen Augmented training of hidden Markov models to recognize remote homologs via simulated evolution Bioinformatics, July 1, 2009; 25(13): 1602 - 1608. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. W. Brandt and J. Heringa webPRC: the Profile Comparer for alignment-based searching of public domain databases Nucleic Acids Res., July 1, 2009; 37(suppl_2): W48 - W52. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-Y. Chu, Y.-F. Huang, C.-C. Huang, Y.-S. Cheng, C.-K. Huang, and Y.-J. Oyang ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors Nucleic Acids Res., July 1, 2009; 37(suppl_2): W396 - W401. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Price, P. S. Dehal, and A. P. Arkin FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix Mol. Biol. Evol., July 1, 2009; 26(7): 1641 - 1650. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. V. Spriggs, Y. Murakami, H. Nakamura, and S. Jones Protein function annotation from sequence: prediction of residues interacting with RNA Bioinformatics, June 15, 2009; 25(12): 1492 - 1497. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Lebeer, T. L. A. Verhoeven, G. Francius, G. Schoofs, I. Lambrichts, Y. Dufrene, J. Vanderleyden, and S. C. J. De Keersmaecker Identification of a Gene Cluster for the Biosynthesis of a Long, Galactose-Rich Exopolysaccharide in Lactobacillus rhamnosus GG and Functional Analysis of the Priming Glycosyltransferase Appl. Envir. Microbiol., June 1, 2009; 75(11): 3554 - 3563. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. Hall, S. Hester, J. L. Griffin, K. S. Lilley, and A. P. Jackson The Organelle Proteome of the DT40 Lymphocyte Cell Line Mol. Cell. Proteomics, June 1, 2009; 8(6): 1295 - 1305. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Iida, K. Fukami-Kobayashi, A. Toyoda, Y. Sakaki, M. Kobayashi, M. Seki, and K. Shinozaki Analysis of Multiple Occurrences of Alternative Splicing Events in Arabidopsis thaliana Using Novel Sequenced Full-Length cDNAs DNA Res, June 1, 2009; 16(3): 155 - 164. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Huttenhower, E. M. Haley, M. A. Hibbs, V. Dumeaux, D. R. Barrett, H. A. Coller, and O. G. Troyanskaya Exploring the human genome with functional maps Genome Res., June 1, 2009; 19(6): 1093 - 1106. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. de Jong, M. Eitel, W. Jakob, H.-J. Osigus, H. Hadrys, R. DeSalle, and B. Schierwater Multiple Dicer Genes in the Early-Diverging Metazoa Mol. Biol. Evol., June 1, 2009; 26(6): 1333 - 1340. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Dodd, S. A. Kocherginskaya, M. A. Spies, K. E. Beery, C. A. Abbas, R. I. Mackie, and I. K. O. Cann Biochemical Analysis of a {beta}-D-Xylosidase and a Bifunctional Xylanase-Ferulic Acid Esterase from a Xylanolytic Gene Cluster in Prevotella ruminicola 23 J. Bacteriol., May 15, 2009; 191(10): 3328 - 3338. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Wei, Y. Tao, G. Liu, C. Chen, R. Luo, H. Xia, Q. Gan, H. Zeng, Z. Lu, Y. Han, et al. Inaugural Article: A transcriptomic analysis of superhybrid rice LYP9 and its parents PNAS, May 12, 2009; 106(19): 7695 - 7701. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. A. Watkins, A. Gusnanto, B. de Bono, S. De, D. Miranda-Saavedra, D. L. Hardie, W. G. J. Angenent, A. P. Attwood, P. D. Ellis, W. Erber, et al. A HaemAtlas: characterizing gene expression in differentiated human blood cells Blood, May 7, 2009; 113(19): e1 - e9. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. S. Weiler and S. Chatterjee The Multi-AT-Hook Chromosomal Protein of Drosophila melanogaster, D1, Is Dispensable for Viability Genetics, May 1, 2009; 182(1): 145 - 159. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D.F. Meyer, D. C.G. Silva, C. Yang, K. F. Pedley, C. Zhang, M. van de Mortel, J. H. Hill, R. C. Shoemaker, R. V. Abdelnoor, S. A. Whitham, et al. Identification and Analyses of Candidate Genes for Rpp4-Mediated Resistance to Asian Soybean Rust in Soybean Plant Physiology, May 1, 2009; 150(1): 295 - 307. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Zimmermann, T. Sorg, S. Y. Siehler, and U. Gerischer Role of Acinetobacter baylyi Crc in Catabolite Repression of Enzymes for Aromatic Compound Catabolism J. Bacteriol., April 15, 2009; 191(8): 2834 - 2842. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zhang, W. Li, Q. Zhang, H. Wang, X. Xu, B. Diao, L. Zhang, and B. Kan The Core Oligosaccharide and Thioredoxin of Vibrio cholerae Are Necessary for Binding and Propagation of Its Typing Phage VP3 J. Bacteriol., April 15, 2009; 191(8): 2622 - 2629. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Waschkowitz, S. Rockstroh, and R. Daniel Isolation and Characterization of Metalloproteases with a Novel Domain Structure by Construction and Screening of Metagenomic Libraries Appl. Envir. Microbiol., April 15, 2009; 75(8): 2506 - 2516. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Codlin and S. E. Mole S. pombe btn1, the orthologue of the Batten disease gene CLN3, is required for vacuole protein sorting of Cpy1p and Golgi exit of Vps10p J. Cell Sci., April 15, 2009; 122(8): 1163 - 1173. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ferreras, E. D. Hernandez, O. H. Martinez-Costa, and J. J. Aragon Subunit Interactions and Composition of the Fructose 6-Phosphate Catalytic Site and the Fructose 2,6-Bisphosphate Allosteric Site of Mammalian Phosphofructokinase J. Biol. Chem., April 3, 2009; 284(14): 9124 - 9131. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Lehman, A. M. Kropinski, A. J. Castle, and A. M. Svircev Complete Genome of the Broad-Host-Range Erwinia amylovora Phage {Phi}Ea21-4 and Its Relationship to Salmonella Phage Felix O1 Appl. Envir. Microbiol., April 1, 2009; 75(7): 2139 - 2147. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Zheng, H. Li, C. Wang, Q. Sheng, H. Fan, S. Yang, B. Liu, J. Dai, R. Zeng, and L. Xie A platform to standardize, store, and visualize proteomics experimental data Acta Biochim Biophys Sin, April 1, 2009; 41(4): 273 - 279. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Hou, Z. Xu, W. Zhang, W. A. McLaughlin, D. A. Case, Y. Xu, and W. Wang Characterization of Domain-Peptide Interaction Interface: A Generic Structure-based Model to Decipher the Binding Specificity of SH3 Domains Mol. Cell. Proteomics, April 1, 2009; 8(4): 639 - 649. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Zhu, S. Dai, S. McClung, X. Yan, and S. Chen Functional Differentiation of Brassica napus Guard Cells and Mesophyll Cells Revealed by Comparative Proteomics Mol. Cell. Proteomics, April 1, 2009; 8(4): 752 - 766. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. W. Porter, Y. J. Zhu, D. T. Webb, and D. A. Christopher Novel thigmomorphogenetic responses in Carica papaya: touch decreases anthocyanin levels and stimulates petiole cork outgrowths Ann. Bot., April 1, 2009; 103(6): 847 - 858. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. E. Wozniak, C. Lee, and K. T. Hughes T-POP Array Identifies EcnR and PefI-SrgD as Novel Regulators of Flagellar Gene Expression J. Bacteriol., March 1, 2009; 191(5): 1498 - 1508. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Gibbons and A. Rokas Comparative and Functional Characterization of Intragenic Tandem Repeats in 10 Aspergillus Genomes Mol. Biol. Evol., March 1, 2009; 26(3): 591 - 602. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Russo, J.-E. Schweitzer, T. Polen, M. Bott, and E. Pohl Crystal Structure of the Caseinolytic Protease Gene Regulator, a Transcriptional Activator in Actinomycetes J. Biol. Chem., February 20, 2009; 284(8): 5208 - 5216. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. D. Hay, U. Remminghorst, and B. H. A. Rehm MucR, a Novel Membrane-Associated Regulator of Alginate Biosynthesis in Pseudomonas aeruginosa Appl. Envir. Microbiol., February 15, 2009; 75(4): 1110 - 1120. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Yin, F. Bangs, I. R. Paton, A. Prescott, J. James, M. G. Davey, P. Whitley, G. Genikhovich, U. Technau, D. W. Burt, et al. The Talpid3 gene (KIAA0586) encodes a centrosomal protein that is essential for primary cilia formation Development, February 15, 2009; 136(4): 655 - 664. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lassmann, O. Frings, and E. L. L. Sonnhammer Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features Nucleic Acids Res., February 1, 2009; 37(3): 858 - 865. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Cockell, L. Lo Presti, L. Cerutti, E. Cano Del Rosario, P. M. Hauser, and V. Simanis Functional Differentiation of tbf1 Orthologues in Fission and Budding Yeasts Eukaryot. Cell, February 1, 2009; 8(2): 207 - 216. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wang and T. Okamoto Involvement of Polypyrimidine Tract-Binding Protein (PTB)-Related Proteins in Pollen Germination in Arabidopsis Plant Cell Physiol., February 1, 2009; 50(2): 179 - 190. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Brochier-Armanet, E. Talla, and S. Gribaldo The Multiple Evolutionary Histories of Dioxygen Reductases: Implications for the Origin and Evolution of Aerobic Respiration Mol. Biol. Evol., February 1, 2009; 26(2): 285 - 297. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Battesti and E. Bouveret Bacteria Possessing Two RelA/SpoT-Like Proteins Have Evolved a Specific Stringent Response Involving the Acyl Carrier Protein-SpoT Interaction J. Bacteriol., January 15, 2009; 191(2): 616 - 624. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. B. Lawrenz, J. D. Lenz, and V. L. Miller A Novel Autotransporter Adhesin Is Required for Efficient Colonization during Bubonic Plague Infect. Immun., January 1, 2009; 77(1): 317 - 326. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Iida, T. Waki, K. Nakamura, Y. Mukouzaka, and T. Kudo The GAF-Like-Domain-Containing Transcriptional Regulator DfdR Is a Sensor Protein for Dibenzofuran and Several Hydrophobic Aromatic Compounds J. Bacteriol., January 1, 2009; 191(1): 123 - 134. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. W. Sayers, T. Barrett, D. A. Benson, S. H. Bryant, K. Canese, V. Chetvernin, D. M. Church, M. DiCuccio, R. Edgar, S. Federhen, et al. Database resources of the National Center for Biotechnology Information Nucleic Acids Res., January 1, 2009; 37(suppl_1): D5 - D15. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. Davis, C. G. Murphy, C. A. Saraceni-Richards, M. C. Rosenstein, T. C. Wiegers, and C. J. Mattingly Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks Nucleic Acids Res., January 1, 2009; 37(suppl_1): D786 - D792. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. B. K. Reddy, R. Riley, F. Wymore, P. Montgomery, D. DeCaprio, R. Engels, M. Gellesch, J. Hubble, D. Jen, H. Jin, et al. TB database: an integrated platform for tuberculosis research Nucleic Acids Res., January 1, 2009; 37(suppl_1): D499 - D508. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. J. V. Nordstrom, M. C. Lagerstrom, L. M. J. Waller, R. Fredriksson, and H. B. Schioth The Secretin GPCRs Descended from the Family of Adhesion GPCRs Mol. Biol. Evol., January 1, 2009; 26(1): 71 - 84. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Donald and E. I. Shakhnovich SDR: a database of predicted specificity-determining residues in proteins Nucleic Acids Res., January 1, 2009; 37(suppl_1): D191 - D194. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Shulman-Peleg, R. Nussinov, and H. J. Wolfson RsiteDB: a database of protein binding pockets that interact with RNA nucleotide bases Nucleic Acids Res., January 1, 2009; 37(suppl_1): D369 - D373. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Y. K. Lam, E. Khurana, G. Fang, P. Cayting, N. Carriero, K.-H. Cheung, and M. B. Gerstein Pseudofam: the pseudogene families database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D738 - D743. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. G. Tarcea, T. Weymouth, A. Ade, A. Bookvich, J. Gao, V. Mahavisno, Z. Wright, A. Chapman, M. Jayapandian, A. Ozgur, et al. Michigan molecular interactions r2: from interacting proteins to pathways Nucleic Acids Res., January 1, 2009; 37(suppl_1): D642 - D646. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Laskowski PDBsum new things Nucleic Acids Res., January 1, 2009; 37(suppl_1): D355 - D359. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Wilson, R. Pethica, Y. Zhou, C. Talbot, C. Vogel, M. Madera, C. Chothia, and J. Gough SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny Nucleic Acids Res., January 1, 2009; 37(suppl_1): D380 - D386. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Mabey Gilsenan, G. Atherton, J. Bartholomew, P. F. Giles, T. K. Attwood, D. W. Denning, and P. Bowyer Aspergillus Genomes and the Aspergillus Cloud Nucleic Acids Res., January 1, 2009; 37(suppl_1): D509 - D514. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Bordner Predicting small ligand binding sites in proteins using backbone structure Bioinformatics, December 15, 2008; 24(24): 2865 - 2871. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-P. Gagne, M. Isabelle, K. S. Lo, S. Bourassa, M. J. Hendzel, V. L. Dawson, T. M. Dawson, and G. G. Poirier Proteome-wide identification of poly(ADP-ribose) binding proteins and poly(ADP-ribose)-associated protein complexes Nucleic Acids Res., December 1, 2008; 36(22): 6959 - 6976. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-C. Shun, Y. Botbol, X. Li, F. Di Nunzio, J. E. Daigle, N. Yan, J. Lieberman, M. Lavigne, and A. Engelman Identification and Characterization of PWWP Domain Residues Critical for LEDGF/p75 Chromatin Binding and Human Immunodeficiency Virus Type 1 Infectivity J. Virol., December 1, 2008; 82(23): 11555 - 11567. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Chalkia, N. Nikolaidis, W. Makalowski, J. Klein, and M. Nei Origins and Evolution of the Formin Multigene Family That Is Involved in the Formation of Actin Filaments Mol. Biol. Evol., December 1, 2008; 25(12): 2717 - 2733. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Fukui, N. Nakagawa, Y. Kitamura, Y. Nishida, R. Masui, and S. Kuramitsu Crystal Structure of MutS2 Endonuclease Domain and the Mechanism of Homologous Recombination Suppression J. Biol. Chem., November 28, 2008; 283(48): 33417 - 33427. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Robinson, C. Overy, and E. R. S. Kunji The mechanism of transport by mitochondrial carriers based on analysis of symmetry PNAS, November 18, 2008; 105(46): 17766 - 17771. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhang, J. Ju, H. Peng, F. Gao, C. Zhou, Y. Zeng, Y. Xue, Y. Li, B. Henrissat, G. F. Gao, et al. Biochemical and Structural Characterization of the Intracellular Mannanase AaManA of Alicyclobacillus acidocaldarius Reveals a Novel Glycoside Hydrolase Family Belonging to Clan GH-A J. Biol. Chem., November 14, 2008; 283(46): 31551 - 31558. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. B. Rosenblum, J. E. Stajich, N. Maddox, and M. B. Eisen Global gene expression profiles for life stages of the deadly amphibian pathogen Batrachochytrium dendrobatidis PNAS, November 4, 2008; 105(44): 17034 - 17039. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. F. Del Papa and M. Perego Ethanolamine Activates a Sensor Histidine Kinase Regulating Its Utilization in Enterococcus faecalis J. Bacteriol., November 1, 2008; 190(21): 7147 - 7156. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. Vasconcelos, G. W. Li, M. A. Lubkowitz, and M. A. Grusak Characterization of the PT Clade of Oligopeptide Transporters in Rice The Plant Genome, November 1, 2008; 1(2): 77 - 88. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. F. de Souza, V. Anantharaman, S. J. de Souza, L. Aravind, and F. J. Gueiros-Filho AMIN domains have a predicted role in localization of diverse periplasmic protein complexes Bioinformatics, November 1, 2008; 24(21): 2423 - 2426. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Jadeau, E. Bechet, A. J. Cozzone, G. Deleage, C. Grangeasse, and C. Combet Identification of the idiosyncratic bacterial protein tyrosine kinase (BY-kinase) family signature Bioinformatics, November 1, 2008; 24(21): 2427 - 2430. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Somogyi, B. Sipos, Z. Penzes, E. Kurucz, J. Zsamboki, D. Hultmark, and I. Ando Evolution of Genes and Repeats in the Nimrod Superfamily Mol. Biol. Evol., November 1, 2008; 25(11): 2337 - 2347. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. S. Rajangam, M. Kumar, H. Aspeborg, G. Guerriero, L. Arvestad, P. Pansri, C. J.-L. Brown, S. Hober, K. Blomqvist, C. Divne, et al. MAP20, a Microtubule-Associated Protein in the Secondary Cell Walls of Hybrid Aspen, Is a Target of the Cellulose Synthesis Inhibitor 2,6-Dichlorobenzonitrile Plant Physiology, November 1, 2008; 148(3): 1283 - 1294. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||































