Nucleic Acids Research, 2000, Vol. 28, No. 1 263-266
© 2000 Oxford University Press
The Pfam Protein Families Database
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK, 1Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA and 2Center for Genomics Research, Karolinska Institutet, S-171 77 Stockholm, Sweden
Received October 1, 1999; Accepted October 4, 1999.
| ABSTRACT |
|---|
|
|
|---|
Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at http://www.sanger.ac.uk/Software/Pfam/ , in Sweden at http://www.cgr.ki.se/Pfam/ and in the US at http://pfam.wustl.edu/ . The latest version (4.3) of Pfam contains 1815 families. These Pfam families match 63% of proteins in SWISS-PROT 37 and TrEMBL 9. For complete genomes Pfam currently matches up to half of the proteins. Genomic DNA can be directly searched against the Pfam library using the Wise2 package.
| INTRODUCTION |
|---|
|
|
|---|
Pfam is a database of protein domain families. Pfam contains curated multiple sequence alignments for each family, as well as profile hidden Markov models (profile HMMs) for finding these domains in new sequences. Pfam contains functional annotation, literature references and database links for each family. There are two multiple alignments for each Pfam family, the seed alignment that contains a relatively small number of representative members of the family and the full alignment that contains all members in the database that can be detected. All alignments use sequences taken from pfamseq, which is a non-redundant protein set composed of SWISS-PROT and SP-TrEMBL. The profile HMM is built from the seed alignment using the HMMER package (see http://hmmer.wustl.edu/ ), which is then used to search the pfamseq sequence database. All the matches found above the curated thresholds are aligned using the profile HMM to make the full alignment. The largest full alignment in Pfam, for the HIV GP120 glycoprotein, has >16 000 members, yet the seed alignment only has 24 representative members. The latest version of Pfam (4.3) contains 1815 families that have matches to 63% of sequences, covering 45% of residues in the sequence database.
One of the main goals of Pfam was to aid the annotation of the Caenorhabditis elegans genome (1). Traditional approaches to large scale sequence annotation use a pairwise sequence comparison method such as BLAST (2) to find similarity to proteins of known function. Annotations are then transferred from the protein of known function to the predicted protein. The pairwise similarity search does not give a clear indication of the domain structure of the proteins. Mistakes in annotation can result from not considering the domain organisation of proteins (3). For example a protein may be misannotated as an enzyme when the similarity is only to a regulatory domain. Since its inception, Pfam has been developed to provide broad support for automated protein sequence classification and annotation. During the last year there have been significant changes and extensions to Pfam, which further this role.
| Pfam WEBSITES |
|---|
|
|
|---|
There are currently three Pfam websites that are maintained independently. All of the sites contain core functionality, including searching the Pfam library of HMMs, searching the text annotation of Pfam and viewing the multiple alignments for each family. A few new features are not yet implemented on all sites.
The Pfam WWW servers can present the domain architecture of a protein graphically as beads on a string with a colour-coded and hyperlinked bead for each domain (4). To get an overview of the different domains involved in a family, it is possible to list graphical schematics for all family members in one view. By browsing the sequence annotation together with these schematics, one can get a rough idea of the evolution and functional implications of domain combinations. For instance, if a certain combination is uniquely associated with proteins of a distinct functional class, this would suggest that other proteins with this combination have the same function. Likewise, if a certain combination is present in a certain taxonomic group only, it may confer a function that is specific for those organisms. If a combination is found scattered over a range of taxa, this might suggest that it arose multiple times independently.
For a more fine-grained analysis of the evolution of domain architectures, we have developed a novel tool that displays the graphical domain schematics of each sequence connected in an evolutionary tree. This tool is implemented as a Java applet, NIFAS, which at present is available from the Pfam servers in Sweden and the UK. It requires Netscape 4.5 or Internet Explorer 4.0. An example of a NIFAS view is shown in Figure 1. Trees are calculated from Pfam seed or full multiple alignments. We are currently using the neighbour-joining tree construction method in Clustalw (5). NIFAS can be used to analyse whether two or more domains have co-evolved or have recombined recently. For instance, the bacterial sugar transferase proteins PTF1_RHOCA (P23388) and PTF1_XANCP (P45597) are clustered together in Figure 1, in which the tree was calculated for the enzymatic domain. The NIFAS view based on the EIIA_2 domain shows the same two sequences grouped, and based on the HPR domain they are grouped too, although not as reliably (data not shown). This analysis thus suggests that an ancestral protein existed with all three domains, and the two present proteins are its direct descendants.
|
Pfam-A is supplemented by Pfam-B, however it has previously not been possible to annotate new proteins with matches to Pfam-B families. Protein sequence submitted to the UK Pfam search server is now automatically searched for Pfam-B domains (as well as the standard search for Pfam-A domains). This is performed by using BLAST2 to search against a database of the sequence fragments that form Pfam-B, with some post-processing of the results. Sequence segments matching a Pfam-B family can then be aligned against the family using a profile HMM. These profile HMMs are built on-the-fly; profile HMMs for Pfam-B families are not currently part of the Pfam distribution.
A further enhancement of Pfams utility is the addition of structural information to alignments with members of known 3D-structure. Secondary structure and relative solvent accessibility values extracted from the DSSP database (6) are included as alignment markups (labels #=GR.. SS and #=GR.. SA) as of Pfam 4.3. Furthermore, the corresponding entries in the PDB database (7) are referenced with residue coordinates. These references are linked to rasmol (8) for visualisation of the structural entity that corresponds to the Pfam domain.
| CHANGES TO Pfam-B |
|---|
|
|
|---|
Pfam-B is an automatically generated supplement to Pfam-A, that provides completeness in terms of coverage. Pfam-B has also provided a useful resource for new Pfam-A families. Pfam version 4 has seen a marked change in the way that Pfam-B is constructed. Up to and including the 3.4 release of Pfam, Pfam-B was constructed using the Domainer algorithm (9). The basis for this algorithm was a computationally expensive all-against-all BLAST comparison of the subsequences not found in Pfam-A. As a result it became infeasible to re-construct Pfam-B at every monthly release.
Since the 4.0 release of Pfam, Pfam-B has been constructed using the ProDom database of protein domain families (10), which is a high quality automatically generated protein families database constructed over the same underlying sequence database as Pfam (SWISS-PROT and TrEMBL). The new construction process for Pfam-B is fast, and as a result Pfam-B is now re-built at every point monthly release. Pfam-B in principle is made from the parts of ProDom not covered by Pfam-A. The Pfam-B construction process is conceptually a function taking a ProDom alignment as input and giving between zero and three Pfam-B families as output. The function is applied to all families in ProDom to form Pfam-B. In some cases, a ProDom family is effectively subsumed by one or more Pfam-A families. These ProDom families are ignored. In other cases, a ProDom family has no overlap with any Pfam-A family. These alignments become Pfam-B families with no alteration. More interesting are cases where the ProDom alignment is truncated or bisected by a Pfam-A family, as displayed pictorially in Figure 2. In these cases, the ProDom alignment is cut at the maximal extent of the intruding Pfam-A family to form one (Fig. 2a), or in the case of bisection, two or three Pfam-B families (Fig. 2b). Here, the domain boundaries of Pfam-A are used to infer domain boundaries for Pfam-B families. New Pfam-B alignments are only included if they are wider than 20 columns. Cases such as the bisection example shown in Figure 2 are particularly useful for Pfam curation. In such cases the ProDom family has more members than the Pfam-A family it subsumes, and this implies that perhaps the Pfam-A family is missing some members. By adding a link from the new Pfam-B family 2 to the Pfam-A family, this potential deficit is flagged for future consideration.
|
| QUALITY CONTROL |
|---|
|
|
|---|
Curating a large number of families presents many challenges for quality control, both for annotation and family membership. We have recently added a spell checking functionality to Pfam, allowing us to store a dictionary of words that are allowed in the free text lines of Pfam.
Pfam-B is now providing useful quality control for Pfam-A that was not present before. The comparison of Pfam-A and ProDom that occurs during Pfam-B construction has provided Pfam with an excellent way to detect missing members of families. This has led to large increases in membership for some families. For example, in Pfam version 4.1 the rieske domain family (PF00355) had 51 members. This was found to be related to Pfam-B family 31 by ProDom. By including some of the related rieske domains from Pfam-B 31 in the seed alignment the new Pfam-A profile HMM found 192 rieske domains.
One of the most important quality controls is the overlap check. This states that no residue of any protein can belong to more than one family. As new families are added to Pfam an overlap to an existing family may signify that the new family is related to a preexisting family. In this case we can extend the existing family to include the members of the new family. The overlap could also be due to incorrectly choosing domain boundaries for a family, which can be easily fixed by trimming the seed alignment. As Pfams residue coverage increases this control becomes more stringent and therefore more useful.
| SEARCHING GENOMES WITH Pfam |
|---|
|
|
|---|
An important goal of Pfam is to enable rapid automatic classification of predicted proteins into protein domain families. Pfam is used around the world as an aid to genomic annotation in one of two ways: (i) Pfam can be used to annotate protein translations using the HMMer software; or (ii) Pfam can be used to predict genes and annotate genomic DNA using the Wise2 package.
Although Pfams coverage across the sequence databases is high (63%), we know that these databases are biased towards some protein families and organisms. Therefore it is useful to know what fraction of protein sequences in whole genome sequencing projects are annotated by Pfam analysis. Table 1 shows a summary of a Pfam/HMMER analysis of the predicted proteins from five representative genomes: the bacteria Escherichia coli and Rickettsia prowazekii, the nematode Caenorhabditis elegans, the yeast Saccharomyces cerevisiae and the archaeon Methanococcus jannaschii. Pfam identifies domains in 4050% of the proteins in each genome, except for the archaeal M.jannaschii genome where the fraction is somewhat lower (33%). This compares favourably to the fraction of proteins that can be annotated by standard pair-wise BLAST analysis: for example, the worm genome project reported that ~42% of worm proteins had an informative BLAST similarity to a non-nematode protein (11).
|
Increasing the number of models in Pfam will increase the hit rate, of course. However, the expected return on such an effort is less than one might guess, as illustrated in Table 2. As a rough rule of thumb, the 1020 largest protein families can account for ~10% of each genome. To cover 20%, it takes ~50100 families; to get 30%, it takes ~150300 families; and to get our full current coverage in each genome, it takes ~5001000 families. The representation of a given protein family varies substantially from genome to genome. The top 10 families that account for 10% of one genome are not the same as the top 10 families in another genome. For example, the largest bacterial protein family is the ABC transporter family; in the two eukaryotes, the protein kinases are the most numerous. The last line in Table 2 shows the number of Pfam families that show one or more hits in one genome but no hits in any of the other genomes, showing that there is substantial non-overlap in the representation of Pfam families in various genomes. Also, 471 of the 1664 Pfam 4.2 models showed no hits to any proteins in these five genomes; many of these models cover protein families specific to vertebrates or viruses.
|
A considerable amount of sequence data is released as raw genomic sequence. Analysis of this sequence is greatly hampered by the presence of (i) introns and (ii) frameshifting sequencing errors in the DNA sequence, which makes deducing the protein sequence of genes contained in the genomic DNA sequence difficult. It is estimated that ~50% of metazoan exons are predicted correctly when standard gene programs are run (T.Hubbard, personal communication). If Pfam is searched against protein translations of genomic DNA, in many cases valid protein domains are missed due to the inaccuracy of gene prediction. The algorithm GeneWise (12) allows a protein profile HMM to be compared directly to genomic DNA, without the need for any gene prediction and allowing for potential frameshifting sequencing errors. GeneWise contains a gene prediction method which it integrates with the profile HMM during the comparison. Tests of GeneWise show that it produces 98% accurate gene predictions in the region of the homology (R.Guigo, personal communication). Unfortunately, GeneWise is a very CPU expensive program, and comparing 100 kb of DNA sequence to the entire Pfam library takes ~30 h on a Unix server machine (Compaq Alpha).
To allow the large scale application of Pfam to genomic DNA we used a pre-filter that incorporated a Perl script called halfwise based on BLASTX (2) to cut the running time down to an average of 2 h. The BLAST search is of the DNA sequence against a constructed protein database which attempts to represent Pfam hits sensibly. This is made by taking the Pfam full alignments and making them non-redundant to a maximum pairwise identity of 75%. This pre-filter is run with a low threshold to select candidate profile HMMs to be compared to the DNA sequence using GeneWise. In tests, the sensitivity loss of using this pre-filter was ~10%, and it also showed greater robustness towards low complexity regions in the genomic data, such as unmasked microsatellite repeats. Halfwise is part of the Wise2 package that provides access to the GeneWise algorithm in a number of different forms (see http://www.sanger.ac.uk/Software/Wise2/ ).
| AVAILABILITY OF Pfam |
|---|
|
|
|---|
Pfam is available on the WWW in Europe at http://www.sanger.ac.uk/Software/Pfam/ and http://www.cgr.ki.se/Pfam/ , and in the US at http://pfam.wustl.edu/ . The Pfam distribution contains a number of files: Pfam-A.seed and Pfam-A.full contain the seed and full alignments with annotation in Stockholm format; Pfam is a file containing the library of Pfam profile HMMs; PfamFrag is a library of profile HMMs designed specifically to find matches to protein fragments; SwissPfam is a file containing the domain organisation for each protein in the database; Pfam-B contains the data for Pfam-B families in Stockholm format; diff is a file containing the changes between release to allow incremental updates of Pfam derived data; pfamseq contains the underlying sequence database, in fasta format, that all sequences in Pfam are taken from.
| ACKNOWLEDGEMENTS |
|---|
We are grateful to the many people who have submitted data to Pfam. In particular we thank Matthew Bashton who added many of the new families in Pfam, Christian Storm for writing NIFAS, and Michael Åsman and Mats Jonsson for adding new features to the websites.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +44 1223 494950; Fax: +44 1223 494919; Email: agb@sanger.ac.uk
| REFERENCES |
|---|
|
|
|---|
-
1 Sonnhammer,E.L.L., Eddy,S.R. and Durbin,R. (1997) Proteins, 28, 405420.[Web of Science][Medline]
2 Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) J. Mol. Biol., 215, 403410.[Web of Science][Medline]
3 Galperin,M.Y. and Koonin,E.V. (1998) In Silico Biol., 1, 5567.[Medline]
4 Bateman,A., Birney,E., Durbin,R., Eddy,S.R., Finn,R.D. and Sonnhammer,E.L.L. (1999) Nucleic Acids Res., 27, 260262.
5 Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 46734680.
6 Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[Web of Science][Medline]
7 Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[Web of Science][Medline]
8 Sayle,R.A. and Milner-White,E.J. (1995) Trends Biochem. Sci., 20, 374. [Web of Science][Medline]
9 Sonnhammer,E.L.L. and Kahn,D. (1994) Protein Sci., 3, 482492.[Web of Science][Medline]
10 Corpet,F., Gouzy,J. and Kahn,D. (1999) Nucleic Acids Res., 27, 263267. Updated article in this issue: Nucleic Acids Res. (2000), 28, 267269.
11 The C.elegans Sequencing Consortium (1998) Science, 282, 20122018.
12 Birney,E. and Durbin,R. (1997) ISMB, 5, 5664.
This article has been cited by other articles:
![]() |
T. A. Leski, C. C. Caswell, M. Pawlowski, D. J. Klinke, J. M. Bujnicki, S. J. Hart, and S. Lukomski Identification and Classification of bcl Genes and Proteins of Bacillus cereus Group Organisms and Their Application in Bacillus anthracis Detection and Fingerprinting Appl. Envir. Microbiol., November 15, 2009; 75(22): 7163 - 7172. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. J. Haiser, M. R. Yousef, and M. A. Elliot Cell Wall Hydrolases Affect Germination, Vegetative Growth, and Sporulation in Streptomyces coelicolor J. Bacteriol., November 1, 2009; 191(21): 6501 - 6512. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. K. Ghosh, N. Chauhan, S. Rajakumari, G. Daum, and R. Rajasekharan At4g24160, a Soluble Acyl-Coenzyme A-Dependent Lysophosphatidic Acid Acyltransferase Plant Physiology, October 1, 2009; 151(2): 869 - 881. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Currie, F. Merino, T. Skarina, A. H. Y. Wong, A. Singer, G. Brown, A. Savchenko, A. Caniuguir, V. Guixe, A. F. Yakunin, et al. ADP-dependent 6-Phosphofructokinase from Pyrococcus horikoshii OT3: STRUCTURE DETERMINATION AND BIOCHEMICAL CHARACTERIZATION OF PH1645 J. Biol. Chem., August 21, 2009; 284(34): 22664 - 22671. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Nanni, S. Mazzara, L. Pattini, and A. Lumini Protein classification combining surface analysis and primary structure Protein Eng. Des. Sel., April 1, 2009; 22(4): 267 - 272. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Gatti, M. Sypa, I. Rusyn, F. A. Wright, and W. T. Barry SAFEGUI: resampling-based tests of categorical significance in gene expression data made easy Bioinformatics, February 15, 2009; 25(4): 541 - 542. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Y. K. Lam, E. Khurana, G. Fang, P. Cayting, N. Carriero, K.-H. Cheung, and M. B. Gerstein Pseudofam: the pseudogene families database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D738 - D743. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. N.I. Pang, K. Lin, M. A. Wouters, J. Heringa, and R. A. George Identifying foldable regions in protein sequence from the hydrophobic signal Nucleic Acids Res., February 2, 2008; 36(2): 578 - 588. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Moses, M. E. Liku, J. J. Li, and R. Durbin Regulatory evolution in proteins by turnover and lineage-specific changes of cyclin-dependent kinase consensus sites PNAS, November 6, 2007; 104(45): 17713 - 17718. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bettegowda, J. Yao, A. Sen, Q. Li, K.-B. Lee, Y. Kobayashi, O. V. Patel, P. M. Coussens, J. J. Ireland, and G. W. Smith JY-1, an oocyte-specific gene, regulates granulosa cell function and early embryonic development in cattle PNAS, November 6, 2007; 104(45): 17602 - 17607. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Paul, K. Gable, and T. M. Dunn A Six-membrane-spanning Topology for Yeast and Arabidopsis Tsc13p, the Enoyl Reductases of the Microsomal Fatty Acid Elongating System J. Biol. Chem., June 29, 2007; 282(26): 19237 - 19246. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Fujishima, M. Komasa, S. Kitamura, H. Suzuki, M. Tomita, and A. Kanai Proteome-Wide Prediction of Novel DNA/RNA-Binding Proteins Using Amino Acid Composition and Periodicity in the Hyperthermophilic Archaeon Pyrococcus furiosus DNA Res, June 15, 2007; (2007) dsm011v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Salamat-Miller, J. Fang, C. W. Seidel, Y. Assenov, M. Albrecht, and C. R. Middaugh A Network-based Analysis of Polyanion-binding Proteins Utilizing Human Protein Arrays J. Biol. Chem., April 6, 2007; 282(14): 10153 - 10163. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Lingle Gating Rings Formed by RCK Domains: Keys to Gate Opening J. Gen. Physiol., February 2, 2007; 129(2): 101 - 107. [Full Text] [PDF] |
||||
![]() |
P. Kowal, A. M. Gurtan, P. Stuckert, A. D. D'Andrea, and T. Ellenberger Structural Determinants of Human FANCF Protein That Function in the Assembly of a DNA Damage Signaling Complex J. Biol. Chem., January 19, 2007; 282(3): 2047 - 2055. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Y. Galperin The Molecular Biology Database Collection: 2007 update Nucleic Acids Res., January 12, 2007; 35(suppl_1): D3 - D4. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Aurrecoechea, M. Heiges, H. Wang, Z. Wang, S. Fischer, P. Rhodes, J. Miller, E. Kraemer, C. J. Stoeckert Jr., D. S. Roos, et al. ApiDB: integrated resources for the apicomplexan bioinformatics resource center Nucleic Acids Res., January 12, 2007; 35(suppl_1): D427 - D430. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Lanie, W.-L. Ng, K. M. Kazmierczak, T. M. Andrzejewski, T. M. Davidsen, K. J. Wayne, H. Tettelin, J. I. Glass, and M. E. Winkler Genome Sequence of Avery's Virulent Serotype 2 Strain D39 of Streptococcus pneumoniae and Comparison with That of Unencapsulated Laboratory Strain R6 J. Bacteriol., January 1, 2007; 189(1): 38 - 51. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Talla, C. Narayanan, N. Srinivasan, and D. Balasubramanian Mutation Causing Self-Aggregation in Human {gamma}C-Crystallin Leading to Congenital Cataract Invest. Ophthalmol. Vis. Sci., December 1, 2006; 47(12): 5212 - 5217. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Ouyang and R. Isaacson Identification and Characterization of a Novel ABC Iron Transport System, fit, in Escherichia coli Infect. Immun., December 1, 2006; 74(12): 6949 - 6956. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Maeder, I. Anderson, T. S. Brettin, D. C. Bruce, P. Gilna, C. S. Han, A. Lapidus, W. W. Metcalf, E. Saunders, R. Tapia, et al. The Methanosarcina barkeri Genome: Comparative Analysis with Methanosarcina acetivorans and Methanosarcina mazei Reveals Extensive Rearrangement within Methanosarcinal Genomes J. Bacteriol., November 15, 2006; 188(22): 7922 - 7931. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Louvel, S. Bommezzadri, N. Zidane, C. Boursaux-Eude, S. Creno, A. Magnier, Z. Rouy, C. Medigue, I. S. Girons, C. Bouchier, et al. Comparative and Functional Genomic Analyses of Iron Transport and Regulation in Leptospira spp. J. Bacteriol., November 15, 2006; 188(22): 7893 - 7904. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Billion, R. Ghai, T. Chakraborty, and T. Hain Augur--a computational pipeline for whole genome microbial surface protein prediction and classification Bioinformatics, November 15, 2006; 22(22): 2819 - 2820. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. S. Goldman, W. C. Nierman, D. Kaiser, S. C. Slater, A. S. Durkin, J. A. Eisen, C. M. Ronning, W. B. Barbazuk, M. Blanchard, C. Field, et al. Evolution of sensory complexity recorded in a myxobacterial genome PNAS, October 10, 2006; 103(41): 15200 - 15205. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Cannon, L. Sterck, S. Rombauts, S. Sato, F. Cheung, J. Gouzy, X. Wang, J. Mudge, J. Vasdewani, T. Schiex, et al. Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes PNAS, October 3, 2006; 103(40): 14959 - 14964. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Nakayama, M. Iida, H. Koseki, and O. Ohara A gene-targeting approach for functional characterization of KIAA genes encoding extremely large proteins FASEB J, August 1, 2006; 20(10): 1718 - 1720. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gough Genomic scale sub-family assignment of protein domains Nucleic Acids Res., July 28, 2006; 34(13): 3625 - 3633. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. P. Duffy, A. M. Young, B. Morin, C. J. Lucarotti, B. F. Koop, and D. B. Levin Sequence Analysis and Organization of the Neodiprion abietis Nucleopolyhedrovirus Genome J. Virol., July 15, 2006; 80(14): 6952 - 6963. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Lozada-Chavez, S. C. Janga, and J. Collado-Vides Bacterial regulatory networks are extremely flexible in evolution Nucleic Acids Res., July 13, 2006; 34(12): 3434 - 3445. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kaur, M. Pan, M. Meislin, M. T. Facciotti, R. El-Gewely, and N. S. Baliga A systems view of haloarchaeal strategies to withstand stress from transition metals Genome Res., July 1, 2006; 16(7): 841 - 854. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. B. Yaffe "Bits" and Pieces Sci. Signal., June 20, 2006; 2006(340): pe28 - pe28. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wang, Y. Hu, M. T. Overgaard, F. V. Karginov, O. C. Uhlenbeck, and D. B. McKay The domain of the Bacillus subtilis DEAD-box helicase YxiN that is responsible for specific binding of 23S rRNA has an RNA recognition motif fold RNA, June 1, 2006; 12(6): 959 - 967. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Staub and D. Rotin Role of Ubiquitylation in Cellular Membrane Transport Physiol Rev, April 1, 2006; 86(2): 669 - 707. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Goldstein, D. Glossip, S. Nayak, and K. Kornfeld The CRAL/TRIO and GOLD Domain Protein CGR-1 Promotes Induction of Vulval Cell Fates in Caenorhabditis elegans and Interacts Genetically With the Ras Signaling Pathway Genetics, February 1, 2006; 172(2): 929 - 942. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Takac, M. A. Nunn, J. Meszaros, O. Pechanova, N. Vrbjar, P. Vlasakova, M. Kozanek, M. Kazimirova, G. Hart, P. A. Nuttall, et al. Vasotab, a vasoactive peptide from horse fly Hybomitra bimaculata (Diptera, Tabanidae) salivary glands J. Exp. Biol., January 15, 2006; 209(2): 343 - 352. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Grillari, P. Ajuh, G. Stadler, M. Loscher, R. Voglauer, W. Ernst, J. Chusainow, F. Eisenhaber, M. Pokar, K. Fortschegger, et al. SNEV is an evolutionarily conserved splicing factor whose oligomerization is necessary for spliceosome assembly Nucleic Acids Res., December 6, 2005; 33(21): 6868 - 6883. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mutoh, T. Suzuki, K. Hasegawa, Y. Nakazawa, H. Kouguchi, Y. Sagane, K. Niwa, T. Watanabe, and T. Ohyama Four molecules of the 33 kDa haemagglutinin component of the Clostridium botulinum serotype C and D toxin complexes are required to aggregate erythrocytes Microbiology, December 1, 2005; 151(12): 3847 - 3858. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Magen and G. Ast The importance of being divisible by three in alternative splicing Nucleic Acids Res., September 28, 2005; 33(17): 5574 - 5582. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kang and D. C. Gross Characterization of a Resistance-Nodulation-Cell Division Transporter System Associated with the syr-syp Genomic Island of Pseudomonas syringae pv. syringae Appl. Envir. Microbiol., September 1, 2005; 71(9): 5056 - 5065. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Bourhy, L. Frangeul, E. Couve, P. Glaser, I. Saint Girons, and M. Picardeau Complete Nucleotide Sequence of the LE1 Prophage from the Spirochete Leptospira biflexa and Characterization of Its Replication and Partition Functions J. Bacteriol., June 15, 2005; 187(12): 3931 - 3940. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. K. Saini and D. Fischer Meta-DP: domain prediction meta-server Bioinformatics, June 15, 2005; 21(12): 2917 - 2920. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Brocchieri and S. Karlin Protein length in eukaryotic and prokaryotic proteomes Nucleic Acids Res., June 10, 2005; 33(10): 3390 - 3400. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Ramos, M. Martinez-Bueno, A. J. Molina-Henares, W. Teran, K. Watanabe, X. Zhang, M. T. Gallegos, R. Brennan, and R. Tobes The TetR Family of Transcriptional Repressors Microbiol. Mol. Biol. Rev., June 1, 2005; 69(2): 326 - 356. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gough Convergent evolution of domain architectures (is rare) Bioinformatics, April 15, 2005; 21(8): 1464 - 1471. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Ferron, C. Rancurel, S. Longhi, C. Cambillau, B. Henrissat, and B. Canard VaZyMolO: a tool to define and classify modularity in viral proteins J. Gen. Virol., March 1, 2005; 86(3): 743 - 749. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Dziejman, D. Serruto, V. C. Tam, D. Sturtevant, P. Diraphat, S. M. Faruque, M. H. Rahman, J. F. Heidelberg, J. Decker, L. Li, et al. Genomic characterization of non-O1, non-O139 Vibrio cholerae reveals genes for a type III secretion system PNAS, March 1, 2005; 102(9): 3465 - 3470. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Porto, H. E. Roman, M. Vendruscolo, and U. Bastolla Prediction of Site-Specific Amino Acid Distributions and Limits of Divergent Evolutionary Changes in Protein Sequences Mol. Biol. Evol., March 1, 2005; 22(3): 630 - 638. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Nunn, A. Sharma, G. C. Paesen, S. Adamson, O. Lissina, A. C. Willis, and P. A. Nuttall Complement Inhibitor of C5 Activation from the Soft Tick Ornithodoros moubata J. Immunol., February 15, 2005; 174(4): 2084 - 2091. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Marchin, P. T. Kelly, and J. Fang Tracker: continuous HMMER and BLAST searching Bioinformatics, February 1, 2005; 21(3): 388 - 389. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Takayama, C. Wang, and G. S. Besra Pathway to Synthesis and Processing of Mycolic Acids in Mycobacterium tuberculosis Clin. Microbiol. Rev., January 1, 2005; 18(1): 81 - 101. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. C. Park, K.-H. Song, H. K. Chung, H. Kim, D. W. Kim, J. H. Song, E. S. Hwang, H. S. Jung, S.-H. Park, I. Bae, et al. CR6-Interacting Factor 1 Interacts with Orphan Nuclear Receptor Nur77 and Inhibits Its Transactivation Mol. Endocrinol., January 1, 2005; 19(1): 12 - 24. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. S. Baliga, R. Bonneau, M. T. Facciotti, M. Pan, G. Glusman, E. W. Deutsch, P. Shannon, Y. Chiu, R. S. Weng, R. R. Gan, et al. Genome sequence of Haloarcula marismortui: A halophilic archaeon from the Dead Sea Genome Res., November 1, 2004; 14(11): 2221 - 2234. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Janssen, R. S. Phillips, C. M. R. Turner, and M. P. Barrett Plasmodium interspersed repeats: the major multigene superfamily of malaria parasites Nucleic Acids Res., October 26, 2004; 32(19): 5712 - 5720. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. V. Lunin, Y. Li, J. D. Schrag, P. Iannuzzi, M. Cygler, and A. Matte Crystal Structures of Escherichia coli ATP-Dependent Glucokinase and Its Complex with Glucose J. Bacteriol., October 15, 2004; 186(20): 6915 - 6927. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Valenzuela, M. Garfield, E. D. Rowton, and V. M. Pham Identification of the most abundant secreted proteins from the salivary glands of the sand fly Lutzomyia longipalpis, vector of Leishmania chagasi J. Exp. Biol., October 1, 2004; 207(21): 3717 - 3729. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. C. Nierman, D. DeShazer, H. S. Kim, H. Tettelin, K. E. Nelson, T. Feldblyum, R. L. Ulrich, C. M. Ronning, L. M. Brinkac, S. C. Daugherty, et al. From the Cover: Structural flexibility in the Burkholderia mallei genome PNAS, September 28, 2004; 101(39): 14246 - 14251. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Shan, H. Xu, X. Shi, Y. Yu, H. Yao, X. Zhang, Y. Bai, C. Gao, P. E. J. Saris, and M. Qiao Identification of two new genes involved in twitching motility in Pseudomonas aeruginosa Microbiology, August 1, 2004; 150(8): 2653 - 2661. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-G. Qiu, N. Schisler, and A. Stoltzfus The Evolutionary Gain of Spliceosomal Introns: Sequence and Phase Preferences Mol. Biol. Evol., July 1, 2004; 21(7): 1252 - 1263. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. L. Turner, J. C. Waller, B. Vanderbeld, and W. A. Snedden Cloning and Characterization of Two NAD Kinases from Arabidopsis. Identification of a Calmodulin Binding Isoform Plant Physiology, July 1, 2004; 135(3): 1243 - 1255. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Sharma, M. Isogai, T. Yamamoto, K. Sakaguchi, J. Hashimoto, and S. Komatsu A Novel Interaction between Calreticulin and Ubiquitin-Like Nuclear Protein in Rice Plant Cell Physiol., June 15, 2004; 45(6): 684 - 692. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kim, H. Yang, S.-K. Kim, P. A. Reche, R. S. Tirabassi, R. E. Hussey, Y. Chishti, J. G. Rheinwald, T. J. Morehead, T. Zech, et al. Biochemical and Functional Analysis of Smallpox Growth Factor (SPGF) and Anti-SPGF Monoclonal Antibodies J. Biol. Chem., June 11, 2004; 279(24): 25838 - 25848. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Srikumar, L. G. Mikael, P. D. Pawelek, A. Khamessan, B. F. Gibbs, M. Jacques, and J. W. Coulton Molecular cloning of haemoglobin-binding protein HgbA in the outer membrane of Actinobacillus pleuropneumoniae Microbiology, June 1, 2004; 150(6): 1723 - 1734. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. S. Baliga, S. J. Bjork, R. Bonneau, M. Pan, C. Iloanusi, M. C.H. Kottemann, L. Hood, and J. DiRuggiero Systems Level Insights Into the Stress Response to UV Radiation in the Halophilic Archaeon Halobacterium NRC-1 Genome Res., June 1, 2004; 14(6): 1025 - 1035. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Comfort and R. T. Clubb A Comparative Genome Analysis Identifies Distinct Sorting Pathways in Gram-Positive Bacteria Infect. Immun., May 1, 2004; 72(5): 2710 - 2722. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. F. Wright, J. Christodoulou, C. M. Dobson, and J. Clarke The importance of loop length in the folding of an immunoglobulin domain Protein Eng. Des. Sel., May 1, 2004; 17(5): 443 - 453. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Seshadri, G. S. A. Myers, H. Tettelin, J. A. Eisen, J. F. Heidelberg, R. J. Dodson, T. M. Davidsen, R. T. DeBoy, D. E. Fouts, D. H. Haft, et al. Comparison of the genome of the oral pathogen Treponema denticola with other spirochete genomes PNAS, April 13, 2004; 101(15): 5646 - 5651. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Vitt, D. Gietzen, K. Stevens, J. Wingrove, S. Becha, S. Bulloch, J. Burrill, N. Chawla, J. Chien, M. Crawford, et al. Identification of Candidate Disease Genes by EST Alignments, Synteny, and Expression and Verification of Ensembl Genes on Rat Chromosome 1q43-54 Genome Res., April 1, 2004; 14(4): 640 - 650. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Liu, M. Gerstein, and D. M. Engelman Transmembrane protein domains rarely use covalent domain recombination as an evolutionary mechanism PNAS, March 9, 2004; 101(10): 3495 - 3497. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Fernandez, R. Scott, and R. S. Berry The nonconserved wrapping of conserved protein folds reveals a trend toward increasing connectivity in proteomic networks PNAS, March 2, 2004; 101(9): 2823 - 2827. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. DLAKIC and D. TOLLERVEY The Noc proteins involved in ribosome synthesis and export contain divergent HEAT repeats RNA, March 1, 2004; 10(3): 351 - 354. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Baiker, C. Bagowski, H. Ito, M. Sommer, L. Zerboni, K. Fabel, J. Hay, W. Ruyechan, and A. M. Arvin The Immediate-Early 63 Protein of Varicella-Zoster Virus: Analysis of Functional Domains Required for Replication In Vitro and for T-Cell and Skin Tropism in the SCIDhu Model In Vivo J. Virol., February 1, 2004; 78(3): 1181 - 1194. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Choi, E. Y. Park, J. H. Kim, S. K. Chang, and Y. Cho Probing the Functional Importance of the Hexameric Ring Structure of RNase PH J. Biol. Chem., January 2, 2004; 279(1): 755 - 764. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bateman, L. Coin, R. Durbin, R. D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E. L. L. Sonnhammer, et al. The Pfam protein families database Nucleic Acids Res., January 1, 2004; 32(90001): D138 - 141. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Vettore, F. R. da Silva, E. L. Kemper, G. M. Souza, A. M. da Silva, M. I. T. Ferro, F. Henrique-Silva, E. A. Giglioti, M. V.F. Lemos, L. L. Coutinho, et al. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane Genome Res., December 1, 2003; 13(12): 2725 - 2735. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Gu, J. E. Jackman, A. J. Lohan, M. W. Gray, and E. M. Phizicky tRNAHis maturation: An essential yeast protein catalyzes addition of a guanine nucleotide to the 5' end of tRNAHis Genes & Dev., December 1, 2003; 17(23): 2889 - 2901. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Eisenhaber, M. Wildpaner, C. J. Schultz, G. H.H. Borner, P. Dupree, and F. Eisenhaber Glycosylphosphatidylinositol Lipid Anchoring of Plant Proteins. Sensitive Prediction from Sequence- and Genome-Wide Studies for Arabidopsis and Rice Plant Physiology, December 1, 2003; 133(4): 1691 - 1701. [Abstract] [Full Text] |
||||
![]() |
B. Borud, G. Mellgren, J. Lund, and M. Bakke Cloning and Characterization of a Novel Zinc Finger Protein that Modulates the Transcriptional Activity of Nuclear Receptors Mol. Endocrinol., November 1, 2003; 17(11): 2303 - 2319. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. D. Thorell, K. Stenklo, J. Karlsson, and T. Nilsson A Gene Cluster for Chlorate Metabolism in Ideonella dechloratans Appl. Envir. Microbiol., September 1, 2003; 69(9): 5585 - 5592. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wei, T. W. Southworth, H. Kloster, M. Ito, A. A. Guffanti, A. Moir, and T. A. Krulwich Mutational Loss of a K+ and NH4+ Transporter Affects the Growth and Endospore Formation of Alkaliphilic Bacillus pseudofirmus OF4 J. Bacteriol., September 1, 2003; 185(17): 5133 - 5147. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Papazisi, T. S. Gorton, G. Kutish, P. F. Markham, G. F. Browning, D. K. Nguyen, S. Swartzell, A. Madan, G. Mahairas, and S. J. Geary The complete genome sequence of the avian pathogen Mycoplasma gallisepticum strain Rlow Microbiology, September 1, 2003; 149(9): 2307 - 2316. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Casals, P. Gomez-Puertas, J. Pie, C. Mir, R. Roca, B. Puisac, R. Aledo, J. Clotet, S. Menao, D. Serra, et al. Structural ({beta}{alpha})8 TIM Barrel Model of 3-Hydroxy-3-methylglutaryl-Coenzyme A Lyase J. Biol. Chem., August 1, 2003; 278(31): 29016 - 29023. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. T.S. Nogueira, V. E. De Rosa Jr., M. Menossi, E. C. Ulian, and P. Arruda RNA Expression Profiles and Data Mining of Sugarcane Response to Low Temperature Plant Physiology, August 1, 2003; 132(4): 1811 - 1824. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Teplyakov, G. Obmolova, S. Y. Chu, J. Toedt, E. Eisenstein, A. J. Howard, and G. L. Gilliland Crystal Structure of the YchF Protein Reveals Binding Sites for GTP and Nucleic Acid J. Bacteriol., July 15, 2003; 185(14): 4031 - 4037. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. McDermott and R. Samudrala Bioverse: functional, structural and contextual annotation of proteins and proteomes Nucleic Acids Res., July 1, 2003; 31(13): 3736 - 3737. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Koike, Y. Kobayashi, and T. Takagi Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource Genome Res., June 1, 2003; 13(6): 1231 - 1243. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Tajul-Arifin, R. Teasdale, T. Ravasi, D. A. Hume, RIKEN GER Group, GSL Members, and J. S. Mattick Identification and Analysis of Chromodomain-Containing Proteins Encoded in the Mouse Transcriptome Genome Res., June 1, 2003; 13(6): 1416 - 1429. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Lee, J. Chen, L. Sun, S. Wu, K. R. Gray, A. Rich, M. Huang, J.-H. Lin, J. N. Feder, E. B. Janovitz, et al. Expression and Characterization of Human Transient Receptor Potential Melastatin 3 (hTRPM3) J. Biol. Chem., May 30, 2003; 278(23): 20890 - 20897. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Chapuy-Regaud, A. D. Ogunniyi, N. Diallo, Y. Huet, J.-F. Desnottes, J. C. Paton, S. Escaich, and M.-C. Trombe RegR, a Global LacI/GalR Family Regulator, Modulates Virulence and Competence in Streptococcus pneumoniae Infect. Immun., May 1, 2003; 71(5): 2615 - 2625. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Seshadri, I. T. Paulsen, J. A. Eisen, T. D. Read, K. E. Nelson, W. C. Nelson, N. L. Ward, H. Tettelin, T. M. Davidsen, M. J. Beanan, et al. Complete genome sequence of the Q-fever pathogen Coxiellaburnetii PNAS, April 29, 2003; 100(9): 5455 - 5460. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. F. Markham, A. Kanci, G. Czifra, B. Sundquist, P. Hains, and G. F. Browning Homologue of Macrophage-Activating Lipoprotein in Mycoplasmagallisepticum Is Not Essential for Growth and Pathogenicity in Tracheal Organ Cultures J. Bacteriol., April 15, 2003; 185(8): 2538 - 2547. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, E. I. Shakhnovich, and L. A. Mirny Amino acids determining enzyme-substrate specificity in prokaryotic and eukaryotic protein kinases PNAS, April 15, 2003; 100(8): 4463 - 4468. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Seoighe, C. R. Johnston, and D. C. Shields Significantly Different Patterns of Amino Acid Replacement After Gene Duplication as Compared to After Speciation Mol. Biol. Evol., April 1, 2003; 20(4): 484 - 490. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. F. Brunk, L. C. Lee, A. B. Tran, and J. Li Complete sequence of the mitochondrial genome of Tetrahymena thermophila and comparative methods for identifying highly divergent genes Nucleic Acids Res., March 15, 2003; 31(6): 1673 - 1682. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kerk, J. Bulgrien, D. W. Smith, and M. Gribskov Arabidopsis Proteins Containing Similarity to the Universal Stress Protein Domain of Bacteria Plant Physiology, March 1, 2003; 131(3): 1209 - 1219. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






























