Skip Navigation


Nucleic Acids Research Advance Access originally published online on May 21, 2007
Nucleic Acids Research 2007 35(Web Server issue):W653-W658; doi:10.1093/nar/gkm293
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (5357K) Freely available
Right arrow Screen PDF (564K) Freely available
Right arrowOA All Versions of this Article:
35/suppl_2/W653    most recent
gkm293v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Sadreyev, R. I.
Right arrow Articles by Grishin, N. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sadreyev, R. I.
Right arrow Articles by Grishin, N. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2007, Vol. 35, No. suppl_2 W653-W658
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Articles

COMPASS server for remote homology inference

Ruslan I. Sadreyev1,*, Ming Tang1, Bong-Hyun Kim2 and Nick V. Grishin1,2

1Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA and 2Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA

*To whom correspondence should be addressed. Tel: 214-645-5951; Fax: 214-645-5948; Email: sadreyev{at}chop.swmed.edu

Received January 31, 2007. Revised March 30, 2007. Accepted April 12, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 REFERENCES
 
COMPASS is a method for homology detection and local alignment construction based on the comparison of multiple sequence alignments (MSAs). The method derives numerical profiles from given MSAs, constructs local profile-profile alignments and analytically estimates E-values for the detected similarities. Until now, COMPASS was only available for download and local installation. Here, we present a new web server featuring the latest version of COMPASS, which provides (i) increased sensitivity and selectivity of homology detection; (ii) longer, more complete alignments; and (iii) faster computational speed. After submission of the query MSA or single sequence, the server performs searches versus a user-specified database. The server includes detailed and intuitive control of the search parameters. A flexible output format, structured similarly to BLAST and PSI-BLAST, provides an easy way to read and analyze the detected profile similarities. Brief help sections are available for all input parameters and output options, along with detailed documentation. To illustrate the value of this tool for protein structure-functional prediction, we present two examples of detecting distant homologs for uncharacterized protein families. Available at http://prodata.swmed.edu/compass


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 REFERENCES
 
Accurate detection of sequence similarity between distantly related proteins is essential for many fields, including protein structure prediction, protein engineering, and comparative genomics. The performance of an automatic method for sequence comparison can be characterized by sensitivity, selectivity and accuracy of produced sequence alignments. All these parameters can be significantly improved by comparing multiple sequence alignments (MSAs) rather than individual sequences. The improvement comes from evolutionary information about residue preferences at sequence positions in the family represented by the MSA. This information can be extracted from MSAs in two numerical forms: ‘traditional’ position-specific profiles and hidden Markov models (HMMs). The well-known and popular methods for profile-sequence or HMM-sequence comparison include PSI-BLAST (1,2), HMMER (3), SAM-T (4,5) and others. A newer generation of methods involves the comparison of two profiles (6–10) or two HMMs (11,12), with several corresponding web servers available (13–16). These methods further improve the quality of homology detection and alignment construction (17,18). There is a number of publicly available web servers aimed at protein structure prediction that use these and a variety of other techniques [for example, (19–23)].

COMPASS (9) is an established method for profile-based comparison of MSAs. COMPASS derives numerical profiles from given MSAs, constructs optimal local profile-profile alignments, and analytically estimates E-values for the detected similarities. As previously shown by us (9) and independently verified by others (12,18), COMPASS is a sensitive and selective tool for detection of remote sequence similarity that offers accurate local alignments. In many cases, COMPASS provides accurate homology detection and structure prediction that would be difficult or impossible to produce by PSI-BLAST (9,24).

As a standalone package, COMPASS has been used by different research groups (24–31). Until now, COMPASS was only available for download and local installation. Here, we present a new web server featuring the recently improved version of COMPASS.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 REFERENCES
 
To compare two MSAs, COMPASS performs four steps: (i) processing input MSAs and generating numerical profiles; (ii) calculating scores between individual positions of the compared profiles; (iii) finding optimal local alignment of the two profiles; and (iv) assessing statistical significance of the optimal alignment score (9).

Methodologically, COMPASS is a generalization to profile-profile comparison of the PSI-BLAST approach to profile-sequence comparison. Numerical profiles represent effective counts and frequencies of 21 symbols (20 residue types and gaps) at each position of the input MSAs. To search with a query MSA against a database of MSAs, the database profiles are pre-computed in advance. Scores for the similarity between individual profile positions are calculated using our original formula (9) and then rescaled so that their distribution is similar to a standard distribution with well-known properties (such as BLOSUM62 substitution scores). Rescaled positional scores are used to find the optimal local alignment using the Smith–Waterman algorithm. The statistical significance of the optimal alignment score is estimated using a simple formula for E-value (the expected number of hits in a random database with a score equal to or greater than the observed score). The parameters of this formula are based on our extensive simulations of random profile comparisons (9). As the final result of the search, a list of the most significant hits for the submitted query is displayed, followed by the optimal profile-profile alignments.

According to our results (9) and independent evaluations (12,18), COMPASS performance has been demonstrated to be among the top methods for profile comparison, by both the quality of homology detection and the accuracy of local alignment construction. The presented web server features a newer version of COMPASS, with several major modifications to improve performance.

  1. Higher quality of homology detection. Evaluation of the statistical significance of hits is improved by using a more realistic null model of random profile comparison. The original random model involved the profiles composed of randomly sampled positions from real MSAs. The score statistics were modeled depending on the profile lengths only, and a rough linear approximation of the dependency was used (9). We developed a new random model that captures additional important features of real profiles. First, in order to reproduce local correlations between different positions of MSA, we generate random profiles from fragments of real profiles corresponding to individual elements of secondary structure. Second, to model more accurately the distribution parameters K and {lambda} (2,9) for optimal profile-profile scores, we introduce their dependence on the profile ‘thickness’ (sequence divergence within the profiles). Finally, we use more precise non-linear functions (combinations of quadratic and square-root) to describe the dependency of these parameters on profile length and ‘thickness’. According to our preliminary results, the new version of COMPASS shows roughly 20–25% improvement in the quality of similarity detection.
  2. Longer, more complete local alignments. Rescaling of individual positional scores is modified, so that alignment coverage increases. In the original version, this procedure was similar to the composition-based statistic in PSI-BLAST (2), which standardized positional scores by adjusting the distribution parameter lambda (describing mainly the distribution width). In the new version, in order to make the rescaled distribution closer to standard, the mean of the distribution is also forced to a fixed value. As a result, positional scores are more compatible with the gap penalties that were empirically optimized for the standard substitution matrices (e.g. BLOSUM 62). The optimal alignments on average become longer and cover similarity regions better without compromising the overall alignment accuracy.
  3. Improved speed. Several algorithmic modifications, as well as a general code optimization, lead to an order of magnitude improvement in computational speed over the original version. The resulting computational efficiency is now comparable to that of the fastest profile-profile methods (12,15), with a typical search taking a few minutes on one processor. This time period may increase when the server is heavily loaded or when the user requires generation of the query profile by PSI-BLAST search, which may take longer for queries with a large number of homologs in the sequence database.
  4. Flexible control of input options. The server's front page (Figure 1A) allows the user to upload the query in several common alignment formats, choose the database and adjust search parameters and output options. The query MSA or single sequence can be either pasted in the input window or uploaded from a file. The available profile databases currently include PFAM (32), COG, KOG (33,34) and PSI-BLAST alignments produced from sequences with known 3D structure: chain representatives of the PDB database (35) and domain representatives of SCOP classification (36). The PDB representatives are full chains extracted from the whole set of available 3D structures (35), based on a 70% cutoff of sequence identity. The SCOP representatives are structural domains defined and classified by expert analysis into families, superfamilies, folds and classes (36). These representatives are based on 40% identity and are taken from the ASTRAL database (37). The PDB and ASTRAL sequences are used as queries for PSI-BLAST searches against NCBI nr database. The resulting MSAs of detected homologs are used to generate COMPASS profiles. To allow for the choice of different levels of sequence divergence within MSAs, the user can choose profiles corresponding to different numbers of PSI-BLAST iterations. PFAM (32), COG and KOG (33,34) databases include families of both known and unknown 3D structure, which cover protein sequence space more completely and provide alternative ways of family classification. These databases typically represent tighter sequence grouping, with more consideration of protein function, and clustering of orthologs from different genomes. PFAM profiles are generated by COMPASS from full family alignments provided by PFAM. COG and KOG profiles are generated from MSAs produced from the database sequences by MUSCLE (38). The profile databases are regularly updated when new versions of original databases are available.


Figure 1
View larger version (43K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. (A) Front page of the COMPASS server. The main section allows the user to submit the query (by pasting in the window or by specifying the file), to choose the search database, and (if needed) to enter the email address to receive the results. The section of input processing options allows the user to choose whether a PSI-BLAST run is needed to enrich the query profile with additional sequence homologs and to define the parameters of profile construction. The section of search options can be used to adjust the main parameters of the search. The section of output options allows for flexible formatting of the search results. A brief explanation of each option is available by clicking on the option's name. Additional sections include the links to more detailed documentation and to the FTP page with standalone COMPASS package. (B) Search results for uncharacterized PFAM DUF185 as a query, supporting the structure and function prediction for this family. The list of hits among SCOP domains consistently includes members of the same superfamily of S-adenosyl-L-methionine-dependent methyltransferases (SAM-Mtases) (c.66.1). (C) Example of profile-profile alignment. The header includes brief information about the hit: database identifier, protein description, full length of the MSA (‘length’), the length of the profile after purging positions with high gap content (‘filtered length’), effective number of sequences as a characteristic of sequence divergence within MSA (‘Nef f ’), followed by COMPASS score and E-value. In this example, the top and consensus sequences for compared profiles are displayed. Position matches with positive scores are marked by ‘+’, identical residues in the two consensus sequences are marked by the residue symbol. Invariant glutamates of Motifs I and II (39) involved in ligand binding are marked with red dots, glycine-rich motif is circled. D: A recently solved structure for a member of DUF185 family (PDB ID 1zkd) confirms our prediction. Side chains of the invariant glutamate residues are shown in red, glycine-rich loop is circled.

 
In order to gain more confidence in detected similarities and to find the best search conditions for a specific query, tuning the parameters controlling the generation of profiles and the construction of profile-profile alignments is advisable. The user can modify several such parameters. First, the input MSA (or sequence) can be used as a query for PSI-BLAST search, in order to produce a more diverse MSA of this family. The user can adjust the maximal number of iterations, as well as the requirements for a detected homolog to be included in the alignment: maximal E-value, minimal coverage of the query and minimal sequence identity to the query. Second, ‘Gap fraction threshold’ allows the user to control the maximal content of gaps in the MSA columns included in the COMPASS profile. If a column contains too many gaps, it is disregarded in the process of profile comparison, and shown in the final output as lower-case letters for residues and dots for gaps. The default value of this parameter is 0.5.

In the construction of profile-profile alignments, ‘Gap penalties’ are score penalties for opening and extending a new gap. ‘Effective length of the database’ is the parameter used in the calculation of E-values for the profile-profile alignments. For a given optimal alignment score, there is roughly a linear dependence of E-value on the assumed database length. ‘Matrix’ is a substitution matrix of the user's choice, BLOSUM62 by default. As described above, the choice of the matrix affects the rescaling of scores between individual profile positions that are used in the construction of the profile-profile alignment. Changing the scale of the positional scores would (i) make gap insertion more or less likely, affecting the resulting alignments, and (ii) change the optimal alignment scores and E-values.

Among the output formatting options, many are similar to those of PSI-BLAST. ‘Expect’ and ‘significance threshold’ are, respectively, the E-value cutoffs for the hit to be included in the output and to be considered significant. The hits outside the significance threshold are shown as potentially not meaningful. The user can also limit the total number of hits to display (‘Display up to’). Some output options are specific to profile-profile comparison. For example, the displayed profile-profile alignments can include different numbers of top sequences from the input MSAs (‘Top sequences to show’), as well as consensus sequences (‘Show consensus sequences’). Brief help sections are provided for every adjustable parameter, as well as a link to more detailed documentation (Figure 1A).

(v) User-friendly output. The general structure of the output is similar to that of PSI-BLAST: the list of top hits is sorted by E-value and split into those below and above the significance threshold, followed by optimal profile-profile alignments with brief information about each hit. However, there are several significant differences, mainly in the format of alignments. The user can display the consensus sequences of profiles, as well as multiple top sequences from the input MSA. The number of top sequences displayed can range from zero (to show consensus only) to all sequences of the MSA. The complete query MSA is retrieved by clicking on the consensus link. Another feature for fast and convenient analysis is links to the original databases, which provide immediate access to information available for detected protein families.

Examples of remote similarity detection
As an illustration, we describe the detection of distant sequence similarities that lead to fold predictions for two uncharacterized PFAM families annotated as ‘DUF’ (domain of unknown function). First, the COMPASS server detects homology between DUF185 (corresponding to COG1565 of the COG database) and SCOP domains of the S-adenosyl-L-methionine-dependent methyltransferase (SAM-Mtase) fold. Using the full DUF185 (PFAM 19.0) alignment as a query, with the default input parameters (Figure 1A), the server returns a list of hits that consistently belong to the same SCOP superfamily (c.66.1), both above and below the E-value cutoff (Figure 1B). In this list, each line consists of four fields: the identifier in the original database (implemented as a link to the database), a brief description of the protein, the COMPASS score and the corresponding E-value.

The next section of the output includes profile-profile alignments between the query and the hits. Each alignment is accompanied by a header with a brief information about the hit. Unlike the PSI-BLAST format, the alignments can include different numbers of top sequences from input MSAs and/or consensus sequences. Figure 1C shows an example of such an alignment, with a single top sequence and consensus displayed for each profile. To distinguish the gaps introduced by COMPASS from the gaps that already occur in the input alignments, the former are shown as equal signs (=). The alignment in Figure 1C includes the region of similarity between the query (profile for DUF185) and a homologous profile based on the PSI-BLAST alignment for structural domain 1i4wA. In addition to similar patterns of hydrophobicity and small residues, DUF185 shows a strong conservation of SAM-Mtase signature motifs [reviewed in (39)]. The SAM-binding loop GxGxG (circled) and conserved acidic residue in the preceding ß-strand (marked with a red dot) are parts of Motif I, whereas the invariant glutamate at the end of the next ß-strand (marked with a red dot) is a part of Motif II (39).

This previously published prediction had been difficult to produce by PSI-BLAST, even for an expert user (24). However, it was more recently confirmed by the solved structure of a DUF185 member. This structure (PDB ID 1zkd, Northeast Structural Genomics Consortium) has been neither functionally annotated nor classified by SCOP or CATH, but possesses typical features of the SAM-Mtase fold (Figure 1D). The core of the domain contains a mixed ß-sheet of seven ß-strands surrounded by two sheets of {alpha}-helices. The strand order is 3214576; with strand 7 (colored red) anti-parallel to the rest and forming a characteristic methyltransferase ß-hairpin with strand 6 (colored orange). In this domain, the ß-hairpin contains an additional {alpha}-helical insert (orange helices). The presence of a glycine-rich loop (circled) and other signature motifs, including glutamates marked in Figure 1C (side chains shown in red), suggest that this domain is a functional methyltransferase.

The second prediction originates from searching with RrnaAD methylase family as a query. This search reveals a newly identified similarity to a PFAM family of mainly hypothetical bacterial proteins with unknown structure and function, DUF519 (corresponding to COG2961 in the COG database). Thus, we suggest that DUF519/COG2961 proteins also possess the structural SAM-Mtase fold. This hypothesis is supported by the results of a search with the PFAM 19.0 DUF519 alignment as a query against the database of SCOP profiles (PSI-BLAST iteration 3). Homologs detected above the significance threshold, as well as multiple hits below the threshold, consistently belong to the SAM-Mtase fold.

Figure 2A shows the COMPASS alignment between DUF519 and the detected homolog, a domain of the SAM-Mtase fold (PDB ID 1qyrA). This domain (not shown) possesses typical features of the fold and is similar to the structure shown in Figure 1D. Figure 2A shows the COMPASS alignment including the signature motifs of SAM-Mtases. Figure 2B shows the MSA of representatives from both families that covers SAM-Mtase Motifs I and II (39). In DUF519, this region includes the invariant glutamate aligned to a ligand-binding glutamate of SAM-Mtases (E95 in the top sequence, marked with red dot), the characteristic location of conserved small residues in the SAM-binding loop (marked with a line) and a similar hydrophobicity pattern. Secondary structure prediction for this part of DUF519 is also consistent with the secondary structure of the SAM-Mtase fold. This prediction is additionally supported by other tools, e.g. by (i) significant scores for the similarity with the SCOP SAM-Mtase domains produced by FFAS03 server (14); and (ii) the results of multiple iterations of PSI-BLAST search in a sequence database with a family representative as a query. After four iterations, PSI-BLAST detects the similarity between a DUF519 sequence Q9PHA1_XYLFA (gi|15836648, residues 32-291) and two proteins of known structure possessing the SAM-Mtase fold (PDB IDs 2ift and 2fpo).


Figure 2
View larger version (64K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Search results for PFAM DUF519 suggest that this family possesses the structural fold of SAM-Mtases. (A) DUF519 is used as a query for the COMPASS search against the databases of PSI-BLAST alignments (iteration 3) for SCOP representatives. The COMPASS alignment between the query and the detected homolog (domain 1qyrA) includes characteristic motifs of the SAM-Mtase superfamily. In this example, only consensus sequences are displayed. Positions corresponding to the conserved acidic residues of Motifs I and II (39) are marked with red dots. The region of the SAM-binding loop is circled. (B) Multiple alignment including representatives from DUF519 (top) and 1qyrA homologs (bottom). Sequences are denoted by NCBI GI numbers. Positions corresponding to conserved acidic residues of SAM-Mtase are marked with red dots. The region of the ligand-binding loop is marked with a line. Invariant residues are boxed in black. Uncharged residues (all amino acids except D, E, K, R) in mostly hydrophobic sites are highlighted in yellow; non-hydrophobic residues (all amino acids except W, F, Y, M, L, I, V) at mostly hydrophilic sites are highlighted in light gray. The secondary structure of 1qyrA is shown below the alignment, with {alpha}-helices and ß-strands displayed as cylinders and arrows, respectively.

 


    ACKNOWLEDGEMENTS
 
The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing high-performance computing resources. We would like to thank Lisa Kinch and James Wrabl for discussions and critical reading of the manuscript. Funding to pay the Open Access publication charges for this article was provided by Howard Hughes Medical Institute.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 REFERENCES
 

  1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 25:3389–3402.[Abstract/Free Full Text]

  2. Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res (2001) 29:2994–3005.[Abstract/Free Full Text]

  3. Eddy SR. Profile hidden Markov models. Bioinformatics (1998) 14:755–763.[Abstract/Free Full Text]

  4. Karplus K, Barrett C, Cline M, Diekhans M, Grate L, Hughey R. Predicting protein structure using only sequence information. Proteins (1999) 37(Suppl. 3):121–125.[Web of Science][Medline]

  5. Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, Diekhans M, Hughey R. Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins (2003) 53(Suppl. 6):491–496.[CrossRef][Web of Science][Medline]

  6. Pietrokovski S. Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Res (1996) 24:3836–3845.[Abstract/Free Full Text]

  7. Rychlewski L, Jaroszewski L, Li W, Godzik A. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci (2000) 9:232–241.[Web of Science][Medline]

  8. Yona G, Levitt M. Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J. Mol. Biol (2002) 315:1257–1275.[CrossRef][Web of Science][Medline]

  9. Sadreyev RI, Grishin NV. COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J. Mol. Biol (2003) 326:317–336.[CrossRef][Web of Science][Medline]

  10. Ginalski K, von Grotthuss M, Grishin NV, Rychlewski L. Detecting distant homology with Meta-BASIC. Nucleic Acids Res (2004) 32:W576–581.[Abstract/Free Full Text]

  11. Edgar RC, Sjolander K. COACH: profile-profile alignment of protein families using hidden Markov models. Bioinformatics (2004) 20:1309–1318.[Abstract/Free Full Text]

  12. Soding J. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21:951–960.[Abstract/Free Full Text]

  13. Frenkel-Morgenstern M, Singer A, Bronfeld H, Pietrokovski S. One-Block CYRCA: an automated procedure for identifying multiple-block alignments from single block queries. Nucleic Acids Res (2005) 33:W281–W283.[Abstract/Free Full Text]

  14. Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A. FFAS03: a server for profile–profile sequence alignments. Nucleic Acids Res (2005) 33:W284–W288.[Abstract/Free Full Text]

  15. Soding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res (2005) 33:W244–W248.[Abstract/Free Full Text]

  16. Soding J, Remmert M, Biegert A, Lupas AN. HHsenser: exhaustive transitive profile search using HMM-HMM comparison. Nucleic Acids Res (2006) 34:W374–W378.[Abstract/Free Full Text]

  17. Ohlson T, Wallner B, Elofsson A. Profile-profile methods provide improved fold-recognition: a study of different profile-profile alignment methods. Proteins (2004) 57:188–197.[CrossRef][Web of Science][Medline]

  18. Wang G, Dunbrack RL Jr. Scoring profile-to-profile sequence alignments. Protein Sci (2004) 13:1612–1626.[CrossRef][Web of Science][Medline]

  19. Chivian D, Kim DE, Malmstrom L, Schonbrun J, Rohl CA, Baker D. Prediction of CASP6 structures using automated Robetta protocols. Proteins (2005) 61(Suppl. 7):157–166.[CrossRef][Web of Science][Medline]

  20. Ginalski K, Elofsson A, Fischer D, Rychlewski L. 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics (2003) 19:1015–1018.[Abstract/Free Full Text]

  21. Kelley LA, MacCallum RM, Sternberg MJ. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol (2000) 299:499–520.[Web of Science][Medline]

  22. Shi J, Blundell TL, Mizuguchi K. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol (2001) 310:243–257.[CrossRef][Web of Science][Medline]

  23. Zhou H, Zhou Y. SPARKS 2 and SP3 servers in CASP6. Proteins (2005) 61(Suppl. 7):152–156.[Web of Science][Medline]

  24. Sadreyev RI, Baker D, Grishin NV. Profile-profile comparisons by COMPASS predict intricate homologies between protein families. Protein Sci (2003) 12:2262–2272.[CrossRef][Web of Science][Medline]

  25. Birtle Z, Ponting CP. Meisetz and the birth of the KRAB motif. Bioinformatics (2006) 22:2841–2845.[Abstract/Free Full Text]

  26. Kim BH, Sadreyev R, Grishin NV. COG4849 is a novel family of nucleotidyltransferases. J. Mol. Recognit (2005) 18:422–425.[CrossRef][Web of Science][Medline]

  27. Theobald DL, Cervantes RB, Lundblad V, Wuttke DS. Homology among telomeric end-protection proteins. Structure (2003) 11:1049–1050.[Medline]

  28. Theobald DL, Wuttke DS. Prediction of multiple tandem OB-fold domains in telomere end-binding proteins Pot1 and Cdc13. Structure (2004) 12:1877–1879.[Medline]

  29. Theobald DL, Wuttke DS. Divergent evolution within protein superfolds inferred from profile-based phylogenetics. J. Mol. Biol (2005) 354:722–737.[CrossRef][Web of Science][Medline]

  30. Wels M, Francke C, Kerkhoven R, Kleerebezem M, Siezen RJ. Predicting cis-acting elements of Lactobacillus plantarum by comparative genomics with different taxonomic subgroups. Nucleic Acids Res (2006) 34:1947–1958.[Abstract/Free Full Text]

  31. Winter EE, Ponting CP. Mammalian BEX, WEX and GASP genes: coding and non-coding chimaerism sustained by gene conversion events. BMC Evol. Biol (2005) 5:54.[CrossRef][Medline]

  32. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, et al. Pfam: clans, web tools and services. Nucleic Acids Res (2006) 34:D247–D251.[Abstract/Free Full Text]

  33. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science (1997) 278:631–637.[Abstract/Free Full Text]

  34. Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, et al. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res (2001) 29:22–28.[Abstract/Free Full Text]

  35. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res (2000) 28:235–242.[Abstract/Free Full Text]

  36. Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res (2004) 32:D226–D229.[Abstract/Free Full Text]

  37. Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. The ASTRAL Compendium in 2004. Nucleic Acids Res (2004) 32:D189–D192.[Abstract/Free Full Text]

  38. Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics (2004) 5:113.[CrossRef][Medline]

  39. Schubert HL, Blumenthal RM, Cheng X. Many paths to methyltransfer: a chronicle of convergence. Trends Biochem. Sci (2003) 28:329–335.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Virol.Home page
C. Rancurel, M. Khosravi, A. K. Dunker, P. R. Romero, and D. Karlin
Overlapping Genes Produce Proteins with Unusual Sequence Properties and Offer Insight into De Novo Protein Creation
J. Virol., October 15, 2009; 83(20): 10719 - 10736.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B. W. Brandt and J. Heringa
webPRC: the Profile Comparer for alignment-based searching of public domain databases
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W48 - W52.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. I. Sadreyev, M. Tang, B.-H. Kim, and N. V. Grishin
COMPASS server for homology detection: improved statistical accuracy, speed and functionality
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W90 - W94.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B.-H. Kim, H. Cheng, and N. V. Grishin
HorA web server to infer homology between proteins using sequence and structural similarity
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W532 - W538.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. Jung and D. Kim
SIMPRO: simple protein homology detection method by using indirect signals
Bioinformatics, March 15, 2009; 25(6): 729 - 735.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. I. Sadreyev and N. V. Grishin
Accurate statistical model of comparison between multiple sequence alignments
Nucleic Acids Res., April 1, 2008; 36(7): 2240 - 2248.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. J. Reid, C. Yeats, and C. A. Orengo
Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone
Bioinformatics, September 15, 2007; 23(18): 2353 - 2360.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (5357K) Freely available
Right arrow Screen PDF (564K) Freely available
Right arrowOA All Versions of this Article:
35/suppl_2/W653    most recent
gkm293v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Sadreyev, R. I.
Right arrow Articles by Grishin, N. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sadreyev, R. I.
Right arrow Articles by Grishin, N. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?