Nucleic Acids Research, 2004, Vol. 32, Database issue D189-D192
© 2004 Oxford University Press
The ASTRAL Compendium in 2004
1 Berkeley Structural Genomics Center, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, 2 Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA, 3 Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA, 4 MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK and 5 Department of Structural Biology, D-109 Fairchild, Stanford University, Stanford, CA 94305, USA
*To whom correspondence should be addressed at Department of Plant and Microbial Biology, 461A Koshland Hall, University of California, Berkeley, CA 94720-3102, USA. Tel: +1 510 643 9131; Fax: +1 208 279 8978; Email: brenner{at}compbio.berkeley.edu
Received September 11, 2003; Revised and Accepted September 16, 2003
| ABSTRACT |
|---|
|
|
|---|
The ASTRAL Compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. Partially derived from the SCOP database of protein structure domains, it includes sequences for each domain and other resources useful for studying these sequences and domain structures. The current release of ASTRAL contains 54 745 domains, more than three times as many as the initial release 4 years ago. ASTRAL has undergone major transformations in the past 2 years. In addition to several complete updates each year, ASTRAL is now updated on a weekly basis with preliminary classifications of domains from newly released PDB structures. These classifications are available as a stand-alone database, as well as integrated into other ASTRAL databases such as representative subsets. To enhance the utility of ASTRAL to structural biologists, all SCOP domains are now made available as PDB-style coordinate files as well as sequences. In addition to sequences and representative subsets based on SCOP domains, sequences and subsets based on PDB chains are newly included in ASTRAL. Several search tools have been added to ASTRAL to facilitate retrieval of data by individual users and automated methods. ASTRAL may be accessed at http://astral.stanford. edu/.
| BACKGROUND |
|---|
|
|
|---|
The Protein Data Bank (PDB) is a centralized repository of protein structures (1) containing over 22 000 entries in August 2003. The SCOP database (2,3) provides a manually curated set of domains from all PDB entries, classified in a hierarchy indicating different levels of structural and evolutionary relationship between the domains. SCOP thus provides a broad survey of all known protein folds, detailed information about relatives of proteins of known structure and a framework for classification of additional structures as they are solved.
Many tools for bioinformatic analysis rely on sequence information, but the nature of PDB files makes it challenging to accurately extract the sequence corresponding to a given domain definition. ASTRAL addresses this issue by providing an explicit mapping between the PDB ATOM and SEQRES records, which is used to derive databases of sequences corresponding to SCOP domains, as described previously (4,5). These Rapid Access Format (RAF) maps are manually curated to eliminate errors in automatic parsing of PDB files, and to translate chemically modified amino acids back to the original sequence. The RAF maps are used to derive databases of sequences corresponding to each domain and PDB chain included in SCOP. Representative subsets of these full sequence sets are also available, chosen according to different thresholds and measures of sequence similarity.
Recent improvements to ASTRAL include the creation of PDB-style coordinate files for each SCOP domain. Sequences are now provided for each PDB chain as well as for SCOP domains; representative subsets of PDB chains are also provided. The highest quality representative in each subset is now chosen using Aberrant Entry Re-Ordered SPACI (AEROSPACI) scores rather than the SPACI scores (4) used previously; PDB entries manually annotated by the SCOP authors as aberrant are penalized so that they are less likely to be chosen as the representative structure for a given subset. Genetic domain sequences for multi-chain SCOP domains, introduced in a previous release of ASTRAL (5), are now the default. Residues appearing in PDB files which have been chemically modified after translation are replaced by the original sequence where possible in both the RAF maps and ASTRAL sequences. Many of these replacements are done automatically using the table reported previously (5); others are extracted using manual or automated curation from comments in the PDB file.
Although several complete releases of ASTRAL are produced each year, synchronized to new SCOP releases, the number of new protein structures that become available between releases of SCOP continues to increase. For example, an additional 1646 proteins (5247 domains) were added in the 5 months between the release of ASTRAL 1.63 and the current version, 1.65, compared with 1540 proteins (5170 domains) added in the previous 6 months since version 1.61. As the new data represent
10% of the total number of domains in ASTRAL, it is important to rapidly incorporate these new structures. A major new feature in ASTRAL is integration of preliminary domain classifications of newly released structures using hidden Markov models (6,7) trained on superfamilies of previously classified domains.
| CURATED MAPPINGS |
|---|
|
|
|---|
The RAF maps (5) provide explicit mappings between the sequence of PDB chains studied (the SEQRES records) and the experimentally observed atoms (the ATOM records) for every PDB chain in SCOP. Manual curation ensures that the mapping presented in the RAF file is an exact representation of the data in the original PDB file, even when the PDB file itself is erroneous. In cases where residues have been post-translationally modified, efforts are made to represent the original sequence in the RAF maps. Many standard chemical modifications are translated automatically, as described previously (5). A great amount of additional manual and automatic curation has been added in recent ASTRAL releases. Hundreds of additional translations are parsed from comments in SEQADV records, in cases where a residue is annotated as modified. Several thousand more translations are manually curated from comments in the PDB files that indicate which amino acid was chemically modified to derive a non-standard heterogen. In some cases, a single heterogen is derived from multiple amino acids, e.g. the chromophores of luminescent proteins which are cyclizations of three adjacent amino acids; these are mapped to multiple residues in the RAF maps and sequences. All non-standard residue translations are documented on our website in a format that is easily parsed by humans or automated methods. The RAF format is designed to be rapidly accessed in various computer languages, and we will soon release open source Perl modules to facilitate development of software that interacts with the RAF database.
| ASTRAL SCOP REPRESENTATIVE SUBSETS |
|---|
|
|
|---|
An overview of the ASTRAL build process is shown in Figure 1. Using the RAF maps, four complete sequence sets are created for every domain in the first seven classes of the SCOP database. Two sets (the genetic domain sets) include the genetic domain sequences described previously (5), and the other two (the original-style sequence sets) use the prior method of splitting each multi-chain domain into multiple sequences (4). For each of these methodologies, one complete sequence set is derived from sequences in the PDB ATOM records, and another from sequences in the SEQRES records. Genetic domain sequence sets mapped from SEQRES records are now the default ASTRAL sequences.
|
The SEQRES sets (for both genetic domain and original-style methods) are used to derive representative subsets. As shown in Figure 1, each set is fully compared against itself using BLAST (8), and subsets are created using the three similarity criteria (BLAST E-values, sequence identity and SCOP classification) described previously (4). Represent atives are chosen according to AEROSPACI scores, which are derived from calculated SPACI scores and manual annotation by SCOP authors. SPACI scores, a first order guide to the resolution, R-factor and stereochemical accuracy of crystallographically determined structures, have been described previously (4). AEROSPACI scores add an additional penalty of 2.0 to structures annotated as chimeric, circularly permuted, disordered, missing large regions, erroneous, misfolded, mistraced, mutant or truncated. Theoretical structures, which are not present in SCOP but still assigned AEROSPACI scores, are assigned an additional penalty of 5.0.
| PDB CHAIN SEQUENCE SETS |
|---|
|
|
|---|
A set of sequences is created which includes the sequence of every PDB chain in SCOP, based on SEQRES records. Selected subsets are also derived from this set using the same method as used to derive SCOP domain subsets. Because PDB chains often contain multiple domains, we create subsets only at high sequence identity (90100% ID); lower thresholds would produce incorrect results in cases where several multi-domain proteins share a single common domain.
| CONTINUOUS UPDATES |
|---|
|
|
|---|
ASTEROIDS (ASTral newER pOtentIal Domain Set) is a set of sequences of newly released PDB entries, divided into domains and optionally available integrated into the ASTRAL representative subsets. Because new PDB files are available each week, ASTEROIDS are created using a fully automated method for predicting domains similar to those already classified in the manually curated databases SCOP and Pfam (9).
An overview of the ASTEROIDS build process is shown on the right side of Figure 1. The 100% ID representative subset of genetic domain sequences is grouped by superfamily. Sequences from each superfamily are aligned with MAFFT (10), using the fftnsi algorithm and all default options. A hidden Markov model (7) (HMM) is created from the multiple sequence alignment for each superfamily using the HMMER (6) tools hmmbuild and hmmcalibrate (with all default options). These HMMs are built once during a full release of ASTRAL, and then used to predict domains in the sequences of newly released PDB entries on a weekly basis, using an E-value cutoff of 104. HMMs from the Pfam-A database (9) are also used to predict domains in the remaining unassigned regions of sequence; in these cases, an E-value equal to the trusted cutoff or 104, whichever is more significant, is used to assign domain predictions. Overlaps of up to 10 residues between multiple HMMs are allowed, and regions of sequence matching several domains are assigned to the one with the more significant E-value. Longer overlaps result in the entire domain prediction being rejected, and automatically flagged for later manual review to prevent further erroneous predictions. After domain assignment using HMMs, any unassigned region of at least 50 consecutive residues is also predicted to contain at least one potential domain, and included in the ASTEROIDS set. All ASTEROIDS are assigned SCOP sid-style identifiers (3) beginning with the letter u; for example, u1abcd1 would be the first of several predicted domains in chain D of the PDB entry 1ABC. The FASTA headers for the ASTEROID sequences indicate the chain and region boundaries, the source of the domain prediction (ASTRAL superfamily, Pfam or remaining unassigned region), and the version of the database and E-value of the prediction for domains identified using HMMER. ASTEROID sequences are integrated into representative ASTRAL subsets selected according to two similarity criteria (BLAST E-value and % identity) at a variety of thresholds using previously described methods for creating representative subsets (4). ASTEROID sequences are assigned AEROSPACI scores of 9.99; other PDB chains have AEROSPACI scores ranging from 5.9 to 1.91, so an ASTEROID is only chosen as a structural representative if no similar sequence already classified in SCOP is available. Representative subsets of SCOP/ASTRAL domains are available with or without ASTEROIDS, and all ASTEROIDS may also be downloaded in a single FASTA file. Multiple alignments and HMMs for ASTRAL superfamilies are also available.
| PDB-STYLE FILES |
|---|
|
|
|---|
To facilitate use of ASTRAL by structural biologists, we provide PDB-style files containing coordinates for each SCOP domain. These files also contain REMARK records documenting the original PDB file used as a data source, as well as information on the domains classification in ASTRAL and SCOP, such as identifiers and AEROSPACI scores.
| IMPROVED SCRIPTS |
|---|
|
|
|---|
CGI scripts are now provided which retrieve individual sequences and PDB-style files. Both genetic domain and original-style sequences can be retrieved, as well as sequences derived from either SEQRES or ATOM records. Data may be searched using a variety of identifiers, including PDB codes, SCOP sid identifiers and SCOP sccs identifiers (3).
| ACKNOWLEDGEMENTS |
|---|
This work is supported by grants from the NIH (1-P50-GM62412, 1-K22-HG00056) and the Searle Scholars Program (01-L-116), and by the US Department of Energy under contract DE-AC03-76SF00098.
| REFERENCES |
|---|
|
|
|---|
- Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235242.
[Abstract/Free Full Text] - Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536540.[CrossRef][Web of Science][Medline]
- Lo Conte,L., Brenner,S.E., Hubbard,T.J., Chothia,C. and Murzin,A.G. (2002) SCOP Database in 2002: refinements accommodate structural genomics. Nucleic Acids Res., 30, 264267.
[Abstract/Free Full Text] - Brenner,S.E., Koehl,P. and Levitt,M. (2000) The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res., 28, 254256.
[Abstract/Free Full Text] - Chandonia,J.M., Walker,N.S., Lo Conte,L., Koehl,P., Levitt,M. and Brenner,S.E. (2002) ASTRAL compendium enhancements. Nucleic Acids Res., 30, 260263.
[Abstract/Free Full Text] - Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755763.
[Abstract/Free Full Text] - Krogh,A., Brown,M., Mian,I.S., Sjolander,K. and Haussler,D. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol., 235, 15011531.[CrossRef][Web of Science][Medline]
- Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403410.[CrossRef][Web of Science][Medline]
- Bateman,A., Birney,E., Cerruti,L., Durbin,R., Etwiller,L., Eddy,S.R., Griffiths-Jones,S., Howe,K.L., Marshall,M. and Sonnhammer,E.L. (2002) The Pfam protein families database. Nucleic Acids Res., 30, 276280.
[Abstract/Free Full Text] - Katoh,K., Misawa,K., Kuma,K. and Miyata,T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res., 30, 30593066.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
A. Lobley, M. I. Sadowski, and D. T. Jones pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination Bioinformatics, July 15, 2009; 25(14): 1761 - 1767. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kumar and L. Cowen Augmented training of hidden Markov models to recognize remote homologs via simulated evolution Bioinformatics, July 1, 2009; 25(13): 1602 - 1608. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-C. Lo, C.-Y. Lee, C.-C. Lee, and P.-C. Lyu iSARST: an integrated SARST web server for rapid protein structural similarity searches Nucleic Acids Res., July 1, 2009; 37(suppl_2): W545 - W551. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. I. Sadreyev, S. Shi, D. Baker, and N. V. Grishin Structure similarity measure with penalty for close non-equivalent residues Bioinformatics, May 15, 2009; 25(10): 1259 - 1263. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lo, Y.-Y. Chiu, E. A. Rodland, P.-C. Lyu, T.-Y. Sung, and W.-L. Hsu Predicting helix-helix interactions from residue contacts in membrane proteins Bioinformatics, April 15, 2009; 25(8): 996 - 1003. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Jung and D. Kim SIMPRO: simple protein homology detection method by using indirect signals Bioinformatics, March 15, 2009; 25(6): 729 - 735. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. F. Altschul, E. M. Gertz, R. Agarwala, A. A. Schaffer, and Y.-K. Yu PSI-BLAST pseudocounts and the minimum description length principle Nucleic Acids Res., February 1, 2009; 37(3): 815 - 824. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zhang, R. Sprung, J. Pei, X. Tan, S. Kim, H. Zhu, C.-F. Liu, N. V. Grishin, and Y. Zhao Lysine Acetylation Is a Highly Abundant and Evolutionarily Conserved Modification in Escherichia Coli Mol. Cell. Proteomics, February 1, 2009; 8(2): 215 - 225. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Wilson, R. Pethica, Y. Zhou, C. Talbot, C. Vogel, M. Madera, C. Chothia, and J. Gough SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny Nucleic Acids Res., January 1, 2009; 37(suppl_1): D380 - D386. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Veeramalai and D. Gilbert A novel method for comparing topological models of protein structures enhanced with ligand information Bioinformatics, December 1, 2008; 24(23): 2698 - 2705. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Stojmirovic, E. M. Gertz, S. F. Altschul, and Y.-K. Yu The effectiveness of position- and composition-specific gap costs for protein similarity searches Bioinformatics, July 1, 2008; 24(13): i15 - i23. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, M. Tang, and N. V. Grishin PROMALS3D web server for accurate multiple protein sequence and structure alignments Nucleic Acids Res., July 1, 2008; 36(suppl_2): W30 - W34. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A.C. Beck, A. L. Jonsson, R. D. Schaeffer, K. A. Scott, R. Day, R. D. Toofanny, D. O.V. Alonso, and V. Daggett Dynameomics: mass annotation of protein dynamics and unfolding in water by high-throughput atomistic molecular dynamics simulations Protein Eng. Des. Sel., June 1, 2008; 21(6): 353 - 368. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Poleksic and M. Fienup Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms Bioinformatics, May 1, 2008; 24(9): 1145 - 1153. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, B.-H. Kim, and N. V. Grishin PROMALS3D: a tool for multiple protein sequence and structure alignments Nucleic Acids Res., April 1, 2008; 36(7): 2295 - 2300. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. N.I. Pang, K. Lin, M. A. Wouters, J. Heringa, and R. A. George Identifying foldable regions in protein sequence from the hydrophobic signal Nucleic Acids Res., February 2, 2008; 36(2): 578 - 588. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Birzele, J. E. Gewehr, and R. Zimmer AutoPSI: a database for automatic structural classification of protein sequences and structures Nucleic Acids Res., January 11, 2008; 36(suppl_1): D398 - D401. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Andreeva, D. Howorth, J.-M. Chandonia, S. E. Brenner, T. J. P. Hubbard, C. Chothia, and A. G. Murzin Data growth and its impact on the SCOP database: new developments Nucleic Acids Res., January 11, 2008; 36(suppl_1): D419 - D425. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. A. Shamim, M. Anwaruddin, and H.A. Nagarajaram Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs Bioinformatics, December 15, 2007; 23(24): 3320 - 3327. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. S. Domingues, J. Rahnenfuhrer, and T. Lengauer Conformational analysis of alternative protein structures Bioinformatics, December 1, 2007; 23(23): 3131 - 3138. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. T. Saunders and P. Green Insights from Modeling Protein Evolution with Context-Dependent Mutation and Asymmetric Amino Acid Selection Mol. Biol. Evol., December 1, 2007; 24(12): 2632 - 2647. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.R. Davies, R.M. Jackson, K.V. Mardia, and C.C. Taylor The Poisson Index: a new probabilistic model for protein ligand binding site similarity Bioinformatics, November 15, 2007; 23(22): 3001 - 3008. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Shiryev, J. S. Papadopoulos, A. A. Schaffer, and R. Agarwala Improved BLAST searches using longer words for protein seeding Bioinformatics, November 1, 2007; 23(21): 2949 - 2951. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, B.-H. Kim, M. Tang, and N. V. Grishin PROMALS web server for accurate multiple protein sequence alignments Nucleic Acids Res., July 13, 2007; 35(suppl_2): W649 - W652. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. I. Sadreyev, M. Tang, B.-H. Kim, and N. V. Grishin COMPASS server for remote homology inference Nucleic Acids Res., July 13, 2007; 35(suppl_2): W653 - W658. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lerman and B. E. Shakhnovich Defining functional distance using manifold embeddings of gene ontology annotations PNAS, July 3, 2007; 104(27): 11334 - 11339. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Pandini, G. Mauri, A. Bordogna, and L. Bonati Detecting similarities among distant homologous proteins by comparison of domain flexibilities Protein Eng. Des. Sel., June 30, 2007; (2007) gzm021v2. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Cavalli, X. Salvatella, C. M. Dobson, and M. Vendruscolo Protein structure determination from NMR chemical shifts PNAS, June 5, 2007; 104(23): 9615 - 9620. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Gewehr, V. Hintermair, and R. Zimmer AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings Bioinformatics, May 15, 2007; 23(10): 1203 - 1210. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Gewehr, M. Szugat, and R. Zimmer BioWeka extending the Weka framework for bioinformatics Bioinformatics, March 1, 2007; 23(5): 651 - 653. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Summa and M. Levitt Near-native structure refinement using in vacuo energy minimization PNAS, February 27, 2007; 104(9): 3177 - 3182. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Pandini, L. Bonati, F. Fraternali, and J. Kleinjung MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database Bioinformatics, February 15, 2007; 23(4): 515 - 516. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Leslin, A. Abyzov, and V. A. Ilyin TOPOFIT-DB, a database of protein structural alignments based on the TOPOFIT method Nucleic Acids Res., January 12, 2007; 35(suppl_1): D317 - D321. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei and N. V. Grishin MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information Nucleic Acids Res., September 11, 2006; 34(16): 4364 - 4374. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Tangrot, L. Wang, B. Kagstrom, and U. H. Sauer FISH--family identification of sequence homologues using structure anchored hidden Markov models. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W10 - W14. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Novatchkova, G. Schneider, R. Fritz, F. Eisenhaber, and A. Schleiffer DOUTfinder--identification of distant domain outliers using subsignificant sequence similarity. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W214 - W218. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Camoglu, T. Can, and A. K. Singh Integrating multi-attribute similarity networks for robust representation of the protein space Bioinformatics, July 1, 2006; 22(13): 1585 - 1592. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. P. Davis, H. Braberg, M.-Y. Shen, U. Pieper, A. Sali, and M.S. Madhusudhan Protein complex compositions predicted by structural similarity Nucleic Acids Res., May 31, 2006; 34(10): 2943 - 2952. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Fodor and R. W. Aldrich Statistical Limits to the Identification of Ion Channel Domains by Sequence Similarity J. Gen. Physiol., May 30, 2006; 127(6): 755 - 766. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-C. Ngan, M. T. Inouye, and R. Samudrala A knowledge-based scoring function based on residue triplets for protein structure prediction Protein Eng. Des. Sel., May 1, 2006; 19(5): 187 - 193. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Gewehr and R. Zimmer SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles Bioinformatics, January 15, 2006; 22(2): 181 - 187. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. O'Donoghue, A. Sethi, C. R. Woese, and Z. A. Luthey-Schulten The evolutionary history of Cys-tRNACys formation PNAS, December 27, 2005; 102(52): 19003 - 19008. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Price, G. E. Crooks, R. E. Green, and S. E. Brenner Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap Bioinformatics, October 15, 2005; 21(20): 3824 - 3831. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Juretic, D. R. Hoen, M. L. Huynh, P. M. Harrison, and T. E. Bureau The evolutionary fate of MULE-mediated duplications of host gene fragments in rice Genome Res., September 1, 2005; 15(9): 1292 - 1297. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Bastolla and L. Demetrius Stability constraints and protein evolution: the role of chain length, composition and disulfide bonds Protein Eng. Des. Sel., September 1, 2005; 18(9): 405 - 415. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Lupyan, A. Leo-Macias, and A. R. Ortiz A new progressive-iterative algorithm for multiple structure alignment Bioinformatics, August 1, 2005; 21(15): 3255 - 3263. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Wang and R. L. Dunbrack Jr PISCES: recent improvements to a PDB sequence culling server Nucleic Acids Res., July 1, 2005; 33(suppl_2): W94 - W98. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Jaroszewski, L. Rychlewski, Z. Li, W. Li, and A. Godzik FFAS03: a server for profile-profile sequence alignments Nucleic Acids Res., July 1, 2005; 33(suppl_2): W284 - W288. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bao, M. Zhou, and Y. Cui nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms Nucleic Acids Res., July 1, 2005; 33(suppl_2): W480 - W482. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Wang and R. Samudrala FSSA: a novel method for identifying functional signatures from structural alignments Bioinformatics, July 1, 2005; 21(13): 2969 - 2977. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Han, B.-c. Lee, S. T. Yu, C.-s. Jeong, S. Lee, and D. Kim Fold recognition by combining profile-profile alignment and support vector machine Bioinformatics, June 1, 2005; 21(11): 2667 - 2673. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bao and Y. Cui Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information Bioinformatics, May 15, 2005; 21(10): 2185 - 2190. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Harrison, D. Zheng, Z. Zhang, N. Carriero, and M. Gerstein Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability Nucleic Acids Res., April 28, 2005; 33(8): 2374 - 2383. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Itoh, S. Goto, T. Akutsu, and M. Kanehisa Fast and accurate database homology search using upper bounds of local alignment scores Bioinformatics, April 1, 2005; 21(7): 912 - 921. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Soding Protein homology detection by HMM-HMM comparison Bioinformatics, April 1, 2005; 21(7): 951 - 960. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Van Walle, I. Lasters, and L. Wyns SABmark--a benchmark for sequence alignment that covers the entire known fold space Bioinformatics, April 1, 2005; 21(7): 1267 - 1268. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Sethi, P. O'Donoghue, and Z. Luthey-Schulten Evolutionary profiles from the QR factorization of multiple sequence alignments PNAS, March 15, 2005; 102(11): 4045 - 4050. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Via, A. Zanzoni, and M. Helmer-Citterich Seq2Struct: a resource for establishing sequence-structure links Bioinformatics, February 15, 2005; 21(4): 551 - 553. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Fontana, E. Bindewald, S. Toppo, R. Velasco, G. Valle, and S. C. E. Tosatto The SSEA server for protein secondary structure alignment Bioinformatics, February 1, 2005; 21(3): 393 - 395. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Katoh, K.-i. Kuma, H. Toh, and T. Miyata MAFFT version 5: improvement in accuracy of multiple sequence alignment Nucleic Acids Res., January 20, 2005; 33(2): 511 - 518. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bru, E. Courcelle, S. Carrere, Y. Beausse, S. Dalmar, and D. Kahn The ProDom database of protein domain families: more emphasis on 3D Nucleic Acids Res., January 1, 2005; 33(suppl_1): D212 - D215. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Qian, A. R. Ortiz, and D. Baker Improvement of comparative model accuracy by free-energy optimization along principal components of natural structural variation PNAS, October 26, 2004; 101(43): 15346 - 15351. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Shapiro and D. Brutlag FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web Nucleic Acids Res., July 1, 2004; 32(suppl_2): W536 - W541. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. S. Domingues, J. Rahnenfuhrer, and T. Lengauer Automated clustering of ensembles of alternative models in protein structure databases Protein Eng. Des. Sel., June 1, 2004; 17(6): 537 - 543. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Govaerts, H. Wille, S. B. Prusiner, and F. E. Cohen Evidence for assembly of prions with left-handed {beta}-helices into trimers PNAS, June 1, 2004; 101(22): 8342 - 8347. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Andreeva, D. Howorth, S. E. Brenner, T. J. P. Hubbard, C. Chothia, and A. G. Murzin SCOP database in 2004: refinements integrate structure and sequence family data Nucleic Acids Res., January 1, 2004; 32(90001): D226 - 229. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








