Nucleic Acids Research, 2001, Vol. 29, No. 14 2994-3005
© 2001 Oxford University Press
Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements
National Center for Biotechnology Information, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
PSI-BLAST is an iterative program to search a database for proteins with distant similarity to a query sequence. We investigated over a dozen modifications to the methods used in PSI-BLAST, with the goal of improving accuracy in finding true positive matches. To evaluate performance we used a set of 103 queries for which the true positives in yeast had been annotated by human experts, and a popular measure of retrieval accuracy (ROC) that can be normalized to take on values between 0 (worst) and 1 (best). The modifications we consider novel improve the ROC score from 0.758 ± 0.005 to 0.895 ± 0.003. This does not include the benefits from four modifications we included in the baseline version, even though they were not implemented in PSI-BLAST version 2.0. The improvement in accuracy was confirmed on a small second test set. This test involved analyzing three protein families with curated lists of true positives from the non-redundant protein database. The modification that accounts for the majority of the improvement is the use, for each database sequence, of a position-specific scoring system tuned to that sequences amino acid composition. The use of composition-based statistics is particularly beneficial for large-scale automated applications of PSI-BLAST.
* To whom correspondence should be addressed. Tel: +1 301 435 5884; Fax: +1 301 480 2918; Email: schaffer{at}helix.nih.gov
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. A. Daep, R. J. Lamont, and D. R. Demuth Interaction of Porphyromonas gingivalis with Oral Streptococci Requires a Motif That Resembles the Eukaryotic Nuclear Receptor Box Protein-Protein Interaction Domain Infect. Immun., July 1, 2008; 76(7): 3273 - 3280. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Stojmirovic, E. M. Gertz, S. F. Altschul, and Y.-K. Yu The effectiveness of position- and composition-specific gap costs for protein similarity searches Bioinformatics, July 1, 2008; 24(13): i15 - i23. [Abstract] [PDF] |
||||
![]() |
C. E. Martinez-Guerrero, R. Ciria, C. Abreu-Goodger, G. Moreno-Hagelsieb, and E. Merino GeConT 2: gene context analysis for orthologous proteins, conserved domains and metabolic pathways Nucleic Acids Res., July 1, 2008; 36(suppl_2): W176 - W180. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Yu, B. Zhang, G. R. Szilvay, R. Sun, J. Janis, Z. Wang, S. Feng, H. Xu, M. B. Linder, and M. Qiao Protein HGFI from the edible mushroom Grifola frondosa is a novel 8 kDa class I hydrophobin that forms rodlets in compressed monolayers Microbiology, June 1, 2008; 154(6): 1677 - 1685. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Dumetz, E. Duchaud, S. Claverol, N. Orieux, S. Papillon, D. Lapaillerie, and M. Le Henaff Analysis of the Flavobacterium psychrophilum outer-membrane subproteome and identification of new antigenic targets for vaccine by immunomics Microbiology, June 1, 2008; 154(6): 1793 - 1801. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Lee, M. K. Chan, and R. Bundschuh Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches Bioinformatics, June 1, 2008; 24(11): 1339 - 1343. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Holmes, T. Mester, R. A. O'Neil, L. A. Perpetua, M. J. Larrahondo, R. Glaven, M. L. Sharma, J. E. Ward, K. P. Nevin, and D. R. Lovley Genes for two multicopper proteins required for Fe(III) oxide reduction in Geobacter sulfurreducens have different expression patterns both in the subsurface and on energy-harvesting electrodes Microbiology, May 1, 2008; 154(5): 1422 - 1435. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Sonego, A. Kocsor, and S. Pongor ROC analysis: applications to the classification of biological sequences and 3D structures Brief Bioinform, May 1, 2008; 9(3): 198 - 209. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Csuros, I. B. Rogozin, and E. V. Koonin Extremely Intron-Rich Genes in the Alveolate Ancestors Inferred with a Flexible Maximum-Likelihood Approach Mol. Biol. Evol., May 1, 2008; 25(5): 903 - 911. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Oberhardt, J. Puchalka, K. E. Fryer, V. A. P. Martins dos Santos, and J. A. Papin Genome-Scale Metabolic Network Analysis of the Opportunistic Pathogen Pseudomonas aeruginosa PAO1 J. Bacteriol., April 15, 2008; 190(8): 2790 - 2803. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. I. Sadreyev and N. V. Grishin Accurate statistical model of comparison between multiple sequence alignments Nucleic Acids Res., April 1, 2008; 36(7): 2240 - 2248. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. I. Elvira, M. M. Galdeano, P. Gilardi, I. Garcia-Luque, and M. T. Serra Proteomic analysis of pathogenesis-related proteins (PRs) induced by compatible and incompatible interactions of pepper mild mottle virus (PMMoV) in Capsicum chinense L3 plants J. Exp. Bot., April 1, 2008; 59(6): 1253 - 1265. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. A. Cymerman, I. Chung, B. M. Beckmann, J. M. Bujnicki, and G. Meiss EXOG, a novel paralog of Endonuclease G in higher eukaryotes Nucleic Acids Res., March 27, 2008; 36(4): 1369 - 1379. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Shah, C. S. Oehmen, and B.-J. Webb-Robertson SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection Bioinformatics, March 15, 2008; 24(6): 783 - 790. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. L. Obuchowski and C. Jacobs-Wagner PflI, a Protein Involved in Flagellar Positioning in Caulobacter crescentus J. Bacteriol., March 1, 2008; 190(5): 1718 - 1729. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. D. Gordon, G. L. Ottaviano, S. E. Connell, G. V. Tobkin, C. H. Son, S. Shterental, and A. M. Gehring Secreted-Protein Response to {sigma}U Activity in Streptomyces coelicolor J. Bacteriol., February 1, 2008; 190(3): 894 - 904. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Moreno-Hagelsieb and K. Latimer Choosing BLAST options for better detection of orthologs as reciprocal best hits Bioinformatics, February 1, 2008; 24(3): 319 - 324. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. K. Basu, I. B. Rogozin, O. Deusch, T. Dagan, W. Martin, and E. V. Koonin Evolutionary Dynamics of Introns in Plastid-Derived Genes in Plants: Saturation Nearly Reached but Slow Intron Gain Continues Mol. Biol. Evol., January 1, 2008; 25(1): 111 - 119. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. H. Hazen, D. Wu, J. A. Eisen, and P. A. Sobecky Sequence Characterization and Comparative Analysis of Three Plasmids Isolated from Environmental Vibrio spp. Appl. Envir. Microbiol., December 1, 2007; 73(23): 7703 - 7710. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Coronado, S. Mneimneh, S. L. Epstein, W.-G. Qiu, and P. N. Lipke Conserved Processes and Lineage-Specific Proteins in Fungal Cell Wall Evolution Eukaryot. Cell, December 1, 2007; 6(12): 2269 - 2277. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Fan, Y. Liu, D. Smith, L. Konermann, K. W. M. Siu, and D. Golemi-Kotra Diversity of Penicillin-binding Proteins: RESISTANCE FACTOR FmtA OF STAPHYLOCOCCUS AUREUS J. Biol. Chem., November 30, 2007; 282(48): 35143 - 35152. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Gerken, C. A. Girard, Y.-C. L. Tung, C. J. Webby, V. Saudek, K. S. Hewitson, G. S. H. Yeo, M. A. McDonough, S. Cunliffe, L. A. McNeill, et al. The Obesity-Associated FTO Gene Encodes a 2-Oxoglutarate-Dependent Nucleic Acid Demethylase Science, November 30, 2007; 318(5855): 1469 - 1472. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-B. Shen and K.-C. Chou Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM Protein Eng. Des. Sel., November 10, 2007; (2007) gzm057v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Shiryev, J. S. Papadopoulos, A. A. Schaffer, and R. Agarwala Improved BLAST searches using longer words for protein seeding Bioinformatics, November 1, 2007; 23(21): 2949 - 2951. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Swe, N. C. K. Heng, Y.-T. Ting, H. J. Baird, A. Carne, A. Tauch, J. R. Tagg, and R. W. Jack ef1097 and ypkK encode enterococcin V583 and corynicin JK, members of a new family of antimicrobial proteins (bacteriocins) with modular structure from Gram-positive bacteria Microbiology, October 1, 2007; 153(10): 3218 - 3227. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Pukatzki, A. T. Ma, A. T. Revel, D. Sturtevant, and J. J. Mekalanos Type VI secretion system translocates a phage tail spike-like protein into target cells where it cross-links actin PNAS, September 25, 2007; 104(39): 15508 - 15513. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Heger, S. Mallick, C. Wilton, and L. Holm The global trace graph, a novel paradigm for searching protein sequence databases Bioinformatics, September 15, 2007; 23(18): 2361 - 2367. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. Berti, N. J. Greve, Q. H. Christensen, and M. G. Thomas Identification of a Biosynthetic Gene Cluster and the Six Associated Lipopeptides Involved in Swarming Motility of Pseudomonas syringae pv. tomato DC3000 J. Bacteriol., September 1, 2007; 189(17): 6312 - 6323. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Dovey and P. Russell Mms22 Preserves Genomic Integrity During DNA Replication in Schizosaccharomyces pombe Genetics, September 1, 2007; 177(1): 47 - 61. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Martin, L.-L. Du, S. Rozenzhak, and P. Russell Protection of telomeres by a conserved Stn1 Ten1 complex PNAS, August 28, 2007; 104(35): 14038 - 14043. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. V. Parfenova, K. Abarca-Heidemann, B. M. Crane, and B. S. Rothberg Molecular Architecture and Divalent Cation Activation of TvoK, a Prokaryotic Potassium Channel J. Biol. Chem., August 17, 2007; 282(33): 24302 - 24309. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. Hussa, T. M. O'Shea, C. L. Darnell, E. G. Ruby, and K. L. Visick Two-Component Response Regulators of Vibrio fischeri: Identification, Mutagenesis, and Characterization J. Bacteriol., August 15, 2007; 189(16): 5825 - 5838. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Suhrer, M. Gruber, and M. J. Sippl QSCOP-BLAST--fast retrieval of quantified structural information for protein sequences of unknown structure Nucleic Acids Res., July 13, 2007; 35(suppl_2): W411 - W415. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. I. Sadreyev, M. Tang, B.-H. Kim, and N. V. Grishin COMPASS server for remote homology inference Nucleic Acids Res., July 13, 2007; 35(suppl_2): W653 - W658. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Weinberg, J. E. Barrick, Z. Yao, A. Roth, J. N. Kim, J. Gore, J. X. Wang, E. R. Lee, K. F. Block, N. Sudarsan, et al. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline Nucleic Acids Res., July 9, 2007; (2007) gkm487v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Kann, S. L. Sheetlin, Y. Park, S. H. Bryant, and J. L. Spouge The identification of complete domains within protein sequences using accurate E-values for semi-global alignment Nucleic Acids Res., July 9, 2007; 35(14): 4678 - 4685. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. J. Evans, C. Lambert, and R. E. Sockett Predation by Bdellovibrio bacteriovorus HD100 Requires Type IV Pili J. Bacteriol., July 1, 2007; 189(13): 4850 - 4859. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Zmora, J. Trant, S.-M. Chan, and J. S. Chung Vitellogenin and Its Messenger RNA During Ovarian Development in the Female Blue Crab, Callinectes sapidus: Gene Expression, Synthesis, Transport, and Cleavage Biol Reprod, July 1, 2007; 77(1): 138 - 146. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Dumay-Odelot, C. Marck, S. Durrieu-Gaillard, O. Lefebvre, S. Jourdain, M. Prochazkova, A. Pflieger, and M. Teichmann Identification, Molecular Cloning, and Characterization of the Sixth Subunit of Human Transcription Factor TFIIIC J. Biol. Chem., June 8, 2007; 282(23): 17179 - 17189. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bonner, C. Perrier, B. Corthesy, and S. J. Perkins Solution Structure of Human Secretory Component and Implications for Biological Function J. Biol. Chem., June 8, 2007; 282(23): 16969 - 16980. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ludtke, J. Buettner, W. Wu, A. Muchir, A. Schroeter, S. Zinn-Justin, S. Spuler, H. H.-J. Schmidt, and H. J. Worman Peroxisome Proliferator-Activated Receptor-{gamma} C190S Mutation Causes Partial Lipodystrophy J. Clin. Endocrinol. Metab., June 1, 2007; 92(6): 2248 - 2255. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. A. Laing, M. A. Wright, J. Cooney, and S. M. Bulley From the Cover: The missing step of the L-galactose pathway of ascorbate biosynthesis in plants, an L-galactose guanyltransferase, increases leaf ascorbate content PNAS, May 29, 2007; 104(22): 9534 - 9539. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wang, R. G. Kleespies, A. M. Huger, and J. A. Jehle The Genome of Gryllus bimaculatus Nudivirus Indicates an Ancient Diversification of Baculovirus-Related Nonoccluded Nudiviruses of Insects J. Virol., May 15, 2007; 81(10): 5395 - 5406. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Loprasert, W. Whangsuk, J. M. Dubbs, R. Sallabhan, K. Somsongkul, and S. Mongkolsuk HpdR Is a Transcriptional Activator of Sinorhizobium meliloti hpdA, Which Encodes a Herbicide-Targeted 4-Hydroxyphenylpyruvate Dioxygenase J. Bacteriol., May 1, 2007; 189(9): 3660 - 3664. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Moraleda-Munoz and L. J. Shimkets Lipolytic Enzymes in Myxococcus xanthus J. Bacteriol., April 15, 2007; 189(8): 3072 - 3080. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Spatz, L. Petherbridge, Y. Zhao, and V. Nair Comparative full-length sequence analysis of oncogenic and vaccine (Rispens) strains of Marek's disease virus J. Gen. Virol., April 1, 2007; 88(4): 1080 - 1096. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Richardt, D. Lang, R. Reski, W. Frank, and S. A. Rensing PlanTAPDB, a Phylogeny-Based Resource of Plant Transcription-Associated Proteins Plant Physiology, April 1, 2007; 143(4): 1452 - 1466. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Visick, T. M. O'Shea, A. H. Klein, K. Geszvain, and A. J. Wolfe The Sugar Phosphotransferase System of Vibrio fischeri Inhibits both Motility and Bioluminescence J. Bacteriol., March 15, 2007; 189(6): 2571 - 2574. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Fariselli, I. Rossi, E. Capriotti, and R. Casadio The WWWH of remote homolog detection: The state of the art Brief Bioinform, March 1, 2007; 8(2): 78 - 87. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Suzuki, H. H.-Y. Wang, and D. R. McCarty Repression of the LEAFY COTYLEDON 1/B3 Regulatory Network in Plant Embryo Development by VP1/ABSCISIC ACID INSENSITIVE 3-LIKE B3 Genes Plant Physiology, February 1, 2007; 143(2): 902 - 911. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zaros, J.-F. Briand, Y. Boulard, S. Labarre-Mariotte, M. C. Garcia-Lopez, P. Thuriaux, and F. Navarro Functional organization of the Rpb5 subunit shared by the three yeast RNA polymerases Nucleic Acids Res., January 28, 2007; 35(2): 634 - 647. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Wheeler, T. Barrett, D. A. Benson, S. H. Bryant, K. Canese, V. Chetvernin, D. M. Church, M. DiCuccio, R. Edgar, S. Federhen, et al. Database resources of the National Center for Biotechnology Information Nucleic Acids Res., January 12, 2007; 35(suppl_1): D5 - D12. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Eiamphungporn, N. Charoenlap, P. Vattanaviboon, and S. Mongkolsuk Agrobacterium tumefaciens soxR Is Involved in Superoxide Stress Protection and Also Directly Regulates Superoxide-Inducible Expression of Itself and a Target Gene J. Bacteriol., December 15, 2006; 188(24): 8669 - 8673. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Babu, L. M. Iyer, S. Balaji, and L. Aravind The natural history of the WRKY-GCM1 zinc fingers and the relationship between transcription factors and transposons Nucleic Acids Res., December 2, 2006; 34(22): 6505 - 6520. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. D. Silva, L. Shen, V. Tcherepanov, C. Watson, and C. Upton Predicted function of the vaccinia virus G5R protein Bioinformatics, December 1, 2006; 22(23): 2846 - 2850. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Katic and I. Greenwald EMB-4: A Predicted ATPase That Facilitates lin-12 Activity in Caenorhabditis elegans Genetics, December 1, 2006; 174(4): 1907 - 1915. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-K. Yu, E. M. Gertz, R. Agarwala, A. A. Schaffer, and S. F. Altschul Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches Nucleic Acids Res., November 6, 2006; 34(20): 5966 - 5973. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. R. DeBella, A. Hayashi, and L. S. Rose LET-711, the Caenorhabditis elegans NOT1 Ortholog, Is Required for Spindle Positioning and Regulation of Microtubule Length in Embryos Mol. Biol. Cell, November 1, 2006; 17(11): 4911 - 4924. [Abstract] [Full Text] [PDF] |




















