Nucleic Acids Research Advance Access published online on November 11, 2009
Nucleic Acids Research, doi:10.1093/nar/gkp949
Database Issue |
SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
1Technische Universität München, Department of Genome Oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, Freising, Germany, 2Bioinformatics Department, Centro de Investigación Príncipe Felipe, Valencia, Spain and 3Institute for Bioinformatics and Systems Biology (MIPS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg, Germany
*To whom correspondence should be addressed. Tel: +49 8161 712136; Fax: +49 8161 712186; Email: t.rattei{at}wzw.tum.de
Received September 15, 2009. Revised October 10, 2009. Accepted October 12, 2009.
The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).