Nucleic Acids Research Advance Access originally published online on November 16, 2006
Nucleic Acids Research 2007 35(Database issue):D193-D197; doi:10.1093/nar/gkl929
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, Database issue D193-D197
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles |
The Universal Protein Resource (UniProt)
1 Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven St. NW, Suite 1200 Washington, DC 20007, USA 2 The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK 3 Swiss Institute of Bioinformatics, Centre Medical Universitaire 1 rue Michel Servet 1211 Geneva 4, Switzerland
To whom correspondence should be addressed to Rolf Apweiler. Tel: +44 1223 494435; Fax: +44 1223 494468; Email: rolf.apweiler{at}ebi.ac.uk
Received September 21, 2006. Accepted October 12, 2006.
| ABSTRACT |
|---|
|
|
|---|
The ability to store and interconnect all available information on proteins is crucial to modern biological research. Accordingly, the Universal Protein Resource (UniProt) plays an increasingly important role by providing a stable, comprehensive, freely accessible central resource on protein sequences and functional annotation. UniProt is produced by the UniProt Consortium, formed in 2002 by the European Bioinformatics Institute (EBI), the Protein Information Resource (PIR) and the Swiss Institute of Bioinformatics (SIB). The core activities include manual curation of protein sequences assisted by computational analysis, sequence archiving, development of a user-friendly UniProt web site and the provision of additional value-added information through cross-references to other databases. UniProt is comprised of three major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase and the UniProt Reference Clusters. An additional component consisting of metagenomic and environmental sequences has recently been added to UniProt to ensure availability of such sequences in a timely fashion. UniProt is updated and distributed on a bi-weekly basis and can be accessed online for searches or download at http://www.uniprot.org.
| INTRODUCTION |
|---|
|
|
|---|
High-throughput genome sequencing is producing a rapid and accelerating accumulation of predicted protein sequences for a large number of organisms. At the same time, protein functions are being analyzed using a wide range of approaches, ranging from traditional small-scale experiments to large-scale methods such as gene expression profiling, protein-protein interactions and structural genomics as well as in silico prediction of protein functions. To accommodate these data, various individual resources are available to the research community. However, there is a widely recognized need for a centralized repository of protein sequences with comprehensive coverage and a systematic approach to protein annotation, incorporating, integrating and standardizing data from these various sources.
UniProt is the central resource for storing and interconnecting information from large and disparate sources and the most comprehensive catalog of protein sequence and functional annotation. It has three components optimized for different uses. The UniProt Knowledgebase (UniProtKB) is an expertly curated database, a central access point for integrated protein information with cross-references to multiple sources. The UniProt Archive (UniParc) is a comprehensive sequence repository, reflecting the history of all protein sequences (1). UniProt Reference Clusters (UniRef) merge closely related sequences based on sequence identity to speed up searches. UniProt is built upon the extensive bioinformatics infrastructure and scientific expertise at European Bioinformatics Institute (EBI), Protein Information Resource (PIR) and Swiss Institute of Bioinformatics (SIB). It is freely and easily accessible by researchers to conduct interactive and custom-tailored analyses for proteins of interest to facilitate hypothesis generation and knowledge discovery.
| CONTENT |
|---|
|
|
|---|
UniProtKB
UniProtKB consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. The former contains manually annotated records with information extracted from literature and curator-evaluated computational analysis. To achieve accuracy, annotations are performed by biologists with specific expertise. Information including function, catalytic activity, subcellular location, disease, structure and post-translational modifications is annotated. An important part of the annotation process involves the merging of different reports for a single protein. After a careful inspection of the sequences, the annotator selects the reference sequence, does the corresponding merging and lists the splice and genetic variants along with disease information when available. Any discrepancies between the different sequence sources are also annotated. Cross-references are provided to the underlying nucleotide sequence sources as well as to many other useful databases including organism-specific, domain, family and disease databases. UniProtKB/TrEMBL contains high quality computationally analyzed records enriched with automatic annotation and classification. The computer-assisted annotation is created using automatically generated rules as in Spearmint (2) or manually curated rules based on protein families, including HAMAP family rules (3), RuleBase rules (4) and PIRSF classification-based name rules and site rules (5,6). UniProtKB/TrEMBL contains the translations of all coding sequences present in the EMBL/GenBank/DDBJ Nucleotide Sequence Databases, the sequences of PDB structures and data derived from amino acid sequences that are directly submitted to the UniProtKB or scanned from the literature. We exclude some types of data such as DDBJ/EMBL/DDBJ entries that encode small fragments, synthetic sequences, most non-germline immunoglobulins and T-cell receptors, most patent sequences and some highly over-represented data. Records are selected for full manual annotation and integration into UniProtKB/Swiss-Prot according to defined annotation priorities.
UniRef
The UniRef databases provide three clustered sets (UniRef100, 90 and 50) of sequences from UniProtKB and selected UniParc records in order to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences from view. The UniRef100 database combines identical sequences and sub-fragments with 11 or more residues into a single UniRef entry, which displays the sequence of a representative protein, with the accession numbers of all the UniProtKB entries within the cluster and links to the corresponding UniProtKB and UniParc records. UniRef90 and UniRef50 are built by further clustering UniRef100 sequences with 11 or more residues using the CD-HIT algorithm (7) such that each cluster is composed of sequences that have at least 90 or 50% sequence identity, respectively, to the representative sequence. Selection of the representative sequence in each UniRef cluster is based on the ranking of all the sequences in the cluster using the following criteria in descending precedence:
- Quality of the entry: member entries from UniProtKB/Swiss-Prot section are preferred.
- Meaningful name: entries with names that do not contain non-biological or non-descriptive words, such as hypothetical, probable, are preferred.
- Organism: entries from model organisms are preferred.
- Length of the sequence: longest sequence are preferred.
The UniRef databases are generated based on the UniProtKB and UniParc databases, thus providing up-to-date collections of sequences. UniRef100 is the most comprehensive and non-redundant protein sequence dataset. UniRef90 and UniRef50 yield a database size reduction of
40 and 65%, respectively, providing for significantly faster sequence similarity searches. In addition, UniRef databases reduce the bias in sequence searches by providing a more even sampling of sequence space.
UniParc
UniParc is the main sequence storehouse and is a comprehensive repository that reflects the history of all protein sequences (1). UniParc houses all new and revised protein sequences from various sources to ensure that complete coverage is available at a single site. It includes not only UniProtKB but also translations from the EMBL-Bank/DDBJ/GenBank Nucleotide Sequence Databases, the Ensembl database of animal genomes, the International Protein Index (IPI), the Protein Data Bank (PDB), NCBI's Reference Sequence Collection (RefSeq), model organism databases FlyBase and WormBase and protein sequences from the European, American and Japanese Patent Offices. To avoid redundancy, sequences are handled as stringsall sequences 100% identical over the entire length are merged, regardless of source organism. New and updated sequences are loaded on a daily basis, cross-referenced to the source database accession number and provided with a sequence version that increments upon changes to the underlying sequence. The basic information stored within each UniParc entry is the identifier, the sequence, cyclic redundancy check number, source database(s) with accession and version numbers and a time stamp. In addition, each source database accession number is tagged with its status in that database, indicating if the sequence still exists or has been deleted in the source database. UniParc records are designed to be without annotation since the annotation will be only true in the real biological context of the sequence: proteins with the same sequence may have different functions depending on species, tissue, developmental stage, etc.
The UniRef databases are generated based on the UniProtKB and UniParc databases, thus providing up-to-date collections of sequences. UniRef100 is the most comprehensive and non-redundant protein sequence dataset. UniRef90 and UniRef50 yield a database size reduction of
40 and 65%, respectively, providing for significantly faster sequence similarity searches. In addition, UniRef databases reduce the bias in sequence searches by providing a more even sampling of sequence space.
| NEW FEATURES |
|---|
|
|
|---|
UniSave
The UniProtKB Sequence/Annotation Version database (UniSave) is a comprehensive archive of UniProtKB entry versions (8). All changed UniProtKB/Swiss-Prot and UniProt/TrEMBL entries are added to UniSave on a bi-weekly basis to coincide with the UniProtKB releases. UniSave is available at http://www.ebi.ac.uk/uniprot/unisave.
ID mapping
UniProt provides a mapping service to convert common gene IDs and protein IDs to UniProtKB AC/ID and vice versa. Mappings are either inherited from cross-references within UniProtKB entries or are based on the existing mappings between EMBL and GenBank entries, while others make use of cross-references obtained from the iProClass database (9). This service is available at http://www.uniprot.org/search/idmapping.shtml, where users can map between UniProtKB and >30 other data sources such as NCBI (e.g. gi numbers, RefSeq accession numbers, Entrez Gene IDs, PubMed IDs), GO (www.geneontology.org/), PFAM (www.sanger.ac.uk/Software/Pfam/) and PIRSF (pir.georgetown.edu/pirsf.shtml). In addition, users can also download selected mappings in the form of a tab-delimited table from ftp://ftp.pir.georgetown.edu/databases/iproclass/.
Format changes
Recent format changes
A number of UniProtKB format changes have recently been introduced to improve data consistency:
- The DT line (DaTe) changed from showing only the dates corresponding to full UniProtKB releases to displaying the date of the bi-weekly release at which an entry is integrated or updated. The information concerning the release number has been dropped and the entry and sequence version numbers in the DT lines were introduced instead. The sequence version number of an entry is incremented by one when its amino acid sequence is modified, whereas the entry version number is incremented by one whenever any data in the flat file representation of the entry is modified.
- A new line type has been introduced to viral entries to indicate the host(s) either as a specific organism or taxonomic group of organisms. This line has been termed OH for Organism Host and contains the host name and taxonomy ID.
- The CC line (Comment) topic DATABASE has been replaced by WEB RESOURCE to clarify the conceptual difference between the content of these lines and the DR (Database cross-Reference) lines.
- Pre-translational events have so far been represented by several feature keys. To improve the consistency of annotation of pre- and co-translational events, the feature key VARSPLIC was removed and the new feature key VAR_SEQ created for the description of alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting.
Forthcoming format changes
- The format of the ID line will be changed to better reflect the annotation status of an entry. The current STANDARD and PRELIMINARY data classes will be replaced by Reviewed (entries that have been manually reviewed and annotated by UniProtKB curators) and Unreviewed (computer-annotated entries that have not been reviewed by UniProtKB curators), respectively. In addition, the MoleculeType field, which is a legacy of compatibility with the EMBL flat file format, will be dropped.
- Since in most cases protein sequences are derived from translation of nucleotide sequences and there may or may not be definitive experimental evidence for their existence, a new line type will be introduced to indicate the evidence for the existence of a protein (PE line). The PE line will have one of the following values: evidence at protein level, evidence at transcript level, inferred from homology or predicted. Unreviewed entries will have an additional value, unassigned. However, it should be noted that the PE line will not give information on the correctness of the sequence.
- As mentioned before, the feature key INIT_MET is only used to indicate that the initiator methionine has been cleaved off. Currently, the initiator methionine is not included in the sequence of a UniProtKB entry in such a case and the INIT_MET sequence coordinates are therefore 0. The initiator methionine will be added back to such protein sequences and the sequence coordinates of the feature key INIT_MET accordingly changed to 1.
- The FASTA header line of UniProtKB and UniRef entries will be standardized. In the former case the format will consist of >UniqueIdentifier|EntryName ProteinNameOrganismName, whereas in the latter it will follow the following format: >UniqueIdentifier Cluster: ClusterName; n = Members; Taxon|Rep: ProteinNameOrganismName.
- For UniProtKB, the UniqueIdentifier is the primary accession number of the UniProtKB entry or, in the case of entries that describe several protein isoforms, an isoform identifier; EntryName is the entry name of the UniProtKB entry; ProteinName is the recommended or submitted protein name of the UniProtKB entry (this is the name before the first bracket, excluding precursor but including Fragment if appropriate); OrganismName is the scientific name of the organism of the UniProtKB entry. Examples: >P24856|ANP_NOTCO Ice-structuring glycoprotein (Fragment)Notothenia coriiceps neglecta >P51650-1|SSDH_RAT Succinate semialdehyde dehydrogenaseRattus norvegicus
- For UniRef, the UniqueIdentifier is the primary accession number of the UniRef cluster; ClusterName is the name of the UniRef cluster; Members is the number of UniRef cluster members; Taxon is the scientific name of the lowest common taxon shared by all UniRef cluster members; ProteinName is the protein name of the representative member of the UniRef cluster; OrganismName is the scientific name of the organism of the representative member of the UniRef cluster. Example: >UniRef50_P24856 Cluster: Ice-structuring glycoprotein (Fragment); n = 15; Holacanthopterygii|Rep: Ice-structuring glycoprotein (Fragment)Notothenia coriiceps neglecta
- UniParc is not represented here as its header is purely the UniParc accession number.
- For UniProtKB, the UniqueIdentifier is the primary accession number of the UniProtKB entry or, in the case of entries that describe several protein isoforms, an isoform identifier; EntryName is the entry name of the UniProtKB entry; ProteinName is the recommended or submitted protein name of the UniProtKB entry (this is the name before the first bracket, excluding precursor but including Fragment if appropriate); OrganismName is the scientific name of the organism of the UniProtKB entry. Examples: >P24856|ANP_NOTCO Ice-structuring glycoprotein (Fragment)Notothenia coriiceps neglecta >P51650-1|SSDH_RAT Succinate semialdehyde dehydrogenaseRattus norvegicus
| RECENT CHANGES |
|---|
|
|
|---|
New documents
A number of documents, available both by ftp and on the Web site, have been added. The document nameprot.txt lists a number of rules for naming proteins to standardize the nomenclature for a given protein across related organisms, with the hope that authors/laboratories will follow as much as possible these rules for naming new proteins. The document orysa.txt lists all the Oryza sativa (rice) entries of the UniProtKB/Swiss-Prot section with the corresponding chromosome locus, the UniProtKB accession number, entry name, the description and the gene name(s). The document scorpktx.txt lists the potassium-channel-specific scorpion toxins known to date (10,11) along with the UniProtKB accession number, the entry name and the systematic name. Finally, the document ptmlist.txt contains the controlled vocabulary and associated feature keys for post-translational modifications.
Metagenomic and environmental sequences
Swiss-Prot and TrEMBL sections of the UniProtKB contain entries with a known taxonomic source. However, a new development in sequence productionnamely, the availability of metagenomic datahas necessitated the creation of a separate section, UniProt Metagenomic and Environmental Sequences (UniMES).
| UPCOMING DEVELOPMENTS |
|---|
|
|
|---|
Annotation of UniProtKB/TrEMBL entries
To improve the quality of the protein names in UniProtKB/TrEMBL, an effort is underway to start manually curating protein names using protein PIRSFs and their accompanying name rules and/or site rules (5,6). It is important to note that all the protein names will be checked manually prior to updating the individual UniProtKB/TrEMBL record. In the context of the Human Proteome Initiative annotation program, the protein names and gene symbols of representative human entries in UniProtKB/TrEMBL will be updated so as to provide the community with a cleaner human proteome set that encompasses the 15 000 reviewed human entries in UniProtKB/Swiss-Prot and about 5000 as yet unreviewed human UniProtKB/TrEMBL entries.
| DATABASE ACCESS AND FEEDBACK |
|---|
|
|
|---|
UniProt is freely available for both commercial and non-commercial use. Please see http://www.uniprot.org/terms for details. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published on a bi-weekly basis.
| ACKNOWLEDGEMENTS |
|---|
UniProt is mainly supported by the National Institutes of Health (NIH) grant 1 U01 HG02712-01. Additional support for the EBI's involvement in UniProt comes from the European Commission (EC)'s FELICS grant (021902RII3) and from the NIH grant 1R01HGO2273-01. UniProtKB/Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science. PIR activities are also supported by the NIH grants for NIAID proteomic resource (HHSN266200400061C) and grid enablement (NCI-caBIGICR-10-10-01) and BioThesaurus (ITR-0205470). Funding to pay the Open Access publication charges for this article was provided by the UniProt NIH grant 1 U01 HG02712-01.
Conflict of interest statement. None declared.
| Footnotes |
|---|
The UniProt Consortium: Amos Bairoch, Lydie Bougueleret, Severine Altairac, Valeria Amendolia, Andrea Auchincloss, Ghislaine Argoud Puy, Kristian Axelsen, Delphine Baratin, Marie-Claude Blatter, Brigitte Boeckmann, Laurent Bollondi, Emmanuel Boutet, Silvia Braconi Quintaje, Lionel Breuza, Alan Bridge, Edouard deCastro, Danielle Coral, Elisabeth Coudert, Isabelle Cusin, Pavel Dobrokhotov, Dolnide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Sebastian Gehant, Nathalie Farriol-Mathis, Serenella Ferro, Elisabeth Gasteiger, Alain Gateau, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, Vassilios Ioannidis, Ivan Ivanyi, Janet James, Eric Jain, Silvia Jimenez, Florence Jungo, Vivien Junker, Guillaume Keller, Corinne Lachaize, Lydie Lane-Guermonprez, Petra Langendijk-Genevaux, Vicente Lara, Philippe Lemercier, Virginie Le Saux, Damien Lieberherr, Tania de Oliveira Lima, Veronique Mangold, Xavier Martin, Karine Michoud, Madelaine Moinat, Cristiano Moreira, Anne Morgat, Marisa Nicolas, Shoko Ohji, Salvo Paesano, Ivo Pedruzzi, David Perret, Isabelle Phan, Sandrine Pilbout, Violaine Pillet, Sylvain Poux, Nicole Redaschi, Sorogini Reynaud, Catherine Rivoire, Bernd Roechert, Claudia Sapsezian, Michel Schneider, Christian Sigrist, Mauricio da Silva, Karin Sonesson, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue, Anne-Lise Veuthey, Claudia Vitorello and Lina Yip at the Swiss Institute of Bioinformatics (SIB) and the Medical Biochemistry Department of the University of Geneva; Rolf Apweiler, Yasmin Alam-Faruque, Daniel Barrell, Lawrence Bower, Paul Browne, Wei Mun Chan, Louise Daugherty, Emilio Salazar Donate, Ruth Eberhardt, Alexander Fedotov, Rebecca Foulger, Gill Fraser, Gabriella Frigerio, John Garavelli, Renato Golin, Alan Horne, Julius Jacobsen, Michael Kleen, Paul Kersey, Ernst Kretschmann, Kati Laiho, Rasko Leinonen, Duncan Legge, Michele Magrane, Maria Jesus Martin, Patricia Monteiro, Claire O'Donovan, Sandra Orchard, John O'Rourke, Samuel Patient, Manuela Pruess, Andrey Sitnov, Nataliya Sklyar, Eleanor Whitfield, Daniela Wieser, Quan Lin, Mark Rynbeek, Giuseppe di Martino, Mike Donnelly and Pieter van Rensburg at the European Bioinformatics Institute (EBI). Cathy Wu, Cecilia Arighi, Leslie Arminski, Winona Barker, Yongxing Chen, Sehee Chung, Christina Fang, Vincent Hermoso, Zhang-Zhi Hu, Hsing-Kuo Hua, Hongzhan Huang, Robel Kahsay, Raja Mazumder, Peter McGarvey, Darren Natale, Anastasia Nikolskaya, Natalia Petrova, Baris Suzek, Sona Vasudevan, C. R. Vinayaka, Lai Su Yeh, Xin Yuan and Jian Zhang at the Protein Information Resource (PIR)
The authors wish it to be know that, in their opinion, all authors should be regarded as joint First Authors
| REFERENCES |
|---|
|
|
|---|
- Leinonen, R., Diez, F.G., Binns, D., Fleischmann, W., Lopez, R., Apweiler, R. (2004) UniProt archive Bioinformatics, 20, 32363237
[Abstract/Free Full Text] . - Wieser, D., Kretschmann, E., Apweiler, R. (2004) Filtering erroneous protein annotation Bioinformatics, 20, i342i347[Abstract] .
- Gattiker, A., Michoud, K., Rivoire, C., Auchincloss, A.H., Coudert, E., Lima, T., Kersey, P., Pagni, M., Sigrist, C.J., Lachaize, C., et al. (2003) Automated annotation of microbial proteomes in SWISS-PROT Comput. Biol. Chem, . 27, 4958[CrossRef][Web of Science][Medline] .
- Fleischmann, W., Moller, S., Gateau, A., Apweiler, R. (1999) A novel method for automatic functional annotation of proteins Bioinformatics, 15, 228233
[Abstract/Free Full Text] . - Wu, C.H., Nikolskaya, A., Huang, H., Yeh, L.S., Natale, D.A., Vinayaka, C.R., Hu, Z.Z., Mazumder, R., Kumar, S., Kourtesis, P., et al. (2004) PIRSF: family classification system at the protein information resource Nucleic Acids Res, . 32, D112D114
[Abstract/Free Full Text] . - Natale, D.A., Vinayaka, C.R., Wu, C.H. (2004) Large-scale, classification-driven, rule-based functional annotation of proteins In Subramaniam, S. (Ed.). Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. Bioinformatics Volume, John Wiley & Sons, Ltd .
- Li, W., Jaroszewski, L., Godzik, A. (2001) Clustering of highly homologous sequences to reduce the size of large protein databases Bioinformatics, 17, 282283
[Abstract/Free Full Text] . - Leinonen, R., Nardone, F., Zhu, W., Apweiler, R. (2006) UniSave: the UniProtKB sequence/annotation version database Bioinformatics, 22, 12841285
[Abstract/Free Full Text] . - Wu, C.H., Huang, H., Nikolskaya, A., Hu, Z., Barker, W.C. (2004) The iProClass integrated database for protein functional analysis Comput. Biol. Chem, . 28, 8796[CrossRef][Web of Science][Medline] .
- Tytgat, J., Chandy, K.G., Garcia, M.L., Gutman, G.A., Martin-Eauclaire, M.F., van der Walt, J.J., Possani, L.D. (1999) A unified nomenclature for short-chain peptides isolated from scorpion venoms: alpha-KTx molecular subfamilies Trends Pharmacol. Sci, . 20, 444447[CrossRef][Medline] .
- Rodriguez de la Vega, R.C. and Possani, L.D. (2004) Current views on scorpion toxins specific for K+-channels Toxicon, 43, 865875[Medline]
.
This article has been cited by other articles:
![]() |
M. J. Saller, F. Fusetti, and A. J. M. Driessen Bacillus subtilis SpoIIIJ and YqjG Function in Membrane Protein Biogenesis J. Bacteriol., November 1, 2009; 191(21): 6749 - 6757. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Mahon, A. J. O'Donoghue, D. H. Goetz, P. G. Murray, C. S. Craik, and M. G. Tuohy Characterization of a multimeric, eukaryotic prolyl aminopeptidase: an inducible and highly specific intracellular peptidase from the non-pathogenic fungus Talaromyces emersonii Microbiology, November 1, 2009; 155(11): 3673 - 3682. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. N. Magnan, A. Randall, and P. Baldi SOLpro: accurate sequence-based prediction of protein solubility Bioinformatics, September 1, 2009; 25(17): 2200 - 2207. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Tanaka, K. O. Koyanagi, and T. Itoh Highly Diversified Molecular Evolution of Downstream Transcription Start Sites in Rice and Arabidopsis Plant Physiology, March 1, 2009; 149(3): 1316 - 1324. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. Diz, E. Dudley, B. W. MacDonald, B. Pina, E. L. R. Kenchington, E. Zouros, and D. O. F. Skibinski Genetic Variation Underlying Protein Expression in Eggs of the Marine Mussel Mytilus edulis Mol. Cell. Proteomics, January 1, 2009; 8(1): 132 - 144. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Hammami, J. Ben Hamida, G. Vergoten, and I. Fliss PhytAMP: a database dedicated to antimicrobial plant peptides Nucleic Acids Res., January 1, 2009; 37(suppl_1): D963 - D968. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Fey, P. Gaudet, T. Curk, B. Zupan, E. M. Just, S. Basu, S. N. Merchant, Y. A. Bushmanova, G. Shaulsky, W. A. Kibbe, et al. dictyBase--a Dictyostelium bioinformatics resource update Nucleic Acids Res., January 1, 2009; 37(suppl_1): D515 - D519. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. M. Berman, J. D. Westbrook, M. J. Gabanyi, W. Tao, R. Shah, A. Kouranov, T. Schwede, K. Arnold, F. Kiefer, L. Bordoli, et al. The protein structure initiative structural genomics knowledgebase Nucleic Acids Res., January 1, 2009; 37(suppl_1): D365 - D368. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Maruyama, A. Wakamatsu, Y. Kawamura, K. Kimura, J.-i. Yamamoto, T. Nishikawa, Y. Kisu, S. Sugano, N. Goshima, T. Isogai, et al. Human Gene and Protein Database (HGPD): a novel database presenting a large quantity of experiment-based results in human proteomics Nucleic Acids Res., January 1, 2009; 37(suppl_1): D762 - D766. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Hirakawa, A. Ochi, Y. Kawahara, S. Kawamura, T. Torikata, and S. Kuhara Catalytic Reaction Mechanism of Goose Egg-white Lysozyme by Molecular Modelling of Enzyme-Substrate Complex J. Biochem., December 1, 2008; 144(6): 753 - 761. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Stockinger, T. Attwood, S. N. Chohan, R. Cote, P. Cudre-Mauroux, L. Falquet, P. Fernandes, R. D. Finn, T. Hupponen, E. Korpelainen, et al. Experience using web services for biological sequence analysis Brief Bioinform, November 1, 2008; 9(6): 493 - 505. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-j. Kim and D. Rebholz-Schuhmann Categorization of services for seeking information in biomedical literature: a typology for improvement of practice Brief Bioinform, November 1, 2008; 9(6): 452 - 465. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Morcos, C. Lamanna, M. Sikora, and J. Izaguirre Cytoprophet: a Cytoscape plug-in for protein and domain interaction networks inference Bioinformatics, October 1, 2008; 24(19): 2265 - 2266. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hofmann-Apitius, J. Fluck, L. Furlong, O. Fornes, C. Kolarik, S. Hanser, M. Boeker, S. Schulz, F. Sanz, R. Klinger, et al. Knowledge environments representing molecular entities for the virtual physiological human Phil Trans R Soc A, September 13, 2008; 366(1878): 3091 - 3110. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. C. Jones, J. Deck, R. D. Edmondson, and M. E. Hart Relative Quantitative Comparisons of the Extracellular Protein Profiles of Staphylococcus aureus UAMS-1 and Its sarA, agr, and sarA agr Regulatory Mutants Using One-Dimensional Polyacrylamide Gel Electrophoresis and Nanocapillary Liquid Chromatography Coupled with Tandem Mass Spectrometry J. Bacteriol., August 1, 2008; 190(15): 5265 - 5278. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Quintaje and S. Orchard The Annotation of Both Human and Mouse Kinomes in UniProtKB/Swiss-Prot: One Small Step in Manual Annotation, One Giant Leap for Full Comprehension of Genomes Mol. Cell. Proteomics, August 1, 2008; 7(8): 1409 - 1419. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Ferrada and A. Wagner Protein robustness promotes evolutionary innovations on large evolutionary time-scales Proc R Soc B, July 22, 2008; 275(1643): 1595 - 1602. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wong and M. A. Ragan MACHOS: Markov clusters of homologous subsequences Bioinformatics, July 1, 2008; 24(13): i77 - i85. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Feldhahn, P. Thiel, M. M. Schuler, N. Hillen, S. Stevanovic, H.-G. Rammensee, and O. Kohlbacher EpiToolKit--a web server for computational immunomics Nucleic Acids Res., July 1, 2008; 36(suppl_2): W519 - W522. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Cole, J. D. Barber, and G. J. Barton The Jpred 3 secondary structure prediction server Nucleic Acids Res., July 1, 2008; 36(suppl_2): W197 - W201. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Waegele, T. Schmidt, H. W. Mewes, and A. Ruepp OREST: the online resource for EST analysis Nucleic Acids Res., July 1, 2008; 36(suppl_2): W140 - W144. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-Y. Chien, D. T.-H. Chang, C.-Y. Chen, Y.-Z. Weng, and C.-M. Hsu E1DS: catalytic site prediction based on 1D signatures of concurrent conservation Nucleic Acids Res., July 1, 2008; 36(suppl_2): W291 - W296. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Romero-Zaliz, C. del Val, J. P. Cobb, and I. Zwir Onto-CC: a web server for identifying Gene Ontology conceptual clusters Nucleic Acids Res., July 1, 2008; 36(suppl_2): W352 - W357. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Nagase, H. Yamakawa, S. Tadokoro, D. Nakajima, S. Inoue, K. Yamaguchi, Y. Itokawa, R. F. Kikuno, H. Koga, and O. Ohara Exploration of Human ORFeome: High-Throughput Preparation of ORF Clones and Efficient Characterization of Their Protein Products DNA Res, June 1, 2008; 15(3): 137 - 149. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Lustgarten, C. Kimmel, H. Ryberg, and W. Hogan EPO-KB: a searchable knowledge base of biomarker to protein links Bioinformatics, June 1, 2008; 24(11): 1418 - 1419. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W. Lee, J. K. Kim, S. Lee, S. Choi, S. Kim, and I. Hwang Arabidopsis Nuclear-Encoded Plastid Transit Peptides Contain Multiple Sequence Subgroups with Distinctive Chloroplast-Targeting Sequence Motifs PLANT CELL, June 1, 2008; 20(6): 1603 - 1622. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bernsel, H. Viklund, J. Falk, E. Lindahl, G. von Heijne, and A. Elofsson Prediction of membrane-protein topology from first principles PNAS, May 20, 2008; 105(20): 7177 - 7181. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. C. Sutcliffe, G. W. Black, and D. J. Harrington Bioinformatic insights into the biosynthesis of the Group B carbohydrate in Streptococcus agalactiae Microbiology, May 1, 2008; 154(5): 1354 - 1363. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Sczyrba, S. Konermann, and R. Giegerich Two interactive Bioinformatics courses at the Bielefeld University Bioinformatics Server Brief Bioinform, May 1, 2008; 9(3): 243 - 249. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Sammut, R. D. Finn, and A. Bateman Pfam 10 years on: 10 000 families and still growing Brief Bioinform, May 1, 2008; 9(3): 210 - 219. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Yoshida, K. Nagasaki, Y. Takashima, Y. Shirai, Y. Tomaru, Y. Takao, S. Sakamoto, S. Hiroishi, and H. Ogata Ma-LMM01 Infecting Toxic Microcystis aeruginosa Illuminates Diverse Cyanophage Genome Strategies J. Bacteriol., March 1, 2008; 190(5): 1762 - 1772. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Gausdal, B. T. Gjertsen, E. McCormack, P. Van Damme, R. Hovland, C. Krakstad, O. Bruserud, K. Gevaert, J. Vandekerckhove, and S. O. Doskeland Abolition of stress-induced protein synthesis sensitizes leukemia cells to anthracycline-induced death Blood, March 1, 2008; 111(5): 2866 - 2877. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Michael, G. Trave, C. Ramu, C. Chica, and T. J. Gibson Discovery of candidate KEN-box motifs using Cell Cycle keyword enrichment combined with native disorder prediction and motif conservation Bioinformatics, February 15, 2008; 24(4): 453 - 457. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bashton, I. Nobeli, and J. M. Thornton PROCOGNATE: a cognate ligand domain mapping for enzymes Nucleic Acids Res., January 11, 2008; 36(suppl_1): D618 - D622. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Caboche, M. Pupin, V. Leclere, A. Fontaine, P. Jacques, and G. Kucherov NORINE: a database of nonribosomal peptides Nucleic Acids Res., January 11, 2008; 36(suppl_1): D326 - D331. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Degtyarenko, P. de Matos, M. Ennis, J. Hastings, M. Zbinden, A. McNaught, R. Alcantara, M. Darsow, M. Guedj, and M. Ashburner ChEBI: a database and ontology for chemical entities of biological interest Nucleic Acids Res., January 11, 2008; 36(suppl_1): D344 - D350. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wanchana, S. Thongjuea, V. J. Ulat, M. Anacleto, R. Mauleon, M. Conte, M. Rouard, M. Ruiz, N. Krishnamurthy, K. Sjolander, et al. The Generation Challenge Programme comparative plant stress-responsive gene catalogue Nucleic Acids Res., January 11, 2008; 36(suppl_1): D943 - D946. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. M. Markowitz, E. Szeto, K. Palaniappan, Y. Grechkin, K. Chu, I-M. A. Chen, I. Dubchak, I. Anderson, A. Lykidis, K. Mavromatis, et al. The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions Nucleic Acids Res., January 11, 2008; 36(suppl_1): D528 - D533. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. P. Gauthier, M. E. Larsen, R. Wernersson, U. de Lichtenberg, L. J. Jensen, S. Brunak, and T. S. Jensen Cyclebase.org a comprehensive multi-organism online database of cell-cycle experiments Nucleic Acids Res., January 11, 2008; 36(suppl_1): D854 - D859. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Gunther, M. Kuhn, M. Dunkel, M. Campillos, C. Senger, E. Petsalaki, J. Ahmed, E. G. Urdiales, A. Gewiess, L. J. Jensen, et al. SuperTarget and Matador: resources for exploring drug-target relationships Nucleic Acids Res., January 11, 2008; 36(suppl_1): D919 - D922. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. V. Kriventseva, N. Rahman, O. Espinosa, and E. M. Zdobnov OrthoDB: the hierarchical catalog of eukaryotic orthologs Nucleic Acids Res., January 11, 2008; 36(suppl_1): D271 - D275. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Saunders, S. Lyon, M. Day, B. Riley, E. Chenette, and S. Subramaniam The Molecule Pages database Nucleic Acids Res., January 11, 2008; 36(suppl_1): D700 - D706. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ruepp, B. Brauner, I. Dunger-Kaltenbach, G. Frishman, C. Montrone, M. Stransky, B. Waegele, T. Schmidt, O. N. Doudieu, V. Stumpflen, et al. CORUM: the comprehensive resource of mammalian protein complexes Nucleic Acids Res., January 11, 2008; 36(suppl_1): D646 - D650. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. Bruford, M. J. Lush, M. W. Wright, T. P. Sneddon, S. Povey, and E. Birney The HGNC Database in 2008: a resource for the human genome Nucleic Acids Res., January 11, 2008; 36(suppl_1): D445 - D448. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Okuno, A. Tamon, H. Yabuuchi, S. Niijima, Y. Minowa, K. Tonomura, R. Kunimoto, and C. Feng GLIDA: GPCR ligand database for chemical genomics drug discovery database and tools update Nucleic Acids Res., January 11, 2008; 36(suppl_1): D907 - D912. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Andreeva, D. Howorth, J.-M. Chandonia, S. E. Brenner, T. J. P. Hubbard, C. Chothia, and A. G. Murzin Data growth and its impact on the SCOP database: new developments Nucleic Acids Res., January 11, 2008; 36(suppl_1): D419 - D425. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Pagel, M. Oesterheld, O. Tovstukhina, N. Strack, V. Stumpflen, and D. Frishman DIMA 2.0 predicted and known domain interactions Nucleic Acids Res., January 11, 2008; 36(suppl_1): D651 - D655. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gracy, D. Le-Nguyen, J.-C. Gelly, Q. Kaas, A. Heitz, and L. Chiche KNOTTIN: the knottin or inhibitor cystine knot scaffold in 2007 Nucleic Acids Res., January 11, 2008; 36(suppl_1): D314 - D319. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Yeats, J. Lees, A. Reid, P. Kellam, N. Martin, X. Liu, and C. Orengo Gene3D: comprehensive structural and functional annotation of genomes Nucleic Acids Res., January 11, 2008; 36(suppl_1): D414 - D418. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Lechat, L. Hummel, S. Rousseau, and I. Moszer GenoList: an integrated environment for comparative analysis of microbial genomes Nucleic Acids Res., January 11, 2008; 36(suppl_1): D469 - D474. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Finn, J. Tate, J. Mistry, P. C. Coggill, S. J. Sammut, H.-R. Hotz, G. Ceric, K. Forslund, S. R. Eddy, E. L. L. Sonnhammer, et al. The Pfam protein families database Nucleic Acids Res., January 11, 2008; 36(suppl_1): D281 - D288. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Duvick, A. Fu, U. Muppirala, M. Sabharwal, M. D. Wilkerson, C. J. Lawrence, C. Lushbough, and V. Brendel PlantGDB: a resource for comparative plant genomics Nucleic Acids Res., January 11, 2008; 36(suppl_1): D959 - D965. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Henrick, Z. Feng, W. F. Bluhm, D. Dimitropoulos, J. F. Doreleijers, S. Dutta, J. L. Flippen-Anderson, J. Ionides, C. Kamada, E. Krissinel, et al. Remediation of the protein data bank archive Nucleic Acids Res., January 11, 2008; 36(suppl_1): D426 - D433. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Karolchik, R. M. Kuhn, R. Baertsch, G. P. Barber, H. Clawson, M. Diekhans, B. Giardine, R. A. Harte, A. S. Hinrichs, F. Hsu, et al. The UCSC Genome Browser Database: 2008 update Nucleic Acids Res., January 11, 2008; 36(suppl_1): D773 - D779. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Reumers, L. Conde, I. Medina, S. Maurer-Stroh, J. Van Durme, J. Dopazo, F. Rousseau, and J. Schymkowitz Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases Nucleic Acids Res., January 11, 2008; 36(suppl_1): D825 - D829. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Pedroso, G. Rivera, F. Lazo, M. Chacon, F. Ossandon, F. A. Veloso, and D. S. Holmes AlterORF: a database of alternate open reading frames Nucleic Acids Res., January 11, 2008; 36(suppl_1): D517 - D518. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Castellano, V. N. Gladyshev, R. Guigo, and M. J. Berry SelenoDB 1.0 : a database of selenoprotein genes, proteins and SECIS elements Nucleic Acids Res., January 11, 2008; 36(suppl_1): D332 - D338. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Tress, J.-J. Wesselink, A. Frankish, G. Lopez, N. Goldman, A. Loytynoja, T. Massingham, F. Pardi, S. Whelan, J. Harrow, et al. Determination and validation of principal gene products Bioinformatics, January 1, 2008; 24(1): 11 - 17. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. F. Schwarz, O. Hadicke, J. Erdmann, A. Ziegler, D. Bayer, and S. Moller SNPtoGO: characterizing SNPs by enriched GO terms Bioinformatics, January 1, 2008; 24(1): 146 - 148. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J.K. Wee, T. W. Tan, and S. Ranganathan CASVM: web server for SVM-based prediction of caspase substrates cleavage sites Bioinformatics, December 1, 2007; 23(23): 3241 - 3243. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K Lewis, J. L Farmer, R. C Burghardt, G. R Newton, G. A Johnson, D. L Adelson, F. W Bazer, and T. E Spencer Galectin 15 (LGALS15): A Gene Uniquely Expressed in the Uteri of Sheep and Goats that Functions in Trophoblast Attachment Biol Reprod, December 1, 2007; 77(6): 1027 - 1036. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. G Lemay, A. M Zivkovic, and J B. German Building the bridges to bioinformatics in nutrition research Am. J. Clinical Nutrition, November 1, 2007; 86(5): 1261 - 1269. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Cattley and J. W. Arthur BioManager: the use of a bioinformatics web application as a teaching tool in undergraduate bioinformatics training Brief Bioinform, November 1, 2007; 8(6): 457 - 465. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||















