Skip Navigation


Nucleic Acids Research Advance Access originally published online on October 4, 2008
Nucleic Acids Research 2009 37(Database issue):D169-D174; doi:10.1093/nar/gkn664
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (91K) Freely available
Right arrow Screen PDF (101K) Freely available
Right arrowOA All Versions of this Article:
37/suppl_1/D169    most recent
gkn664v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2009, Vol. 37, Database issue D169-D174
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article appears in the following Nucleic Acids Research issue: Database issue [View the issue table of contents]

Articles

The Universal Protein Resource (UniProt) 2009

The UniProt Consortium1,2,3,*

1The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, 2Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven St NW, Suite 1200, Washington, DC 20007, USA and 3Swiss Institute of Bioinformatics, Centre Medical Universitaire 1 rue Michel Servet, 1211 Geneva 4, Switzerland

*To whom correspondence should be addressed. Tel: +44 1223 494435; Fax: +44 1223 494468; Email: apweiler{at}ebi.ac.uk

Received September 15, 2008. Accepted September 19, 2008.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT
 NEW DEVELOPMENTS
 DATABASE ACCESS AND FEEDBACK
 FUNDING
 APPENDIX
 REFERENCES
 
The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information that is essential for modern biological research. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute, the Protein Information Resource and the Swiss Institute of Bioinformatics. The core activities include manual curation of protein sequences assisted by computational analysis, sequence archiving, a user-friendly UniProt website and the provision of additional value-added information through cross-references to other databases. UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. One of the key achievements of the UniProt consortium in 2008 is the completion of the first draft of the complete human proteome in UniProtKB/Swiss-Prot. This manually annotated representation of all currently known human protein-coding genes was made available in UniProt release 14.0 with 20 325 entries. UniProt is updated and distributed every three weeks and can be accessed online for searches or downloaded at www.uniprot.org.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT
 NEW DEVELOPMENTS
 DATABASE ACCESS AND FEEDBACK
 FUNDING
 APPENDIX
 REFERENCES
 
High-throughput genome sequencing and the exponential growth of proteomics data is providing a rapid and accelerating accumulation of predicted protein sequences and associated data for a large number of organisms. There is a widely recognized need for a centralized repository of protein sequences with comprehensive coverage and a systematic approach to protein annotation, incorporation, integration and standardization of data from these various sources and UniProt strives to provide this.

UniProt is the central resource for storing and interconnecting information from large and disparate sources, and the most comprehensive catalogue of protein sequence and functional annotation. It has four components optimized for different uses. The UniProt Knowledgebase (UniProtKB) is an expertly curated database, a central access point for integrated protein information with cross-references to multiple sources. The UniProt Archive (UniParc) is a comprehensive sequence repository, reflecting the history of all protein sequences (1). UniProt Reference Clusters (UniRef) merge closely related sequences based on sequence identity to speed up searches while the UniProt Metagenomic and Environmental Sequences database (UniMES) was created to respond to the expanding area of metagenomic data. UniProt is freely and easily accessible by researchers to conduct interactive and custom-tailored analyses for proteins of interest to facilitate hypothesis generation and knowledge discovery.


    CONTENT
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT
 NEW DEVELOPMENTS
 DATABASE ACCESS AND FEEDBACK
 FUNDING
 APPENDIX
 REFERENCES
 
The UniProt Knowledgebase (UniProtKB)
UniProtKB consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. The former contains manually annotated records with information extracted from literature and curator-evaluated computational analysis. Biologists with specific expertise do the annotation to achieve accuracy. In UniProtKB, annotation consists of the description of the following: function(s), enzyme-specific information, biologically relevant domains and sites, post-translational modifications, subcellular location(s), tissue specificity, developmental specific expression, structure, interactions, splice isoform(s), associated diseases or deficiencies or abnormalities, etc. Another important part of the annotation process involves the merging of different reports for a single protein. After a careful inspection of the sequences, the curator selects the reference sequence, does the corresponding merging, and lists the splice and genetic variants along with disease information when available. Any discrepancies between the different sequence sources are also annotated. Cross-references are provided to the underlying nucleotide sequence sources as well as many other useful databases including organism-specific, domain, family and disease databases. UniProtKB/TrEMBL contains high quality computationally analysed records enriched with automatic annotation and classification. The computer-assisted annotation is created using automatically generated rules as in Spearmint (2), or manually curated rules based on protein families, including HAMAP family rules (3), RuleBase rules (4) and PIRSF classification-based name rules and site rules (5,6). UniProtKB/TrEMBL contains the translations of all coding sequences (CDS) present in the EMBL-Bank/GenBank/DDBJ Nucleotide Sequence Databases (7) and sequences from The Arabidopsis Information Resource (TAIR) (8), SGD (9) and Ensembl Homo sapiens (10). We exclude some types of data such as EMBL-Bank/GenBank/DDBJ entries that encode small fragments, synthetic sequences, most non-germline immunoglobulins and T-cell receptors, most patent sequences and some highly over-represented data. Records are selected for full manual annotation and integration into UniProtKB/Swiss-Prot according to defined annotation priorities.

The UniProt Reference Clusters (UniRef)
UniRef provides clustered sets of all sequences from the UniProt Knowledgebase (including splice forms as separate entries) and selected UniProt Archive records to obtain complete coverage of sequence space at resolutions of 100, 90 and 50% identity while hiding redundant sequences (11). The UniRef clusters provide a hierarchical set of sequence clusters where each individual member sequence can exist in only one UniRef cluster at each resolution and have only one parent or child cluster at another resolution. The UniRef100 database combines identical sequences and sub-fragments into a single UniRef entry. UniRef90 is built from UniRef100 clusters and UniRef50 is built from UniRef90 clusters. UniRef100, UniRef90 and UniRef50 yield a database size reduction of ~10, 40 and 70%, respectively. Each cluster record contains source database, protein name and taxonomy organism information on each member sequence but is represented by a single selected representative protein sequence and name; the number of members and highest common taxonomy node for the membership is included. UniRef100 is the most comprehensive and non-redundant protein sequence dataset available. The reduced size of the UniRef90 and UniRef50 datasets provide faster sequence similarity searches and reduce the research bias in similarity searches by providing a more even sampling of sequence space. UniRef is currently being used for a broad range of applications in the areas of automated genome annotation, family classification, systems biology, structural genomics, phylogenetic analysis and mass spectrometry. The UniRef clusters are updated with every release of UniProtKB.

UniProt Archive (UniParc)
UniParc is the main sequence storehouse and is a comprehensive repository that reflects the history of all protein sequences (1). UniParc houses all new and revised protein sequences from various sources to ensure that complete coverage is available at a single site. It includes not only UniProtKB but also translations from the EMBL-Bank/DDBJ/GenBank Nucleotide Sequence Databases, the Ensembl database of eukaryotic genomes, the H-Invitational Database (H-Inv), the Vertebrate Genome Annotation Database (VEGA), the International Protein Index (IPI), Protein Research Foundation (PRF), the Protein Data Bank (PDB), NCBI's Reference Sequence Collection (RefSeq), model organism databases FlyBase, SGD, TAIR and WormBase, TROME and protein sequences from the American, European, Korean and Japanese Patent Offices. To avoid redundancy, sequences are handled as strings—all sequences 100% identical over the entire length are merged, regardless of the source organism. New and updated sequences are loaded on a daily basis, cross-referenced to the source database accession number and provided with a sequence version that increments upon changes to the underlying sequence. The basic information stored within each UniParc entry is the identifier, the sequence, cyclic redundancy check number, source database(s) with accession and version numbers and a time stamp. If a UniParc entry does not have a cross-reference to a UniProtKB entry, the reason for the exclusion of that sequence from UniProtKB is provided (e.g. pseudogene). In addition, each source database accession number is tagged with its status in that database, indicating if the sequence still exists or has been deleted in the source database, and cross-references to NCBI GI and TaxId if appropriate. UniParc records are designed to be without annotation since the annotation will be only true in the real biological context of the sequence: proteins with the same sequence may have different functions depending on species, tissue, developmental stage, etc.

The UniProt Metagenomic and Environmental Sequences database (UniMES)
The Swiss-Prot and TrEMBL sections of the UniProt Knowledgebase contain entries with a known taxonomic source. However, the expanding area of metagenomic data has necessitated the creation of a separate database, the UniProt Metagenomic and Environmental Sequences database (UniMES). UniMES currently contains data from the Global Ocean Sampling Expedition (GOS), which was originally submitted to the International Nucleotide Sequence Database Collaboration (INSDC). The initial GOS dataset is composed of 25 million DNA sequences, primarily from oceanic microbes and predicts nearly 6 million proteins. By combining the predicted protein sequences with automatic classification by InterPro, the integrated resource for protein families, domains and functional sites, UniMES uniquely provides free access to the array of genomic information gathered from the sampling expeditions, enhanced by links to further analytical resources. The environmental sample data contained within this database is not present in UniProtKB and UniRef but is integrated into UniParc. UniMES is available on the ftp site in FASTA format with a UniMES matches to InterPro methods file.


    NEW DEVELOPMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT
 NEW DEVELOPMENTS
 DATABASE ACCESS AND FEEDBACK
 FUNDING
 APPENDIX
 REFERENCES
 
New UniProt unified website
The UniProt consortium released its new official unified website: a new interface, a new search engine and many new options to serve its user community better. The individual mirrors (www.ebi.uniprot.org, www.expasy.uniprot.org, www.pir.uniprot.org and parts of www.expasy.org) are no longer maintained. User feedback and the analysis of the use of our previous sites have led us to put more emphasis on supporting the most frequently used functionalities: database searches with simple (and sometimes less simple) queries that often consist of only a few terms have been enhanced by a good scoring system and a suggestion mechanism. Searching with ontology terms is assisted by auto-completion, and we also provide the possibility of using ontologies to browse search results. The viewing of database entries is improved with configurable views, a simplified terminology and a better integration of documentation. Medium-to-large sized result sets can now be retrieved directly on the site, so people no longer need to be referred to commercial, third party services. Access to the following most common bioinformatics tools have been simplified: sequence similarity searches, multiple sequence alignments, batch retrieval and a database identifier mapping tool can now be launched directly from any page, and the output of these tools can be combined, filtered and browsed like normal database searches. Programmatic access to all data and results is possible via simple HTTP (REST) requests (www.uniprot.org/help/technical). In addition to the existing formats that support the different data sets (e.g. plain text, FASTA and XML for UniProtKB), now it also provides (configurable) tab-delimited, RSS and GFF downloads where possible, and all data is available in RDF (www.w3.org/RDF/), a W3C standard for publishing data on the Semantic Web. Extensive documentation on how to best use this new resource is available at: www.uniprot.org/help/.

UniProtKB additional protein bibliography information
UniProt strives to provide comprehensive literature citations on which UniProtKB protein annotations are based, e.g. currently, there are ~218 000 PubMed citations annotated in ~4.1 million UniProtKB sequences and 66% of the citations are in UniProtKB/Swiss-Prot. Various other public databases such as Entrez Gene and model organism databases (MODs), e.g. SGD, MGI also provide curated literature information for respective gene or protein entries. For genes commonly annotated in different databases, each database often provides unique literature annotations reflecting the bias or the different priorities of the databases. Therefore, it is of great benefit to the scientific community to integrate additional sources of curated literature into UniProtKB. We have now integrated literature annotations from five external curated gene or protein databases covering human, mouse, yeast and other organisms:

GeneRIF of Entrez Gene (www.ncbi.nlm.nih.gov/projects/GeneRif),

SGD (www.yeastgenome.org), MGI (www.informatics.jax.org),

GAD (geneticassociationdn.nih.gov) and PDB (www.rcsb.org/pdb/).

The five external sources contribute ~244 000 unique PubMed citations not annotated in UniProtKB, covering ~110 000 UniProtKB entries. The additional bibliography is directly linked from the protein entry view on the UniProt website. We will continue to identify more sources of bibliography information from other MODs and databases of protein functions to enhance the UniProtKB bibliography. The additional bibliography information will not only facilitate the curation of UniProtKB entries, but also benefit the scientific users to better explore the existing knowledge on proteins of their interest.

Format changes
UniProt format changes occur in order to improve data consistency and usability. We strongly urge our users to monitor our newsfeeds in order to maximize the full benefit of these changes. Below are some of the major changes from recent months and those planned for the near future. Full details are available at www.uniprot.org/

Recent format changes

  1. The UniProtKB FASTA headers were unfortunately incompatible with the -o option of the NCBI's program formatdb. We have been working with the NCBI to remedy this and changes were required on both sides. The new version of formatdb now accepts a database code for UniProtKB/TrEMBL and we have modified our UniProtKB FASTA headers accordingly. For consistency reasons, we also changed the FASTA headers of the other UniProt databases.
  2. We have structured the UniProtKB DE lines. The new format includes three categories:
    • ‘RecName’ is the protein name recommended by the UniProt Consortium;
    • ‘AltName’ represents synonyms found in the literature or in other databases;
    • ‘SubName’ is the name provided by the submitters of the underlying nucleotide sequence. It is found in UniProtKB/TrEMBL only.

Three subcategories allow the fine-tuning of the nomenclature:
  • Abbreviations and acronyms are available in the ‘Short’ subcategory;
  • WHO INN (International Nonproprietary Names) are found in the ‘INN’ subcategory;
  • EC (Enzyme nomenclature) numbers are located in the ‘EC’ subcategory.

Each block of DE lines may also contain the sections: ‘Includes’ or ‘Contains’ and the field ‘Flags’, which indicate, for instance, whether the sequence shown is a fragment and/or a precursor.

Forthcoming format changes

(i) The CC line topic INTERACTION conveys information about binary protein-protein interactions. Currently, all interaction data are automatically derived from the IntAct database. In the future, we will start to add manually curated binary protein-protein interactions to this topic (these are currently described in the CC line topic SUBUNIT). In order to represent isoform- and chain-specific interactions (e.g. for viral polyproteins) and to add interactor-specific comments (e.g. PTMs and binding regions), we are going to modify the format of the INTERACTION lines. Each binary interaction will be represented by a block of three to four lines:
  • The first line of a block indicates the experimental evidence for the interaction and the data source (literature reference or ‘By similarity’ and/or cross-reference to the database from which the data was derived).
  • The next line is an optional comment about the interaction.
  • The last two lines give details on the interacting proteins: the Protein1 = line represents the currently displayed entry, the Protein2 = line the other interacting protein. If Protein2 is from a different species than Protein1, its species or taxonomic range is indicated.

Example:

CC -!- INTERACTION:

CC Interact = Yes (PubMed:11533489);

CC Comment = HDAC3 mediates the deacetylation of RELA;

CC Protein1 = RELA [Q04206];

CC Protein2 = HDAC3 [O15379].

(ii) We are going to introduce the new CC line topic DISRUPTION PHENOTYPE to describe the effects caused by the disruption of the gene coding for a protein. Note that we only describe effects caused by the complete absence of a gene and thus of a protein in vivo (null mutants caused by random or target deletions, insertions of a transposable element, etc.) To avoid description of phenotypes due to partial or dominant negative mutants, mis-sense mutations will not be described in this topic, but in FT MUTAGEN instead. Not all defects caused by transient inactivation using methods such as RNA interference or blockage by antibodies will be described due to the difficulty of interpreting results.

UniProtKB ANNOTATION
UniProtKB consists of two sections, Swiss-Prot and TrEMBL.

UniProtKB/Swiss-Prot contains manually annotated records with information extracted from literature and curator-evaluated computational analysis. Manual annotation consists of a critical review of experimentally proven or computer-predicted data about each protein, including the protein sequences. Data are continuously updated by an expert team of biologists.

The annotation activities of the UniProtKB/Swiss-Prot can be divided into two parts:

Model organism-oriented annotation
UniProtKB/Swiss-Prot provides annotated entries for many species, but concentrates on the annotation of entries from model organisms of distinct taxonomic groups to ensure the presence of high quality annotation for representative members of all protein families:

  • Human and other mammals (HPI);
  • Bacteria and Archaea (HAMAP);
  • Plants (PPAP);
  • Fungi (FPAP);
  • Viruses;
  • Toxins (Tox-Prot);
  • Drosophila, Xenopus, Zebrafish and C. elegans.

Transversal annotation
Transversal annotation focuses on issues common to all organisms, such as post-translational modifications (PTMs), structural information and protein–protein interactions. For more information, please see www.uniprot.org/help/projects.

First draft of the complete human proteome
A recent result of the annotation approach outlined above is the first draft of the complete human proteome in UniProtKB/Swiss-Prot. This manually annotated representation of all currently known human protein-coding genes was made available in UniProt release 14.1. At the time of release, it represents 20 325 entries. More than a third of these contain additional sequences representing isoforms generated by alternative splicing, alternative promoter usage and/or alternative translation initiation, resulting in close to 34 000 human protein sequences. Approximately 46 000 single amino acid polymorphisms (SAPs), mostly disease-linked, are also described as well as 60 000 PTMs.

It is not the first time that UniProtKB/Swiss-Prot has provided a fully annotated proteome set for a model organism (e.g. Escherichia coli or Saccharomyces cerevisiae) and there are many more planned in the near and more distant future (Arabidopsis thaliana, Bacillus subtilis, Dictyostelium discoideum, mouse, rice, Staphylococcus aureus, Schizosaccharomyces pombe, etc.). However, there is unlikely to be anything as important as this proteome. For the first time, we can present to the life sciences community a clean set of what we believe to be a full (although still imperfect) representation of human proteins. It is the ultimate goal of the life sciences to fully understand Homo sapiens at the molecular level and we hope this set will significantly contribute to this. There are still many challenging tasks in front of us. We will create entries for newly discovered human proteins, review and update the existing set, increase the number of splice variants, explore the full range of PTMs and continue to build a comprehensive view of protein variation in the human population. The characterization at the molecular level will also need to be placed in its physiological context: subcellular location, tissue expression, protein-protein interaction, etc.


    DATABASE ACCESS AND FEEDBACK
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT
 NEW DEVELOPMENTS
 DATABASE ACCESS AND FEEDBACK
 FUNDING
 APPENDIX
 REFERENCES
 
UniProt is freely available for both commercial and non-commercial use. Please see www.uniprot.org/help/license for details. The UniProt databases can be accessed online (www.uniprot.org) or downloaded in several formats (ftp.uniprot.org/pub/databases). New releases are published every three weeks except for UniMES which is updated only when the underlying source data are updated. Statistics are available with each release at www.uniprot.org.

We are constantly trying to improve our database in terms of accuracy and representation and hence, consider your feedback extremely valuable. Please contact us if you have any questions (www.uniprot.org/contact) or updates (www.uniprot.org/help/submissions) or email us directly at help{at}uniprot.org. You can also subscribe to e-mail alerts (www.uniprot.org/help/alerts) for the latest information on UniProt databases.


    FUNDING
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT
 NEW DEVELOPMENTS
 DATABASE ACCESS AND FEEDBACK
 FUNDING
 APPENDIX
 REFERENCES
 
UniProt is mainly supported by the National Institutes of Health (NIH) grant (2U01HG02712-04). Additional support for the EBI's involvement in UniProt comes from the European Commission contract FELICS grant (021902RII3) and from the NIH grant (2P41HG02273-07). UniProtKB/Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science. Additional support comes from the European Commission contract FELICS (091902RII3) and from the PATRIC BRC (NIH/NIAID contract HHSN 266200400035C). PIR activities are also supported by the NIH grants and contracts on proteomics (HHSN266200400061C), protein ontology (1R01GM080646-01) and grid enablement (NCI-1435-04-04-CT-73980). Funding for open access charges: National Institutes of Health (NIH) grant 2 U01 HG02712-05.

Conflict of interest statement. None declared.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT
 NEW DEVELOPMENTS
 DATABASE ACCESS AND FEEDBACK
 FUNDING
 APPENDIX
 REFERENCES
 
UniProt has been prepared by: Amos Bairoch, Lydie Bougueleret, Severine Altairac, Valeria Amendolia, Andrea Auchincloss, Ghislaine Argoud-Puy, Kristian Axelsen, Delphine Baratin, Marie-Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Laurent Bollondi, Emmanuel Boutet, Silvia Braconi Quintaje, Lionel Breuza, Alan Bridge, Edouard deCastro, Luciane Ciapina, Danielle Coral, Elisabeth Coudert, Isabelle Cusin, Gwennaelle Delbard, Dolnide Dornevil, Paula Duek Roggli, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Sebastian Gehant, Nathalie Farriol-Mathis, Serenella Ferro, Elisabeth Gasteiger, Alain Gateau, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, Janet James, Silvia Jimenez, Florence Jungo, Vivien Junker, Thomas Kappler, Guillaume Keller, Corinne Lachaize, Lydie Lane-Guermonprez, Petra Langendijk-Genevaux, Vicente Lara, Philippe Lemercier, Virginie Le Saux, Damien Lieberherr, Tania de Oliveira Lima, Veronique Mangold, Xavier Martin, Patrick Masson, Karine Michoud, Madelaine Moinat, Anne Morgat, Anais Mottaz, Salvo Paesano, Ivo Pedruzzi, Isabelle Phan, Sandrine Pilbout, Violaine Pillet, Sylvain Poux, Monica Pozzato, Nicole Redaschi, Sorogini Reynaud, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue, Anne-Lise Veuthey, Lina Yip and Luiz Zuletta at the Swiss Institute of Bioinformatics (SIB) and the Bioinformatics and Structural Biology Department of the University of Geneva; Rolf Apweiler, Yasmin Alam-Faruque, Ricardo Antunes, Daniel Barrell, David Binns, Lawrence Bower, Paul Browne, Wei Mun Chan, Emily Dimmer, Ruth Eberhardt, Alexander Fedotov, Rebecca Foulger, John Garavelli, Renato Golin, Alan Horne, Rachael Huntley, Julius Jacobsen, Michael Kleen, Paul Kersey, Kati Laiho, Rasko Leinonen, Duncan Legge, Quan Lin, Michele Magrane, Maria Jesus Martin, Claire O'Donovan, Sandra Orchard, John O'Rourke, Samuel Patient, Manuela Pruess, Andrey Sitnov, Eleanor Stanley, Matt Corbett, Giuseppe di Martino, Mike Donnelly, Jie Luo and Pieter van Rensburg at the European Bioinformatics Institute (EBI); Cathy Wu, Cecilia Arighi, Leslie Arminski, Winona Barker, Yongxing Chen, Zhang-Zhi Hu, Hsing-Kuo Hua, Hongzhan Huang, Raja Mazumder, Peter McGarvey, Darren A. Natale, Anastasia Nikolskaya, Natalia Petrova, Baris E. Suzek, Sona Vasudevan, C. R. Vinayaka, Lai Su Yeh and Jian Zhang at the Protein Information Resource (PIR).


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT
 NEW DEVELOPMENTS
 DATABASE ACCESS AND FEEDBACK
 FUNDING
 APPENDIX
 REFERENCES
 

  1. Leinonen R, Diez FG, Binns D, Fleischmann W, Lopez R, Apweiler R. UniProt archive. Bioinformatics (2004) 20:3236–3237.[Abstract/Free Full Text]

  2. Wieser D, Kretschmann E, Apweiler R. Filtering erroneous protein annotation. Bioinformatics (2004) 20:i342–i347.[Abstract]

  3. Gattiker A, Michoud K, Rivoire C, Auchincloss AH, Coudert E, Lima T, Kersey P, Pagni M, Sigrist CJ, Lachaize C, et al. Automated annotation of microbial proteomes in SWISS-PROT. Comput. Biol. Chem. (2003) 27:49–58.[CrossRef][Web of Science][Medline]

  4. Fleischmann W, Moller S, Gateau A, Apweiler R. A novel method for automatic functional annotation of proteins. Bioinformatics (1999) 15:228–233.[Abstract/Free Full Text]

  5. Wu CH, Nikolskaya A, Huang H, Yeh L.-S, Natale DA, Vinayaka CR, Hu ZZ, Mazumder R, Kumar S, Kourtesis P, et al. PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. (2004) 32:D112–D114.[Abstract/Free Full Text]

  6. Natale DA, Vinayaka CR, Wu CH. Large-scale, classification-driven, rule-based functional annotation of proteins. In: Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics—Subramaniam S, ed. (2004) Bioinformatics Volume.: John Wiley & Sons, Ltd. West Sussex, England.

  7. Cochrane G, Akhtar R, Aldebert P, Althorpe N, Baldwin A, Bates K, Bhattacharyya S, Bonfield J, Bower L, Browne P, et al. Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database. Nucleic Acids Res. (2008) 36:D5–D12.[Abstract/Free Full Text]

  8. Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. (2008) 36:D1009–D1014.[Abstract/Free Full Text]

  9. Hong EL, Balakrishnan R, Dong Q, Christie KR, Park J, Binkley G, Costanzo MC, Dwight SS, Engel SR, Fisk DG, et al. The Ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res. (2008) 36:D577–D581.[Abstract/Free Full Text]

  10. Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, et al. Ensembl 2008. Nucleic Acids Res. (2008) 36:D707–D714.[Abstract/Free Full Text]

  11. Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics (2007) 23:1282–1288.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Virol.Home page
Y. Bulliard, P. Turelli, U. F. Rohrig, V. Zoete, B. Mangeat, O. Michielin, and D. Trono
Functional Analysis and Structural Modeling of Human APOBEC3G Reveal the Role of Evolutionarily Conserved Elements in the Inhibition of Human Immunodeficiency Virus Type 1 Infection and Alu Transposition
J. Virol., December 1, 2009; 83(23): 12611 - 12621.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Easty and N. Nikolov
Client-side integration of life science literature resources
Bioinformatics, December 1, 2009; 25(23): 3194 - 3196.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
N. Terrapon, O. Gascuel, E. Marechal, and L. Brehelin
Detection of new protein domains using co-occurrence: application to Plasmodium falciparum
Bioinformatics, December 1, 2009; 25(23): 3077 - 3083.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Yamazaki, R. Akashi, Y. Banno, T. Endo, H. Ezura, K. Fukami-Kobayashi, K. Inaba, T. Isa, K. Kamei, F. Kasai, et al.
NBRP databases: databases of biological resources in Japan
Nucleic Acids Res., November 24, 2009; (2009) gkp996v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Leplae, G. Lima-Mendez, and A. Toussaint
ACLAME: A CLAssification of Mobile genetic Elements, update 2010
Nucleic Acids Res., November 23, 2009; (2009) gkp938v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Yamasaki, K. Murakami, J.-i. Takeda, Y. Sato, A. Noda, R. Sakate, T. Habara, H. Nakaoka, F. Todokoro, A. Matsuya, et al.
H-InvDB in 2009: extended database and data mining resources for human genes and transcripts
Nucleic Acids Res., November 23, 2009; (2009) gkp1020v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Schlicker and M. Albrecht
FunSimMat update: new features for exploring functional similarity
Nucleic Acids Res., November 18, 2009; (2009) gkp979v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. M. Gould, F. Diella, A. Via, P. Puntervoll, C. Gemund, S. Chabanis-Davidson, S. Michael, A. Sayadi, J. C. Bryne, C. Chica, et al.
ELM: the status of the 2010 eukaryotic linear motif resource
Nucleic Acids Res., November 17, 2009; (2009) gkp1016v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Smialowski, P. Pagel, P. Wong, B. Brauner, I. Dunger, G. Fobo, G. Frishman, C. Montrone, T. Rattei, D. Frishman, et al.
The Negatome database: a reference set of non-interacting protein pairs
Nucleic Acids Res., November 17, 2009; (2009) gkp1026v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Binns, E. Dimmer, R. Huntley, D. Barrell, C. O'Donovan, and R. Apweiler
QuickGO: a web-based tool for Gene Ontology searching
Bioinformatics, November 15, 2009; 25(22): 3045 - 3046.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Marsico, K. Scheubert, A. Tuukkanen, A. Henschel, C. Winter, R. Winnenburg, and M. Schroeder
MeMotif: a database of linear motifs in {alpha}-helical transmembrane proteins
Nucleic Acids Res., November 12, 2009; (2009) gkp1042v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Kapushesky, I. Emam, E. Holloway, P. Kurnosov, A. Zorin, J. Malone, G. Rustici, E. Williams, H. Parkinson, and A. Brazma
Gene Expression Atlas at the European Bioinformatics Institute
Nucleic Acids Res., November 11, 2009; (2009) gkp936v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B. Rhead, D. Karolchik, R. M. Kuhn, A. S. Hinrichs, A. S. Zweig, P. A. Fujita, M. Diekhans, K. E. Smith, K. R. Rosenbloom, B. J. Raney, et al.
The UCSC genome browser database: update 2010
Nucleic Acids Res., November 11, 2009; (2009) gkp939v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. A. Vizcaino, R. Cote, F. Reisinger, H. Barsnes, J. M. Foster, J. Rameseder, H. Hermjakob, and L. Martens
The Proteomics Identifications database: 2010 update
Nucleic Acids Res., November 11, 2009; (2009) gkp964v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. S. Dehal, M. P. Joachimiak, M. N. Price, J. T. Bates, J. K. Baumohl, D. Chivian, G. D. Friedland, K. H. Huang, K. Keller, P. S. Novichkov, et al.
MicrobesOnline: an integrated portal for comparative and functional genomics
Nucleic Acids Res., November 11, 2009; (2009) gkp919v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Flicek, B. L. Aken, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, G. Coates, S. Fairley, et al.
Ensembl's 10th year
Nucleic Acids Res., November 11, 2009; (2009) gkp972v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Lees, C. Yeats, O. Redfern, A. Clegg, and C. Orengo
Gene3D: merging structure and function for a Thousand genomes
Nucleic Acids Res., November 11, 2009; (2009) gkp987v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Klucar, M. Stano, and M. Hajduk
phiSITE: database of gene regulation in bacteriophages
Nucleic Acids Res., November 9, 2009; (2009) gkp911v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. E. Ulrich and I. B. Zhulin
The MiST2 database: a comprehensive genomics resource on microbial signal transduction
Nucleic Acids Res., November 9, 2009; (2009) gkp940v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Muller, D. Szklarczyk, P. Julien, I. Letunic, A. Roth, M. Kuhn, S. Powell, C. von Mering, T. Doerks, L. J. Jensen, et al.
eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations
Nucleic Acids Res., November 9, 2009; (2009) gkp951v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. J. Kersey, D. Lawson, E. Birney, P. S. Derwent, M. Haimel, J. Herrero, S. Keenan, A. Kerhornou, G. Koscielny, A. Kahari, et al.
Ensembl genomes: Extending ensembl across the taxonomic space
Nucleic Acids Res., November 1, 2009; (2009) gkp871v1.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
C. Soderlund
Computational techniques for elucidating plant-pathogen interactions from large-scale experiments on fungi and oomycetes
Brief Bioinform, November 1, 2009; 10(6): 654 - 663.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
M. Libault, T. Joshi, V. A. Benedito, D. Xu, M. K. Udvardi, and G. Stacey
Legume Transcription Factor Genes: What Makes Legumes So Special?
Plant Physiology, November 1, 2009; 151(3): 991 - 1001.
[Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Vanhee, J. Reumers, F. Stricher, L. Baeten, L. Serrano, J. Schymkowitz, and F. Rousseau
PepX: a structural database of non-redundant protein-peptide complexes
Nucleic Acids Res., October 30, 2009; (2009) gkp893v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe, and M. Hirakawa
KEGG for representation and analysis of molecular networks involving diseases and drugs
Nucleic Acids Res., October 30, 2009; (2009) gkp896v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Velankar, C. Best, B. Beuth, C. H. Boutselakis, N. Cobley, A. W. Sousa Da Silva, D. Dimitropoulos, A. Golovin, M. Hirshberg, M. John, et al.
PDBe: Protein Data Bank in Europe
Nucleic Acids Res., October 25, 2009; (2009) gkp916v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Y. Geer, A. Marchler-Bauer, R. C. Geer, L. Han, J. He, S. He, C. Liu, W. Shi, and S. H. Bryant
The NCBI BioSystems database
Nucleic Acids Res., October 23, 2009; (2009) gkp858v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B. Aranda, P. Achuthan, Y. Alam-Faruque, I. Armean, A. Bridge, C. Derow, M. Feuermann, A. T. Ghanbarian, S. Kerrien, J. Khadake, et al.
The IntAct molecular interaction database in 2010
Nucleic Acids Res., October 22, 2009; (2009) gkp878v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. J. Roberts, T. Vincze, J. Posfai, and D. Macelis
REBASE--a database for DNA restriction and modification: enzymes, genes and genomes
Nucleic Acids Res., October 21, 2009; (2009) gkp874v1.
[Abstract] [Full Text] [PDF]


Home page
DatabaseHome page
P. Gaudet, L. Lane, P. Fey, A. Bridge, S. Poux, A. Auchincloss, K. Axelsen, S. Braconi Quintaje, E. Boutet, P. Brown, et al.
Collaborative annotation of genes and proteins between UniProtKB/Swiss-Prot and dictyBase
Database, October 15, 2009; 2009(0): bap016 - bap016.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Frenkel-Morgenstern, A. A. Cohen, N. Geva-Zatorsky, E. Eden, J. Prilusky, I. Issaeva, A. Sigal, C. Cohen-Saidon, Y. Liron, L. Cohen, et al.
Dynamic Proteomics: a database for dynamics and localizations of endogenous fluorescently-tagged proteins in living human cells
Nucleic Acids Res., October 9, 2009; (2009) gkp808v1.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Clin. Nutr.Home page
J. C McCann and B. N Ames
Vitamin K, an example of triage theory: is micronutrient inadequacy linked to diseases of aging?
Am. J. Clinical Nutrition, October 1, 2009; 90(4): 889 - 907.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
N. Jimenez-Lozano, J. Segura, J. R. Macias, J. Vega, and J. M. Carazo
aGEM: an integrative system for analyzing spatial-temporal gene-expression information
Bioinformatics, October 1, 2009; 25(19): 2566 - 2572.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Ren, Z. Liu, X. Gao, C. Jin, M. Ye, H. Zou, L. Wen, Z. Zhang, Y. Xue, and X. Yao
MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore
Nucleic Acids Res., September 25, 2009; (2009) gkp784v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Richardson, S. Venkataraman, P. Stevenson, Y. Yang, N. Burton, J. Rao, M. Fisher, R. A. Baldock, D. R. Davidson, and J. H. Christiansen
EMAGE mouse embryo spatial gene expression database: 2010 update
Nucleic Acids Res., September 18, 2009; (2009) gkp763v1.
[Abstract] [Full Text] [PDF]


Home page
DatabaseHome page
L. A. Florez, S. F. Roppel, A. G. Schmeisky, C. R. Lammers, and J. Stulke
A community-curated consensual annotation that is continuously updated: the Bacillus subtilis centred wiki SubtiWiki
Database, September 17, 2009; 2009(0): bap012 - bap012.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. W. Huss III, P. Lindenbaum, M. Martone, D. Roberts, A. Pizarro, F. Valafar, J. B. Hogenesch, and A. I. Su
The Gene Wiki: community intelligence applied to human gene annotation
Nucleic Acids Res., September 15, 2009; (2009) gkp760v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
W. Xiong, T. Li, K. Chen, and K. Tang
Local combinational variables: an approach used in DNA-binding helix-turn-helix motif prediction with sequence information
Nucleic Acids Res., September 1, 2009; 37(17): 5632 - 5640.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
C. Faye, E. Chautard, B. R. Olsen, and S. Ricard-Blum
The First Draft of the Endostatin Interaction Network
J. Biol. Chem., August 14, 2009; 284(33): 22041 - 22047.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Mcwilliam, F. Valentin, M. Goujon, W. Li, M. Narayanasamy, J. Martin, T. Miyar, and R. Lopez
Web services at the European Bioinformatics Institute-2009
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W6 - W10.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Imanishi and H. Nakaoka
Hyperlink Management System and ID Converter System: enabling maintenance-free hyperlinks among major biological databases
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W17 - W22.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Gattiker, C. Dessimoz, A. Schneider, I. Xenarios, M. Pagni, and J. Rougemont
The Microbe browser for comparative genomics
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W296 - W299.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Yang, H. Luo, J. Chen, Q. Xing, and L. He
SePreSA: a server for the prediction of populations susceptible to serious adverse drug reactions implementing the methodology of a chemical-protein interactome
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W406 - W412.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. T.-H. Chang, T.-Y. Chien, and C.-Y. Chen
seeMotif: exploring and visualizing sequence motifs in 3D structures
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W552 - W558.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Blankenburg, F. Ramirez, J. Buch, and M. Albrecht
DASMIweb: online integration, analysis and assessment of distributed protein interaction data
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W122 - W128.
[Abstract] [Full Text] [PDF]


Home page
Plant Cell PhysiolHome page
Y. Makita, N. Kobayashi, Y. Mochizuki, Y. Yoshida, S. Asano, N. Heida, M. Deshpande, R. Bhatia, A. Matsushima, M. Ishii, et al.
PosMed-plus: An Intelligent Search Engine that Inferentially Integrates Cross-Species Information Resources for Molecular Breeding of Plants
Plant Cell Physiol., July 1, 2009; 50(7): 1249 - 1259.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (91K) Freely available
Right arrow Screen PDF (101K) Freely available
Right arrowOA All Versions of this Article:
37/suppl_1/D169    most recent
gkn664v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?