Nucleic Acids Research Advance Access originally published online on November 27, 2006
Nucleic Acids Research 2007 35(Database issue):D61-D65; doi:10.1093/nar/gkl842
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, Database issue D61-D65
Published by Oxford University Press 2006
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles |
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
National Center for Biotechnology Information, National Library of Medicine National Institutes of Health Rm 6An.12J, 45 Center Drive, Bethesda, MD 20892-6510, USA
*To whom correspondence should be addressed. Tel: +1 301 435 5898; Fax: +1 301 480 2918; Email: pruitt{at}ncbi.nlm.nih.gov
Received September 20, 2006. Revised October 6, 2006. Accepted October 6, 2006.
| ABSTRACT |
|---|
|
|
|---|
NCBI's reference sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2 879 860 proteins (RefSeq release 19). RefSeq records integrate information from multiple sources, when additional data are available from those sources and therefore represent a current description of the sequence and its features. Annotations include coding regions, conserved domains, tRNAs, sequence tagged sites (STS), variation, references, gene and protein product names, and database cross-references. Sequence is reviewed and features are added using a combined approach of collaboration and other input from the scientific community, prediction, propagation from GenBank and curation by NCBI staff. The format of all RefSeq records is validated, and an increasing number of tests are being applied to evaluate the quality of sequence and annotation, especially in the context of complete genomic sequence.
| INTRODUCTION |
|---|
|
|
|---|
RefSeq is a public database of nucleotide and protein sequences with feature and bibliographic annotation. The RefSeq database is built and distributed by the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine located at the US National Institutes of Health. NCBI makes RefSeq publicly available, at no cost, over the internet via Entrez query (1) and Basic Local Alignment Search Tool (BLAST) (2,3) programs, and incorporation into a wide range of NCBI resources. The RefSeq collection is also available for FTP download as bi-monthly comprehensive releases, incremental daily updates and updates of other frequencies for select species.
NCBI builds RefSeq from the sequence data available in the archival database GenBank (4), which is a comprehensive public repository of sequences submitted to, and exchanged among, GenBank in the United States, the EMBL Data Library in the United Kingdom, and the DNA Data Bank of Japan. RefSeq records are not part of GenBank, although they can be retrieved from NCBI via the same interfaces such as Entrez Nucleotide and Entrez Protein. GenBank represents submissions from multiple groups, and thus includes sequence information generated at different times, with alternate technologies, and with diverse names and other information. In contrast, a RefSeq record is a synthesis of information and may include feature annotation, names or other data not available in the GenBank records from which it was derived. Also, RefSeq records can be updated more frequently by collaborating groups and by NCBI staff.
The RefSeq collection is unique in providing a curated, non-redundant, explicitly linked nucleotide and protein database representing significant taxonomic diversity. The collection is non-redundant in the sense that the goal is to represent distinct biological molecules that are observed for an organism, strain or haplotype. However, the molecules may themselves appear more than once in the collection if alternatively spliced transcripts encode the same protein product, if there are multiple genomic locations in one species or among species that encode the same products, or if RefSeqs are generated to represent alternate haplotypes and some mRNAs and proteins sequences are the same in all. RefSeq provides linked genomic and protein sequence records for the majority of organisms in the database; transcript records are currently limited to a subset of the eukaryotic collection. The RefSeq database provides a critical foundation for integrating sequence, genetic and functional information, and is used internationally as a standard for genome annotation. RefSeq records are accessible in several NCBI resources including Entrez Nucleotide, Protein, Gene, Map Viewer and BLAST. RefSeq records can be identified by a distinct accession format, which includes an underscore (_). A full definition of the RefSeq accession space is available on the RefSeq website (http://www.ncbi.nlm.nih.gov/RefSeq/key.html#accessions).
| SCOPE |
|---|
|
|
|---|
The RefSeq collection includes complete or incomplete genome sequences, transcripts and proteins. Genomic sequence records are added when whole genome submissions are submitted to GenBank and are updated as those genome sequencing projects submit updates. Genomic sequences include nuclear chromosomes, organelles, bacterial and viral genomes, and naturally occurring plasmids. RefSeq represents transcripts and proteins as represented on the GenBank submissions for many organisms; however, if whole genome sequencing projects are submitted to GenBank without annotation then NCBI may calculate annotation and provide annotated proteins in the RefSeq representation. For some eukaryotic species, RefSeq transcript and protein records are provided independently of the genome sequence and are used as a reagent to subsequently annotate the genome sequence when it becomes available. For some eukaryotic species, additional genomic sequence records are provided to represent non-transcribed pseudogenes, alternate haplotypes, gene clusters or gene-specific regions.
| GROWTH |
|---|
|
|
|---|
The size of the comprehensive bi-monthly RefSeq release continues to grow in pace with the large-scale genome and cDNA sequencing projects (see Table 1 and Supplementary Table 1). As of July 2006, the release included records from 3695 species and represented 2 762 164 protein records with the majority from bacterial genomes (1 990 849 proteins) and the next largest number provided for mammalian species (251 785). As new genome assemblies become available for organelles, chromosomes or complete genomes, they are incorporated into the RefSeq collection. Most organisms are represented in the collection only after some genomic sequence data (nuclear, plastid, mitochondrial or other genomic molecules) become available; however, transcript and protein records may be provided for a subset of eukaryotic organisms prior to the availability of genomic sequence data. From July 2005 (RefSeq release 12) to July 2006 (RefSeq release 18) the number of species included in the RefSeq release increased by 24%, and the total number of records increased by 46% with the largest increase, namely 62%, occurring in the protein collection.
|
| ACCESS |
|---|
|
|
|---|
The RefSeq collection can be accessed in multiple ways at NCBI, including by Entrez query, BLAST, FTP and links provided from NCBI databases and resources (see Supplementary Table 3). For some services, such as BLAST and queries against Entrez nucleotide and protein databases, results sets can be restricted to RefSeq records using Limits, Filters, Tabs or additional query restrictions. A subset of the available access methods is described here.
Entrez queries and links
RefSeq records are included in the results returned when performing queries against the Entrez nucleotide or protein databases and the relatively new tab-oriented results page facilitates accessing the RefSeq subset (Figure 1). The display of tabs and links can be customized by logging into My NCBI. RefSeq records are extensively cross-linked with otherresources. Entrez nucleotide and protein query results include numerous links both to sets of related sequences that may include RefSeq records, and to support navigation to several additional databases and display pages (5). More links may be available from the RefSeq feature annotations as dbXrefs including links to the Consensus CDS (CCDS) project (human, mouse) and to model organisms databases such as FlyBase, MGD, WormBase or TAIR (69). Entrez queries can also be formatted to retrieve only RefSeq records, or to retrieve a subset of interest such as records that have been curated by either a collaborating group or by NCBI staff. For example, a query to retrieve all RefSeq nucleotide records that are annotated with a status of REVIEWED and include the name BRCA1 somewhere in the record is formatted as BRCA1 AND srcdb_refseq_reviewed[prop]. The RefSeq website provides definitions of the available property restrictions (http://www.ncbi.nlm.nih.gov/RefSeq/key.html#query).
|
Entrez queries from the Entrez home page, where it is possible to query against all of the Entrez databases at once, will also return results to other databases including Gene (10) and Genomes (11), which are both components of the RefSeq project. Entrez Gene integrates gene-specific annotation from RefSeq records with other sources of information, and thus provides a gene-oriented view of the genes annotated on RefSeqs. When there is sequence for a complete genome or chromosome, the data are also included in the Entrez Genome database that provides multiple tools to display and to analyze the information.
FTP
The complete RefSeq collection is made available for anonymous FTP as bi-monthly releases in conjunction with daily and cumulative updates between the release cycles. The RefSeq release is structured to provide access to the full RefSeq collection or to a portion of the collection organized by main taxonomic categories or by molecule type (e.g. mitochondrion) in order to facilitate downloads of subsets of interest. As such, the release itself is redundant as records can be found in more than one category; for example, a sequence may be included in the complete directory and also in a taxonomic category such as the plant directory, and optionally may occur in an organelle-oriented grouping. Extensive documentation is provided to describe the release contents including reports of files and sequences (accessions) included per category, sequences that have been removed since the previous release, species (NCBI taxonomy identifier) that have been added since the previous release, and a full description of the release structure and content. Announcements about large changes, problems and the availability of a RefSeq release are emailed to the refseq-announce email list (see Supplementary Table 2). Additional FTP data are provided for some organisms of interest, including the transcript and protein dataset for human and mouse.
| COLLABORATIONS |
|---|
|
|
|---|
The RefSeq project is supported by numerous collaborations that provide a variety of information including the definitions of the reference sequence standards, feature annotation and standard names. These collaborations also support the Entrez Gene database and are described in more detail in the NCBI handbook chapters for Gene, RefSeq and the Consensus CDS (CCDS) project.
Model organism databases
For some species, the RefSeq collection is curated entirely by a collaborating authoritative group that provides both the sequences and annotation. Thus RefSeq records may contain information provided by an external authoritative source and/or analyses and curation at NCBI. The collaborating group is identified on RefSeq records.
Nomenclature
Collaborations are established with official nomenclature groups when such authorities are available for an organism so that official names can be used on annotated genes. If there is no official group, data, then an effort is made to work with the research community to establish a policy for naming genes and protein products.
Consensus CDS
Annotation of genes on the human and mouse genomes is provided by multiple public resources, using different methods and resulting in information that is similar but not always identical. The human and mouse genome sequences are now sufficiently stable to start identifying those gene placements that are identical, and to make the results of those analyses public and supported as a core set by the three major public human genome browsers. The CCDS project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. Consistently annotated CDS regions are assigned a stable identifier and version number (e.g. CCDS1.1), which is cited on the RefSeq sequence records as a dbXref and reported in the CCDS website, Map Viewer and Entrez Gene displays (see Supplementary Table 2). The long-term goal is to support convergence toward a standard set of gene annotations on the human and mouse genomes. The CCDS set is built by consensus among the collaborating members which include (i) European Bioinformatics Institute (EBI); (ii) National Center for Biotechnology Information (NCBI); (iii) Wellcome Trust Sanger Institute (WTSI); and (iv) University of California, Santa Cruz (UCSC).
| QUALITY TESTING AND CURATION |
|---|
|
|
|---|
All RefSeq sequences are validated to confirm accurate nucleotide-to-protein sequence correspondence and valid ASN.1 format. Additional validation or quality testing is carried out for different subsets of the collection.
NCBI staff review and manually modify a subset of the RefSeq collection (Table 2). The goal of NCBI's manual curation is to provide accurate and full-length sequence data, to ensure accurate sequence-to-gene associations, to expand the collection by adding previously unrepresented genes and/or alternate splice products, and to provide additional feature annotation to represent mature peptide products, regions of interest, and/or to highlight less frequent biological events such as non-AUG initiation sites (12) or selenoproteins (13). The curation status is annotated on RefSeq records, as a COMMENT feature; the status terms used include model, predicted, provisional, inferred, validated and reviewed, with the latter two indicating that sequence-level curation has taken place. Curation status terms are documented on the RefSeq website (http://www.ncbi.nlm.nih.gov/RefSeq/key.html#status).
|
With high-quality genomic sequence available for the human and mouse genome, review of cDNA-based RefSeqs relative to the genome has been a primary focus. The CCDS collaboration has also helped focus attention on areas where representations of mRNA and proteins sequences differ. Many tests have been added to identify possible annotation problems and thus target review to areas of most concern. QA tests include the following:
- Short CDS (length < 100 amino acids).
- Invalid start or stop codon.
- Transcript has a stop codon in CDS.
- Annotated CDS may be partial (inframe upstream start site).
- Sequence is low complexity.
- Protein sequence has no similarity to other protein records.
- Non-consensus splice sites.
- Has a very short (<5 bp) or long (>7 kb) exon, or very short (<25 bp) intron.
- Single exon gene.
- Gene has a spliced 5-UTR and CDS is located in the terminal exon.
- Indel: transcript has insertions or deletions versus the reference genome sequence.
- Mismatches: transcript has one or more mismatches versus the reference genome sequence.
- Transcript does not align completely to the reference genome.
- Nonsense-mediated decay (NMD) candidate (distance from stop codon to 3'-most intron following stop >55 nt).
Several of the tests were initially implemented to support the CCDS project; the scope has been expanded to include all human and mouse records. Many of the tests are designed to identify potential problems and a test failure does not necessarily indicate a real error. For example, records that do not meet minimum protein length thresholds have a higher probability of being invalid, but some very short proteins are known to exist.
Records that fail quality tests are prioritized for curation, with the highest priority given to reviewing records with potential problems in the CDS. The curation process flow includes storing database attributes to indicate that the quality test category was reviewed and the RefSeq updated, or if no problem was found with the RefSeq transcript and protein record and the reported error should be ignored, or if the problem is due to the genome assembly at that location. Assembly problems can include known gaps in the assembly and in some cases the assembled genome sequence represents a known mutation or rare polymorphism that is not the ideal sequence to represent in the transcript and protein records.
| SUPPLEMENTARY DATA |
|---|
|
|
|---|
Supplementary Data are available at NAR Online.
| ACKNOWLEDGEMENTS |
|---|
This work was supported by the Intramural Research Program of the NIH, National Library of Medicine. Funding to pay the Open Access publication charges for this article was provided by the Intramural Research Program of the NIH, National Library of Medicine.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Schuler, G.D., Epstein, J.A., Ohkawa, H., Kans, J.A. (1996) Entrez: molecular biology database and retrieval system Methods Enzymol, . 266, 141162[Web of Science][Medline] .
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990) Basic local alignment search tool J. Mol. Biol, . 215, 403410[CrossRef][Web of Science][Medline] .
- Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res, . 25, 33893402
[Abstract/Free Full Text] . - Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L. (2007) GenBank Nucleic Acids Res, . in press .
- Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., et al. (2007) Database resources of the National Center for Biotechnology Information Nucleic Acids Res, . in press .
- Drysdale, R.A., Crosby, M.A., FlyBase, Consortium. (2005) FlyBase: genes and gene models Nucleic Acids Res, . 33, D390D395
[Abstract/Free Full Text] . - Blake, J.A., Eppig, J.T., Bult, C.J., Kadin, J.A., Richardson, J.E., Mouse, Genome Database. (2006) Group The Mouse Genome Database (MGD): updates and enhancements Nucleic Acids Res, . 34, D562D567
[Abstract/Free Full Text] . - Schwarz, E.M., Antoshechkin, I., Bastiani, C., Bieri, T., Blasiar, D., Canaran, P., Chan, J., Chen, N., Chen, W.J., Davis, P., et al. (2006) WormBase: better software, richer content Nucleic Acids Res, . 34, D475D478
[Abstract/Free Full Text] . - Rhee, S.Y., Beavis, W., Berardini, T.Z., Chen, G., Dixon, D., Doyle, A., Garcia-Hernandez, M., Huala, E., Lander, G., Montoya, M., et al. (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community Nucleic Acids Res, . 31, 224228
[Abstract/Free Full Text] . - Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T. (2007) Entrez Gene: Gene-centered information at NCBI Nucleic Acids Res, . in press .
- Tatusova, T.A., Karsch-Mizrachi, I., Ostell, J.A. (1999) Complete genomes in WWW Entrez: data representation and analysis Bioinformatics, 15, 536543
[Abstract/Free Full Text] . - Touriol, C., Bornes, S., Bonnal, S., Audigier, S., Prats, H., Prats, A.C., Vagner, S. (2003) Generation of protein isoform diversity by alternative initiation of translation at non-AUG codons Biol. Cell, 95, 169178[CrossRef][Web of Science][Medline] .
- Copeland, P.R. (2003) Regulation of gene expression by stop codon recoding: selenocysteine Gene, 312, 1725[CrossRef][Web of Science][Medline]
.
This article has been cited by other articles:
![]() |
D. Vallenet, S. Engelen, D. Mornico, S. Cruveiller, L. Fleury, A. Lajus, Z. Rouy, D. Roche, G. Salvignol, C. Scarpelli, et al. MicroScope: a platform for microbial genome annotation and comparative genomics Database, November 25, 2009; 2009(0): bap021 - bap021. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Yamasaki, K. Murakami, J.-i. Takeda, Y. Sato, A. Noda, R. Sakate, T. Habara, H. Nakaoka, F. Todokoro, A. Matsuya, et al. H-InvDB in 2009: extended database and data mining resources for human genes and transcripts Nucleic Acids Res., November 23, 2009; (2009) gkp1020v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. L. Barbosa-Morais, M. J. Dunning, S. A. Samarajiwa, J. F. J. Darot, M. E. Ritchie, A. G. Lynch, and S. Tavare A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data Nucleic Acids Res., November 18, 2009; (2009) gkp942v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Rhead, D. Karolchik, R. M. Kuhn, A. S. Hinrichs, A. S. Zweig, P. A. Fujita, M. Diekhans, K. E. Smith, K. R. Rosenbloom, B. J. Raney, et al. The UCSC genome browser database: update 2010 Nucleic Acids Res., November 11, 2009; (2009) gkp939v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Rattei, P. Tischler, S. Gotz, M.-A. Jehl, J. Hoser, R. Arnold, A. Conesa, and H.-W. Mewes SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters Nucleic Acids Res., November 11, 2009; (2009) gkp949v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. S. Dehal, M. P. Joachimiak, M. N. Price, J. T. Bates, J. K. Baumohl, D. Chivian, G. D. Friedland, K. H. Huang, K. Keller, P. S. Novichkov, et al. MicrobesOnline: an integrated portal for comparative and functional genomics Nucleic Acids Res., November 11, 2009; (2009) gkp919v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. S. Syed, M. D'Antonio, and F. D. Ciccarelli Network of Cancer Genes: a web resource to analyze duplicability, orthology and network properties of cancer genes Nucleic Acids Res., November 11, 2009; (2009) gkp957v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Lees, C. Yeats, O. Redfern, A. Clegg, and C. Orengo Gene3D: merging structure and function for a Thousand genomes Nucleic Acids Res., November 11, 2009; (2009) gkp987v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Muller, D. Szklarczyk, P. Julien, I. Letunic, A. Roth, M. Kuhn, S. Powell, C. von Mering, T. Doerks, L. J. Jensen, et al. eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations Nucleic Acids Res., November 9, 2009; (2009) gkp951v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ceol, A. Chatr Aryamontri, L. Licata, D. Peluso, L. Briganti, L. Perfetto, L. Castagnoli, and G. Cesareni MINT, the molecular interaction database: 2009 update Nucleic Acids Res., November 6, 2009; (2009) gkp983v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Ostlund, T. Schmitt, K. Forslund, T. Kostler, D. N. Messina, S. Roopra, O. Frings, and E. L. L. Sonnhammer InParanoid 7: new algorithms and tools for eukaryotic orthology analysis Nucleic Acids Res., November 5, 2009; (2009) gkp931v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhang, J. Lv, H. Liu, J. Zhu, J. Su, Q. Wu, Y. Qi, F. Wang, and X. Li HHMD: the human histone modification database Nucleic Acids Res., November 5, 2009; (2009) gkp968v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Li, H. McWilliam, A. R. de la Torre, A. Grodowski, I. Benediktovich, M. Goujon, S. Nauche, and R. Lopez Non-redundant patent sequence databases with value-added annotations at two levels Nucleic Acids Res., November 1, 2009; (2009) gkp960v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Meyer, R. Overbeek, and A. Rodriguez FIGfams: yet another set of protein families Nucleic Acids Res., November 1, 2009; 37(20): 6643 - 6654. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. M. Markowitz, I-M. A. Chen, K. Palaniappan, K. Chu, E. Szeto, Y. Grechkin, A. Ratner, I. Anderson, A. Lykidis, K. Mavromatis, et al. The integrated microbial genomes system: an expanding comparative analysis resource Nucleic Acids Res., October 28, 2009; (2009) gkp887v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. Horner, G. Pavesi, T. Castrignano, P. D. De Meo, S. Liuni, M. Sammeth, E. Picardi, and G. Pesole Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing Brief Bioinform, October 27, 2009; (2009) bbp046v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Roberts, T. Vincze, J. Posfai, and D. Macelis REBASE--a database for DNA restriction and modification: enzymes, genes and genomes Nucleic Acids Res., October 21, 2009; (2009) gkp874v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Yamaguchi, H. Fukuoka, T. Arao, A. Ohyama, T. Nunome, K. Miyatake, and S. Negoro Gene expression analysis in cadmium-stressed roots of a low cadmium-accumulating solanaceous plant, Solanum torvum J. Exp. Bot., October 16, 2009; (2009) erp313v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. S. Kim, T. Murphy, J. Xia, D. Caragea, Y. Park, R. W. Beeman, M. D. Lorenzen, S. Butcher, J. R. Manak, and S. J. Brown BeetleBase in 2010: revisions to provide comprehensive genomic information for Tribolium castaneum Nucleic Acids Res., October 9, 2009; (2009) gkp807v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Suzuki, T. Gojobori, and S. Kumar Methods for Incorporating the Hypermutability of CpG Dinucleotides in Detecting Natural Selection Operating at the Amino Acid Sequence Level Mol. Biol. Evol., October 1, 2009; 26(10): 2275 - 2284. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Richardson, S. Venkataraman, P. Stevenson, Y. Yang, N. Burton, J. Rao, M. Fisher, R. A. Baldock, D. R. Davidson, and J. H. Christiansen EMAGE mouse embryo spatial gene expression database: 2010 update Nucleic Acids Res., September 18, 2009; (2009) gkp763v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhang, I. Thiele, D. Weekes, Z. Li, L. Jaroszewski, K. Ginalski, A. M. Deacon, J. Wooley, S. A. Lesley, I. A. Wilson, et al. Three-Dimensional Structural View of the Central Metabolic Network of Thermotoga maritima Science, September 18, 2009; 325(5947): 1544 - 1549. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zeng, S. Zhu, and H. Yan Towards accurate human promoter recognition: a review of currently used sequence features and classification methods Brief Bioinform, September 1, 2009; 10(5): 498 - 508. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kumar, M. P. Suleski, G. J. Markov, S. Lawrence, A. Marco, and A. J. Filipski Positional conservation and amino acids shape the correct diagnosis and population frequencies of benign and damaging personal amino acid mutations Genome Res., September 1, 2009; 19(9): 1562 - 1569. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. M. Markowitz, K. Mavromatis, N. N. Ivanova, I-M. A. Chen, K. Chu, and N. C. Kyrpides IMG ER: a system for microbial genome annotation expert review and curation Bioinformatics, September 1, 2009; 25(17): 2271 - 2278. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Waterland, R. Kellermayer, M.-T. Rached, N. Tatevian, M. V. Gomes, J. Zhang, L. Zhang, A. Chakravarty, W. Zhu, E. Laritsky, et al. Epigenomic profiling indicates a role for DNA methylation in early postnatal liver development Hum. Mol. Genet., August 15, 2009; 18(16): 3026 - 3038. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Wu, S.-R. Jun, G. E. Sims, and S.-H. Kim Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method PNAS, August 4, 2009; 106(31): 12826 - 12831. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Monier, A. Pagarete, C. de Vargas, M. J. Allen, B. Read, J.-M. Claverie, and H. Ogata Horizontal gene transfer of an entire metabolic pathway between a eukaryotic alga and its DNA virus Genome Res., August 1, 2009; 19(8): 1441 - 1449. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Wang, P. Alexander, L. Wu, R. Hammer, O. Cleaver, and S. L. McKnight Dependence of Mouse Embryonic Stem Cells on Threonine Catabolism Science, July 24, 2009; 325(5939): 435 - 439. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-H. Nam, D.-W. Kim, T.-S. Jung, Y.-S. Choi, D.-W. Kim, H.-S. Choi, S.-H. Choi, and H.-S. Park PESTAS: a web server for EST analysis and sequence mining Bioinformatics, July 15, 2009; 25(14): 1846 - 1848. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Heinze, L. Giron-Monzon, A. Solovyova, S. L. Elliot, S. Geisler, C. G. Cupples, B. A. Connolly, and P. Friedhoff Physical and functional interactions between Escherichia coli MutL and the Vsr repair endonuclease Nucleic Acids Res., July 1, 2009; 37(13): 4453 - 4463. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Hijazi, W. Masson, B. Auge, L. Waltzer, M. Haenlin, and F. Roch boudin is required for septate junction organisation in Drosophila and codes for a diffusible protein of the Ly6 superfamily Development, July 1, 2009; 136(13): 2199 - 2209. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. D. Pruitt, J. Harrow, R. A. Harte, C. Wallin, M. Diekhans, D. R. Maglott, S. Searle, C. M. Farrell, J. E. Loveland, B. J. Ruef, et al. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes Genome Res., July 1, 2009; 19(7): 1316 - 1323. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Mcwilliam, F. Valentin, M. Goujon, W. Li, M. Narayanasamy, J. Martin, T. Miyar, and R. Lopez Web services at the European Bioinformatics Institute-2009 Nucleic Acids Res., July 1, 2009; 37(suppl_2): W6 - W10. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Kwon, Y. Shigemoto, Y. Kuwana, and H. Sugawara Web API for biology with a workflow navigation system Nucleic Acids Res., July 1, 2009; 37(suppl_2): W11 - W16. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Klingelhoefer, L. Moutsianas, and C. Holmes Approximate Bayesian feature selection on a large meta-dataset offers novel insights on factors that effect siRNA potency Bioinformatics, July 1, 2009; 25(13): 1594 - 1601. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Phillippy, K. Ayanbule, N. J. Edwards, and S. L. Salzberg Insignia: a DNA signature search web server for diagnostic assay development Nucleic Acids Res., July 1, 2009; 37(suppl_2): W229 - W234. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Antonov, S. Dietmann, P. Wong, D. Lutter, and H. W. Mewes GeneSet2miRNA: finding the signature of cooperative miRNA activities in the gene lists Nucleic Acids Res., July 1, 2009; 37(suppl_2): W323 - W328. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kuzniar, K. Lin, Y. He, H. Nijveen, S. Pongor, and J. A. M. Leunissen ProGMap: an integrated annotation resource for protein orthology Nucleic Acids Res., July 1, 2009; 37(suppl_2): W428 - W434. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhou, M. Pertea, A. L. Delcher, and L. Florea Sim4cc: a cross-species spliced alignment program Nucleic Acids Res., June 1, 2009; 37(11): e80 - e80. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Marchais, M. Naville, C. Bohn, P. Bouloc, and D. Gautheret Single-pass classification of all noncoding sequences in a bacterial genome using phylogenetic profiles Genome Res., June 1, 2009; 19(6): 1084 - 1092. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Hannibal, E. K. Ruzzo, L. R. Miller, B. Betz, J. G. Buchan, D. M. Knutzen, K. Barnett, M. L. Landsverk, A. Brice, E. LeGuern, et al. SEPT9 gene sequencing analysis reveals recurrent mutations in hereditary neuralgic amyotrophy Neurology, May 19, 2009; 72(20): 1755 - 1759. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. T. Moreland, J. F. Ryan, C. Pan, and A. D. Baxevanis The Homeodomain Resource: a comprehensive collection of sequence, structure, interaction, genomic and functional information on the homeodomain protein family Database, April 28, 2009; 2009(0): bap004 - bap004. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Menon, F. L. Poole II, A. Cvetkovic, S. A. Trauger, E. Kalisiak, J. W. Scott, S. Shanmukh, J. Praissman, F. E. Jenney Jr., W. R. Wikoff, et al. Novel Multiprotein Complexes Identified in the Hyperthermophilic Archaeon Pyrococcus furiosus by Non-denaturing Fractionation of the Native Proteome Mol. Cell. Proteomics, April 1, 2009; 8(4): 735 - 751. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Hachiya, Y. Osana, K. Popendorf, and Y. Sakakibara Accurate identification of orthologous segments among multiple genomes Bioinformatics, April 1, 2009; 25(7): 853 - 860. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Necsulea, C. Guillet, J.-C. Cadoret, M.-N. Prioleau, and L. Duret The Relationship between DNA Replication and Human Genome Organization Mol. Biol. Evol., April 1, 2009; 26(4): 729 - 741. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Chelala, A. Khan, and N. R Lemoine SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms Bioinformatics, March 1, 2009; 25(5): 655 - 661. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Taher and I. Ovcharenko Variable locus length in the human genome leads to ascertainment bias in functional inference for non-coding elements Bioinformatics, March 1, 2009; 25(5): 578 - 584. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kudaravalli, J.-B. Veyrieras, B. E. Stranger, E. T. Dermitzakis, and J. K. Pritchard Gene Expression Levels Are a Target of Recent Natural Selection in the Human Genome Mol. Biol. Evol., March 1, 2009; 26(3): 649 - 658. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A Reeves, D. Talavera, and J. M Thornton Genome and proteome annotation: organization, interpretation and integration J R Soc Interface, February 6, 2009; 6(31): 129 - 147. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Oyama, H. Kozuka-Hata, S. Tasaki, K. Semba, S. Hattori, S. Sugano, J.-i. Inoue, and T. Yamamoto Temporal Perturbation of Tyrosine Phosphoproteome Dynamics Reveals the System-wide Regulatory Networks Mol. Cell. Proteomics, February 1, 2009; 8(2): 226 - 231. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Vilella, J. Severin, A. Ureta-Vidal, L. Heng, R. Durbin, and E. Birney EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates Genome Res., February 1, 2009; 19(2): 327 - 335. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Golfier, S. Lemoine, A. van Miltenberg, A. Bendjoudi, J. Rossier, S. Le Crom, and M.-C. Potier Selection of oligonucleotides for whole-genome microarrays with semi-automatic update Bioinformatics, January 1, 2009; 25(1): 128 - 129. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Waegele, I. Dunger-Kaltenbach, G. Fobo, C. Montrone, H.-W. Mewes, and A. Ruepp CRONOS: the cross-reference navigation server Bioinformatics, January 1, 2009; 25(1): 141 - 143. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Newburger and M. L. Bulyk UniPROBE: an online database of protein binding microarray data on protein-DNA interactions Nucleic Acids Res., January 1, 2009; 37(suppl_1): D77 - D82. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Everett, A. Vo, and S. Hannenhalli PTM-Switchboard--a database of posttranslational modifications of transcription factors, the mediating enzymes and target genes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D66 - D71. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. P. Chan and T. M. Lowe GtRNAdb: a database of transfer RNA genes detected in genomic sequence Nucleic Acids Res., January 1, 2009; 37(suppl_1): D93 - D97. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. G. Almeida, N. J. Sakabe, A. R. deOliveira, M. C. C. Silva, A. S. Mundstein, T. Cohen, Y.-T. Chen, R. Chua, S. Gurung, S. Gnjatic, et al. CTdatabase: a knowledge-base of high-throughput and curated data on cancer-testis antigens Nucleic Acids Res., January 1, 2009; 37(suppl_1): D816 - D819. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Keerthikumar, R. Raju, K. Kandasamy, A. Hijikata, S. Ramabadran, L. Balakrishnan, M. Ahmed, S. Rani, L. D. N. Selvan, D. S. Somanathan, et al. RAPID: Resource of Asian Primary Immunodeficiency Diseases Nucleic Acids Res., January 1, 2009; 37(suppl_1): D863 - D867. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. J. V. Nordstrom, M. C. Lagerstrom, L. M. J. Waller, R. Fredriksson, and H. B. Schioth The Secretin GPCRs Descended from the Family of Adhesion GPCRs Mol. Biol. Evol., January 1, 2009; 26(1): 71 - 84. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Fu, B. E. Sanders-Beer, K. S. Katz, D. R. Maglott, K. D. Pruitt, and R. G. Ptak Human immunodeficiency virus type 1, human protein interaction database at NCBI Nucleic Acids Res., January 1, 2009; 37(suppl_1): D417 - D422. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Klimke, R. Agarwala, A. Badretdin, S. Chetvernin, S. Ciufo, B. Fedorov, B. Kiryutin, K. O'Neill, W. Resch, S. Resenchuk, et al. The National Center for Biotechnology Information's Protein Clusters Database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D216 - D223. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Walter, T. Rattei, R. Arnold, U. Guldener, M. Munsterkotter, K. Nenova, G. Kastenmuller, P. Tischler, A. Wolling, A. Volz, et al. PEDANT covers all complete RefSeq genomes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D408 - D411. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Lefever, J. Vandesompele, F. Speleman, and F. Pattyn RTPrimerDB: the portal for real-time PCR primers and probes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D942 - D945. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Harmar, R. A. Hills, E. M. Rosser, M. Jones, O. P. Buneman, D. R. Dunbar, S. D. Greenhill, V. A. Hale, J. L. Sharman, T. I. Bonner, et al. IUPHAR-DB: the IUPHAR database of G protein-coupled receptors and ion channels Nucleic Acids Res., January 1, 2009; 37(suppl_1): D680 - D685. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Whitaker, I. Letunic, G. A. McConkey, and D. R. Westhead metaTIGER: a metabolic evolution resource Nucleic Acids Res., January 1, 2009; 37(suppl_1): D531 - D538. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bhasi, P. Philip, V. Manikandan, and P. Senapathy ExDom: an integrated database for comparative analysis of the exon-intron structures of protein domains in eukaryotes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D703 - D711. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. D. McDowall, M. S. Scott, and G. J. Barton PIPs: human protein-protein interaction prediction database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D651 - D656. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Kuhn, D. Karolchik, A. S. Zweig, T. Wang, K. E. Smith, K. R. Rosenbloom, B. Rhead, B. J. Raney, A. Pohl, M. Pheasant, et al. The UCSC Genome Browser Database: update 2009 Nucleic Acids Res., January 1, 2009; 37(suppl_1): D755 - D761. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Shionyu, A. Yamaguchi, K. Shinoda, K.-i. Takahashi, and M. Go AS-ALPS: a database for analyzing the effects of alternative splicing on protein structure, interaction and network in human and mouse Nucleic Acids Res., January 1, 2009; 37(suppl_1): D305 - D309. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. P. Hubbard, B. L. Aken, S. Ayling, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, L. Clarke, et al. Ensembl 2009 Nucleic Acids Res., January 1, 2009; 37(suppl_1): D690 - D697. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Axelrod, Y. Lin, P. C. Ng, T. B. Stockwell, J. Crabtree, J. Huang, E. Kirkness, R. L. Strausberg, M. E. Frazier, J. C. Venter, et al. The HuRef Browser: a web resource for individual human genomics Nucleic Acids Res., January 1, 2009; 37(suppl_1): D1018 - D1024. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. M. Muro, R. Herrington, S. Janmohamed, C. Frelin, M. A. Andrade-Navarro, and N. N. Iscove Identification of gene 3' ends by automated EST cluster analysis PNAS, December 23, 2008; 105(51): 20286 - 20290. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Noguchi, T. Taniguchi, and T. Itoh MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes DNA Res, December 1, 2008; 15(6): 387 - 396. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Diskin, M. Li, C. Hou, S. Yang, J. Glessner, H. Hakonarson, M. Bucan, J. M. Maris, and K. Wang Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms Nucleic Acids Res., November 1, 2008; 36(19): e126 - e126. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. ten Bosch and W. W. Grody Keeping Up With the Next Generation: Massively Parallel Sequencing in Clinical Diagnostics J. Mol. Diagn., November 1, 2008; 10(6): 484 - 492. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
















