Skip Navigation


Nucleic Acids Research Advance Access originally published online on October 16, 2008
Nucleic Acids Research 2009 37(Database issue):D32-D36; doi:10.1093/nar/gkn721
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (85K) Freely available
Right arrow Screen PDF (97K) Freely available
Right arrowOA All Versions of this Article:
37/suppl_1/D32    most recent
gkn721v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Pruitt, K. D.
Right arrow Articles by Maglott, D. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pruitt, K. D.
Right arrow Articles by Maglott, D. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2009, Vol. 37, Database issue D32-D36
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article appears in the following Nucleic Acids Research issue: Database issue [View the issue table of contents]

Articles

NCBI Reference Sequences: current status, policy and new initiatives

Kim D. Pruitt*, Tatiana Tatusova, William Klimke and Donna R. Maglott

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rm 4As.47B, 45 Center Drive, Bethesda, MD, USA

*To whom correspondence should be addressed. Tel: +1 301 435 5898; Fax: +1 301 480 2918; Email: pruitt{at}ncbi.nlm.nih.gov

Received September 15, 2008. Revised September 26, 2008. Accepted September 30, 2008.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 GROWTH
 NEW FEATURES FOR EUKARYOTIC...
 REFSEQ POLICIES
 RELATED INITIATIVES
 FUTURE DIRECTIONS
 FUNDING
 REFERENCES
 
NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 x 106 proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 GROWTH
 NEW FEATURES FOR EUKARYOTIC...
 REFSEQ POLICIES
 RELATED INITIATIVES
 FUTURE DIRECTIONS
 FUNDING
 REFERENCES
 
NCBI's Reference Sequence (RefSeq) is a public database of nucleotide and protein sequences with feature and bibliographic annotation. The RefSeq database is built and distributed by the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine located at the US National Institutes of Health. RefSeq records are made publicly available, at no cost, by multiple methods: (i) interactive query over the internet using text via Entrez (1); (ii) Basic Local Alignment Search Tool (BLAST) (2,3) programs; (iii) scripted query using E-Utilities (4); and (iv) download by FTP. There is a formal bi-monthly release cycle (odd-numbered months) for RefSeq, with each release, incremental daily updates and special reports for some species provided from the FTP site. RefSeqs are an integral part of many resources at NCBI, including the Gene database (5), the Map Viewer and HomoloGene.

NCBI builds RefSeqs from the sequence data available in the archival database GenBank (6), which is a comprehensive public repository of sequences submitted to, and exchanged among the International Nucleotide Sequence Database Collaboration (INSDC). RefSeq records are not part of GenBank although they can be retrieved from NCBI using the same interfaces, such as Entrez Nucleotide and Entrez Protein. For a comparison of the two databases, please see the GenBank chapter, appendix, in the online NCBI Handbook (http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.section.GenBank_ASM).

The RefSeq collection is unique in providing a heavily curated, non-redundant, explicitly linked nucleotide and protein database representing significant taxonomic diversity. The RefSeq database provides a critical foundation for integrating sequence, genetic and functional information and is used internationally as a standard for genome annotation. RefSeq records can be identified by a distinct accession format, which includes an underscore (‘_’) at the third position. A full definition of the RefSeq accession space and availability is available on the RefSeq Web site. This web site provides many details about the RefSeq project including links to additional documentation about the curation process in the RefSeq chapter of the NCBI Handbook (http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.ch18).


    GROWTH
 TOP
 ABSTRACT
 INTRODUCTION
 GROWTH
 NEW FEATURES FOR EUKARYOTIC...
 REFSEQ POLICIES
 RELATED INITIATIVES
 FUTURE DIRECTIONS
 FUNDING
 REFERENCES
 
The size of the comprehensive bi-monthly RefSeq release continues to grow in pace with the large-scale genome and cDNA sequencing projects. As of July 2008, the release included records from 5395 species and represented 5 590 364 protein records with the majority from bacterial genomes (4 120 701 proteins) and the next largest number provided for fungal, then mammalian species (346 470 and 313 549, respectively; see Table 1). From July 2007 (RefSeq release 24) to July 2008 (RefSeq release 30) the number of species included in the RefSeq release increased by 20% and the total number of records increased by 53% with a 45% growth in the protein collection. The number of curated records also continues to increase (Table 2); this growth represents curation by NCBI staff in addition to curation by collaborators, and an expansion in the number of genomes that are supported by collaborations. As shown in Table 3, there is at least one curated record for 41% of the species represented in the RefSeq release 30 and 5% of the 5.5 x 106 proteins have been curated. These percentages vary with taxonomic groups, because there may be more significant support for some by collaborators as well as NCBI curation staff. For example, there is at least one curated record for 86% of the species in the mammalian group but when calculated in terms of the total number of mammalian proteins, only11% are curated. This discrepancy reflects both the large amount of curation at the species level (88%) for 233 mammalian mitochondrial genomes, which encode a relatively small number of proteins (3055, 88% of which are curated), and a smaller amount of curation of the large number of proteins from the nuclear genomes of some of the mammals included in RefSeq, because the focuses are on the human and mouse reference genomes. For the human genome, 53% of all proteins annotated on all human genome assemblies have been curated. This includes the set of proteins that are generated as a product of NCBI's genome annotation pipeline (with a XP_ accession prefix) in addition to the proteins that are based on transcript data and publications, which are the primary focus of curation (with a NP_ accession prefix). When considering curation of proteins annotated on the human reference genome assembly or proteins with a NP_ accession prefix, then 79% of the human RefSeq proteins have been curated. The focus on human and mouse is supported by the Consensus CDS (CCDS) collaboration (see http://www.ncbi.nlm.nih.gov/projects/CCDS).


View this table:
[in this window]
[in a new window]

 
Table 1. Annual growth of the RefSeq ftp releasea

 

View this table:
[in this window]
[in a new window]

 
Table 2. Annual growth of curated records

 

View this table:
[in this window]
[in a new window]

 
Table 3. Percent curation of release 30 per taxonomic group

 

    NEW FEATURES FOR EUKARYOTIC REFSEQ RECORDS
 TOP
 ABSTRACT
 INTRODUCTION
 GROWTH
 NEW FEATURES FOR EUKARYOTIC...
 REFSEQ POLICIES
 RELATED INITIATIVES
 FUTURE DIRECTIONS
 FUNDING
 REFERENCES
 
Support for eukaryotic, primarily vertebrate, records was modified to represent a larger number of pseudogene and non-coding transcripts, and to add feature annotation, as described below.

Non-transcribed pseudogenes
For human and mouse, the sequence defining each non-transcribed pseudogene is derived from the reference genome assembly when possible. Previously, pseudogenes were defined based on any genomic record available in GenBank. A subset may still be defined on records other than those used for the reference assembly if the reference assembly is incomplete at that location, or if the pseudogene is known not to occur in the reference assembly (e.g. a known haplotype or strain difference).

Non-coding transcripts
More non-coding RNAs are being included in the RefSeq collection; this subset includes transcribed pseudogenes, antisense transcripts, known functional RNAs and transcribed loci of unknown function that do not appear to be protein coding. In addition, alternatively spliced transcripts of protein-coding loci that severely truncate or otherwise render the transcript unlikely to be capable of supporting translation are also represented as a non-coding sequence, including transcripts from protein-coding loci that are candidates for nonsense-mediated decay [NMD; (7)]. However, proteins are still represented for a subset of NMD candidate transcripts if there is publication support or if all transcripts available consistently exhibit the same extended UTR pattern for a known gene. For example, see NR_024147.1 and NM_001005845.1. Non-coding RNA records use the accession prefix NR_ or XR_. Representative records can be retrieved with the Entrez nucleotide query ‘srcdb_refseq[prop] AND biomol_RNA[prop] NOT biomol_mRNA[prop]’. Coverage of this molecule type is known to be incomplete.

Exons
Exon feature annotation is now calculated for transcripts and some non-transcribed pseudogenes for human and mouse records. Exon annotation on transcript records is computed by aligning the transcript record to the reference genome assembly, using the program Splign (8) and interpreting the alignment result. Exon names are incremented according to 5' to 3' order of all exons identified based on available RefSeq transcripts for the gene. This annotation highlights transcript variant differences; for example, it is more apparent when a variant omits an exon as there is a gap in the exon names. Exon annotation (pseudoexons) for non-transcribed pseudogenes is calculated by aligning the RefSeq transcript from the corresponding functional gene (using Splign) to the pseudogene genomic region and interpreting the alignment result. Exon information is displayed in the flat file and included in files provided for FTP. This annotation provides a more complete description of RefSeq transcript variants by providing information on the locations on a spliced transcript that correspond to gene exons and exon names. Exon features are calculated on a weekly basis after a record becomes publicly available in NCBI databases.

Primary block
Multiple submissions to the INSDC are often used to construct the RefSeq record in order to represent a more complete transcript; to assemble a genomic region that is manually annotated (with NG_ accession prefix); or to select a nucleotide polymorphic variant that is thought to be the better representative. The PRIMARY block displayed on a flat file record indicates the specific coordinates in the RefSeq record (REFSEQ_SPAN) and the corresponding coordinates from each GenBank source that was used to assemble the RefSeq (PRIMARY_IDENTIFIER and PRIMARY_SPAN). The PRIMARY block follows the COMMENT block and is available in the ASN.1 as a seq hist assembly block. This information is provided for vertebrates and a small number of other species. For example, see accessions: (i) NG_008407 [GenBank] .1 which represents a RefSeqGene genomic record (see below); (ii) NM_000539 [GenBank] .3 which is a human transcript record that was assembled from genomic sequence based on transcript alignments; or, (iii) NM_000207 [GenBank] .2 which is a human transcript record that was assembled from two GenBank transcripts to represent a 3'-UTR that is both more complete and more consistent with the reference genome assembly. Although a given RefSeq transcript may be assembled from more than one GenBank record, please note that the set of GenBank transcripts that derive from the gene in question must generally support, and not contradict, the final exon combination represented in RefSeq transcripts.

Comments
Comments about curation decisions made by the CCDS collaboration are now included in human and mouse RefSeq transcript and protein records. These comments are provided when the CDS structure, as annotated on the reference genome, is modified resulting in a change to the protein to include or exclude alternate coding exons, use alternate splice donor or acceptor sites or modify the N-terminal length by using a different in-frame start codon. For example, see NM_000031 [GenBank] .5.


    REFSEQ POLICIES
 TOP
 ABSTRACT
 INTRODUCTION
 GROWTH
 NEW FEATURES FOR EUKARYOTIC...
 REFSEQ POLICIES
 RELATED INITIATIVES
 FUTURE DIRECTIONS
 FUNDING
 REFERENCES
 
Species included
Species are considered for inclusion in the RefSeq collection based on a combination of factors including the availability and quality of a whole genome assembly, the number of available cDNA submissions to INSDC, medical relevance and the availability of additional support data from the research community. The RefSeq project is supported by several different process flows and a species may be selected to proceed via a particular process flow depending on the type and abundance of available data as well as the availability of community-maintained annotation. In general, the RefSeq project includes representation of genomes for species that have any of these features:

  1. Model organism.
  2. Causative agent of or otherwise associated with disease.
  3. Eukaryotic genomes that are considered to be a reasonably complete representation of the whole genome with a sequence project goal of >6x sequence depth.
  4. Prokaryotic genomes that are considered to be either a complete genome sequence or a whole genome shotgun sequence (WGS) assembly. WGS assemblies that represent strain-specific differences are included in the RefSeq collection.
  5. Complete organelles and plastid genomes.
  6. Targeted sequence regions that support specific reporting or identification needs; for example, the RefSeqGene project (below) or other gene-specific benchmarks that are used for identification purposes.

Genome annotation decisions
It is important to note that RefSeq records are maintained by NCBI whereas the primary sequence records available in the international nucleotide sequence databases (INSD) are maintained and updated by the submitters of those records. Thus, annotation presented on a RefSeq genomic record may differ from the original submission for some species or loci. Annotation is provided on genome assemblies represented in RefSeq by several different process flows including:

  1. Collaboration.
  2. Annotation by NCBI.
  3. Propagation of annotation from GenBank with targeted curation.
We collaborate with model organism database groups that have the resources and interest in maintaining and submitting genome annotation updates at some frequency and represent their curated annotation in RefSeq. The model organism group is considered the primary authority for the annotated RefSeq genome. For some species, including Saccharomyces cerevisiae and Caenhorhabditis elegans, annotation updates are submitted directly to RefSeq. For other species, including Drosophila melanogaster and Arabidopsis thaliana, annotation updates are submitted directly to the INSDC and propagated to RefSeq.

NCBI's computational genome annotation pipelines are used to annotate some prokaryotic and eukaryotic genomes. Genome annotation is calculated by NCBI upon request from a genome sequencing project, for genomes that are submitted without comprehensive whole genome annotation (and the submitting group does not plan to provide annotation) and for genomes that are considered to be of critical importance to continuing medical and basic research. For more information, see the Genome Annotation chapter of the NCBI Handbook (http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.ch14). A prokaryotic annotation pipeline has been developed to support annotation collaborations with interested sequencing centres wherein the NCBI annotation pipeline is used to calculate the genome annotation submitted to the INSDC and propagated to RefSeq. Annotation of the larger eukaryotic genomes is more complex and additional considerations are given to genomes that are submitted to the INSDC unannotated but with intent to submit annotation at a future date. If annotation is not submitted in a reasonable period of time (~12 months), then NCBI may proceed to calculate annotation for the RefSeq genome records as this provides valuable information to the research community and puts the RefSeq genome into the queue for periodic updates and ongoing annotation maintenance. Note that the eukaryotic genomes annotated by NCBI are updated periodically and RefSeqs for RNAs and proteins may be updated independent of the annotated genome as new transcript data are submitted to the archival INSDC.

Most genome submissions do include annotation information, and this is frequently propagated to the RefSeq representation for the genome. The RefSeq record may differ from the original INSD record in small details to conform to RefSeq feature annotation standards. In these cases, NCBI staff may curate individual loci based on requests from the scientific community or based on on-going curation efforts that are oriented on protein homology approaches (see Targeted annotation for microbial and organelle genomes section).


    RELATED INITIATIVES
 TOP
 ABSTRACT
 INTRODUCTION
 GROWTH
 NEW FEATURES FOR EUKARYOTIC...
 REFSEQ POLICIES
 RELATED INITIATIVES
 FUTURE DIRECTIONS
 FUNDING
 REFERENCES
 
CCDS
Collaborators for the Consensus CDS (CCDS) project have made a significant focus on coordinating curation of the human proteome in the past 2 years. The last annotation update for the human genome, NCBI build 36.3, resulted in 20 159 tracked CCDS IDs (for over 17 000 genes). A significant outcome of the CCDS collaboration is convergence towards common curation criteria for coding sequence (CDS) annotation by all curators. Comments (displayed as ‘Public Note’) and an expanded history report are now provided; public notes explain the curation logic applied when modifying, or removing, a CCDS ID. For example, see CCDS7726.2 (http://www.ncbi.nlm.nih.gov/projects/CCDS/CcdsBrowse.cgi).

RefSeqGene
RefSeqGene is a subset of NCBI's RefSeq project. RefSeqGene records provide a stable standard genomic sequence for reporting sequence variants, especially those of clinical relevance. As reported by Gulley et al. (9), reference sequence standards are needed to support clear communication about the position of variation that is biologically significant, without uncertainty that may result from identification of the position of the initiation codon or splice sites. RefSeq mRNA sequences are being used as a reporting standard, but they have the obvious limitation of not providing explicit coordinates for flanking or intronic sequence. RefSeq chromosome sequences are also used, but the coordinate system is unappealingly large and, perhaps more importantly, not as stable because the human reference sequence continues to be updated. When requested by an authoritative group for a gene, RefSeqGene sequences are constructed to differ from that of the reference genome to represent a specific allele. The default range of a RefSeqGene sequence begins 5 kb upstream of the first exon and extends 2 kb downstream of the last exon, but curators can modify this upon request. RefSeqGene records can be retrieved from Entrez nucleotide using the query ‘refseqgene[keyword]’ and are provided for ftp as weekly updates (see ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/RefSeqGene/). The RefSeqGene web site (http://www.ncbi.nlm.nih.gov/projects/RefSeq/RSG/) provides a report of the >650 genes currently included, and the RefSeqGene accession version, links to view the record in graphical or traditional flat file format, and links to access more information in Entrez Gene or OMIM. Records are considered curated and stable; provision of RefSeqGene records is done in consultation with locus-specific databases or other experts as needed.

Targeted annotation for microbial and organelle genomes
During the last 2 years, NCBI has been working on automatic methods to improve the quality of microbial and organellar genome annotation. Curated clusters from the Protein Clusters database (10) are used to provide updated and consistent functional annotation to RefSeq genomic records. Structural RNAs are added by using tRNAscanSE for tRNAs, Infernal and covariance models for 5S rRNAs and an internal database for 16S and 23S rRNAs (11–13). Existing annotations are checked by comparison to this tool and the results have been used to correct a number of annotations on numerous RefSeq records including additions (751 rRNAs; 3679 tRNAs), deletions (4 rRNAs, 47 tRNAs) and strand corrections (103 rRNAs, 115 tRNAs). NCBI has also developed a genome validation tool for assessing the quality of functional annotation (incorrectly annotated RNAs as above, overlapping features and potential frameshifts) with both a web component (http://www.ncbi.nlm.nih.gov/genomes/frameshifts/frameshifts.cgi) as well as standalone package for local installation (ftp://ftp.ncbi.nih.gov/genomes/TOOLS/subcheck/). This tool is used in-house for curation of RefSeq records as well as by various submitters of prokaryotic genomic records to improve annotations prior to submission.


    FUTURE DIRECTIONS
 TOP
 ABSTRACT
 INTRODUCTION
 GROWTH
 NEW FEATURES FOR EUKARYOTIC...
 REFSEQ POLICIES
 RELATED INITIATIVES
 FUTURE DIRECTIONS
 FUNDING
 REFERENCES
 
The RefSeq collection is expected to grow as more genomes are sequenced. For human, collaborations among CCDS, HUGO Gene Nomenclature Committee (HGNC), Swiss-Prot, the ENCODE project, locus-specific databases and others will continue to improve the coverage and accuracy of RefSeq sequence and annotation. For microbial genomes, a major initiative to curate bacterial protein annotation across the group of related proteins (10) is expected to result in a significant increase in the number of curated bacterial proteins. In addition, improvements to the RefSeq release processing pipeline and quality assurance testing provided are in progress.


    FUNDING
 TOP
 ABSTRACT
 INTRODUCTION
 GROWTH
 NEW FEATURES FOR EUKARYOTIC...
 REFSEQ POLICIES
 RELATED INITIATIVES
 FUTURE DIRECTIONS
 FUNDING
 REFERENCES
 
Intramural Research Program of the National Institute of Health; National Library of Medicine. Funding for open access charge: Intramural Research Program of the NIH, National Library of Medicine.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 GROWTH
 NEW FEATURES FOR EUKARYOTIC...
 REFSEQ POLICIES
 RELATED INITIATIVES
 FUTURE DIRECTIONS
 FUNDING
 REFERENCES
 

  1. Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: molecular biology database and retrieval system. Methods Enzymol. (1996) 266:141–162.[Web of Science][Medline]

  2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. (1990) 215:403–410.[CrossRef][Web of Science][Medline]

  3. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. (1997) 25:3389–3402.[Abstract/Free Full Text]

  4. Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, et al. Database resources of the National Center for Biotechnology Information. In: Nucleic Acids Res (2009) in press.

  5. Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez gene: gene-centered information at NCBI. Nucleic Acids Res. (2007) 35:D26–D31.[Abstract/Free Full Text]

  6. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. In: Nucleic Acids Res (2009) in press.

  7. Maquat LE. Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat. Rev. Mol. Cell Biol (2004) 5:89–99.[CrossRef][Web of Science][Medline]

  8. Kapustin Y, Souvorov A, Tatusova T, Lipman D. Splign: algorithms for computing spliced alignments with identification of paralogs. Biol. Direct. (2008) 21:20.

  9. Gulley ML, Braziel RM, Halling KC, Hsi ED, Kant JA, Nikiforova MN, Nowak JA, Ogino S, Oliveira A, Polesky HF, et al. Clinical laboratory reports in molecular pathology. Arch. Pathol. Lab Med. (2007) 131:852–863.[Web of Science][Medline]

  10. Klimke W, Agarwala R, Badretdin A, Chetvernin S, Ciufo S, Fedorov B, Kiryutin B, O’Neill K, Resch W, Resenchuk S, et al. The national center for biotechnology information's protein clusters database. In: Nucleic Acids Res (2009) in press.

  11. Eddy SR. A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics (2002) 3:18.[CrossRef][Medline]

  12. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. (2005) 33:D121–D124.[Abstract/Free Full Text]

  13. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. (1997) 25:955–964.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
E. W. Sayers, T. Barrett, D. A. Benson, E. Bolton, S. H. Bryant, K. Canese, V. Chetvernin, D. M. Church, M. DiCuccio, S. Federhen, et al.
Database resources of the National Center for Biotechnology Information
Nucleic Acids Res., November 12, 2009; (2009) gkp967v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B. Rhead, D. Karolchik, R. M. Kuhn, A. S. Hinrichs, A. S. Zweig, P. A. Fujita, M. Diekhans, K. E. Smith, K. R. Rosenbloom, B. J. Raney, et al.
The UCSC genome browser database: update 2010
Nucleic Acids Res., November 11, 2009; (2009) gkp939v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. N. Nurtdinov, M. O. Vasiliev, A. S. Ershova, I. S. Lossev, and A. S. Karyagina
PLANdbAffy: probe-level annotation database for Affymetrix expression microarrays
Nucleic Acids Res., November 11, 2009; (2009) gkp969v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. S. Syed, M. D'Antonio, and F. D. Ciccarelli
Network of Cancer Genes: a web resource to analyze duplicability, orthology and network properties of cancer genes
Nucleic Acids Res., November 11, 2009; (2009) gkp957v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Flicek, B. L. Aken, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, G. Coates, S. Fairley, et al.
Ensembl's 10th year
Nucleic Acids Res., November 11, 2009; (2009) gkp972v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Zhang, J. Lv, H. Liu, J. Zhu, J. Su, Q. Wu, Y. Qi, F. Wang, and X. Li
HHMD: the human histone modification database
Nucleic Acids Res., November 5, 2009; (2009) gkp968v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe, and M. Hirakawa
KEGG for representation and analysis of molecular networks involving diseases and drugs
Nucleic Acids Res., October 30, 2009; (2009) gkp896v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Grillo, A. Turi, F. Licciulli, F. Mignone, S. Liuni, S. Banfi, V. A. Gennarino, D. S. Horner, G. Pavesi, E. Picardi, et al.
UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs
Nucleic Acids Res., October 30, 2009; (2009) gkp902v1.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
K. D. Pruitt, J. Harrow, R. A. Harte, C. Wallin, M. Diekhans, D. R. Maglott, S. Searle, C. M. Farrell, J. E. Loveland, B. J. Ruef, et al.
The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes
Genome Res., July 1, 2009; 19(7): 1316 - 1323.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C.-H. Chou, W.-C. Chang, C.-M. Chiu, C.-C. Huang, and H.-D. Huang
FMM: a web server for metabolic pathway reconstruction and comparative analysis
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W129 - W134.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Gattiker, C. Dessimoz, A. Schneider, I. Xenarios, M. Pagni, and J. Rougemont
The Microbe browser for comparative genomics
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W296 - W299.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
M. L. Landsverk, E. K. Ruzzo, H. C. Mefford, K. Buysse, J. G. Buchan, E. E. Eichler, E. M. Petty, E. A. Peterson, D. M. Knutzen, K. Barnett, et al.
Duplication within the SEPT9 gene associated with a founder effect in North American families with hereditary neuralgic amyotrophy
Hum. Mol. Genet., April 1, 2009; 18(7): 1200 - 1208.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (85K) Freely available
Right arrow Screen PDF (97K) Freely available
Right arrowOA All Versions of this Article:
37/suppl_1/D32    most recent
gkn721v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Pruitt, K. D.
Right arrow Articles by Maglott, D. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pruitt, K. D.
Right arrow Articles by Maglott, D. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?