Nucleic Acids Research, 2005, Vol. 33, Database issue D501-D504
© 2005, the authors
Nucleic Acids Research, Vol. 33, Database issue © Oxford University Press 2005; all rights reserved
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rm 6An.12J, 45 Center Drive, Bethesda, MD 20892-6510, USA
* To whom correspondence should be addressed. Tel: +1 301 435 5950; Fax: +1 301 480 2918; Email: pruitt{at}ncbi.nlm.nih.gov
Received September 15, 2004; Revised and Accepted September 21, 2004
| ABSTRACT |
|---|
|
|
|---|
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.
| INTRODUCTION |
|---|
|
|
|---|
RefSeq is a public database of nucleotide and protein sequences with corresponding feature and bibliographic annotation. The RefSeq database is built and distributed by the NCBI, a division of the National Library of Medicine located at the US National Institutes of Health. NCBI makes RefSeq publicly available, at no cost, over the internet via FTP, Entrez query (1), Basic Local Alignment Search Tool (BLAST) (2,3) programs, and incorporation in a wide range of NCBI resources.
NCBI builds RefSeq from the sequence data available in the archival database GenBank (4), which is a comprehensive public repository of sequences submitted to, and exchanged among, GenBank in the US, the EMBL Data Library in the UK and the DNA Data Bank of Japan. In addition, the annotated RefSeq record and/or supplementary information may be provided by multiple collaborations established with nomenclature groups, model organism databases and other facets of the scientific community. RefSeq records indicate the source GenBank data, include references and annotations relevant to the gene, transcript and protein, and indicate curation with attribution to the curation group.
The RefSeq collection is unique in providing a curated, non-redundant, explicitly linked nucleotide and protein database representing significant taxonomic diversity. Genomic and protein sequence datasets are provided for the majority of organisms included; transcript records are currently provided for a subset of the eukaryotic collection. The RefSeq database provides a critical foundation for integrating sequence, genetic and functional information, and is used internationally as a standard for genome annotation. The collection is curated on an ongoing basis by collaborating groups and by NCBI staff. Sequence records are presented in a standard format and are subject to computational validation.
| DISTINCTION FROM GENBANK |
|---|
|
|
|---|
The RefSeq collection is derived from the primary submissions available in GenBank. GenBank is a redundant archival database that represents sequence information generated at different times, and may represent several alternate views of the protein, names or other information. In contrast, RefSeq represents a nearly non-redundant collection that is a synthesis and summary of available information, and represents the current view of the sequence information, names and other annotations.
RefSeq records can be distinguished from GenBank records by the format of the accession series. RefSeq accession numbers are formatted as two alphabetic characters, followed by an underscore (_), optionally followed by four alphabetic characters (specific to the NZ_ prefix), followed by six, eight or nine numerals. GenBank accessions never include an underscore. Different alphabetic prefixes have implied meaning in terms of both the process of generation and the type of molecule represented. A full definition of the RefSeq accession numbers is available on the RefSeq Web site (http://www.ncbi.nlm.nih.gov/RefSeq/key.html#accessions).
| GROWTH |
|---|
|
|
|---|
The RefSeq database continues to grow in pace with the large-scale genome and cDNA sequencing projects (see Table 1). As new complete genome assemblies become available, they are incorporated into the RefSeq collection. Most organisms are represented in the collection only after some genomic sequence data (nuclear, plastid, mitochondrial or other genomic molecules) becomes available; however, transcript and protein records may be provided for a subset of eukaryotic model organisms prior to the availability of genomic sequence data.
|
| ANNOTATION |
|---|
|
|
|---|
Annotation of RefSeq records originates from several sources including the original GenBank submission, collaborating groups, NCBI computational analysis, user feedback and manual curation at NCBI. For example, collaboration supports the RefSeq representation of Saccharomyces cerevisiae, Drosophila melanogaster and Arabidopsis thaliana, which are directly contributed by the Saccharomyces Genome Database (SGD)(5), FlyBase (6) and The Institute for Genomic Research (TIGR), respectively. Similarly, the entire viral RefSeq collection is reviewed and curated by the NCBI Viral Genome Advisors group. See the RefSeq Collaborators page for more information about contributions from collaborators (http://www.ncbi.nlm.nih.gov/RefSeq/collaborators.html). All RefSeq records include explicit cross-links between the nucleotide and protein cognates and to Entrez Gene (7), which provides gene-oriented access to the RefSeq collection. Additional links, annotated as db_xref notations, are provided on some records to organism-specific genome resources such as Mouse Genome Informatics (MGI) (8) or FlyBase.
For other species, including Apis mellifera (honey bee), Gallus gallus (chicken), Homo sapiens (human), Mus musculus (mouse) and Rattus norvegicus (rat), genome annotation is provided by a NCBI computational process that utilizes transcript alignments, protein support and a hidden Markov model (HMM) ab initio prediction algorithm (see the NCBI Handbook; http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books). Genomic RefSeq records that are annotated by this process represent genes, transcripts and proteins, and include additional feature annotation to represent STS markers. The available RefSeq transcript dataset, with the NM_ accession prefix, is an important reagent in this annotation pipeline.
Comprehensive representation of the proteins, explicitly linked to a RefSeq nucleotide record, is a major focus of the RefSeq project. The goal is to represent the full-length protein product; however, partial protein products are represented for some genomes when partial protein annotation is contributed by a collaborator or when proteins are predicted from incomplete genome sequence data. Proteins are annotated by computation and curation. Conserved domains are calculated by an automatic process using data maintained in the NCBI Conserved Domain Database (CDD) (9); this annotation provides hints about possible function. Likewise, variation features that are located in the coding region are automatically calculated from data available in the NCBI dbSNP database (10). Additional features including Enzyme Commission (EC) numbers, other landmark regions of the protein sequence and references may be added by curation either by an external collaborator or by NCBI staff.
Transcript records are provided for a subset of eukaryotic species, including those in the Chordata taxonomic lineage, to represent protein-coding sequences, transcribed pseudogenes, ribosomal RNAs and other small RNAs. Annotation results from a mixture of automated and curatorial analysis. Variation features are calculated automatically from data in the dbSNP database, and the nucleotide region corresponding to the annotated protein conserved domains are also provided automatically (as a miscellaneous feature, or misc_feat). Other features, such as polyadenylation signals and sites, alternate transcription start sites and RNA editing sites, are provided by curation.
| CURATION AND QUALITY CONTROL |
|---|
|
|
|---|
RefSeq sequences are validated to confirm the following: (i) accurate nucleotide-to-protein sequence correspondence; (ii) valid ASN.1 format and (iii) for species supported by collaboration with official nomenclature groups, current preferred name and symbol designations. Validation of map location is available for species that are annotated via the NCBI annotation pipeline.
NCBI staff review and manually modify a subset of the RefSeq collection including those provided for viruses, some bacteria, mammals and some additional species. The goal of this manual curation is to provide accurate and full-length sequence data, to ensure accurate sequence-to-gene associations, to expand the collection by adding previously unrepresented genes and/or alternate splice products, and to provide additional feature annotation to represent mature peptide products, regions of interest and/or to highlight less frequent biological events such as non-AUG initiation sites (11) or selenoproteins (12). The curation status is annotated on RefSeq records, as a COMMENT feature; the status terms used include model, predicted, provisional, inferred, validated and reviewed, with the latter two indicating that sequence-level curation has taken place. Curation status terms are documented on the RefSeq Web site (http://www.ncbi.nlm.nih.gov/RefSeq/key.html#status).
Several processes are used to identify records that will benefit most from staff review. For instance, records targeted for review include those that differ relative to available genomic sequence, those with significant protein length variation compared to homologous groups calculated by the NCBI HomoloGene resource (13), and those for which there are no related proteins other than the GenBank record used to construct the RefSeq. Several additional tests for transcript and protein quality are in place but are not enumerated here. In addition, review is based on user feedback that identifies additional data or errors. We welcome user feedback to help maintain and improve the RefSeq collection. A feedback form is provided online, or users can contact the main NCBI Help Desk (see Table 2).
|
| RETRIEVING DATA |
|---|
|
|
|---|
The RefSeq collection can be accessed multiple ways at NCBI, including by Entrez query, BLAST, FTP, and links provided from NCBI databases and resources (see Table 2).
Entrez query
RefSeq results are included in the results returned when performing a global query of the Entrez databases from the NCBI or Entrez homepage. Returned results can be restricted to include only RefSeq records by going to the homepage of the nucleotide or protein database and either using the Entrez Limits page to select Only from RefSeq or adding one of the RefSeq-specific property restrictions directly to the entered text query. For example, a query to retrieve all RefSeq nucleotide records that include the name BRCA1 somewhere in the record is formatted as BRCA1 AND srcdb_refseq[prop]. The RefSeq Web site provides definitions of the available property restrictions (http://www.ncbi.nlm.nih.gov/RefSeq/key.html#query).
Entrez queries from the Entrez home page, where it is possible to query against all of the Entrez databases at once, will also return results to the Entrez Gene and Genomes (14) databases, which are both components of the RefSeq project. Entrez Gene integrates gene-specific annotation from RefSeq records with other sources of information, and thus provides a gene-oriented view of data about genes (7). When there is sequence for a complete genome or chromosome, the data are also included in the Entrez Genome database, which provides multiple tools to display and analyze the information.
BLAST and BLink
RefSeq records are included in the main BLAST nr databases and are also made available in genome-specific BLAST database collections (listed at http://www.ncbi.nlm.nih.gov/BLAST/). Hits to RefSeq records can be immediately identified by the distinct format of the accession numbers. BLAST nr results can be configured to show only those hits to the RefSeq collection by entering the Entrez property query on the format page (e.g. srcdb_refseq[prop]).
RefSeq records are also included in the pre-computed BLAST analysis that is done to provide Entrez links to related sequences (nucleotide or protein) and to BLink, a visualization tool for the related protein sequences dataset. The BLink interface includes an option to show only RefSeq proteins.
FTP
The complete RefSeq collection is made available for anonymous FTP as bi-monthly releases in conjunction with daily and cumulative updates between the release cycles. The RefSeq release is structured to provide access to the full RefSeq collection or to a portion of the collection organized by main taxonomic categories (e.g. plant, viral, vertebrate_mammalian) or molecules of interest (e.g. organelle, plasmid). Documentation includes an indication of files and sequences provided, sequences that have been removed since the previous release, and a full description of the release structure and content. Announcements about large changes, problems and the availability of a RefSeq release are emailed to the refseq-announce email list (see Table 2). Additional FTP data is provided for some organisms of interest, including the transcript and protein dataset for human, mouse and rat. Users may be interested in subscribing to refseq-announce{at}ncbi.nlm.nih.gov to receive information about the RefSeq releases and planned modifications as they occur over time.
Links
Multiple NCBI databases and resources include links to RefSeq records. Links to RefSeq records can be found in many Entrez databases and resources including Gene, UniGene, HomoloGene, Map Viewer, UniSTS.
| Notes |
|---|
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use permissions, please contact journals.permissions{at}oupjournals.org.
| REFERENCES |
|---|
|
|
|---|
- Schuler,G.D., Epstein,J.A., Ohkawa,H. and Kans,J.A. ( (1996) ) Entrez: molecular biology database and retrieval system. Methods Enzymol., , 266, , 141162.[Web of Science][Medline] .
- Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. ( (1990) ) Basic local alignment search tool. J. Mol. Biol., , 215, , 403410.[CrossRef][Web of Science][Medline] .
- Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. ( (1997) ) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., , 25, , 33893402.
[Abstract/Free Full Text] . - Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and Wheeler,D.L. ( (2005) ) GenBank. Nucleic Acids Res., , 3, , D34D38. .
- Christie,K.R., Weng,S., Balakrishnan,R., Costanzo,M.C., Dolinski,K., Dwight,S.S., Engel,S.R., Feierbach,B., Fisk,D.G., Hirschman,J.E. et al. ( (2004) ) Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res., , 32, , 311314. .
- FlyBase Consortium ( (2003) ) The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res., , 31, , 172175.
[Abstract/Free Full Text] . - Maglott,D., Ostell,J., Pruitt,K.D. and Tatusova,T. ( (2005) ) Entrez Gene: Gene-centered information at NCBI. Nucleic Acids Res., , 3, , D54D58. .
- Bult,C.J., Blake,J.A., Richardson,J.E., Kadin,J.A., Eppig,J.T., Baldarelli,R.M., Barsanti,K., Baya,M., Beal,J.S., Boddy,W.J. et al. ( (2004) ) The Mouse Genome Database (MGD): integrating biology with the genome. Nucleic Acids Res., , 32, , 476481. .
- Marchler-Bauer,A., Anderson,J.B., DeWeese-Scott,C., Fedorova,N.D., Geer,L.Y., He,S., Hurwitz,D.I., Jackson,J.D., Jacobs,A.R., Lanczycki,C.J. et al. ( (2003) ) CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res., , 31, , 383387.
[Abstract/Free Full Text] . - Sherry,S.T., Ward,M.H., Kholodov,M., Baker,J., Phan,L., Smigielski,E.M. and Sirotkin,K. ( (2001) ) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., , 29, , 308311.
[Abstract/Free Full Text] . - Touriol,C., Bornes,S., Bonnal,S., Audigier,S., Prats,H., Prats,A.C. and Vagner,S. ( (2003) ) Generation of protein isoform diversity by alternative initiation of translation at non-AUG codons. Biol. Cell., , 95, , 169178.[CrossRef][Web of Science][Medline] .
- Copeland,P.R. ( (2003) ) Regulation of gene expression by stop codon recoding: selenocysteine. Gene, , 312, , 1725.[CrossRef][Web of Science][Medline] .
- Wheeler,D.L., Church,D.M., Edgar,R., Federhen,S., Helmberg,W., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Sequeira,E., et al. ( (2005) ) Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res., , 32, , D39D45 .
- Tatusova,T.A., Karsch-Mizrachi,I. and Ostell,J.A. ( (1999) ) Complete genomes in WWW Entrez: data representation and analysis. Bioinformatics, , 15, , 536543.
[Abstract/Free Full Text] .
This article has been cited by other articles:
![]() |
X. Wang, X. Wang, R. K. Varma, L. Beauchamp, S. Magdaleno, and T. J. Sendera Selection of hyperfunctional siRNAs with improved potency and specificity Nucleic Acids Res., December 1, 2009; 37(22): e152 - e152. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Tang, R. P. LeGros, N. Louneva, L. Yeh, J. W. Cohen, C.-G. Hahn, D. J. Blake, S. E. Arnold, and K. Talbot Dysbindin-1 in dorsolateral prefrontal cortex of schizophrenia cases is reduced in an isoform-specific manner unrelated to dysbindin-1 mRNA expression Hum. Mol. Genet., October 15, 2009; 18(20): 3851 - 3863. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Alvarez, S. Elbashir, T. Borland, I. Toudjarska, P. Hadwiger, M. John, I. Roehl, S. S. Morskaya, R. Martinello, J. Kahn, et al. RNA Interference-Mediated Silencing of the Respiratory Syncytial Virus Nucleocapsid Defines a Potent Antiviral Strategy Antimicrob. Agents Chemother., September 1, 2009; 53(9): 3952 - 3962. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hausser, P. Berninger, C. Rodak, Y. Jantscher, S. Wirth, and M. Zavolan MirZ: an integrated microRNA expression atlas and target prediction resource Nucleic Acids Res., July 1, 2009; 37(suppl_2): W266 - W272. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. C. Pang, M. E. Dinger, T. R. Mercer, L. Malquori, S. M. Grimmond, W. Chen, and J. S. Mattick Genome-Wide Identification of Long Noncoding RNAs in CD8+ T Cells J. Immunol., June 15, 2009; 182(12): 7738 - 7748. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Garber, M. Guttman, M. Clamp, M. C. Zody, N. Friedman, and X. Xie Identifying novel constrained elements by exploiting biased substitution patterns Bioinformatics, June 15, 2009; 25(12): i54 - i62. [Abstract] [Full Text] [PDF] |
||||
![]() |
J H M Schuurs-Hoeijmakers, S Vermeer, B W M van Bon, R Pfundt, C Marcelis, A P M de Brouwer, N de Leeuw, and B. B A de Vries Refining the critical region of the novel 19q13.11 microdeletion syndrome to 750 Kb J. Med. Genet., June 1, 2009; 46(6): 421 - 423. [Full Text] [PDF] |
||||
![]() |
P. Bjorkholm, P. Daniluk, A. Kryshtafovych, K. Fidelis, R. Andersson, and T. R. Hvidsten Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts Bioinformatics, May 15, 2009; 25(10): 1264 - 1270. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Wang and T. S. Furey Analysis of Complex Disease Association and Linkage Studies Using the University of California Santa Cruz Genome Browser Circ Cardiovasc Genet, April 1, 2009; 2(2): 199 - 204. [Full Text] [PDF] |
||||
![]() |
K.-H. Yen, C.-L. Ho, and C. Lee The analysis of inconsistencies between cytogenetic annotations and sequence mapping by defining the imprecision zones of cytogenetic banding Bioinformatics, April 1, 2009; 25(7): 845 - 852. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. T. Garnett, T. M. Han, M. J. Gilchrist, J. C. Smith, M. B. Eisen, F. C. Wardle, and S. L. Amacher Identification of direct T-box target genes in the developing zebrafish mesoderm Development, March 1, 2009; 136(5): 749 - 760. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Matsumura, Z. Huang, T. Baba, P. S. Lee, J. C. Barnett, S. Mori, J. T. Chang, W.-L. Kuo, A. H. Gusberg, R. S. Whitaker, et al. Yin Yang 1 Modulates Taxane Response in Epithelial Ovarian Cancer Mol. Cancer Res., February 1, 2009; 7(2): 210 - 220. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. C. Lima, A. M. Varani, and C. F.M. Menck NAD Biosynthesis Evolution in Bacteria: Lateral Gene Transfer of Kynurenine Pathway in Xanthomonadales and Flavobacteriales Mol. Biol. Evol., February 1, 2009; 26(2): 399 - 406. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. C. Friedman, K. K.-H. Farh, C. B. Burge, and D. P. Bartel Most mammalian mRNAs are conserved targets of microRNAs Genome Res., January 1, 2009; 19(1): 92 - 105. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Taccioli, E. Fabbri, R. Visone, S. Volinia, G. A. Calin, L. Y. Fong, R. Gambari, A. Bottoni, M. Acunzo, J. Hagan, et al. UCbase & miRfunc: a database of ultraconserved sequences and microRNA function Nucleic Acids Res., January 1, 2009; 37(suppl_1): D41 - D48. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Lee and G. Shin CleanEST: a database of cleansed EST libraries Nucleic Acids Res., January 1, 2009; 37(suppl_1): D686 - D689. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Keerthikumar, R. Raju, K. Kandasamy, A. Hijikata, S. Ramabadran, L. Balakrishnan, M. Ahmed, S. Rani, L. D. N. Selvan, D. S. Somanathan, et al. RAPID: Resource of Asian Primary Immunodeficiency Diseases Nucleic Acids Res., January 1, 2009; 37(suppl_1): D863 - D867. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Belshaw, T. de Oliveira, S. Markowitz, and A. Rambaut The RNA Virus Database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D431 - D435. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Babiarz, J. G. Ruby, Y. Wang, D. P. Bartel, and R. Blelloch Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs Genes & Dev., October 15, 2008; 22(20): 2773 - 2785. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Xu, C.-L. Wei, F. Lin, and W.-K. Sung An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data Bioinformatics, October 15, 2008; 24(20): 2344 - 2349. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Janes, T. Ezaz, J. A. M. Graves, and S. V. Edwards Characterization, chromosomal location, and genomic neighborhood of a ratite ortholog of a gene with gonadal expression in mammals Integr. Comp. Biol., October 1, 2008; 48(4): 505 - 511. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bekaert and E. C. Teeling UniPrime: a workflow-based platform for improved universal primer design Nucleic Acids Res., June 1, 2008; 36(10): e56 - e56. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Weinberg, E. E. Regulski, M. C. Hammond, J. E. Barrick, Z. Yao, W. L. Ruzzo, and R. R. Breaker The aptamer core of SAM-IV riboswitches mimics the ligand-binding site of SAM-I riboswitches RNA, May 1, 2008; 14(5): 822 - 828. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Li, X. Dai, and X. Zhao A nearest neighbor approach for automated transporter prediction and categorization from protein sequences Bioinformatics, May 1, 2008; 24(9): 1129 - 1136. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Garrett-Mayer, G. Parmigiani, X. Zhong, L. Cope, and E. Gabrielson Cross-study validation and combined analysis of gene expression microarray data Biostat., April 1, 2008; 9(2): 333 - 354. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Deusch, G. Landan, M. Roettger, N. Gruenheit, K. V. Kowallik, J. F. Allen, W. Martin, and T. Dagan Genes of Cyanobacterial Origin in Plant Nuclear Genomes Point to a Heterocyst-Forming Plastid Ancestor Mol. Biol. Evol., April 1, 2008; 25(4): 748 - 761. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Moreno-Hagelsieb and K. Latimer Choosing BLAST options for better detection of orthologs as reciprocal best hits Bioinformatics, February 1, 2008; 24(3): 319 - 324. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Bourdeau, J. Deschenes, D. Laperriere, M. Aid, J. H. White, and S. Mader Mechanisms of primary and secondary estrogen target gene regulation in breast cancer cells Nucleic Acids Res., January 17, 2008; 36(1): 76 - 93. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yan, B. M. Barnes, F. Kohl, and T. G. Marr Modulation of gene expression in hibernating arctic ground squirrels Physiol Genomics, January 17, 2008; 32(2): 170 - 181. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. R. Mercer, M. E. Dinger, S. M. Sunkin, M. F. Mehler, and J. S. Mattick Specific expression of long noncoding RNAs in the mouse brain PNAS, January 15, 2008; 105(2): 716 - 721. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Laslett and B. Canback ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences Bioinformatics, January 15, 2008; 24(2): 172 - 175. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Roma, M. Sardiello, G. Cobellis, P. Cruz, G. Lago, R. Sanges, and E. Stupka The UniTrap resource: tools for the biologist enabling optimized use of gene trap clones Nucleic Acids Res., January 11, 2008; 36(suppl_1): D741 - D746. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Rattei, P. Tischler, R. Arnold, F. Hamberger, J. Krebs, J. Krumsiek, B. Wachinger, V. Stumpflen, and W. Mewes SIMAP structuring the network of protein similarities Nucleic Acids Res., January 11, 2008; 36(suppl_1): D289 - D292. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Nozawa, Y. Kawahara, and M. Nei From the Cover: Genomic drift and copy number variation of sensory receptor genes in humans PNAS, December 18, 2007; 104(51): 20421 - 20426. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Siepel, M. Diekhans, B. Brejova, L. Langton, M. Stevens, C. L.G. Comstock, C. Davis, B. Ewing, S. Oommen, C. Lau, et al. Targeted discovery of novel human exons by comparative genomics Genome Res., December 1, 2007; 17(12): 1763 - 1773. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. T. Saunders and P. Green Insights from Modeling Protein Evolution with Context-Dependent Mutation and Asymmetric Amino Acid Selection Mol. Biol. Evol., December 1, 2007; 24(12): 2632 - 2647. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Graham, M. W. McBride, M. Gaasenbeek, K. Gilday, E. Beattie, W. H. Miller, J. D. McClure, J. M. Polke, A. Montezano, R. M. Touyz, et al. Candidate Genes That Determine Response to Salt in the Stroke-Prone Spontaneously Hypertensive Rat: Congenic Analysis Hypertension, December 1, 2007; 50(6): 1134 - 1141. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Rodriguez, T. Bompada, M. Syed, P. K. Shah, and N. Maltsev Evolutionary analysis of enzymes using Chisel Bioinformatics, November 15, 2007; 23(22): 2961 - 2968. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. K. Saini, S. Griffiths-Jones, and A. J. Enright Genomic analysis of human microRNA transcripts PNAS, November 6, 2007; 104(45): 17719 - 17724. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Gordon, S. Yang, M. Tran-Gyamfi, D. Baggott, M. Christensen, A. Hamilton, R. Crooijmans, M. Groenen, S. Lucas, I. Ovcharenko, et al. Comparative analysis of chicken chromosome 28 provides new clues to the evolutionary fragility of gene-rich vertebrate regions Genome Res., November 1, 2007; 17(11): 1603 - 1613. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zhang, M. L. Hastings, A. R. Krainer, and M. Q. Zhang Dual-specificity splice sites function alternatively as 5' and 3' splice sites PNAS, September 18, 2007; 104(38): 15028 - 15033. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Guardado-Calvo, A. L. Llamas-Saiz, G. C. Fox, P. Langlois, and M. J. van Raaij Structure of the C-terminal head domain of the fowl adenovirus type 1 long fiber J. Gen. Virol., September 1, 2007; 88(9): 2407 - 2416. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. P. John, T. Wang, S. Steffen, S. Longhi, C. S. Schmaljohn, and C. B. Jonsson Ebola Virus VP30 Is an RNA Binding Protein J. Virol., September 1, 2007; 81(17): 8967 - 8976. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Zahir, H. V Firth, A. Baross, A. D Delaney, P. Eydoux, W. T Gibson, S. Langlois, H. Martin, L. Willatt, M. A Marra, et al. Novel deletions of 14q11.2 associated with developmental delay, cognitive impairment and similar minor anomalies in three children J. Med. Genet., September 1, 2007; 44(9): 556 - 561. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wagner Rapid Detection of Positive Selection in Genes and Genomes Through Variation Clusters Genetics, August 1, 2007; 176(4): 2451 - 2463. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Takaoka, M. Ohta, A. Ito, K. Takamatsu, A. Sugano, K. Funakoshi, N. Takaoka, N. Sato, H. Yokozaki, N. Arizono, et al. Electroacupuncture suppresses myostatin gene expression: cell proliferative reaction in mouse skeletal muscle Physiol Genomics, July 18, 2007; 30(2): 102 - 110. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Backes, A. Keller, J. Kuentzer, B. Kneissl, N. Comtesse, Y. A. Elnakady, R. Muller, E. Meese, and H.-P. Lenhof GeneTrail--advanced gene set enrichment analysis Nucleic Acids Res., July 13, 2007; 35(suppl_2): W186 - W192. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Lee, T. Hong, S. J. Byun, T. Woo, and Y. J. Choi ESTpass: a web-based server for processing and annotating expressed sequence tag (EST) sequences Nucleic Acids Res., July 13, 2007; 35(suppl_2): W159 - W162. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. B. Voelker and J. A. Berglund A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicing Genome Res., July 1, 2007; 17(7): 1023 - 1033. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Roma, G. Cobellis, P. Claudiani, F. Maione, P. Cruz, G. Tripoli, M. Sardiello, I. Peluso, and E. Stupka A novel view of the transcriptome revealed from gene trapping in mouse embryonic stem cells Genome Res., July 1, 2007; 17(7): 1051 - 1060. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Garg and P. Green Differing patterns of selection in alternative and constitutive splice sites Genome Res., July 1, 2007; 17(7): 1015 - 1022. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Fujishima, M. Komasa, S. Kitamura, H. Suzuki, M. Tomita, and A. Kanai Proteome-Wide Prediction of Novel DNA/RNA-Binding Proteins Using Amino Acid Composition and Periodicity in the Hyperthermophilic Archaeon Pyrococcus furiosus DNA Res, June 15, 2007; (2007) dsm011v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Koch, R. M. Andrews, P. Flicek, S. C. Dillon, U. Karaoz, G. K. Clelland, S. Wilcox, D. M. Beare, J. C. Fowler, P. Couttet, et al. The landscape of histone modifications across 1% of the human genome in five human cell lines Genome Res., June 1, 2007; 17(6): 691 - 707. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Bhinge, J. Kim, G. M. Euskirchen, M. Snyder, and V. R. Iyer Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE) Genome Res., June 1, 2007; 17(6): 910 - 916. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bao, J. L. Peirce, M. Zhou, H. Li, D. Goldowitz, R. W. Williams, L. Lu, and Y. Cui An integrative genomics strategy for systematic characterization of genetic loci modulating phenotypes Hum. Mol. Genet., June 1, 2007; 16(11): 1381 - 1390. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Emanuelsson, U. Nagalakshmi, D. Zheng, J. S. Rozowsky, A. E. Urban, J. Du, Z. Lian, V. Stolc, S. Weissman, M. Snyder, et al. Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome Genome Res., June 1, 2007; 17(6): 886 - 897. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Oyama, H. Kozuka-Hata, Y. Suzuki, K. Semba, T. Yamamoto, and S. Sugano Diversity of Translation Start Sites May Define Increased Complexity of the Human Short ORFeome Mol. Cell. Proteomics, June 1, 2007; 6(6): 1000 - 1006. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Dahl, J. Stenberg, S. Fredriksson, K. Welch, M. Zhang, M. Nilsson, D. Bicknell, W. F. Bodmer, R. W. Davis, and H. Ji Multigene amplification and massively parallel sequencing for cancer mutation discovery PNAS, May 29, 2007; 104(22): 9387 - 9392. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Menotti-Raymond, V. A. David, A. A. Schaffer, R. Stephens, D. Wells, R. Kumar-Singh, S. J. O'Brien, and K. Narfstrom Mutation in CEP290 Discovered for Cat Model of Human Retinal Degeneration J. Hered., May 16, 2007; (2007) esm019v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q.-H. Zhu, A.-Y. Guo, G. Gao, Y.-F. Zhong, M. Xu, M. Huang, and J. Luo DPTF: a database of poplar transcription factors Bioinformatics, May 15, 2007; 23(10): 1307 - 1308. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hanada, X. Zhang, J. O. Borevitz, W.-H. Li, and S.-H. Shiu A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection Genome Res., May 1, 2007; 17(5): 632 - 640. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Greenstein, N. Echols, T. N. Lombana, D. S. King, and T. Alber Allosteric Activation by Dimerization of the PknD Receptor Ser/Thr Protein Kinase from Mycobacterium tuberculosis J. Biol. Chem., April 13, 2007; 282(15): 11427 - 11435. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. F. Galante, J. Trimarchi, C. L. Cepko, S. J. de Souza, L. Ohno-Machado, and W. P. Kuo Automatic correspondence of tags and genes (ACTG): a tool for the analysis of SAGE, MPSS and SBS data Bioinformatics, April 1, 2007; 23(7): 903 - 905. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Keibler, M. Arumugam, and M. R. Brent The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs Bioinformatics, March 1, 2007; 23(5): 545 - 554. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Kim, A. Magen, and G. Ast Different levels of alternative splicing among eukaryotes Nucleic Acids Res., January 12, 2007; 35(1): 125 - 131. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Wheeler, T. Barrett, D. A. Benson, S. H. Bryant, K. Canese, V. Chetvernin, D. M. Church, M. DiCuccio, R. Edgar, S. Federhen, et al. Database resources of the National Center for Biotechnology Information Nucleic Acids Res., January 12, 2007; 35(suppl_1): D5 - D12. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Lee, T. Kim, S.-K. Kim, K. H. Lee, and D. Lee Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications Nucleic Acids Res., January 12, 2007; 35(suppl_1): D47 - D50. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Kim, A. V. Alekseyenko, M. Roy, and C. Lee The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species Nucleic Acids Res., January 12, 2007; 35(suppl_1): D93 - D98. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Igarashi, A. Eroshkin, S. Gramatikova, K. Gramatikoff, Y. Zhang, J. W. Smith, A. L. Osterman, and A. Godzik CutDB: a proteolytic event database Nucleic Acids Res., January 12, 2007; 35(suppl_1): D546 - D549. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Kazakov, M. J. Cipriano, P. S. Novichkov, S. Minovitsky, D. V. Vinogradov, A. Arkin, A. A. Mironov, M. S. Gelfand, and I. Dubchak RegTransBase--a database of regulatory sequences and interactions in a wide range of prokaryotic genomes Nucleic Acids Res., January 12, 2007; 35(suppl_1): D407 - D412. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. G. Jegga, S. Gowrisankar, J. Chen, and B. J. Aronow PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease Nucleic Acids Res., January 12, 2007; 35(suppl_1): D700 - D706. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. E. Snyder, N. Kampanya, J. Lu, E. K. Nordberg, H. R. Karur, M. Shukla, J. Soneja, Y. Tian, T. Xue, H. Yoo, et al. PATRIC: The VBI PathoSystems Resource Integration Center Nucleic Acids Res., January 12, 2007; 35(suppl_1): D401 - D406. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. E. Ulrich and I. B. Zhulin MiST: a microbial signal transduction database Nucleic Acids Res., January 12, 2007; 35(suppl_1): D386 - D390. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. K. McNeil, C. Reich, R. K. Aziz, D. Bartels, M. Cohoon, T. Disz, R. A. Edwards, S. Gerdes, K. Hwang, M. Kubal, et al. The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation Nucleic Acids Res., January 12, 2007; 35(suppl_1): D347 - D353. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Higasa, K. Miyatake, Y. Kukita, T. Tahira, and K. Hayashi D-HaploDB: a database of definitive haplotypes determined by genotyping complete hydatidiform mole samples Nucleic Acids Res., January 12, 2007; 35(suppl_1): D685 - D689. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Eyre, M. W. Wright, M. J. Lush, and E. A. Bruford HCOP: a searchable database of human orthology predictions Brief Bioinform, January 1, 2007; 8(1): 2 - 5. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Andersen, C. Wiuf, M. Kruhoffer, M. Korsgaard, S. Laurberg, and T. F. Orntoft Frequent occurrence of uniparental disomy in colorectal cancer Carcinogenesis, January 1, 2007; 28(1): 38 - 48. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Tembe, N. Zavaljevski, E. Bode, C. Chase, J. Geyer, L. Wasieloski, G. Benson, and J. Reifman Oligonucleotide fingerprint identification for microarray-based pathogen diagnostic assays Bioinformatics, January 1, 2007; 23(1): 5 - 13. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Martinez, A. D. Smith, B. Li, M. Q. Zhang, and K. S. Harrod Computational prediction of novel components of lung transcriptional networks Bioinformatics, January 1, 2007; 23(1): 21 - 29. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Loots and I. Ovcharenko ECRbase: database of evolutionary conserved regions, promoters, and transcription factor binding sites in vertebrate genomes Bioinformatics, January 1, 2007; 23(1): 122 - 124. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cao, J.-L. Li, D. Li, J. F. Tobin, and R. E. Gimeno Molecular identification of microsomal acyl-CoA:glycerol-3-phosphate acyltransferase, a key enzyme in de novo triacylglycerol synthesis PNAS, December 26, 2006; 103(52): 19695 - 19700. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Puerta-Fernandez, J. E. Barrick, A. Roth, and R. R. Breaker Identification of a large noncoding RNA in extremophilic eubacteria PNAS, December 19, 2006; 103(51): 19490 - 19495. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Du, J. S. Rozowsky, J. O. Korbel, Z. D. Zhang, T. E. Royce, M. H. Schultz, M. Snyder, and M. Gerstein A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge Bioinformatics, December 15, 2006; 22(24): 3016 - 3024. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


























