Nucleic Acids Research, 2000, Vol. 28, No. 1 126-128
© 2000 Oxford University Press
NCBIs LocusLink and RefSeq
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
Received September 2, 1999; Revised and Accepted October 4, 1999.
| ABSTRACT |
|---|
|
|
|---|
The NCBI has introduced two new web resourcesLocusLink and RefSeqthat facilitate retrieval of gene-based information and provide reference sequence standards. These resources are designed to provide a non-redundant view of current knowledge about human genes, transcripts and proteins. Additional information about these resources is available on the LocusLink web site at http://www.ncbi.nlm.nih.gov/LocusLink/
| BACKGROUND |
|---|
|
|
|---|
The LocusLink and RefSeq databases were initiated to address data-access problems resulting from significant increases in both sequence data and the number of web sites relating information about genes. For example, it is increasingly difficult to identify unambiguously which sequenceof the many publicly availableis an appropriate, complete representative of a given mRNA or protein. Inversely, given an mRNA or protein sequence, it can also be a challenge to determine the official name or symbol for the gene from which the sequence was derived. And once a gene symbol or name is known, identifying other web resources that include information about that gene of interest may be very time-consuming. In its role as a web directory, LocusLink provides a single point-of-access to a variety of gene-specific information sources including web resources and RefSeq. RefSeq provides a non-redundant data set of reference sequences representing transcripts and proteins of known genes. RefSeq records include links to LocusLink, thereby facilitating making connections among sequence data, gene names and related biological information. The LocusLink and RefSeq resources establish reference sequences and stable database identifiers (LocusID) that can be used in variation, mutation and expression analyses.
| SCOPE |
|---|
|
|
|---|
LocusLink
LocusLink offers a simple query interface to retrieve information about human genes and some non-gene loci. It supports text-based queries by using official nomenclature provided through collaboration with the Human Gene Nomenclature Committee (HGNC; http://www.gene.ucl.ac.uk/nomenclature/ ) (1), as well as cytogenetic locations, aliases and historical names for both a gene and its products. LocusLink provides direct connections to related information available from several resources at NCBI (Table 1) as well as to external web sites including the Genome Database (GDB; http://gdbwww.gdb.org/ ), the Human Gene Mutation Database (HGMD; http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html ) (2), GeneCard (http://bioinfo.weizmann.ac.il/cards/ ), GeneClinics (http://www.geneclinics.org/ ), and locus- or gene family-specific web sites. Some of the links to NCBI resources listed in Table 1 are represented by icons that, when displayed, give an immediate indication that additional information is indeed available. The goal of the PubMed and GenBank/GenPept (3) links is not to be comprehensive, but to establish sufficient connections to facilitate information retrieval via NCBIs ENTREZ (4) related sequences or related publications links or through BLAST (5). LocusLink also provides a unique stable identifier for each locus (LocusID).
|
RefSeq
Although the goal of RefSeq in general is to provide reference sequences representing chromosomes, transcripts and proteins, discussion here is restricted to the subset of human mRNAs and proteins. A RefSeq record is made for an mRNA if the function of the gene product has been studied, and if the sequence of the complete coding region is known. Separate RefSeq records are made for experimentally supported alternate transcripts and their products. The sequence presented in a RefSeq record is usually derived from available GenBank records, although additional information is at times added from the literature or from communications with the research community. RefSeq records are provided in one of two states, either provisional or reviewed. Records initially released as provisional include much of the annotation from the GenBank record used as the source, but incorporate gene and protein names, PubMed links, summary text, and map and chromosome data from LocusLink when available (Table 2). Provisional records are subjected to a manual curation and review process, with the reviewed record being the end product. The reviewed record might differ from the original provisional record by including: (i) more extensive 5' and 3' untranslated regions derived from other GenBank records or the literature, (ii) additional mRNA and/or protein features, (iii) more publications and (iv) a summary text describing the gene. Table 2 lists additional annotation that may be added to provisional and reviewed RefSeq records. RefSeq records can be distinguished from GenBank records by the inclusion of a REFSEQ statement in a COMMENT field, and by the unique format of the accession number. The first three characters of the RefSeq mRNA and protein accession numbers are NM_ and NP_, respectively, followed by six numerals (e.g. NM_000280, NP_000337).
|
| ACCESS |
|---|
|
|
|---|
RefSeq records can be retrieved by text word queries (gene or protein names or symbols, accession numbers, etc.) or by sequence homology. LocusLink (see Table 3 for URLs) and ENTREZ both support accessing RefSeq records by text. BLAST-based sequence queries must be done against the nucleotide or protein nr databases. The RefSeq records in a BLAST query result can be readily identified by the ref prefix and the distinct accession number format described above. More query details and examples are provided in the LocusLink and RefSeq help and FAQ pages available from the LocusLink home page.
|
LocusLink and RefSeq records are also freely available on the NCBI FTP site (see Table 3). Note that RefSeq records are not in GenBank and must be downloaded separately.
| SEARCHING |
|---|
|
|
|---|
Comprehensive descriptions of query strategies and navigation from LocusLink and RefSeq are provided from the LocusLink home page. Please note there are multiple sites within NCBI that include links to LocusLink and RefSeq by specific identifiers. These include Online Mendelian Inheritance in Man (OMIM; http://www.ncbi.nlm.nih.gov/omim/ ), UniGene (http://www.ncbi.nlm.nih.gov/UniGene/ ), GeneMap99 (http://www.ncbi.nlm.nih.gov/genemap/ ) and dbSNP (http://www.ncbi.nlm.nih.gov/SNP/ ) (6).
| MAINTENANCE |
|---|
|
|
|---|
LocusLink and RefSeq records are created and maintained by an ongoing process as described by Pruitt et al. (7) and on the LocusLink web site. The LocusLink web pages are currently refreshed weekly. RefSeq records may be modified at any time based either on text changes (nomenclature), or by replacing a provisional record with a reviewed one (maintaining the same accession number, but changing the version number and sequence ID numbers if the sequence data has changed).
| CONTACT |
|---|
|
|
|---|
Questions, comments and suggestions can be emailed to info@ncbi.nlm.nih.gov . We welcome collaborations with and contributions from the research community.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +1 301 435 5898; Fax: +1 301 480 9241; Email: pruitt@ncbi.nlm.nih.gov
| REFERENCES |
|---|
|
|
|---|
-
1 White,J.A., McAlpine,P.J., Antonarakis,S., Cann,H., Eppig,J.T., Frazer,K., Frezal,J., Lancet,D., Nahmias,J., Pearson,P., Peters,J., Scott,A., Scott,H., Spurr,N., Talbot,C.,Jr and Povey,S. (1978) Genomics, 45, 468471.
2Cooper,D.N., Ball,E.V. and Krawczak,M. (1998) Nucleic Acids Res., 26, 285287.
3Benson,D.A. (1999) Nucleic Acids Res., 27, 1217. Updated article in this issue: Nucleic Acids Res. (2000), 28, 1518.
4 Schuler,G.D., Epstein,J.A., Ohkawa,H. and Kans,J.A. (1996) Methods Enzymol., 266, 141162.[Web of Science][Medline]
5 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 33893402.
6 Sherry,S.T. (2000) Nucleic Acids Res., 28, 352355.
7 Pruitt,K.D., Katz,K.S., Sicotte,H. and Maglott,D.R. (2000) Trends Genet., in press.
This article has been cited by other articles:
![]() |
K. D. Yokoyama, U. Ohler, and G. A. Wray Measuring spatial preferences at fine-scale resolution identifies known and novel cis-regulatory element candidates and functional motif-pair relationships Nucleic Acids Res., July 1, 2009; 37(13): e92 - e92. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Tharakaraman, O. Bodenreider, D. Landsman, J. L. Spouge, and L. Marino-Ramirez The biological function of some human transcription factor binding motifs varies with position relative to the transcription start site Nucleic Acids Res., May 1, 2008; 36(8): 2777 - 2786. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Moreno-Hagelsieb and K. Latimer Choosing BLAST options for better detection of orthologs as reciprocal best hits Bioinformatics, February 1, 2008; 24(3): 319 - 324. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Janga, W. F. Lamboy, A. M. Huerta, and G. Moreno-Hagelsieb The distinctive signatures of promoter regions and operon junctions across prokaryotes Nucleic Acids Res., September 1, 2006; 34(14): 3980 - 3987. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. F. Rudd, R. D. Williams, E. L. Webb, S. Schmidt, G. S. Sellick, and R. S. Houlston The Predicted Impact of Coding Single Nucleotide Polymorphisms Database Cancer Epidemiol. Biomarkers Prev., November 1, 2005; 14(11): 2598 - 2604. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Crise, Y. Li, C. Yuan, D. R. Morcock, D. Whitby, D. J. Munroe, L. O. Arthur, and X. Wu Simian Immunodeficiency Virus Integration Preference Is Similar to That of Human Immunodeficiency Virus Type 1 J. Virol., October 1, 2005; 79(19): 12199 - 12204. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bekaert, H. Richard, B. Prum, and J.-P. Rousset Identification of programmed translational -1 frameshifting sites in the genome of Saccharomyces cerevisiae Genome Res., October 1, 2005; 15(10): 1411 - 1420. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Karchin, M. Diekhans, L. Kelly, D. J. Thomas, U. Pieper, N. Eswar, D. Haussler, and A. Sali LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources Bioinformatics, June 15, 2005; 21(12): 2814 - 2820. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tao, C. Friedman, and Y. A. Lussier Visualizing information across multidimensional post-genomic structured and textual databases Bioinformatics, April 15, 2005; 21(8): 1659 - 1667. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. L. Winsor, R. Lo, S. J. H. Sui, K. S.E. Ung, S. Huang, D. Cheng, W.-K. H. Ching, R. E. W. Hancock, and F. S. L. Brinkman Pseudomonas aeruginosa Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation Nucleic Acids Res., January 1, 2005; 33(suppl_1): D338 - D343. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Marsischky and J. LaBaer Many Paths to Many Clones: A Comparative Look at High-Throughput Cloning Methods Genome Res., October 1, 2004; 14(10b): 2020 - 2028. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. FitzGerald, A. Shlyakhtenko, A. A. Mir, and C. Vinson Clustering of DNA Sequences in Human Promoters Genome Res., August 1, 2004; 14(8): 1562 - 1574. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. S. Nair, A. Jaleel, Y. W. Asmann, K. R. Short, and S. Raghavakaimal Proteomic research: potential opportunities for clinical and physiological investigators Am J Physiol Endocrinol Metab, June 1, 2004; 286(6): E863 - E874. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Montgomery, T. Astakhova, M. Bilenky, E. Birney, T. Fu, M. Hassel, C. Melsopp, M. Rak, A. G. Robertson, M. Sleumer, et al. Sockeye: A 3D Environment for Comparative Genomics Genome Res., May 1, 2004; 14(5): 956 - 962. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Chong, G. Zhang, and V. B. Bajic FIE2: a program for the extraction of genomic DNA sequences around the start and translation initiation site of human genes Nucleic Acids Res., July 1, 2003; 31(13): 3546 - 3553. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Elkon, C. Linhart, R. Sharan, R. Shamir, and Y. Shiloh Genome-Wide In Silico Identification of Transcriptional Regulators Controlling the Cell Cycle in Human Cells Genome Res., May 1, 2003; 13(5): 773 - 780. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Unneberg, A. Wennborg, and M. Larsson Transcript identification by analysis of short sequence tags--influence of tag length, restriction site and transcript database Nucleic Acids Res., April 15, 2003; 31(8): 2217 - 2226. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Kan, D. States, and W. Gish Selecting for Functional Alternative Splices in ESTs Genome Res., December 1, 2002; 12(12): 1837 - 1845. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and a. D. Haussler The Human Genome Browser at UCSC Genome Res., June 1, 2002; 12(6): 996 - 1006. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Mural, M. D. Adams, E. W. Myers, H. O. Smith, G. L. G. Miklos, R. Wides, A. Halpern, P. W. Li, G. G. Sutton, J. Nadeau, et al. A Comparison of Whole-Genome Shotgun-Derived Mouse Chromosome 16 and the Human Genome Science, May 31, 2002; 296(5573): 1661 - 1671. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Yu, G. Hripcsak, and C. Friedman Mapping Abbreviations to Full Forms in Biomedical Articles J. Am. Med. Inform. Assoc., May 1, 2002; 9(3): 262 - 272. [Abstract] [Full Text] [PDF] |
||||
![]() |
R.-F. Yeh, L. P. Lim, and C. B. Burge Computational Inference of Homologous Gene Structures in the Human Genome Genome Res., May 1, 2001; 11(5): 803 - 816. [Abstract] [Full Text] |
||||
![]() |
Z. Kan, E. C. Rouchka, W. R. Gish, and D. J. States Gene Structure Prediction and Alternative Splicing Analysis Using Genomically Aligned ESTs Genome Res., May 1, 2001; 11(5): 889 - 900. [Abstract] [Full Text] |
||||
![]() |
M. D. Wilson, C. Riemer, D. W. Martindale, P. Schnupf, A. P. Boright, T. L. Cheung, D. M. Hardy, S. Schwartz, S. W. Scherer, L.-C. Tsui, et al. Comparative analysis of the gene-dense ACHE/TFR2 region on human chromosome 7q22 with the orthologous region on mouse chromosome 5 Nucleic Acids Res., March 15, 2001; 29(6): 1352 - 1365. [Abstract] [Full Text] [PDF] |
||||
![]() |
J D Matthew, A S Khromov, M J McDuffie, A V Somlyo, A P Somlyo, S Taniguchi, and K Takahashi Contractile properties and proteins of smooth muscles of a calponin knockout mouse J. Physiol., December 15, 2000; 529(3): 811 - 824. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Wheeler, C. Chappey, A. E. Lash, D. D. Leipe, T. L. Madden, G. D. Schuler, T. A. Tatusova, and B. A. Rapp Database resources of the National Center for Biotechnology Information Nucleic Acids Res., January 1, 2000; 28(1): 10 - 14. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-L. Samson Drosophila Arginase Is Produced from a Nonvital Gene That Contains the elav Locus within Its Third Intron J. Biol. Chem., September 29, 2000; 275(40): 31107 - 31114. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









