Nucleic Acids Research, 2002, Vol. 30, No. 1 1-12
© 2002 Oxford University Press
The Molecular Biology Database Collection: 2002 update
Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Building 50, Room 5222, Bethesda, MD 20892-8002, USA
Received October 9, 2001; Accepted November 20, 2001.
| ABSTRACT |
|---|
|
|
|---|
The Molecular Biology Database Collection is an online resource listing key databases of value to the biological community. This Collection is intended to bring fellow scientists attention to high-quality databases that are available throughout the world, rather than just be a lengthy listing of all available databases. As such, this up-to-date listing is intended to serve as the initial point from which to find specialized databases that may be of use in biological research. The databases included in this Collection provide new value to the underlying data by virtue of curation, new data connections or other innovative approaches. Short, searchable summaries and updates for each of the databases included in the Collection are available through the Nucleic Acids Research Web site at http://nar.oupjournals.org.
One of the most significant scientific events in the year 2001 was the publication of the initial sequence and analysis of the human genome resulting from both public (1) and private sector (2) efforts. With these publications, we have entered into a new era for modern biology, one where the majority of biological and biomedical research being conducted will use sequence data as its basic underpinning. Having such a rich source of information will prove invaluable for basic researchers whose findings will, in time, lead to improved strategies for the diagnosis, treatment and prevention of diseases having a genetic basis. In short, the stage has been set for genetic medicine having a prominent role in the delivery of healthcare in the future (3).
A number of significant insights have already been made into the secrets hidden within the 3 billion bases that comprise the human genome (1). There is marked variation in the distribution of features such as genes, transposable elements, GC content, CpG islands and recombination rate; this uneven distribution may provide important clues about the functions of these features and how they may be involved in regulation. There is a preferential retention of Alu elements in GC-rich regions, correlating them (in a loose sense) with actively-transcribed genes. These elements may actually turn out to not be just junk DNA, instead providing a tangible benefit to their human hosts. In general, repetitive elements may not have a direct function per se, but may influence chromosome structure. Probably the most telling finding is that the total number of genes in the human genome is only in the order of 30 000 to 35 000. Previously, numbers in the 80 000 range (and as high as 140 000) had been put forward. While the new estimate in the number of genes gives the human about twice that seen in Caenorhabditis elegans or in Drosophila, the genes themselves have a more complex structure. This big down-estimate in the number of genes immediately brings into question the one geneone protein hypothesis: we are now finding more and more examples of alternative splicing generating a larger number of protein products (consistent with a more complex gene structure), as well as cases where identical proteins can be used for different functions, depending on their compartmentalization (4).
While the near-completion of human genome sequencing marks a significant milestone, there are many other sequence-based efforts currently underway that will have just as much impact on the scientific and medical community. The most eagerly-anticipated model organism map is that of the mouse. The most recent physical map released on the Ensembl web site (http://mouse.ensembl.org, September 2001) provides an estimated 95% coverage of the mouse genome, with 15 694 genes confirmed over 361 Mb. To the issue of human health, single nucleotide polymorphisms (SNPs) continue to be identified at a breakneck pace. Over 1 million SNPs have already been identified, and a random sampling chosen for validation shows that 95% of these are indeed both polymorphic and unique (http://snp.cshl.org/data/). SNP alleles can be used as genetic markers, and often, the SNP itself is the variant that causes or contributes to the risk of developing a particular genetic disorder. To increase the power of using SNPs as markers for human disease, efforts are currently under way to develop a haplotype map, where blocks of SNPs (rather than individual SNPs) could be used to find chromosomal regions associated with disease.
The sequence data that has been generated by these and other systematic sequencing projects can be browsed and downloaded from a variety of Web sites, with the major portals being located at NCBI (http://www.ncbi.nlm.nih.gov), Ensembl (http://www.ensembl.org) and UCSC (http://genome.cse.ucsc.edu). The problem that many investigators encounter, however, is that these larger databases often do not contain specialized information that would be of interest to specific groups within the scientific community. Many such databases have emerged to fill the void, and these databases often provide not just sequence-based information, but data such as phenotypes, experimental conditions, strain crosses and map features, data that might not fit neatly onto a large physical map of a genome. Most importantly, data in these smaller databases tend to be curated by experts in a particular speciality and are often experimentally-verified, meaning that they represent the best state of knowledge in that particular area. The savvy user will, therefore, make use of both types of databases in their experimental planning and design. This journal has devoted its first issue over the last several years to documenting the availability and features of these specialized databases in order to better-serve its readership and to promote the use of these resources in the design and analysis of experiments. These reviewed databases are collectively listed in the Molecular Biology Database Collection.
The databases included in the current version of the Collection are shown in Table 1. This year, the total number of databases listed is 335, up from 281 the year before. Several new databases have been added to the Collection, while others that are no longer actively curated or no longer available have been removed. These databases all distinguish themselves by their approach to presenting the underlying datafor example, by adding new value to the underlying data by virtue of curation, by providing new types of data connections or by implementing other innovative approaches that facilitate biological discovery. The individual entries are classified by type, but the reader should recognize that the distinctions between these classes are often arbitrary, and that many of these databases provide more than one type of information to the user.
|
In addition to the list presented in this paper, an electronic version of the Database Issue and Collection can be accessed online and is freely available to everyone, regardless of subscription status, at http://nar.oupjournals.org. While the list contains the databases described in the papers comprising the current issue, it should be immediately apparent to the reader that there are simply not enough pages in this journal to accommodate full-length, printed descriptions of all of the 335 databases featured here. To address this, the online version of the Collection now includes short summaries of many of the databases, the summaries having been provided directly by the investigators responsible for the individual databases. We have also asked contributors to point out new features of their databases in the Recent Developments section of their entry. It is hoped that this approach will provide the reader with an additional source of information that will facilitate finding and selecting the sources of data that would be of most value in addressing a specific biological problem. Contributors will be encouraged to keep their entries up-to-date.
Suggestions for the inclusion of additional database resources in this collection are encouraged and may be directed to the author (andy{at}nhgri.nih.gov).
| ACKNOWLEDGEMENT |
|---|
I wish to thank Yi-Chi Barash for designing the new Web-based submission tool for this Collection, as well as for her technical support.
| FOOTNOTES |
|---|
* Tel: +1 301 496 8570; Fax: +1 301 402 6858; Email: andy{at}nhgri.nih.gov
| REFERENCES |
|---|
|
|
|---|
-
1 International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860921.[Medline]
2 Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J., Sutton,G.G., Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A. et al. (2001) The sequence of the human genome. Science, 291, 13041351.
3 Collins,F.S. and McKusick,V.A. (2001) Implications of the Human Genome Project for medical science. J. Am. Med. Assoc., 285, 540544.
4 Jeffery,C.J. (1999) Moonlighting proteins. Trends Biochem Sci., 24, 811.[ISI][Medline]
This article has been cited by other articles:
![]() |
A. Schwede, L. Ellis, J. Luther, M. Carrington, G. Stoecklin, and C. Clayton A role for Caf1 in mRNA deadenylation and decay in trypanosomes and human cells Nucleic Acids Res., June 1, 2008; 36(10): 3374 - 3388. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Garrett-Mayer, G. Parmigiani, X. Zhong, L. Cope, and E. Gabrielson Cross-study validation and combined analysis of gene expression microarray data Biostat., April 1, 2008; 9(2): 333 - 354. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. De La Chesnaye, B. Kerr, A. Paredes, H. Merchant-Larios, J. P. Mendez, and S. R. Ojeda Fbxw15/Fbxo12J Is an F-Box Protein-Encoding Gene Selectively Expressed in Oocytes of the Mouse Ovary Biol Reprod, April 1, 2008; 78(4): 714 - 725. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Haanstra, M. Stewart, V.-D. Luu, A. van Tuijl, H. V. Westerhoff, C. Clayton, and B. M. Bakker Control and Regulation of Gene Expression: QUANTITATIVE ANALYSIS OF THE EXPRESSION OF PHOSPHOGLYCERATE KINASE IN BLOODSTREAM FORM TRYPANOSOMA BRUCEI J. Biol. Chem., February 1, 2008; 283(5): 2495 - 2507. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Soundararajan, A. D. Wishart, H. P. V. Rupasinghe, M. Arcellana-Panlilio, C. M. Nelson, M. Mayne, and G. S. Robertson Quercetin 3-Glucoside Protects Neuroblastoma (SH-SY5Y) Cells in Vitro against Oxidative Damage by Inducing Sterol Regulatory Element-binding Protein-2-mediated Cholesterol Biosynthesis J. Biol. Chem., January 25, 2008; 283(4): 2231 - 2245. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Hartmann, C. Benz, S. Brems, L. Ellis, V.-D. Luu, M. Stewart, I. D'Orso, C. Busold, K. Fellenberg, A. C. C. Frasch, et al. Small Trypanosome RNA-Binding Proteins TbUBP1 and TbUBP2 Influence Expression of F-Box Protein mRNAs in Bloodstream Trypanosomes Eukaryot. Cell, November 1, 2007; 6(11): 1964 - 1978. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Fan and Y. Niu Selection and validation of normalization methods for c-DNA microarrays using within-array replications Bioinformatics, September 15, 2007; 23(18): 2391 - 2398. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Raz, V. Nardi, M. Azam, J. Cortes, and G. Q. Daley Farnesyl transferase inhibitor resistance probed by target mutagenesis Blood, September 15, 2007; 110(6): 2102 - 2109. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wagner, C. Lewis, and M. Bichsel A survey of bacterial insertion sequences using IScan Nucleic Acids Res., August 13, 2007; 35(16): 5284 - 5293. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-H. Li, H. Irmer, D. Gudjonsdottir-Planck, S. Freese, H. Salm, S. Haile, A. M. Estevez, and C. Clayton Roles of a Trypanosoma brucei 5'->3' exoribonuclease homolog in mRNA degradation RNA, December 1, 2006; 12(12): 2171 - 2186. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Prentice and L. QI Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation Biostat., July 1, 2006; 7(3): 339 - 354. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. C. Day, C. G. Plopper, and M. V. Fanucchi Age-specific pulmonary cytochrome P-450 3A1 expression in postnatal and adult rats Am J Physiol Lung Cell Mol Physiol, July 1, 2006; 291(1): L75 - L83. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. P. Scott, J. C. Martin, G. Campbell, C.-D. Mayer, and H. J. Flint Whole-Genome Transcription Profiling Reveals Genes Up-Regulated by Growth on Fucose in the Human Gut Bacterium "Roseburia inulinivorans". J. Bacteriol., June 1, 2006; 188(12): 4340 - 4349. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Webb, R. Burns, L. Ellis, N. Kimblin, and M. Carrington Developmentally regulated instability of the GPI-PLC mRNA is dependent on a short-lived protein factor Nucleic Acids Res., March 8, 2005; 33(5): 1503 - 1512. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kohrer, E. L. Sullivan, and U. L. RajBhandary Complete set of orthogonal 21st aminoacyl-tRNA synthetase-amber, ochre and opal suppressor tRNA pairs: concomitant suppression of three different termination codons in an mRNA in mammalian cells Nucleic Acids Res., December 1, 2004; 32(21): 6200 - 6211. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Y. Lu and R. J. Schneider Tissue Distribution of AU-rich mRNA-binding Proteins Involved in Regulation of mRNA Decay J. Biol. Chem., March 26, 2004; 279(13): 12974 - 12979. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. M. Sayer, M. Cubin, A. Rhie, M. Bullock, A. Tahiri-Alaoui, and W. James Structural Determinants of Conformationally Selective, Prion-binding Aptamers J. Biol. Chem., March 26, 2004; 279(13): 13102 - 13109. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Xiong, C. Zhu, F. Li, R. Hegazi, K. He, M. Babyatsky, A. J. Bauer, and S. E. Plevy Inhibition of Interleukin-12 p40 Transcription and NF-{kappa}B Activation by Nitric Oxide in Murine Macrophages and Dendritic Cells J. Biol. Chem., March 12, 2004; 279(11): 10776 - 10783. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Fan, P. Tam, G. V. Woude, and Y. Ren Normalization and analysis of cDNA microarrays using within-array replications applied to neuroblastoma cell response to a cytokine PNAS, February 3, 2004; 101(5): 1135 - 1140. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Wang, A. R. Perrault, Y. Takeda, W. Qin, H. Wang, and G. Iliakis Biochemical evidence for Ku-independent backup pathways of NHEJ Nucleic Acids Res., September 15, 2003; 31(18): 5377 - 5388. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Sarkar, J.-Y. Lu, and R. J. Schneider Nuclear Import and Export Functions in the Different Isoforms of the AUF1/Heterogeneous Nuclear Ribonucleoprotein Protein Family J. Biol. Chem., May 30, 2003; 278(23): 20700 - 20707. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ton, D. Stamatiou, and C.-C. Liew Gene expression profile of zebrafish exposed to hypoxia during development Physiol Genomics, April 16, 2003; 13(2): 97 - 106. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. L. Splinter, A. I. Masyuk, and N. F. LaRusso Specific Inhibition of AQP1 Water Channels in Isolated Rat Intrahepatic Bile Duct Units by Small Interfering RNAs J. Biol. Chem., February 14, 2003; 278(8): 6268 - 6274. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Bundock and P. Hooykaas Severe Developmental Defects, Hypersensitivity to DNA-Damaging Agents, and Lengthened Telomeres in Arabidopsis MRE11 Mutants PLANT CELL, October 1, 2002; 14(10): 2451 - 2462. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-J. Hwang, P. D. Allen, G. C. Tseng, C.-W. Lam, L. Fananapazir, V. J. Dzau, and C.-C. Liew Microarray gene expression profiles in dilated and hypertrophic cardiomyopathic end-stage heart failure Physiol Genomics, July 12, 2002; 10(1): 31 - 44. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||












