Nucleic Acids Research, 2000, Vol. 28, No. 1 94-96
© 2000 Oxford University Press
Gene discovery using the maize genome database ZmDB
Department of Zoology and Genetics, Iowa State University, Ames, IA 50011-3260, USA and 1Department of Biological Sciences, Stanford University, Stanford, CA 94305-5020, USA
Received October 5, 1999; Accepted October 7, 1999.
| ABSTRACT |
|---|
|
|
|---|
Zea mays DataBase (ZmDB) is a repository and analysis tool for sequence, expression and phenotype data of the major crop plant maize. The data accessible in ZmDB are mostly generated in a large collaborative project of maize gene discovery, sequencing and phenotypic analysis using a transposon tagging strategy and expressed sequence tag (EST) sequencing. ESTs constitute most of the current content. Database search tools, convenient links to external databases, and novel sequence analysis programs for spliced alignment are provided and together serve as an efficient protocol for gene discovery by sequence inspection. ZmDB can be accessed at http://zmdb. iastate.edu . ZmDB also provides web-based ordering of materials generated in the project, including EST and genomic DNA clones, seeds of mutant plants and microarrays of amplified EST and genomic DNA sequences.
| INTRODUCTION |
|---|
|
|
|---|
Maize is the primary model plant for addressing fundamental biological questions in monocotyledonous plants. This taxon, which includes all the cereal crops, currently provides >70% of the caloric value of the human diet worldwide (1). Maize is thought to be a segmental allotetraploid reflecting the hybridization of two distinct diploid progenitors ~1120 million years ago (2); today the genome contains an estimated 50 00080 000 genes in ~2.3 x 109 base pairs present on 10 chromosomes (3). As in other plants, maize genes are compact; the distance between genes is large as a result of retrotranspson insertions (4). These features make a direct sequencing strategy for gene discovery currently impractical.
EST sequencing has emerged as the most effective way of identifying genes with moderately or highly abundant transcripts (5). Transposon tagging is a complementary technique that requires more effort but provides more information by efficiently combining gene discovery (identification and cloning) and functional genomics (analysis of the phenotypic consequences of altered gene expression) (6). Our project utilizes Mu transposons, which insert preferentially into low copy gene-like sequences (7). Transgenic maize carrying a Mu element, RescueMu, that was engineered to contain a cloning vector, experience new insertion mutations. The RescueMu plasmid is cloned directly into Escherichia coli creating an immortalized library of insertions. Sequencing from the ends of the transposon yield maize genomic sequences highly enriched for genes.
ZmDB was created to display and analyze data generated in our project of maize gene discovery, and the site also includes details of our techniques and strategies. Most sequence data in ZmDB will consist of ~1.2 kb segments of maize genomic sequence flanking independent RescueMu transposon insertion sites. The plan is to sequence 150 000 such insertion locations, yielding 23-fold coverage of the haploid gene equivalent. To aid with accurate gene identification on the genomic clones, a collection of 50 000 ESTs is now being developed. One-fifth of these will be sequenced from both the 5' and 3' ends. Of the 4000 pairs of forward and reverse sequences available now, approximately half overlap to form continuous EST sequences, ranging from an average of ~750 to 950 bp depending on the source library. The up-to-date EST collection is assembled into tentative unique genes (TUGs) by contig and consensus building. The TUGs are annotated as tentative unique clusters (TUCs) or tentative unique singlets (TUSs). Necessarily, this annotation is temporary, as more sequence data may turn TUSs into TUCs and provide links between previously separate TUCs. The exon/intron structure of a genomic DNA segment can be readily delineated by spliced alignment to a cognate or homologous EST, if such exists, as described in the next section.
Annotation of the TUGs is initially automated. BLAST (8) searches are performed for all TUGs against public databases. The top three highly significant similarities are used as provisional tags for the corresponding TUG. The descriptions and keywords of those entries are carried over to the TUG entry. In that way, a text search for a keyword (for example, glutathione transferase, alternative splicing or nuclear location) will display all TUGs with similarities to entries described in some way by that keyword. Although crude, this automatic annotation procedure proves highly useful in providing the database user with easy starting points for a refined analysis, using ZmDB or their own tools. An ongoing attempt is made to incorporate refined annotation by human experts into the database.
| NOVEL TOOLS FOR SPLICED ALIGNMENT |
|---|
|
|
|---|
Alignment of an EST sequence with its genomic DNA origin is straightforward in the absence of sequence errors or polymorphisms: the introns, spliced out in the EST, will simply show up as long gaps in the alignment. The alignment task becomes more challenging when the matching is less than perfect because of sequencing errors, sequence variation (allelic variants, duplicate genes or gene families) or matching of non-cognate ESTs derived from a homologous locus. Alternative splicing outcomes represented in multiple ESTs can also impede exon identification. ZmDB provides the GeneSeqer program (9), which implements a dynamic programming algorithm to efficiently derive an optimal scoring spliced alignment. GeneSeqer simultaneously assesses the significance of the sequence alignment and the intrinsic quality of the implied splice sites. Figure 1 displays an example. In this case, two EST sequences were aligned and are shown to differ by a putative intron of 93 bases that is retained in one of the sequences.
|
| GENE DISCOVERY AT ZmDB |
|---|
|
|
|---|
ZmDB is designed to serve a dual function as both data repository and analysis workbench. Our protocol for potential gene discovery allows many entry points, depending on the data items available as starting points. For example, the relatedness of the ESTs in Figure 1 may have been discovered by either a BLAST (8) search of one of the sequences against ZmDB, or it may have been obtained indirectly from their mutual similarity to a different nucleotide or protein query. The putative retained intron suggests that these ESTs might derive from an alternatively spliced gene. A BLASTX (8) search of GenBank suggests that these ESTs encode a serine/threonine protein kinase. Using a novel function of the SplicePredictor program (10), the putative protein homologs may be used directly for a spliced alignment to the EST (Fig. 2). The strong alignment at the protein level, the good splice site scores by the SplicePredictor algorithm, and two in-frame stop codons suggest that the intron would normally be spliced out. Retention of the intron would lead to a truncated translation product, that could be functional, or the retained intron could function as a post-transcriptional control point to reduce synthesis of the full-length protein.
|
| FUTURE DIRECTIONS |
|---|
|
|
|---|
Data to be added to ZmDB include RescueMu-derived maize genomic sequences, phenotypic data from maize plants expressing RescueMu-induced mutations, mapping data and microarray data to be generated by this project. The ResueMu-derived sequences will be annotated and (in conjunction with the EST sequences) assembled to an approximate unique set of maize genes. In doing so, other data such as possible alternative splicing of some genes will also be identified and added to the database. Any links between the phenotypic data, mapping data, microarray data and genomic and EST sequences will be identified and made accessible in the database. Users will have several new starting points for correlating gene discovery with possible functions, such as particular phenotypes or expression data. In the meantime we will continue developing computational tools to make the ZmDB site a convenient workbench for plant biologists.
| AVAILABILITY |
|---|
|
|
|---|
ZmDB is accessible at the URL http://zmdb.iastate.edu . Data files and source code for some of the algorithms used at ZmDB can be downloaded by anonymous ftp to ftp.zmdb.iastate.edu The manager of the database can be contacted by Email at zmdb@iastate.edu
| ACKNOWLEDGEMENTS |
|---|
The text of our NSF proposal is available for viewing at the ZmDB web site. Team members include the authors; Vicki Chandler, David Galbraith and Brian Larkins at the University of Arizona, Tucson; Michael Freeling and Sarah Hake at the University of California, Berkeley; Robert Schmidt and Laurie Smith at the University of California, San Diego; Marty Sachs at the University of Illinois, Urbana; and their collaborators at those locations. ZmDB is supported by the USA National Science Foundation grant NSF#9872657 (V.W. principal investigator).
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +1 515 294 9884; Fax: +1 515 294 6755; Email: vbrendel@iastate.edu
| REFERENCES |
|---|
|
|
|---|
-
1 Chrispeels,M.J. and Sadava,D.E. (1977) Plants, Food, and People. W.H.Freeman, San Francisco, CA.
2 Gaut,B.S. and Doebley,J.F. (1997) Proc. Natl Acad. Sci. USA, 94, 68096814.
3 Gale,M.D. and Devos,K.M. (1998) Proc. Natl Acad. Sci. USA, 95, 19711974.
4 SanMiguel,P., Tikhonov,A., Jin,Y.-K., Motchoulskaia,N., Zakharov,D., Melake-Berhan,A., Springer,P.S., Edwards,K.J., Lee,M., Avramova,Z. and Bennetzen,J.L. (1996) Science, 274, 765768.
5 Adams,M.D., Kerlavage,A.R., Fleischmann,R.D., Fuldner,R.A., Bult,C.J., Lee,N.H., Kirkness,E.F., Weinstock,K.G., Gocayne,J.D., White,O. et al. (1995) Nature, 377, 3174.[Medline]
6 Walbot,V. (1992) Annu. Rev. Plant Phys. Plant Mol. Biol., 43, 4982.[Web of Science]
7 Cresse,A.D., Hulbert,S.H., Brown,W.E., Lucas,J.R. and Bennetzen,J.L. (1995) Genetics, 140, 315324.[Abstract]
8 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 33893402.
9 Usuka,J., Zhu,W. and Brendel,V. (1999) Bioinformatics, in press.
10 Brendel,V. and Kleffe,J. (1998) Nucleic Acids Res., 26, 47484757.
This article has been cited by other articles:
![]() |
A. Antofie, M. Lateur, R. Oger, A. Patocchi, C. E. Durel, and W. E. Van de Weg A new versatile database created for geneticists and breeders to link molecular and phenotypic data in perennial crops: the AppleBreed DataBase Bioinformatics, April 1, 2007; 23(7): 882 - 891. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Schnee, T. G. Kollner, M. Held, T. C. J. Turlings, J. Gershenzon, and J. Degenhardt The products of a single maize sesquiterpene synthase form a volatile defense signal that attracts natural enemies of maize herbivores PNAS, January 24, 2006; 103(4): 1129 - 1134. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Luo, P. Dang, B. Z. Guo, G. He, C. C. Holbrook, M. G. Bausher, and R. D. Lee Generation of Expressed Sequence Tags (ESTs) for Gene Discovery and Marker Development in Cultivated Peanut Crop Sci., January 1, 2005; 45(1): 346 - 353. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Lai, N. Dey, C.-S. Kim, A. K. Bharti, S. Rudd, K. F.X. Mayer, B. A. Larkins, P. Becraft, and J. Messing Characterization of the Maize Endosperm Transcriptome and Its Comparison to the Rice Genome Genome Res., October 1, 2004; 14(10a): 1932 - 1937. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. G. Kollner, C. Schnee, J. Gershenzon, and J. Degenhardt The Variability of Sesquiterpenes Emitted from Two Zea mays Cultivars Is Controlled by Allelic Variation of Two Terpene Synthase Genes Encoding Stereoselective Multiple Product Enzymes PLANT CELL, May 1, 2004; 16(5): 1115 - 1131. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. B. Kermekchiev, A. Tzekov, and W. M. Barnes Cold-sensitive mutants of Taq DNA polymerase provide a hot start for PCR Nucleic Acids Res., November 1, 2003; 31(21): 6139 - 6147. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Batley, G. Barker, H. O'Sullivan, K. J. Edwards, and D. Edwards Mining for Single Nucleotide Polymorphisms and Insertions/Deletions in Maize Expressed Sequence Tag Data Plant Physiology, May 1, 2003; 132(1): 84 - 91. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Dong, L. Roy, M. Freeling, V. Walbot, and V. Brendel ZmDB, an integrated database for maize genome research Nucleic Acids Res., January 1, 2003; 31(1): 244 - 247. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Fu, R. Meeley, and M. J. Scanlon empty pericarp2 Encodes a Negative Regulator of the Heat Shock Response and Is Required for Maize Embryogenesis PLANT CELL, December 1, 2002; 14(12): 3119 - 3132. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Piquemal, S. Chamayou, I. Nadaud, M. Beckert, Y. Barriere, I. Mila, C. Lapierre, J. Rigau, P. Puigdomenech, A. Jauneau, et al. Down-Regulation of Caffeic Acid O-Methyltransferase in Maize Revisited Using a Transgenic Approach Plant Physiology, December 1, 2002; 130(4): 1675 - 1685. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Schnee, T. G. Kollner, J. Gershenzon, and J. Degenhardt The Maize Gene terpene synthase 1 Encodes a Sesquiterpene Synthase Catalyzing the Formation of (E)-beta -Farnesene, (E)-Nerolidol, and (E,E)-Farnesol after Herbivore Damage Plant Physiology, December 1, 2002; 130(4): 2049 - 2060. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. B. Bruce, G. O. Edmeades, and T. C. Barker Molecular and physiological approaches to maize improvement for drought tolerance J. Exp. Bot., January 1, 2002; 53(366): 13 - 25. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Sheen Signal Transduction in Maize and Arabidopsis Mesophyll Protoplasts Plant Physiology, December 1, 2001; 127(4): 1466 - 1475. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Okagaki, R. G. Kynast, S. M. Livingston, C. D. Russell, H. W. Rines, and R. L. Phillips Mapping Maize Sequences to Chromosomes Using Oat-Maize Chromosome Addition Materials Plant Physiology, March 1, 2001; 125(3): 1228 - 1235. [Abstract] [Full Text] |
||||
![]() |
L. V. Sidorenko and T. Peterson Transgene-Induced Silencing Identifies Sequences Involved in the Establishment of Paramutation of the Maize p1 Gene PLANT CELL, February 1, 2001; 13(2): 319 - 335. [Abstract] [Full Text] |
||||
![]() |
J. Fernandes, V. Brendel, X. Gai, S. Lal, V. L. Chandler, R. P. Elumalai, D. W. Galbraith, E. A. Pierson, and V. Walbot Comparison of RNA Expression Profiles Based on Maize Expressed Sequence Tag Frequency Analysis and Micro-Array Hybridization Plant Physiology, March 1, 2002; 128(3): 896 - 910. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

, and the intron is indicated by dots.







