Skip Navigation


Nucleic Acids Research Advance Access originally published online on September 10, 2008
Nucleic Acids Research 2009 37(Database issue):D567-D570; doi:10.1093/nar/gkn583
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (1150K) Freely available
Right arrow Screen PDF (214K) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
37/suppl_1/D567    most recent
gkn583v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Chen, D.
Right arrow Articles by Suzuki, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chen, D.
Right arrow Articles by Suzuki, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2009, Vol. 37, Database issue D567-D570
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article appears in the following Nucleic Acids Research issue: Database issue [View the issue table of contents]

Articles

FLYSNPdb: a high-density SNP database of Drosophila melanogaster

Doris Chen1,2,*, Jürg Berger1, Michaela Fellner1,2 and Takashi Suzuki1

1Research Institute of Molecular Pathology (IMP), Dr Bohr-Gasse 7 and 2Institute of Molecular Biotechnology (IMBA), Dr Bohr-Gasse 3, A-1030 Vienna, Austria

*To whom correspondence should be addressed. Tel: +43 1 79044 4513; Email: doris.chen{at}univie.ac.at

Received August 15, 2008. Accepted August 28, 2008.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 DATA SOURCE
 DATABASE CONTENT
 IMPLEMENTATION, USAGE AND ACCESS
 RECENT AND FUTURE DEVELOPMENTS
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
FLYSNPdb provides high-resolution single nucleotide polymorphism (SNP) data of Drosophila melanogaster. The database currently contains 27 367 polymorphisms, including >3700 indels (insertions/deletions), covering all major chromsomes. These SNPs are clustered into 2238 markers, which are evenly distributed with an average density of one marker every 50.3 kb or 6.6 genes. SNPs were identified automatically, filtered for high quality and partly manually curated. The database provides detailed information on the SNP data including molecular and cytological locations (genome Releases 3–5), alleles of up to five commonly used laboratory stocks, flanking sequences, SNP marker amplification primers, quality scores and genotyping assays. Data specific for a certain region, particular stocks or a certain genome assembly version are easily retrievable through the interface of a publicly accessible website (http://flysnp.imp.ac.at/flysnpdb.php).


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 DATA SOURCE
 DATABASE CONTENT
 IMPLEMENTATION, USAGE AND ACCESS
 RECENT AND FUTURE DEVELOPMENTS
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Drosophila melanogaster is one of the most well-studied model organisms due to its short generation time and ease of genetic manipulation. Hence, it is continuously providing major insights into biological processes which are conserved in multicellular organisms. Single nucleotide polymorphisms (SNPs) are widely used as genetic markers in mapping experiments, quantitative trait loci (QTL) analyses, population genetic or evolutionary studies, since they are frequent, mostly phenotypically neutral and molecularly defined. FLYSNPdb contains data of a polymorphism map with an unprecedented resolution of ~50 kb between SNP markers, which is significantly higher than the density of previous Drosophila SNP maps (1–4). Polymorphisms >1 nt were also counted, including indels, which are particularly useful for genotyping assays based on PCR-product length polymorphisms [PLP; (2)] or denaturing high performance liquid chromatography [DHPLC; (5)] or also for evolutionary analyses (6,7). The map comprises SNPs from five different D. melanogaster stocks (Supplementary Table 1). Since polymorphisms in Drosophila are generally bi-allelic and randomly distributed among the utilized strains, we anticipate that most of our SNP markers can be used to discriminate almost any other pair of Drosophila stocks. FLYSNPdb is part of the FLYSNP website (http://flysnp.imp.ac.at/), which provides detailed information on the practical aspects of SNP mapping and genotyping in Drosophila (8) as well as a user guide for the database, a glossary and protocols. With this database, we want to provide a versatile SNP data resource, which is easy to use and has a user-friendly web interface.


    DATA SOURCE
 TOP
 ABSTRACT
 INTRODUCTION
 DATA SOURCE
 DATABASE CONTENT
 IMPLEMENTATION, USAGE AND ACCESS
 RECENT AND FUTURE DEVELOPMENTS
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
For SNP identification, we designed primer pairs to amplify fragments which are ~1 kb long (9), equally distributed along each major chromosome arm (X, 2L, 2R, 3L and 3R), and which preferentially lie in unique, non-protein coding regions (Figure 1). Genomic DNA of up to five standard laboratory stocks per amplicon served as template: besides the wild-type stocks Canton S and Oregon R, we selected for each chromosome arm one strain that carries visible recessive markers, one stock with a Flp recombinase target (FRT) element (10) close to the centromere, and one stock with an enhancer-promoter P- (EP) element at the chromosome tip and a visible white+ marker (11) (see also Supplementary Table 1). The wild-type and FRT stocks are commonly utilized in mutagenesis screens, and the recessive marker as well as EP stocks are useful for identification of recombination events in defined chromosomal regions (2,4). PCR products were sequenced in both orientations, each using one of the amplification primers as sequencing primer. In total, >2.3 Mb (1.7%) of the 117 Mb long euchromatic region of the D. melanogaster genome were resequenced and analysed. After sequencing the PCR fragments, the Phred/Cross_match/PolyBayes software package (12–14) was used for trace quality assessment, alignment to the reference genome (strain y; cn bw sp) (15,16), and automated SNP discovery. In order to obtain high-quality data, SNPs at the first and last 75 bases of an amplicon or below Phred score 20 were omitted. In addition, ~27% of the alignments were visually inspected [with the help of Consed 11.0 (17)], which was particularly necessary for detection of long indels (>6 bases). If multiple sequence reads from the same stock were available at one site, the allele with the highest Phred score was selected. Moreover, SNPs located at adjacent loci were considered as a single polymorphic site. Of the analysed amplicons, 86.9% contained at least one polymorphism in any of the examined stocks. The SNP positions were updated to Release 5 (FB2006_01) of the D. melanogaster genome by aligning 40 bp of the sequences (from Release 3 or 4) flanking each SNP site to the new reference sequence using Blastn (18). Prediction of restriction fragment length polymorphism (RFLP) sites was accomplished with the help of Remap [EMBOSS software suite (19,20)] and the REBASE list of commercially available restriction enzymes with cut sites ≥4 bp (21,22).


Figure 1
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Data source pipeline for SNP identification, data retrieval and curation. Software tools are displayed below each task (for references please see text); if not otherwise stated, costum-made scripts were used.

 

    DATABASE CONTENT
 TOP
 ABSTRACT
 INTRODUCTION
 DATA SOURCE
 DATABASE CONTENT
 IMPLEMENTATION, USAGE AND ACCESS
 RECENT AND FUTURE DEVELOPMENTS
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
The FLYSNPdb data set currently comprises >81 700 SNP alleles at 27 367 sites in 2238 amplicons of about 1 kb length (Table 1). One SNP marker contains in average 12 polymorphisms, the maximal SNP count per marker is 73. The average distance between SNP markers is 50.3 kb, a region in which one can find in average 6.6 genes [according to the FlyBase Release 5.10, FB2008_07 annotation (23)]. The biggest gap between markers is 360 kb long and lies at the tip of chromosome arm 3R, between cytological region 82A1 and 82C3. Only 169 polymorphic loci (0.6% of total SNPs) are tri-allelic, the rest is bi-allelic. A total of 13.7% (3743) of the SNPs are indels, which are up to 360 bp long, but predominantly (96.4%) <10 bp (46.6% of the indels are 1 nt long). For any given stock-pair, the average percentage of SNP markers with a sequence divergence between these two stocks is 76.6%, ranging from 35.3% to 92.0%. Furthermore, the database provides information on the molecular and cytological SNP locations for three genome assembly versions [Release 3–5; (15)], together with the 30 bp flanking sequences as additional site identification feature. For data quality assessment, PolyBayes probability scores (14), Phred trace quality scores (12), as well as the number of sequence reads per alignment are available, and manually curated SNPs are indicated. Since non-coding regions are more polymorphic, we have put our focus on non-exonic regions. If SNPs lie within an intron or exon (according to FlyBase Release 5.10), the corresponding gene name is also retrievable. In addition, information on SNP marker amplification primers is available for genotyping assays which are based on sequencing. Polymorphisms that are suitable for RFLP assays (SNPs which result in differential restriction enzyme sites) or for which verified PLP or tag-array mini-sequencing (TAMS) assays (8) are available, are also indicated, including further information like verified primers or suitable restriction enzymes.


View this table:
[in this window]
[in a new window]

 
Table 1. Number of SNPs in FLYSNPdb, per chromosome arm and in total

 

    IMPLEMENTATION, USAGE AND ACCESS
 TOP
 ABSTRACT
 INTRODUCTION
 DATA SOURCE
 DATABASE CONTENT
 IMPLEMENTATION, USAGE AND ACCESS
 RECENT AND FUTURE DEVELOPMENTS
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
All data are organized and stored in a relational database. For increasing the speed of web queries, several summarizing tables were precomputed and put into a MySQL database which is accessed through PHP scripts.

The form on the first page asks the user to specify the chromosomal region and two stocks for which data on differential polymorphisms will be retrieved (Figure 2). The region can be indicated as molecular coordinates (position 1 – position 2 or position 1 + length) or as cytological segment (region 1 – region 2). Furthermore, it is possible to select whole chromosome arms by leaving the ‘Location’ field blank, or getting all data by selecting the ‘Browse all’ option. SNP data can be viewed as list of SNP markers (including SNP count, amplification primer sequences) or as table of SNP sites (with alleles, flanking sequences, etc.). Additional information concerning quality scores, genotyping assay suitability or coding information (genic, intronic or exonic) can be optionally selected. For users of the previous FLYSNP database version, old identifiers (ids) are retrievable and a link to this version is provided. On each query result page, sub-selections can be made by clicking on the checkboxes at the left side of each row, or by entering search parameters in the fields below each column (Figure 2). The tables are downloadable, e.g. as tab-separated text files which can be easily imported into commonly used databases or Excel spreadsheets, or as track files which can be uploaded to the FlyBase genome viewer [GBrowse; (24)]. As an additional feature, a link to FlyBase GBrowse is provided for the graphical display of the region previously specified by the user.


Figure 2
View larger version (47K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Screenshots of FLYSNPdb input form and query result. On the first page, the user selects chromosomal region and stocks as well as different view options. On the search result page, further features such as table download or sub-queries are available.

 

    RECENT AND FUTURE DEVELOPMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 DATA SOURCE
 DATABASE CONTENT
 IMPLEMENTATION, USAGE AND ACCESS
 RECENT AND FUTURE DEVELOPMENTS
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
The FLYSNPdb data were recently submitted to dbSNP (NCBI, Release 129; http://www.ncbi.nlm.nih.gov/projects/SNP/) so that direct linkage to the FlyBase data repository is feasible. Furthermore, sequence traces and alignments will be provided for users who would like to see the raw data for detailed quality assessment. We are open to help users with their individual needs and will implement suggestions of common use.


    SUPPLEMENTARY DATA
 TOP
 ABSTRACT
 INTRODUCTION
 DATA SOURCE
 DATABASE CONTENT
 IMPLEMENTATION, USAGE AND ACCESS
 RECENT AND FUTURE DEVELOPMENTS
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Supplementary data are available at NAR Online.


    FUNDING
 TOP
 ABSTRACT
 INTRODUCTION
 DATA SOURCE
 DATABASE CONTENT
 IMPLEMENTATION, USAGE AND ACCESS
 RECENT AND FUTURE DEVELOPMENTS
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
European Union Fifth Framework Programme (QLRI-CT-2001-00004);Boehringer Ingelheim GmbH; Japan Society for the Promotion of Science. Funding for open access charge: IMP.


    ACKNOWLEDGEMENTS
 
We thank the whole FLYSNP Consortium, especially Barry Dickson (IMP) for initiation and coordination of the FLYSNP project; Montserrat Aguadé and Dorcas Orengo (Univ. of Barcelona) for providing primers. We are also grateful to Cerebrum Web Consulting for initial setup of the first FLYSNP database version, Werner Kubina and Christian Brandstaetter (IMP) for IT support, Gotthold Schaffner (IMP) for sequencing and Angela Graf (IMP) for stock keeping.


    Footnotes
 
Present addresses: Doris Chen, Department of Biochemistry, University of Vienna, Max F. Perutz Laboratories (MFPL), c/o IMBA, Dr Bohr-Gasse 3, A-1030 Vienna, Austria

Jürg Berger, Roche Austria GmbH, Engelhorngasse 3, A-1210 Vienna, Austria

Michaela Fellner, Vienna Drosophila Research Center (VDRC), Dr Bohr-Gasse 3, A-1030 Vienna, Austria

Takashi Suzuki, Max Planck Institute of Neurobiology, Am Klopferspitz 18, D-82152 Martinsried, Germany


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 DATA SOURCE
 DATABASE CONTENT
 IMPLEMENTATION, USAGE AND ACCESS
 RECENT AND FUTURE DEVELOPMENTS
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 

  1. Teeter K, Naeemuddin M, Gasperini R, Zimmerman E, White KP, Hoskins R, Gibson G. Haplotype dimorphism in a SNP collection from Drosophila melanogaster. J. Exp. Zool. (2000) 288:63–75.[CrossRef][Web of Science][Medline]

  2. Berger J, Suzuki T, Senti KA, Stubbs J, Schaffner G, Dickson BJ. Genetic mapping with SNP markers in Drosophila. Nat. Genet. (2001) 29:475–481.[CrossRef][Web of Science][Medline]

  3. Hoskins RA, Phan AC, Naeemuddin M, Mapa FA, Ruddy DA, Ryan JJ, Young LM, Wells T, Kopczynski C, Ellis MC. Single nucleotide polymorphism markers for genetic mapping in Drosophila melanogaster. Genome Res. (2001) 11:1100–1113.[Abstract/Free Full Text]

  4. Martin SG, Dobi KC, St Johnston D. A rapid method to map mutations in Drosophila. Genome Biol (2001) 2. RESEARCH0036.

  5. Nairz K, Stocker H, Schindelholz B, Hafen E. High-resolution SNP mapping by denaturing HPLC. Proc. Natl Acad. Sci. USA (2002) 99:10575–10580.[Abstract/Free Full Text]

  6. Lunter G. Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes. Bioinformatics (2007) 23:i289–i296.[Abstract/Free Full Text]

  7. Lunter G, Ponting CP, Hein J. Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput. Biol. (2006) 2:e5.[CrossRef][Medline]

  8. Chen D, Ahlford A, Schnorrer F, Kalchhauser I, Fellner M, Viragh E, Kiss I, Syvanen AC, Dickson BJ. High-resolution, high-throughput SNP mapping in Drosophila melanogaster. Nat. Methods (2008) 5:323–329.[Web of Science][Medline]

  9. Rozen S, Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. (2000) 132:365–386.[Medline]

  10. Xu T, Rubin GM. Analysis of genetic mosaics in developing and adult Drosophila tissues. Development (1993) 117:1223–1237.[Abstract]

  11. Rorth P, Szabo K, Bailey A, Laverty T, Rehm J, Rubin GM, Weigmann K, Milan M, Benes V, Ansorge W, et al. Systematic gain-of-function genetics in Drosophila. Development (1998) 125:1049–1057.[Abstract]

  12. Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. (1998) 8:175–185.[Abstract/Free Full Text]

  13. Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. (1998) 8:186–194.[Abstract/Free Full Text]

  14. Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR. A general approach to single-nucleotide polymorphism discovery. Nat. Genet. (1999) 23:452–456.[CrossRef][Web of Science][Medline]

  15. Celniker SE, Rubin GM. The Drosophila melanogaster genome. Annu. Rev. Genomics Hum. Genet. (2003) 4:89–117.[CrossRef][Web of Science][Medline]

  16. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. The genome sequence of Drosophila melanogaster. Science (2000) 287:2185–2195.[Abstract/Free Full Text]

  17. Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res. (1998) 8:195–202.[Abstract/Free Full Text]

  18. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. (1990) 215:403–410.[CrossRef][Web of Science][Medline]

  19. Rice P, Longden I, Bleasby A. EMBOSS: the European molecular biology open software suite. Trends Genet. (2000) 16:276–277.[CrossRef][Web of Science][Medline]

  20. Olson SA. EMBOSS opens up sequence analysis. European molecular biology open software suite. Brief Bioinform. (2002) 3:87–91.[Free Full Text]

  21. Roberts RJ, Macelis D. REBASE–restriction enzymes and methylases. Nucleic Acids Res. (1993) 21:3125–3137.[Free Full Text]

  22. Roberts RJ, Vincze T, Posfai J, Macelis D. REBASE–enzymes and genes for DNA restriction and modification. Nucleic Acids Res. (2007) 35:D269–D270.[Abstract/Free Full Text]

  23. Drysdale RA, Crosby MA. FlyBase: genes and gene models. Nucleic Acids Res. (2005) 33:D390–D395.[Abstract/Free Full Text]

  24. Crosby MA, Goodman JL, Strelets VB, Zhang P, Gelbart WM. FlyBase: genomes by the dozen. Nucleic Acids Res. (2007) 35:D486–D491.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
DevelopmentHome page
M. Fujioka, X. Wu, and J. B. Jaynes
A chromatin insulator mediates transgene homing and very long-range enhancer-promoter communication
Development, September 15, 2009; 136(18): 3077 - 3087.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (1150K) Freely available
Right arrow Screen PDF (214K) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
37/suppl_1/D567    most recent
gkn583v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Chen, D.
Right arrow Articles by Suzuki, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chen, D.
Right arrow Articles by Suzuki, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?