ABSTRACT
FlyBase is a database of genetic and molecular data concerning
Drosophila
. FlyBase is maintained as a relational database (in Sybase). The scope of
FlyBase includes: genes, alleles (and phenotypes), aberrations, pointers to
sequence data, clones, stock lists,
Drosophila
workers and bibliographic references. FlyBase is also available on CD-ROM for Macintosh systems (
Encyclopaedia of Drosophila
).
Drosophila melanogaster
is one of the most studied eukaryotic organisms. Introduced to `modern' biology
in the early years of this century, research with
D.melanogaster
has been at the forefront of most areas of biology, from genetics to ecology,
from neurobiology to evolution.
Drosophila
geneticists have been well served by a series of `catalogs' of mutations, the
first of which was published in 1925, and by regular publication (again, dating
from 1925) of bibliographies of the
Drosophila
literature. The last `conventional' catalog of the genes and mutations of
D.melanogaster
was published in 1992 (
3
), although data collection ceased in late 1989.
Before Lindsley and Zimm's catalog was published, discussions in the community
had led to the conclusion that an `electronic' database was essential, if the
rapidly increasing knowledge of
Drosophila
was to be available in a convenient form. As a consequence, beginning in
October 1992, the National Center for Human Genome Research of the NIH has
funded the FlyBase project with the objective of designing, building and
releasing a database of genetic and molecular information concerning this
insect. FlyBase has also received support from the Medical Research Council,
London.
The `core' of FlyBase is data concerning the genes and mutations of
Drosophila
:
Gene name: gene symbol; Synonym(s) name; Synonym(s) symbol; Genetic map
position; Polytene chromosome map position; Nature of gene product(s);
Molecular data; Gene expression pattern data; Similar genes in other organisms;
Database cross-references.
Allele(s) name: Allele(s) symbol; Allele synonym(s) name; Allele synonym(s) symbol; Origin of allele; Phenotypic information; Molecular data.
Clones (cosmids, P1s, YACS)
P-elements
Transposon constructs and their components
Chromosome aberrations
Bibliographic references
Stock lists
People
Allied data
FlyBase identifier numbers.
In their present form these data are a mixture of information in a highly
structured, parseable form and of free text. All of the data are available as
flat (ASCII) files, the majority being output of selected data sets from the
relational database implementation of FlyBase.
The taxonomic scope of FlyBase is the family Drosophilidae. Genetic data on
species other than
D.melanogaster
are few, although mutant catalogs for
D.buzzatii
(from J.S.F. Barker) and
D.ananassae
(from Y.N. Tobari) have been incorporated.
FlyBase and the Berkeley
Drosophila
Genome Project jointly produce the
Encyclopaedia of Drosophila.
This incorporates their data in ACeDB format and is available for both Unix and
Macintosh systems. The Mac version is published as a CD-ROM (see below).
All genes, alleles, aberrations, bibliographic records and species in FlyBase
have unique identifiers that allow them to be referenced both within FlyBase
and externally. FlyBase identifiers are of the form: FBxxnnnnnnn, where xx is a
two-letter code signifying the type of identifier and nnnnnnn is a 7 digit
number padded with leading zeros. Identifier codes now used are:
FBgn
gene identifier (eg FBgn0001234)
FBal
allele identifier
FBab
aberration identifier
FBrf
bibliographic reference identifier
FBsp
species identifier
FBmc
construct identifier
FBba
balancer
FBtp
engineered transposon.
In October 1995, FlyBase included information on >9000 genes, nearly 26 000
alleles and >11 000 chromosome aberrations. Except for historical data,
inherited from Lindsley and Zimm (
3
) for example, all data are attributed to a single publication (including
personal communications to FlyBase; these are archived and made accessible to
users). The data are a mixture of free text and controlled syntax. FlyBase uses
a standard controlled vocabulary of terms to describe, for example, mutagens
and anatomical parts of
Drosophila
.
The genetic nomenclature of
D.melanogaster
is chaotic (though perhaps not in the technical sense of this word). FlyBase
has written a document on nomenclatural standards for the community
(flybase/nomenclature/nomenclature.txt; FlyBase, 1995). The synonymy of
Drosophila
gene, allele and aberration names is very extensive. Valid gene, allele and
aberration names are accessible from a file of >31 000 synonyms.
All map data are stored in FlyBase in a common form, regardless of whether these
data are genetic, cytogenetic or molecular. This allows FlyBase to output
integrated maps in a variety of formats. Tools have been written to output
information from the relational tables in a variety of forms, including
graphical and tabular. For example, the CytoSearch tool on the FlyBase World
Wide Web (WWW) server (see below) allows users to query the map data in a
number of different ways (e.g. `output all of the genes known to map between
35B1 and 35C1 on the polytene chromosomes', `output all of the deletions that
uncover 35B1', `output all of the cosmid clones on the X chromosome').
A graphical map tool that allows the display of selected classes of object
(genes, aberrations, clones) from an image that represents the chromosomes can
be accessed through either FlyBase server (see below) with a graphics-capable browser such as Mosaic or Netscape. Graphical maps of genes,
clones and clone contigs are incorporated in the
Encyclopaedia of Drosophila
.
A key feature of FlyBase is a comprehensive bibliography of conventional and
unconventional (e.g. films, archival material) publications on the family
Drosophilidae, covering all aspects of study. This includes the complete texts
of all of the published
Drosophila
bibliographies and information from major external resources, such as MEDLINE,
BIOSIS, the Zoological Record and the Environmental Mutagen Information Center
(by permission). The bibliography is updated from these and other sources. To
ensure consistency there is a satellite file of all `multi-publication' sources, for example journals and edited publications, which
includes full names, dates and places of publication, volume number ranges and
ISBNs or ISSNs and CODENS. By far the greater part of these data have been
checked on the Library of Congress and University of Cambridge online catalogs.
Bibliographic records are coded as to type (e.g. journal article, thesis, book,
film). As of October 1995 the number of bibliographic records was nearly 75
000. This includes >4000 theses.
FlyBase uses a controlled vocabulary to summarise the (molecular) nature of a
gene's product. For enzymes, the EC names and EC numbers are included; for non-enzyme proteins a `trivial' name (e.g. actin, calmodulin) is used but the
data are redundant so as to ease access. At present the data are a mixture of
classification by function, or, more often, inferred function (e.g.
transcription factor), classification by structure (e.g. homeodomain protein)
and classification by both structure and function (e.g. tRNA). These fields
also store cross-references to the PROSITE database, by PROSITE name and number.
FlyBase is now working with others to construct a hierarchical classification of
the functions of gene products for use in genomic databases.
FlyBase extensively cross references its objects with those in other genetic and
molecular databases. FlyBase receives daily updates of new and revised records
from the EMBL/DDBJ/GenBank databases and stores their primary accession numbers
in the gene, allele or aberration records. FlyBase also stores cross-references (by accession no.) to both SwissProt and PIR, as well as to the
Eukaryotic Promoter Database (EPD), dbSTS, dbEST, TRANSFAC, PDB, NRL_3D and GCR
databases. FlyBase now includes >5300 accession no. cross-references to the EMBL/DDBJ/GenBank database, 900 to SwissProt and 1800 to
PIR. FlyBase provides these external databases with flat file tables of their
accession numbers linked to FlyBase accession numbers, encouraging reciprocal
DBXREF links.
FlyBase is developing reports and query tools for exploring molecular data.
These reports will include the molecular organization of genes and detailed
information on transcript and protein products.
FlyBase collaborates very closely with both the Berkeley and European
Drosophila
Genome Projects. An integrated list of P1, cosmid and YAC clones from these
projects is available and can be searched by cytological location.
A prototype report for P-element and vector constructs has recently been incorporated within the
SymbolSearch tool (see below). These reports provide graphical maps of
transposons and plasmids linked to sequence data and, where appropriate, to
mutant alleles and publications. For each transposon and vector there are links
to the components used in its construction. At present the data are only
complete for 142 P-element transposons (and their progenitors and components) found in stocks
at the Bloomington
Drosophila
Stock Center. FlyBase is now extending curation to all published transposons
used by
Drosophila
biologists.
One of the most urgent needs for those building genetic databases is a stable
mechanism to cross-reference genes (and other objects) between organisms. In the absence of
such a mechanism FlyBase now simply includes the gene symbol and organism of
loci said, by investigators, to encode a similar (or homologous) product.
FlyBase provides access to the stock lists of the three major stock centers for
D.melanogaster
(Bloomington, Mid-America and Umea) and for that of the
Drosophila
species stock center at Bowling Green. It also provides access to the stock
lists of individual laboratories, if these are provided to FlyBase. FlyBase
works with the major
Drosophila
stock centers to ensure consistency of nomenclature.
FlyBase maintains a directory of names, addresses, telephone and fax numbers and
e-mail addresses of people in the
Drosophila
community. Those with particular roles in the community (e.g. principal investigators, stock-keepers, members of the
Drosophila
Board) are tagged. There are now >4900 records in this directory.
FlyBase cannot, and should not, be wholly comprehensive. We encourage others to
build specialised databases. At present FlyBase offers help in linking these to
FlyBase (by the use of FB identifiers, for example) and in making these
available through the FlyBase servers. Several databases of allied data are now
available: these include a complete list of valid species in the family
Drosophilidae (Dr G. Bachli, Zurich), a catalog of polytene chromosome sites
that are recognised by antibodies (Dr S. Amero, Chicago, IL), a
Drosophila
codon usage table and the
Drosophila
records of the Transcription Termination Signal Database. All
Drosophila
records of the Environmental Mutagen Information Center are available from
FlyBase, as allied data.
FlyBase has a depository for images (flybase/allied-data/images). A project, with Dr N. Patel, to capture images of enhancer
trap lines is underway. These images will be linked to other objects (e.g.
stocks, alleles) in FlyBase.
Although not allied data, FlyBase makes the complete unchanged text of Lindsley
and Zimm (
3
) available (by permission of Academic Press) and keeps a file of errors in this
book that have been noticed. The text of the earlier Lindsley and Grell (
2
) is also available on FlyBase.
FlyBase is built with a relational database management system (Sybase). The
present schema has been implemented for most of the data and most files
accessed via the FlyBase servers are the products of the Sybase tables. The
schema is now being extended to accommodate physical maps and sequences from
the major
Drosophila
genome projects.
FlyBase data are maintained by curators working from the literature and filling
in a standard form that is parsed into the Sybase tables.
FlyBase provides users with a variety of modes of access: WWW, gopher, ftp of
flat files and, via the
Encyclopaedia of Drosophila,
which uses a version ACeDB.
FlyBase is currently accessible at three sites, Harvard and Indiana
Universities. The FlyBase WWW server at Harvard (see below for addresses) gives
access to the search tools CytoSearch (for searching on the basis of
cytological map position) and SymbolSearch (for searching by gene, aberration
or transposon symbol, with full support of wild cards) as well as access to the
full set of FlyBase data services available through the Indiana server. The
Harvard server only supports http, and therefore requires a browser such as
Mosaic, Netscape or Lynx. The server at Indiana supports multiple protocols,
including http, Gopher and ftp and is accessible by a wide range of clients.
Structured flat files that are output from Sybase are available to query, copy
or browse. Users of interactive clients (e.g. Gopher+, Mosaic, Netscape) can
request stocks, update or add to the directory of
Drosophila
workers, and send e-mail to the FlyBase consortium from within FlyBase.
The flat files derived from the Sybase tables are often available in several
formats, as well as being indexed for queries. For example, the bibliography is
available in Unix REFER format (which can be used by many bibliographic
packages) as well as in text and `comma separated values' formats. The genetic
data are available in readable text formats and in a format in which different
fields are coded (the latter allow users to write simple code to construct
their own queries on the data).
FlyBase publishes a subset of the data in printed form as special issues of the
Drosophila Information Service.
Two such issues were published in June 1994:
DIS 73
includes data on gene loci, gene function and gene and allele synonyms;
DIS 74
is a bibliography of the
Drosophila
literature for the period 1982-1993.
FlyBase and the Berkeley
Drosophila
Genome Project jointly publish
The Encyclopaedia of the Drosophila Genome
(version 2, October 1995). This presents a merge of the information in FlyBase
with the data of the Berkeley project viewable via an ACeDB client. This
collaborative project has involved the customisation of ACeDB for
Drosophila
(by Suzanna Lewis in Berkeley), a port of ACeDB to Mac platforms (by Cyrus
Harmon in Berkeley) and an interface between Sybase and ACeDB (by Eddy
Welbourne in Cambridge).
The Encyclopaedia
is available from FlyBase as a CD-ROM (for Macs) or by ftp from the Indiana FlyBase server for Unix and Mac
platforms.
Interaction with the user community is vital for the success of FlyBase. We
encourage the submission of new data, the correction of errors and ideas for
making this database of even greater use to the community.
A complete FlyBase Reference Manual is available from FlyBase servers as either
text or Postscript files (flybase/docs/Reference-manual.txt; flybase/docs/Reference-manual.ps). A shorter User Manual is also available
(flybase/docs/User-manual.txt and .ps) as is a brief introduction `About FlyBase'
(flybase/About-flybase).
News about changes to FlyBase is posted to the bionet.drosophila news group.
We suggest that FlyBase be referenced as follows:
FlyBase (1995). FlyBase - The
Drosophila
Database. Available from the flybase.bio.indiana.edu network server and Gopher
site and at the URL http://morgan.harvard.edu/.
Nucleic Acids Res.
,
24
, 53-56.
We suggest that the abbreviation FB be used for FlyBase, regardless of the
particular FlyBase product.
The Harvard FlyBase server has the URL
The Indiana FlyBase server has the URLs http://flybase.bio.indiana.edu:82/ and
gopher://flybase.bio.indiana.edu:72/) for use with WWW browsers. The gopher
server can be addressed from gopher clients at flybase.bio.indiana.edu. These
are mirrored at the European Bioinformatics Institute
(http://www.ebi.ac.uk/flybase/).
FTP to flybase.bio.indiana.edu (129.79.225.25) with the username anonymous and
your e-mail address as password. FlyBase is in the directory /flybase.
A CD-ROM of the
Encyclopaedia of Drosophila
(for Macs) can be purchased at nominal cost from Ms D. Palmer, Biological
Laboratories, 16 Divinity Avenue, Harvard University, Cambridge, MA 02138, USA
(FAX +1 617 495 9300).
The
Encyclopaedia of Drosophila
is available for Unix systems (Sun, SGI and DEC Alpha) and for Macs by ftp from
flybase.bio.indiana.edu (login with username eofd and password FlyBase).
Questions about the Indiana FlyBase server may be addressed to
flybase{at}bio.indiana.edu.
Requests for help and questions about FlyBase should be addressed to flybase-help{at}morgan.harvard.edu. Reports of errors in FlyBase or data updates,
should be addressed to flybase-updates{at}morgan.harvard.edu. Mail may be addressed to FlyBase, Biological
Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA.
FlyBase is supported by a grant from the National Institutes of Health (National
Center for Human Genome Research). It has also been supported by a grant from
the Medical Research Council, London. John Merriam (UCLA) was a member of the
consortium until July 1994. We thank him for his invaluable contributions.
REFERENCES
Return