Nucleic Acids Research, 2002, Vol. 30, No. 1 38-41
© 2002 Oxford University Press
The Ensembl genome database project
The Wellcome Trust Sanger Institute and 1European Bioinformatics Institute (EMBLEBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
Received August 20, 2001; Revised and Accepted October 31, 2001.
| ABSTRACT |
|---|
|
|
|---|
The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.
| INTRODUCTION |
|---|
|
|
|---|
A genome sequence provides a natural framework about which to organise biological data. In the short time in which genome sequences have been available, genome databases have proved invaluable resources to researchers. In the case of human, the range of existing biological data and the types of researchers is even wider than for other organisms, stretching from clinical genetics to molecular biology. The availability of the draft human genome sequence enables these huge amounts of data, ranging from records of disease in our species to the sequences of related organisms, to be brought together systematically for the first time.
The Ensembl project is actively addressing this by providing a database of human genome annotation (http://www.ensembl.org/). This is being continuously expanded to include an increasing range of data types (vertical integration) as well as to build comparative genome sequence views as sequences of vertebrate genomes, such as mouse, rat and zebrafish, become available (horizontal integration). The database is being built on a very general and carefully engineered software framework that is being developed in parallel with the data integration. By making all software freely available and designing the system to be completely portable, Ensembl aims to provide a bioinformatics framework that is easy to apply to different organisms and types of data. The hope is that in the spirit of open source community projects such as Linux, Ensembl will be widely adopted and will allow database researchers and developers more time to focus on innovation.
| Ensembl GENOME ANNOTATION |
|---|
|
|
|---|
Ensembl annotates known genes and predicts novel genes, with functional annotation from the InterPro (1) protein family databases and with additional annotation by OMIM disease (2), SAGE expression (3,4) and by gene family (5).
Prediction of genes is the most important part of genome annotation, connecting the DNA sequence with the wide array of experimental data. In eukaryotic organisms with large introns, ab initio predictions are useful but have a high false positive rate and often predict partially incorrect gene structures. Thus, incorporation of all available evidence for gene prediction is necessary.
The Ensembl gene build system incorporates a wide range of methods including ab initio gene predictions, homology and gene prediction HMMs. Genes are placed in the genome using a three step process. First, best in genome positions for all known human proteins from SPTREMBL (6) are found using a fast protein to DNA matcher (pmatch, R. Durbin, unpublished software). These positions are refined using genewise (7) to provide an accurate gene structure. UTRs are also aligned to each gene structure using full-length cDNAs where known. Secondly, a similar process is used to align paralogous human proteins and proteins from other organisms to the genome to form a set of novel human genes. Finally, the ab initio program genscan (8) is run across the entire genome to create a set of genscan peptides. Exons from these predicted peptides that are confirmed by blast matches to proteins, vertebrate mRNA and UniGene clusters are assembled into genes.
The above process creates a set of transcripts and these are grouped into genes wherever an exon is shared. These Ensembl genes are regarded as being accurate predicted gene structures with a low false positive rate, since they are all supported by experimental evidence of at least one form via sequence homology. Ensembl human genes are identified by numbers beginning ENSG (transcripts begin ENST, exons begin ENSE and translations begin ENSP). These identifiers are keep stable, as far as is possible, between assemblies of the human genome.
Ensembl is continuously refining and extending its gene building process, calibrating it against regions of the genome that have been hand annotated and experimentally investigated, such as human chromosome 22 (9). We are in the process of integrating EST data into Ensembl gene building. ESTs offer a considerable advantage in aiding the prediction of non-coding exons, especially those located within the 3'-UTR. Two EST/genome alignment algorithms, namely exonerate (G. Slater, unpublished) and EST_genome (10), have been integrated with the Ensembl gene-building pipeline to yield gene predictions incorporating EST alignments. Because EST data are notorious for their high sequence error rate, strict quality measures have been introduced such that only splicing ESTs are considered, and priority is given to those ESTs which align on the genome into clusters.
The whole genome shotgun (WGS) sequence of the mouse genome (data generated by the mouse sequencing consortium) is another rich source for identifying human genes. We have developed a very fast gapped DNADNA alignment algorithm exonerate and have used it to align 14 million mouse reads to the assembled human genome. We have found that matches between human and mouse can be assessed using genscan to indicate those which are potentially novel coding exons.
| Ensembl WEB SITE |
|---|
|
|
|---|
The Ensembl automatic annotation of the human genome sequence is available as an interactive web service (http://www.ensembl.org/).
A view of a region of genomic sequence is shown in Figure 1. Ensembl contigview web pages feature the ability to scroll along entire chromosomes, while viewing the features within a selected region in detail. Features are integrated from external data sources such as HUGO gene names, genetic markers, disease genes and SNPs, with links to primary databases. The user can control which features are displayed and dynamically integrate external DAS data sources as well as their own annotation (see below). Matches between WGS mouse reads and the human genome from exonerate are also displayed. The individual mouse reads can be accessed via the EBI trace server which is also provided via Ensembl (http://trace.ensembl.org/). There is an integrated, context sensitive, searchable help system which can be accessed by selecting the help button on any page.
|
The Ensembl web site provides a variety of alternate views of the data. These include mapview web pages, which show relationships between cytogenetic bands and the genome sequence via markers, and displays feature distribution plots; geneview web pages showing information about individual Ensembl gene with its transcripts and gene structures and proteinview web pages, showing information about individual Ensembl translations with functional annotation from InterPro. Similarity searching is also integrated into the web site. BLAST (11) and SSAHA (http://www.sanger.ac.uk/Software/analysis/SSAHA/) search tools are available against the entire human genome sequence, predicted gene datasets and mouse genome trace and whole genome assembly datasets.
Ensembl can be accessed in a variety of ways apart from web pages. Ensembl annotation can also be viewed interactively using the Apollo Java viewer, which is being developed as a collaborative project between the Berkeley Drosophila Genome Project (http://www.bdgp.org/) and Ensembl. The Ensembl FTP site provides a variety of data download formats, e.g. FASTA files of gene and protein sequences; EMBL and GenBank formats containing annotation of the raw genomic sequence. This includes the full dumps of the MySQL database used by the web site (see below). Extensive data dumping tools are also available from the contigview web pages, allowing regions to be selected and dumped in many flat file formats. Regions can also be dumped as graphical images for printing in a variety of formats and layouts.
Currently Ensembl has annotated human and mouse sequence available via its web site. We are in the process of annotating worm, fly, fugu and mosquito in collaboration with their respective genome communities.
| Ensembl SOFTWARE SYSTEM |
|---|
|
|
|---|
To achieve scalability and consistency of annotation we have developed a portable software system based around a relational database and a series of reusable components. We use Bioperl as a base bioinformatics library (http://www.bioperl.org/) and the free MySQL relational database. The entire Ensembl source code is freely available under an Apache open source licence. It is mainly written in Perl, but with extensions in C and some alternative interfaces are available in Java.
The architecture of the software is split into biologically meaningful objects (business objects) and database connectivity objects (adaptors). This separation makes it easier to evolve the schema to address new datatypes or analyses while maintaining code stability. New datasets can be added easily by providing the necessary adaptors and business objects.
One of the core design features of the system is the Virtual Contig (VC) object, which allows access to genomic sequence and its annotation as if it was a continous piece of DNA in a 1N coordinate space, regardless of how it is stored in the database. This is important since it is impractical to store large genome sequences as continuous pieces of DNA, not least because this would mean updating the entire genome entry whenever any single base changed. The VC object handles reading and writing of features and behaves identically regardless of whether the underlying sequence is stored as a single real piece of DNA (a single raw contig) or an assembly of many fragments of DNA (many raw contigs). Because features are always stored at the raw contig level, virtual contigs really are virtual and as a result are less fragile to sequence assembly changes. It is this feature that allows Ensembl to handle draft genome data in a seamless way and makes it possible to change between different genome assemblies relatively painlessly. This feature should also put us in a good position to handle haplotype sequences efficiently as they become available.
Access to the software is via FTP to stable snapshots or via a CVS server to live development code. As an open source project we have an active community of both academic and commercial developers using CVS. The entire system is portable as well as its individual components, including the web site and analysis pipeline. This allows users to install the system to process their own genome data as well. By downloading the MySQL dumps it is also possible to setup a full mirror of the pre-computed analysis of the human genome provided by the Ensembl web site. Currently there are over 20 remote installations of the web site. However, the power of the system is not limited to a web interface: the object interfaces, such as virtual contigs, to our pre-computed data stored in MySQL provide a new way for research groups to carry out analysis of human genome data without the huge effort of having to first organise the raw sequence.
| Ensembl DATA ANALYSIS PIPELINE |
|---|
|
|
|---|
The human genome sequence is more than an order of magnitude larger than the previous largest genomes of worm and fly, which are in themselves an order of magnitude larger than most of the other genomes that have been sequenced. Also, the human genome sequence is made up of fragments and is rapidly changing as the draft sequence is finished (now >50% of the genome is finished). Ensembl works closely with the primary providers of data in the international human genome sequencing consortium (12) and hosts one of the two Genome Central sites, with links to primary HGP data sources (http://www.ensembl.org/genome/central/). Ensembl currently tracks the sequence assemblies (refered to as the golden path) provided by Jim Kent at UCSC and the Ensembl web interface links at the DNA level to the UCSC web interface (http://genome.cse.ucsc.edu). Ensembl reanalyses the human genome whenever a new assembly become available, maintaining the stability of its gene identifiers between releases wherever possible (see above).
Being able to handle the required scale of analysis, which is dynamic as a result of a continuously changing assembly as opposed to static for the storage and display of static data, has been one of the major challenges for the Ensembl project. For example, it requires many millions of individual BLAST sequence comparisons alone to be run successfully without any failures. To make this possible, the Ensembl software system contains a full analysis pipeline. The analysis components are designed around two generic interfaces, one of which encapsulates running a single analysis process and another which encapsulates reading and writing the input and results of an analysis from a database. This separation allows us to write new components rapidly and in particular allows the construction of composite processes. These generic interfaces are then controlled by a scheduling system, which can handle dependencies and retries on top of low level task schedulers, such as LSF.
| DISTRIBUTED ANNOTATION SYSTEM (DAS) |
|---|
|
|
|---|
While Ensembl aims to provide baseline annotation, genomes are far too complex for any organisation to have a monopoly of ideas or data (13). Ensembl has been actively developing software to support the DAS standard (http://www.biodas.org/) (14), to enable users to easily view and compare annotation from different sources that are distributed across the Internet. Traditionally, different sources of information have been integrated on the Internet via links. However, from the users point of view this means continously jumping from one data providers user interface to another and also makes it very difficult to compare, for example, several alternative gene predictions. DAS addresses this through clients which integrate data served by from a number of different DAS servers.
Ensembl makes use of DAS in several ways. First, it makes its annotation data available (http://servlet.sanger.ac.uk:8080/das/) using the biojava DAS server DAZZLE (http://www.biojava.org/) for users with third party DAS clients. Secondly, for users who want to view annotation from human genome DAS servers without setting up third party clients, Ensembl contigview can be configured to act as a DAS client (by default contigview is pre-configured with a selection of useful DAS sources). Thirdly, for users who wish to serve DAS without setting up a server, limited amounts of user annotation can be uploaded to the Ensembl DAS server.
| CONTACTING Ensembl |
|---|
|
|
|---|
Ensembl is a joint project of the European Bioinformatics Institute (EBI) and the Sanger Centre, both of which are located on the Wellcome Trust Genome Campus, Cambridge, UK. To receive announcements about updates, subscribe to the announce mailing list: majordomo{at}ebi.ac.uk subscribe ensembl-announce. To follow the day to day development of Ensembl subscribe to the development mailing list: majordomo{at}ebi.ac.uk subscribe ensembl-dev. Requests for information and support can be sent to helpdesk{at}ensembl.org, which is a fully supported helpdesk. Extensive additional documentation can be found on the Ensembl web site, including installation guides and tutorials, both about using the software system and the web interface.
| ACKNOWLEDGEMENTS |
|---|
We are grateful to users of our web site and the developers on our mailing lists for much useful feedback and discussion. The Ensembl project is principally funded by the Wellcome Trust with additional funding from EMBL.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +44 1223 494420; Fax: +44 1223 494468; Email: birney{at}ebi.ac.uk
| REFERENCES |
|---|
|
|
|---|
-
1 Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., Birney,E., Biswas,M., Bucher,P., Cerutti,L., Corpet,F., Croning,M.D. et al. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res., 29, 3740.
2 Antonarakis,S.E. and McKusick,V.A. (2000) OMIM passes the 1,000-disease-gene mark. Nature Genet., 25, 11.[Web of Science][Medline]
3 Velculescu,V.E., Zhang,L., Vogelstein,B. and Kinzler,K.W. (1995) Serial analysis of gene expression. Science, 270, 484487.
4 Wheeler,D.L., Church,D.M., Lash,A.E., Leipe,D.D., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Tatusova,T.A., Wagner,L. and Rapp,B.A. (2001) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 29, 1116. Updated article in this issue: Nucleic Acids Res. (2002), 30, 1316.
5 Enright,A.J., Iliopoulos,I., Kyrpides,N.C. and Ouzounis,C.A. (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature, 402, 8690.[Medline]
6 Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 4548.
7 Birney,E. and Durbin,R. (2000) Using GeneWise in the Drosophila annotation experiment. Genome Res., 10, 547548.
8 Burge,C. and Karlin,S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol., 268, 7894.[Web of Science][Medline]
9 Dunham,I., Shimizu,N., Roe,B.A., Chissoe,S., Hunt,A.R., Collins,J.E., Bruskiewich,R., Beare,D.M., Clamp,M., Smink,L.J. et al. (1999) The DNA sequence of human chromosome 22. Nature, 402, 489495.[Medline]
10 Mott,R. (1997) EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comp. Appl. Biosci., 13, 477478.
11 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J.H., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
12 International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860921.[Medline]
13 Hubbard,T.J.P. and Birney,E. (2000) Open annotation offers a democratic solution to genome sequencing. Nature, 403, 825.[Medline]
14 Dowell,R.D., Jokerst,R.M., Day,A., Eddy,S.R. and Stein,L. (2001) The Distributed Annotation System. BMC Bioinformatics, 2, 7.[Medline]
This article has been cited by other articles:
![]() |
A. J. Baines, P. A. Bignone, M. D.A. King, A. M. Maggs, P. M. Bennett, J. C. Pinder, and G. W. Phillips The CKK Domain (DUF1781) Binds Microtubules and Defines the CAMSAP/ssp4 Family of Animal Proteins Mol. Biol. Evol., September 1, 2009; 26(9): 2005 - 2014. [Abstract] [Full Text] [PDF] |
||||
![]() |
L C Bui, A V Evsikov, D R Khan, C Archilla, N Peynot, A Henaut, D Le Bourhis, X Vignon, J P Renard, and V Duranthon Retrotransposon expression as a defining event of genome reprograming in fertilized and cloned bovine embryos Reproduction, August 1, 2009; 138(2): 289 - 299. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Matsushima, N. Kobayashi, Y. Mochizuki, M. Ishii, S. Kawaguchi, T. A. Endo, R. Umetsu, Y. Makita, and T. Toyoda OmicBrowse: a Flash-based high-performance graphics interface for genomic resources Nucleic Acids Res., July 1, 2009; 37(suppl_2): W57 - W62. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Liu, C. O. C. Bellamy, M. A. Bailey, L. J. Mullins, D. R. Dunbar, C. J. Kenyon, G. Brooker, S. Kantachuvesiri, K. Maratou, A. Ashek, et al. Angiotensin-converting Enzyme Is a Modifier of Hypertensive End Organ Damage J. Biol. Chem., June 5, 2009; 284(23): 15564 - 15572. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Garbino, R. J. van Oort, S. S. Dixit, A. P. Landstrom, M. J. Ackerman, and X. H. T. Wehrens Molecular evolution of the junctophilin gene family Physiol Genomics, May 13, 2009; 37(3): 175 - 186. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Siedlinski, C. C. van Diemen, D. S. Postma, J. M. Vonk, and H. M. Boezen Superoxide dismutases, lung function and bronchial responsiveness in a general population Eur. Respir. J., May 1, 2009; 33(5): 986 - 992. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mellberg, A. Dimberg, F. Bahram, M. Hayashi, E. Rennel, A. Ameur, J. O. Westholm, E. Larsson, P. Lindahl, M. J. Cross, et al. Transcriptional profiling reveals a critical role for tyrosine phosphatase VE-PTP in regulation of VEGFR2 activity and endothelial cell morphogenesis FASEB J, May 1, 2009; 23(5): 1490 - 1502. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Moses and R. Durbin Inferring Selection on Amino Acid Preference in Protein Domains Mol. Biol. Evol., March 1, 2009; 26(3): 527 - 536. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Nuckels, A. Ng, T. Darland, and J. M. Gross The Vacuolar-ATPase Complex Regulates Retinoblast Proliferation and Survival, Photoreceptor Morphogenesis, and Pigmentation in the Zebrafish Eye Invest. Ophthalmol. Vis. Sci., February 1, 2009; 50(2): 893 - 905. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Karchin Next generation tools for the annotation of human SNPs Brief Bioinform, January 1, 2009; 10(1): 35 - 52. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Chakrabarti, K. Kaur, K. N. Rao, A. K. Mandal, I. Kaur, R. S. Parikh, and R. Thomas The Transcription Factor Gene FOXC1 Exhibits a Limited Role in Primary Congenital Glaucoma Invest. Ophthalmol. Vis. Sci., January 1, 2009; 50(1): 75 - 83. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Thorisson, O. Lancaster, R. C. Free, R. K. Hastings, P. Sarmah, D. Dash, S. K. Brahmachari, and A. J. Brookes HGVbaseG2P: a central genetic association database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D797 - D802. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. P. Hubbard, B. L. Aken, S. Ayling, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, L. Clarke, et al. Ensembl 2009 Nucleic Acids Res., January 1, 2009; 37(suppl_1): D690 - D697. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Axelrod, Y. Lin, P. C. Ng, T. B. Stockwell, J. Crabtree, J. Huang, E. Kirkness, R. L. Strausberg, M. E. Frazier, J. C. Venter, et al. The HuRef Browser: a web resource for individual human genomics Nucleic Acids Res., January 1, 2009; 37(suppl_1): D1018 - D1024. [Abstract] [Full Text] [PDF] |
||||
![]() |
D.-Q. Nguyen, C. Webber, J. Hehir-Kwa, R. Pfundt, J. Veltman, and C. P. Ponting Reduced purifying selection prevails over positive selection in human copy number variant evolution Genome Res., November 1, 2008; 18(11): 1711 - 1723. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Furney, B. Calvo, P. Larranaga, J. A. Lozano, and N. Lopez-Bigas Prioritization of candidate cancer genes--an aid to oncogenomic studies Nucleic Acids Res., October 1, 2008; 36(18): e115 - e115. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. F. Berriz and F. P. Roth The Synergizer service for translating gene, protein and other biological identifiers Bioinformatics, October 1, 2008; 24(19): 2272 - 2273. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Y. Lam, P. O. S. Tam, D. S. P. Fan, B. J. Fan, D. Y. Wang, C. W. S. Lee, C. P. Pang, and D. S. C. Lam A Genome-wide Scan Maps a Novel High Myopia Locus to 5p15 Invest. Ophthalmol. Vis. Sci., September 1, 2008; 49(9): 3768 - 3778. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zeitz, A. K. Gross, D. Leifert, B. Kloeckener-Gruissem, S. D. McAlear, J. Lemke, J. Neidhardt, and W. Berger Identification and Functional Characterization of a Novel Rhodopsin Mutation Associated with Autosomal Dominant CSNB Invest. Ophthalmol. Vis. Sci., September 1, 2008; 49(9): 4105 - 4114. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Collery, S. McLoughlin, V. Vendrell, J. Finnegan, J. W. Crabb, J. C. Saari, and B. N. Kennedy Duplication and Divergence of Zebrafish CRALBP Genes Uncovers Novel Role for RPE- and Muller-CRALBP in Cone Vision Invest. Ophthalmol. Vis. Sci., September 1, 2008; 49(9): 3812 - 3820. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Menegay, D. Lee, K. F. Tabbara, T. A. Cafaro, J. A. Urrets-Zavalia, H. M. Serra, and S. K. Bhattacharya Proteomic Analysis of Climatic Keratopathy Droplets Invest. Ophthalmol. Vis. Sci., July 1, 2008; 49(7): 2829 - 2837. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Hutton and R. A. Spritz A Comprehensive Genetic Study of Autosomal Recessive Ocular Albinism in Caucasian Patients Invest. Ophthalmol. Vis. Sci., March 1, 2008; 49(3): 868 - 872. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. G. Wilming, J. G. R. Gilbert, K. Howe, S. Trevanion, T. Hubbard, and J. L. Harrow The vertebrate genome annotation (Vega) database Nucleic Acids Res., January 11, 2008; 36(suppl_1): D753 - D760. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Pasutto, C. Y. Mardin, K. Michels-Rautenstrauss, B. H. F. Weber, H. Sticht, G. Chavarria-Soley, B. Rautenstrauss, F. Kruse, and A. Reis Profiling of WDR36 Missense Variants in German Patients with Glaucoma Invest. Ophthalmol. Vis. Sci., January 1, 2008; 49(1): 270 - 274. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Moses, M. E. Liku, J. J. Li, and R. Durbin Regulatory evolution in proteins by turnover and lineage-specific changes of cyclin-dependent kinase consensus sites PNAS, November 6, 2007; 104(45): 17713 - 17718. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Wierling, R. Herwig, and H. Lehrach Resources, standards and tools for systems biology Brief Funct Genomic Proteomic, October 17, 2007; (2007) elm027v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Cogburn, T. E. Porter, M. J. Duclos, J. Simon, S. C. Burgess, J. J. Zhu, H. H. Cheng, J. B. Dodgson, and J. Burnside Functional Genomics of the Chicken A Model Organism Poult. Sci., October 1, 2007; 86(10): 2059 - 2094. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. del Val, P. Ernst, M Falkenhahn, C. Fladerer, K. H. Glatting, S. Suhai, and A. Hotz-Wagenblatt ProtSweep, 2Dsweep and DomainSweep: protein analysis suite at DKFZ Nucleic Acids Res., July 13, 2007; 35(suppl_2): W444 - W450. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Ho Sui, D. L. Fulton, D. J. Arenillas, A. T. Kwon, and W. W. Wasserman oPOSSUM: integrated tools for analysis of regulatory motif over-representation Nucleic Acids Res., July 13, 2007; 35(suppl_2): W245 - W252. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. V.L. Hayes, E. Blackburn, L. V. Smart, M. M. Boyle, G. A. Russell, T. M. Frost, B. J.T. Morgan, A. J. Baines, and W. J. Gullick Identification and Characterization of Novel Spliced Variants of Neuregulin 4 in Prostate Cancer Clin. Cancer Res., June 1, 2007; 13(11): 3147 - 3155. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. A. Jackson, C. R. Marshall, and E. A. Accili Evolution and structural diversification of hyperpolarization-activated cyclic nucleotide-gated channel genes Physiol Genomics, May 11, 2007; 29(3): 231 - 245. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Quackenbush Extracting biology from high-dimensional biological data J. Exp. Biol., May 1, 2007; 210(9): 1507 - 1517. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hanada, X. Zhang, J. O. Borevitz, W.-H. Li, and S.-H. Shiu A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection Genome Res., May 1, 2007; 17(5): 632 - 640. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Bignone, M. D. A. King, J. C. Pinder, and A. J. Baines Phosphorylation of a Threonine Unique to the Short C-terminal Isoform of betaII-Spectrin Links Regulation of {alpha}-beta Spectrin Interaction to Neuritogenesis J. Biol. Chem., January 12, 2007; 282(2): 888 - 896. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Kolpakov, V. Poroikov, R. Sharipov, Y. Kondrakhin, A. Zakharov, A. Lagunin, L. Milanesi, and A. Kel CYCLONET--an integrated database on cell cycle regulation and carcinogenesis Nucleic Acids Res., January 12, 2007; 35(suppl_1): D550 - D556. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Ghosh and L. Soete Information and intellectual property: the global challenges Ind. Corp. Change, December 1, 2006; 15(6): 919 - 935. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. G. Brunet, H. R. Crollius, M. Paris, J.-M. Aury, P. Gibert, O. Jaillon, V. Laudet, and M. Robinson-Rechavi Gene Loss and Evolutionary Rates Following Whole-Genome Duplication in Teleost Fishes Mol. Biol. Evol., September 1, 2006; 23(9): 1808 - 1816. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. B. Bateman, L. Richter, P. Flodman, D. Burch, S. Brown, P. Penrose, O. Paul, D. D. Geyer, D. G. Brooks, and M. A. Spence A new locus for autosomal dominant cataract on chromosome 19: linkage analyses and screening of candidate genes. Invest. Ophthalmol. Vis. Sci., August 1, 2006; 47(8): 3441 - 3449. [Abstract] [Full Text] [PDF] |
||||
![]() |
M Oti, B Snel, M A Huynen, and H G Brunner Predicting disease genes using protein-protein interactions J. Med. Genet., August 1, 2006; 43(8): 691 - 698. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Schulz, T. R. Menheniott, K. Woodfine, A. J. Wood, J. D. Choi, and R. J. Oakey Chromosome-wide identification of novel imprinted genes using microarrays and uniparental disomies Nucleic Acids Res., July 19, 2006; 34(12): e88 - e88. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Bryson, V. Loux, R. Bossy, P. Nicolas, S. Chaillou, M. van de Guchte, S. Penaud, E. Maguin, M. Hoebeke, P. Bessieres, et al. AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system Nucleic Acids Res., July 19, 2006; 34(12): 3533 - 3545. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Luz and M. Vingron Family specific rates of protein evolution Bioinformatics, May 15, 2006; 22(10): 1166 - 1171. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Agrawal and G. D. Stormo Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans Bioinformatics, May 15, 2006; 22(10): 1239 - 1244. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Ezawa, S. OOta, and N. Saitou Genome-Wide Search of Gene Conversions in Duplicated Genes of Mouse and Rat Mol. Biol. Evol., May 1, 2006; 23(5): 927 - 940. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bidinost, M. Matsumoto, D. Chung, N. Salem, K. Zhang, D. W. Stockton, A. Khoury, A. Megarbane, B. A. Bejjani, and E. I. Traboulsi Heterozygous and Homozygous Mutations in PITX3 in a Large Lebanese Family with Posterior Polar Cataracts and Neurodevelopmental Abnormalities. Invest. Ophthalmol. Vis. Sci., April 1, 2006; 47(4): 1274 - 1280. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bidinost, N. Hernandez, D. P. Edward, A. Al-Rajhi, R. A. Lewis, J. R. Lupski, D. W. Stockton, and B. A. Bejjani Of mice and men: tyrosinase modification of congenital glaucoma in mice but not in humans. Invest. Ophthalmol. Vis. Sci., April 1, 2006; 47(4): 1486 - 1490. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Johnston, A. L. Wood, D. J. Bolland, and A. E. Corcoran Complete Sequence Assembly and Characterization of the C57BL/6 Mouse Ig Heavy Chain V Region J. Immunol., April 1, 2006; 176(7): 4221 - 4234. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Florea Bioinformatics of alternative splicing and its regulation Brief Bioinform, March 1, 2006; 7(1): 55 - 69. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Perlman, T. Bouquin, B. van den Hazel, T.H. Jensen, H.T. Schambye, S. Knudsen, and J.S. Okkels Transcriptome analysis of FSH and FSH variant stimulation in granulosa cells from IVM patients reveals novel regulated genes Mol. Hum. Reprod., March 1, 2006; 12(3): 135 - 144. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-H. Shiu, J. K. Byrnes, R. Pan, P. Zhang, and W.-H. Li Role of positive selection in the retention of duplicate genes in mammalian genomes PNAS, February 14, 2006; 103(7): 2232 - 2236. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Lopez-Bigas, B. J. Blencowe, and C. A. Ouzounis Highly consistent patterns for inherited human diseases at the molecular level Bioinformatics, February 1, 2006; 22(3): 269 - 277. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. V. V. Deevi and A. C. R. Martin An extensible automated protein annotation tool: standardizing input and output using validated XML Bioinformatics, February 1, 2006; 22(3): 291 - 296. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Lipovich and M.-C. King Abundant novel transcriptional units and unconventional gene pairs on human chromosome 22 Genome Res., January 1, 2006; 16(1): 45 - 54. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. L. Barbosa-Morais, M. Carmo-Fonseca, and S. Aparicio Systematic genome-wide annotation of spliceosomal proteins reveals differential gene family expansion Genome Res., January 1, 2006; 16(1): 66 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pritsker, T. T. Doniger, L. C. Kramer, S. E. Westcot, and I. R. Lemischka Diversification of stem cell molecular repertoire by alternative splicing PNAS, October 4, 2005; 102(40): 14290 - 14295. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Brockman, P. Singh, D. Liu, S. Quinlan, J. Salisbury, and J. H. Graber PACdb: PolyA Cleavage Site and 3'-UTR Database Bioinformatics, September 15, 2005; 21(18): 3691 - 3693. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. W. Hart, L. McKie, J. E. Morgan, P. Gautier, K. West, I. J. Jackson, and S. H. Cross Genotype-Phenotype Correlation of Mouse Pde6b Mutations Invest. Ophthalmol. Vis. Sci., September 1, 2005; 46(9): 3443 - 3450. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. R. Johnston and D. C. Shields A sequence sub-sampling algorithm increases the power to detect distant homologues Nucleic Acids Res., July 8, 2005; 33(12): 3772 - 3778. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Alland, F. Moreews, D. Boens, M. Carpentier, S. Chiusa, M. Lonquety, N. Renault, Y. Wong, H. Cantalloube, J. Chomilier, et al. RPBS: a web resource for structural bioinformatics Nucleic Acids Res., July 1, 2005; 33(suppl_2): W44 - W49. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Krishnadev, N. Rekha, S. B. Pandit, S. Abhiman, S. Mohanty, L. S. Swapna, S. Gore, and N. Srinivasan PRODOC: a resource for the comparison of tethered protein domain architectures with in-built information on remotely related domain families Nucleic Acids Res., July 1, 2005; 33(suppl_2): W126 - W129. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Brocchieri and S. Karlin Protein length in eukaryotic and prokaryotic proteomes Nucleic Acids Res., June 10, 2005; 33(10): 3390 - 3400. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kamalakaran, S. K. Radhakrishnan, and W. T. Beck Identification of Estrogen-responsive Genes Using a Genome-wide Analysis of Promoter Elements for Transcription Factor Binding Sites J. Biol. Chem., June 3, 2005; 280(22): 21491 - 21497. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Baldwin, S. Y. M. Yao, R. J. Hyde, A. M. L. Ng, S. Foppolo, K. Barnes, M. W. L. Ritzel, C. E. Cass, and J. D. Young Functional Characterization of Novel Human and Mouse Equilibrative Nucleoside Transporters (hENT3 and mENT3) Located in Intracellular Membranes J. Biol. Chem., April 22, 2005; 280(16): 15880 - 15887. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gough Convergent evolution of domain architectures (is rare) Bioinformatics, April 15, 2005; 21(8): 1464 - 1471. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. R. Marshall, J. A. Fox, S. L. Butland, B. F. F. Ouellette, F. S. L. Brinkman, and G. F. Tibbits Phylogeny of Na+/Ca2+ exchanger (NCX) genes from genomic data identifies new gene duplications and a new family member in fish species Physiol Genomics, April 14, 2005; 21(2): 161 - 173. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. M. Zdobnov, Món. Campillos, E. D. Harrington, D. Torrents, and P. Bork Protein coding potential of retroviruses and other transposable elements in vertebrate genomes Nucleic Acids Res., February 16, 2005; 33(3): 946 - 954. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Liu, H. Walch, S. Wu, and A. Grigoriev Significant expansion of exon-bordering protein domains during animal proteome evolution Nucleic Acids Res., January 7, 2005; 33(1): 95 - 105. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. L. Ramprasad, A. Thool, S. Murugan, D. Nancarrow, P. Vyas, S. K. Rao, A. Vidhya, K. Ravishankar, and G. Kumaramanickavel Truncating Mutation in the NHS Gene: Phenotypic Heterogeneity of Nance-Horan Syndrome in an Asian Indian Family Invest. Ophthalmol. Vis. Sci., January 1, 2005; 46(1): 17 - 23. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Florea, V. Di Francesco, J. Miller, R. Turner, A. Yao, M. Harris, B. Walenz, C. Mobarry, G. V. Merkulov, R. Charlab, et al. Gene and alternative splicing annotation with AIR Genome Res., January 1, 2005; 15(1): 54 - 66. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Lu, D. Szafron, R. Greiner, D. S. Wishart, A. Fyshe, B. Pearcy, B. Poulin, R. Eisner, D. Ngo, and N. Lamb PA-GOSUB: a searchable database of model organism protein sequences with their predicted Gene Ontology molecular function and subcellular localization Nucleic Acids Res., January 1, 2005; 33(suppl_1): D147 - D153. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bairoch, R. Apweiler, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, et al. The Universal Protein Resource (UniProt) Nucleic Acids Res., January 1, 2005; 33(suppl_1): D154 - D159. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. de la Cruz, S. Bromberg, D. Pasko, M. Shimoyama, S. Twigger, J. Chen, C.-F. Chen, C. Fan, C. Foote, G. R. Gopinath, et al. The Rat Genome Database (RGD): developments towards a phenome database Nucleic Acids Res., January 1, 2005; 33(suppl_1): D485 - D491. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Bertone, V. Stolc, T. E. Royce, J. S. Rozowsky, A. E. Urban, X. Zhu, J. L. Rinn, W. Tongprasit, M. Samanta, S. Weissman, et al. Global Identification of Human Transcribed Sequences with Genome Tiling Arrays Science, December 24, 2004; 306(5705): 2242 - 2246. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Aijaz, B. J. Clark, K. Williamson, V. van Heyningen, D. Morrison, D. FitzPatrick, R. Collin, N. Ragge, A. Christoforou, A. Brown, et al. Absence of SIX6 Mutations in Microphthalmia, Anophthalmia, and Coloboma Invest. Ophthalmol. Vis. Sci., November 1, 2004; 45(11): 3871 - 3876. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tamimi, M. Lines, M. Coca-Prados, and M. A. Walter Identification of Target Genes Regulated by FOXC1 Using Nickel Agarose-Based Chromatin Enrichment Invest. Ophthalmol. Vis. Sci., November 1, 2004; 45(11): 3904 - 3913. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. REHMSMEIER, P. STEFFEN, M. HOCHSMANN, and R. GIEGERICH Fast and effective prediction of microRNA/target duplexes RNA, October 20, 2004; 10(10): 1507 - 1517. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Huminiecki and K. H. Wolfe Divergence of Spatial Gene Expression Profiles Following Species-Specific Gene Duplications in Human and Mouse Genome Res., October 1, 2004; 14(10a): 1870 - 1879. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Baross, Y. S.N. Butterfield, S. M. Coughlin, T. Zeng, M. Griffith, O. L. Griffith, A. S. Petrescu, D. E. Smailus, J. Khattra, H. L. McDonald, et al. Systematic Recovery and Analysis of Full-ORF Human cDNA Clones Genome Res., October 1, 2004; 14(10b): 2083 - 2092. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Beckley, B. U. Pauli, and R. C. Elble Re-expression of Detachment-inducible Chloride Channel mCLCA5 Suppresses Growth of Metastatic Breast Cancer Cells J. Biol. Chem., October 1, 2004; 279(40): 41634 - 41641. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-M. Mallon, L. Wilming, J. Weekes, J. G.R. Gilbert, J. Ashurst, S. Peyrefitte, L. Matthews, M. Cadman, R. McKeone, C. A. Sellick, et al. Organization and Evolution of a Gene-Rich Region of the Mouse Genome: A 12.7-Mb Region Deleted in the Del(13)Svea36H Mouse Genome Res., October 1, 2004; 14(10a): 1888 - 1901. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Leipzig, P. Pevzner, and S. Heber The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome Nucleic Acids Res., August 3, 2004; 32(13): 3977 - 3983. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Wagner, A. Ansorge, U. Wirkner, V. Eckstein, C. Schwager, J. Blake, K. Miesala, J. Selig, R. Saffrich, W. Ansorge, et al. Molecular evidence for stem cell function of the slow-dividing fraction among human hematopoietic progenitor cells by genome-wide analysis Blood, August 1, 2004; 104(3): 675 - 686. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Weil, P. Widlak, J. D. Minna, and H. R. Garner Global Survey of Chromatin Accessibility Using DNA Microarrays Genome Res., July 1, 2004; 14(7): 1374 - 1381. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Harte, V. Silventoinen, E. Quevillon, S. Robinson, K. Kallio, X. Fustero, P. Patel, P. Jokinen, and R. Lopez Public web-based services from the European Bioinformatics Institute Nucleic Acids Res., July 1, 2004; 32(suppl_2): W3 - W9. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Weckx, P. De Rijk, C. Van Broeckhoven, and J. Del-Favero SNPbox: web-based high-throughput primer design from gene to genome Nucleic Acids Res., July 1, 2004; 32(suppl_2): W170 - W172. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Kim, S. Shin, and S. Lee ASmodeler: gene modeling of alternative splicing from genomic alignment of mRNA, EST and protein sequences Nucleic Acids Res., July 1, 2004; 32(suppl_2): W181 - W186. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kankainen and L. Holm POBO, transcription factor binding site verification with bootstrapping Nucleic Acids Res., July 1, 2004; 32(suppl_2): W222 - W229. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ovcharenko, M. A. Nobrega, G. G. Loots, and L. Stubbs ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes Nucleic Acids Res., July 1, 2004; 32(suppl_2): W280 - W286. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Szafron, P. Lu, R. Greiner, D. S. Wishart, B. Poulin, R. Eisner, Z. Lu, J. Anvik, C. Macdonell, A. Fyshe, et al. Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations Nucleic Acids Res., July 1, 2004; 32(suppl_2): W365 - W371. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. R. Pacheco, A. Q. Gomes, N. L. Barbosa-Morais, V. Benes, W. Ansorge, M. Wollerton, C. W. Smith, J. Valcarcel, and M. Carmo-Fonseca Diversity of Vertebrate Splicing Factor U2AF35: IDENTIFICATION OF ALTERNATIVELY SPLICED U2AF1 mRNAs J. Biol. Chem., June 25, 2004; 279(26): 27039 - 27049. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ergun, C. Buschmann, J. Heukeshoven, K. Dammann, F. Schnieders, H. Lauke, F. Chalajour, N. Kilic, W. H. Stratling, and G. G. Schumann Cell Type-specific Expression of LINE-1 Open Reading Frames 1 and 2 in Fetal and Adult Human Tissues J. Biol. Chem., June 25, 2004; 279(26): 27753 - 27763. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||























