Skip Navigation

Nucleic Acids Research 2006 34(Database Issue):D556-D561; doi:10.1093/nar/gkj133
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (4878K) Freely available
Right arrow Screen PDF (672K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Birney, E.
Right arrow Articles by Hubbard, T. J. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Birney, E.
Right arrow Articles by Hubbard, T. J. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2006, Vol. 34, Database issue D556-D561
© The Author 2006. Published by Oxford University Press. All rights reserved
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions{at}oxfordjournals.org


Article

Ensembl 2006

E. Birney1,*, D. Andrews1, M. Caccamo, Y. Chen1, L. Clarke1, G. Coates1, T. Cox1, F. Cunningham1, V. Curwen1, T. Cutts1, T. Down1, R. Durbin, X. M. Fernandez-Suarez, P. Flicek, S. Gräf, M. Hammond, J. Herrero1, K. Howe1, V. Iyer1, K. Jekosch, A. Kähäri, A. Kasprzyk, D. Keefe1, F. Kokocinski1, E. Kulesha, D. London, I. Longden, C. Melsopp1, P. Meidl1, B. Overduin1, A. Parker, G. Proctor1, A. Prlic1, M. Rae, D. Rios1, S. Redmond, M. Schuster1, I. Sealy1, S. Searle, J. Severin, G. Slater, D. Smedley1, J. Smith, A. Stabenau1, J. Stalker1, S. Trevanion, A. Ureta-Vidal1, J. Vogel1, S. White, C. Woodwark1 and T. J. P. Hubbard

European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK 1Wellcome Trust Sanger Institute Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK

*To whom correspondence should be addressed. Tel: +44 1223 494420; Fax: +44 1223 494470; Email: birney{at}ebi.ac.uk

Received September 14, 2005. Revised October 25, 2005. Accepted October 25, 2005.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 REFERENCES
 
The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased from 4 to 19, with the addition of the mammalian genomes of Rhesus macaque and Opossum, the chordate genome of Ciona intestinalis and the import and integration of the yeast genome. The year has also seen extensive improvements to both data analysis and presentation, with the introduction of a redesigned website, the addition of RNA gene and regulatory annotation and substantial improvements to the integration of human genome variation data.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 REFERENCES
 
The genome sequence of an organism provides the natural index for organizing and understanding biological data. Ensembl provides a software system to store, analyse, use and display genomic information. The genomes of 14 chordates are currently available through Ensembl, from mammals such as Human and Mouse through to the ‘primitive’ chordate Ciona intestinalis. The genomes of three key model eukaryotes, yeast, fly and worm, are also imported from their respective databases to provide easy integration of information from these organisms with chordates. Finally a limited number of insect genomes are also available through Ensembl owing to our participation in the Vectorbase consortium.

Ensembl continues to improve both in terms of the analysis of genome information and its usability both via programmatic means and web-based browsers. This paper details the improvements since the last report (1), in particular for quality of gene structures, a new RNA gene building system, regulatory regions, comparative genomics infrastructure, data mining interfaces, web services based integration, code portability and web-based user interfaces.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 REFERENCES
 
RNA gene annotation in Ensembl
Ensembl has a traditional strength in predicting accurate and as complete as possible protein gene sets in an organism, even in the absence of direct cDNA evidence (2). This is achieved by integrating a number of lines of evidence around genes, often making use of partial cDNA or expressed sequence tag sets and similarity to protein-coding genes in other organisms. However protein-coding genes are not the only functional transcripts in a genome. There are in addition a series of functional RNA gene products from structural RNAs such as U6 RNA through to more recently discovered regulatory RNAs, such as micro RNAs (miRNAs). The Rfam resource (3) organizes all known functional RNAs into families and builds sophisticated covariance models of these sequences. We have collaborated with Rfam to provide an RNA gene build across all the Ensembl genomes, which includes both a covariance model matching step and a RNA folding estimation. The details of this method will be published in a separate paper. Table 1 shows the number of protein-coding and RNA genes predicted in a number of key organisms in Ensembl. The miRNA set is relatively constant between mammalian organisms, whereas other ncRNAs vary considerably. This is due to the high lineage specific expansion of some ncRNAs usually coupled with a high level of pseudogenes along with some functional copies.


View this table:
[in this window]
[in a new window]
 
Table 1 Number of genes of different classes for selected species

 
Improvements to protein-coding genes
Providing as accurate as possible gene sets is one of the major goals in Ensembl. Even when there is a large amount of cDNA evidence in an organism, which is the case for both Human and Mouse, the details of how to reconcile large cDNA collections to form accurate gene sets is not trivial. This is due to the presence of large numbers of pseudogenes in mammalian genomes (4,5), the presence of truncated and chimaeric cDNAs in cDNA collections (6) and polymorphisms between the genome and cDNA collections. Figure 1 shows the increase in quality of our gene resources in Human and Mouse. These improvements are due to assembly improvements, improvements in cDNA collections, careful screening of cDNA collections for contaminations and algorithmic improvements in the gene build. The algorithmic improvements mainly come from careful parameterization of different alignment programs, in particular genewise (7) and exonerate (8), in combination with a more advanced logic of when each alignment program is appropriate. We are continuing to work in collaboration with the RefSeq group at NCBI, the Havana group at the Sanger Institute (9) and the UCSC genome group to develop a stable set of protein-coding gene structures which we agree on to the base pair. The project, called CCDS, made its first release in March 2005, identifying 14 795 transcripts in 13 142 genes which all groups agree on. These are labelled with CCDS identifiers in the genome browser of each participant. We expect to be able to expand this over the next year to around 18 000 transcripts where we currently differ by only one or two amino acids by improvements in all three pipelines.



View larger version (39K):
[in this window]
[in a new window]
 
Figure 1 The progressive improvement in the quality of human and mouse gene builds by comparison to curated protein and mRNA reference sequences is shown. The column legends indicate the species, reference dataset and assembly release number. UniSw indicates the Swiss-Prot (curated) part of UniProt. RefSeq indicates the curated part of RefSeq (i.e. excluding XP entries). Identical trends are seen in all four comparisons of human and mouse against UniSw and RefSeq. The four colours indicate the quality of the match to the reference dataset: blue indicates an exact match; maroon indicates matched ends with some internal mismatch/indel; yellow indicates an incomplete match and green indicates reference sequences that are missing from the gene build. There are multiple reasons for this improvement, including improvements in assembly quality, cDNA resources and algorithmic improvements to the gene build.

 
Over the next year we anticipate incorporating into Ensembl the genomes of a number of mammals that have been sequenced at low-coverage [2x whole genome shotgun (WGS)] and will therefore be highly fragmentary. The standard Ensembl gene build pipeline is unsuitable for such assemblies, so we have been developing a new method that utilizes a whole genome alignment to an annotated reference genome. In this method gene structures on the low-coverage assembly are derived largely by projecting gene structures from the reference genome. We have tested this approach on the initial cow genome assembly (Btau_1.0: 3x WGS), in this case using Homo sapiens as the reference genome. We were able to build good quality gene models from around 17 000 of the 22 000 available human genes. The projection was also used to organize cow assembly fragments into gene_scaffolds, although many of the gene annotations are still fragmented. A new higher quality cow genome assembly is now available (Btau_2.0: 6x WGS) which is more suitable for the standard Ensembl gene build pipeline. We, therefore, plan to compare the gene sets to further evaluate and refine this low-coverage build procedure.

Regulatory regions
The genome encodes far more than just the protein and RNA genes; in addition, the regulation of gene expression is a crucial area. The regulatory code for large eukaryotic genomes remains opaque to comprehensive analysis. However there have been a number of resources, developed recently, which start to make genome-wide prediction sets for regulatory regions. We have developed a database schema and visualization schemes for storing, manipulating and using these regulatory regions, allowing a user to move from a gene to its putative regulation to (where assigned) its putative regulator. The first datasets that we will put into this system are the CisRED resource (http://www.cisred.org) and the MiRanda miRNA target prediction (10), but we hope to expand this area rapidly as new techniques are developed.

Variation resources
A number of genomes, in particular human, have extensive resources on natural polymorphisms. These are predominantly single nucleotide polymorphisms (SNPs) but also include small scale insertions and deletions. For a number of variations, large-scale genotyping projects have provided reference datasets for human variation, e.g. the HapMap project (11). We have developed a new system for handling variations which can store both variations in ‘natural’ populations (such as Human) and variations between lab managed strains (such as Mouse). These variants are cross-correlated with functional information, such as coding regions, splice sites and regulatory regions to provide potential consequences of a variation.

The genotyping of large numbers of individuals provides important information on the correlation of variation between individuals. This correlation is due to both the ancestry of individuals and the variability in recombination rates described collectively as linkage disequilibrium. These correlations are invaluable in both the design and the interpretation of human variation information. We have precomputed the two common measures of pairwise linkage disequilibrium, r2 and d', for all pairs of SNPs at a distance of under 100 kb that have been genotyped in the Perlegen (12) and HapMap (13) populations. In theory this would generate over one billion pairwise LD values, but in many cases these values are low (and so uninteresting). We store values where the r2 is >0.05, which generates around 135 million stored LD values. These correlations require some additional estimation of the missing phase information, which we have achieved with a simple expectation maximization of the double heterozygote. These precomputed tables are invaluable for researchers who do not have access to large computational resources, but of course do not replace more sophisticated methods for variation analysis, e.g. haplotype reconstruction using Haploview (14).

In addition we can efficiently store resequencing data, which is expected to become a larger source of polymorphism information in the future. For resequencing data we store both the individual variations (in the case of unphased data, as genotype calls) and the areas in which variation could have been observed. This latter ‘coverage’ information is crucial for understanding the potential variants between two individuals.

In the future we see increasing utility for these variation resources, in particular for the assessment of purifying selection on particular regions of the genome and in describing the potential functional variation between individuals or between laboratory strains.

Comparative genomics
The ability to calculate and display integrated comparative genomics resources has been an important part of Ensembl. We have extended the comparative genomics systems in two ways. First, we have the ability to calculate, store and visualize multiple alignments of genome sequence. This is achieved by having a general schema for storing multiple alignments which does not require any particular reference sequence for the alignment. We will publish details of this storage method in a subsequent paper. This schema can be populated by a combination of a genome-wide orthology mapper, such as Mercator, and a region based multiple alignment tool, such as MAVID (10) or Mlagan (15). We can also visualize the resulting alignment with annotations mapped on to a common coordinate system, as shown in Figure 2. Importantly this common coordinate system need not be any of the aligned genomes, but could, for example, be the hypothesized ancestral sequence.



View larger version (67K):
[in this window]
[in a new window]
 
Figure 2 A screenshot of the new alignslice view that is enabled by the multiple genome alignment. The top panel shows the human, rat and mouse genomes around the BRCA2 locus. The lower panel shows the base-pair alignment at the end of an exon (highlighted in the top panel by the central red box on human). In the base-pair view, exonic bases are blue and intronic bases are pink, with darker shades indicating conservation. Exon boundaries are highlighted with a red inverted L and SNPs are shown in red.

 
The gene level comparative genomics resources have also been updated to be based around protein tree calculations rather than best reciprocal similarity relationships. This provides better coverage (in particular for deeper relationships, e.g. vertebrate to Drosophila), better resolution of paralogy events and a more consistent way to examine evolutionary interesting events, e.g. positive selection detection via Ka/Ks studies.

Code portability and reuse
Ensembl not only provides a user-friendly website, but also provides a number of programmatic interfaces. The Ensembl system can be remotely installed on any UNIX based system and many of its components can be extended or reused. To maximize the utility of Ensembl we have both improved documentation resources and also labelled each API (application programming interface) function as ‘stable’, ‘moderate risk’ or ‘at risk’. Stable functions we guarantee will exist in our API with an unchanged functional signature for at least 2 years. At risk functions are those which we know are likely to change in the future as they are under development. Currently we have 512 (82%) stable functions in our API.

The Ensembl pipeline is also improving in its modularity and documentation. There is extensive documentation for running the pipeline in the openly accessible CVS repository. We have successfully installed the Ensembl pipeline at a separate institute, Baylor College of Medicine where it is currently being used for their own annotation needs. Our experience is that the most complicated aspects of running the Ensembl pipeline is the precise layout of the computer resources and then the correct configuration of the analysis routines to use for a particular organism.

The Ensembl website has not only had improvements in usability but also has a far more flexible plug-in system in the HTML generation. This allows remote sites which have extended or adapted Ensembl far more control over their local pages, with the ability to override nearly any aspect of the Ensembl website with local plug-in scripts.

We are open towards collaborations; all of Ensembl is openly licensed and can be easily downloaded without any registration. In addition we are happy to host other bioinformatics researchers on site for people to rapidly learn or adapt Ensembl. Interested researchers should contact helpdesk{at}ensembl.org. For more general wet-laboratory usage we regularly organize courses at different institutions that can be tailored to the specific biological areas of interest to attendees.

Data mining interfaces
We deployed a full featured BioMart (16,17) for Ensembl in Spring 2005. BioMart is a data mining federation technology which was spun out from the main Ensembl group as it is appropriate for more than just Ensembl. The BioMart system allows easy query federation across Internet accessible BioMarts. Currently these include Ensembl, WormBase, UniProt and MSD. We expect many other Marts to be developed over time.

Web usability and web service integration
We investigated new layouts of the Ensembl pages to provide better discoverability of information. With the help of specific focus groups spanning a variety of scientific backgrounds, we settled on the new design with a context dependent link bar to the left of the main pages. This bar suggests relevant ‘next links’ for investigation. We will continue to make reasonable changes in web interface aiming to make as much information about genomic regions and genes as intuitive as possible.

We have also continued to integrate with other resources using the distributed annotation system (DAS) protocol (18). We have reused the DAS protocol to work on both protein and ‘gene’ level, allowing remote sites to show features from their servers directly on Ensembl displays. The website uses the coordinate remapping features of the core Ensembl API to allow DAS sources provided on one coordinate system to be projected onto another. For example, this allows annotation on UniProt coordinates to be displayed on Ensembl protein pages, projected onto Ensembl peptide coordinates. The EU BioSapiens collaborative project (www.biosapiens.info), where a large number of different groups are providing genome and protein sequence annotation, has adopted DAS and more than 50 sources are already available. With the Ensembl website's coordinate projection facilities, all this annotation, much of which is on UniProt coordinates, can be displayed in Ensembl as well as other DAS clients (19).

With the increasing number of DAS sources it had become hard to keep track of them and their different coordinate systems. To address this a DAS registry has been developed (http://das.sanger.ac.uk/registry/, A. Prlic et al., manuscript in preparation) as a central point where authors of DAS sources can register them. The Ensembl website is integrated with the registry making it is easier to attach DAS sources to Ensembl displays. DAS has also allowed Ensembl annotation data to be made available in other specialist DAS clients. For example, the SPICE DAS client (20) allows annotation on UniProt coordinates to be displayed on protein 3D structures (Figure 3).



View larger version (48K):
[in this window]
[in a new window]
 
Figure 3 The integration between Ensembl and the DAS protein 3D structure viewer SPICE is shown. The proteinview page of Ensembl shows the beta-globin gene HBB on chromosome 11. One of the non-synonymous SNPs is the sickle cell mutation at residue 7 (glutamic acid to valine). The PDB_spice DAS track shows a link to the PDB entry 1A3N [PDB] chain B. In the SPICE window, which was opened by clicking on this track, the four chain structure of haemoglobin is shown on the left. The DAS annotations for the selected chain (B) are shown on the right. The uniprot_exon SNP DAS source is selected and the six SNPs are highlighted in the sequence of the chain (bottom right) and shown in the structure (dark green side chains with yellow highlights). Holding the mouse over residues in the structure panel shows the position of residue 7. Ensembl exposes its precalculated alignments between UniProt and Ensembl gene annotation as DAS sources (uniprot_exon).

 

    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 REFERENCES
 
Ensembl continues to grow in size, quality and functionality. Fundamentally our main goal of making large vertebrate genomes useful to the scientific community has not changed, but the number of species, depth of analysis and usability of our systems are constantly improving. We are looking forward to new resources such as Chip/Chip datasets and proteomic resources, many of which are in successful pilot phases (11,21). Overall we provide a robust and accurate database of information on chordate genomes, aimed at enabling other groups to maximally exploit these genomes.


    ACKNOWLEDGEMENTS
 
The Ensembl project is principally funded by the Wellcome Trust with additional funding from EMBL, NIH-NIAID and BBSRC. We are grateful to users of our website and the developers on our mailing lists for much useful feedback and discussion. We would like to thank the anonymous reviewers for their comments on this paper. Funding to pay the Open Access publication charges for this article was provided by The Wellcome Trust.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 REFERENCES
 

  1. Hubbard, T., Andrews, D., Caccamo, M., Cameron, G., Chen, Y., Clamp, M., Clarke, L., Coates, G., Cox, T., Cunningham, F., et al. (2005) Ensembl 2005 Nucleic Acids Res, . 33, D447–D453[Abstract/Free Full Text] .

  2. Curwen, V., Eyras, E., Andrews, T.D., Clarke, L., Mongin, E., Searle, S.M., Clamp, M. (2004) The Ensembl automatic gene annotation system Genome Res, . 14, 942–950[Abstract/Free Full Text] .

  3. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., et al. (2004) The Pfam protein families database Nucleic Acids Res, . 32, D138–D141[Abstract/Free Full Text] .

  4. Torrents, D., Suyama, M., Zdobnov, E., Bork, P. (2003) A genome-wide survey of human pseudogenes Genome Res, . 13, 2559–2567[Abstract/Free Full Text] .

  5. Zhang, Z. and Gerstein, M. (2004) Large-scale analysis of pseudogenes in the human genome Curr. Opin. Genet. Dev, . 14, 328–335[CrossRef][Web of Science][Medline] .

  6. Furey, T.S., Diekhans, M., Lu, Y., Graves, T.A., Oddy, L., Randall-Maher, J., Hillier, L.W., Wilson, R.K., Haussler, D. (2004) Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing Genome Res, . 14, 2034–2040[Abstract/Free Full Text] .

  7. Birney, E., Clamp, M., Durbin, R. (2004) GeneWise and Genomewise Genome Res, . 14, 988–995[Abstract/Free Full Text] .

  8. Slater, G.S. and Birney, E. (2005) Automated generation of heuristics for biological sequence comparison BMC Bioinformatics, 6, 31[CrossRef][Medline] .

  9. Ashurst, J.L. and Collins, J.E. (2003) Gene annotation: prediction and testing Annu. Rev. Genomics Hum. Genet, . 4, 69–88[CrossRef][Web of Science][Medline] .

  10. Enright, A.J., John, B., Gaul, U., Tuschl, T., Sander, C., Marks, D.S. (2003) MicroRNA targets in Drosophila Genome Biol, . 5, R1[CrossRef][Medline] .

  11. The ENCODE Project Consortium. (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project Science, 306, 636–640[Abstract/Free Full Text] .

  12. Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G., Frazer, K.A., Cox, D.R. (2005) Whole-genome patterns of common DNA variation in three human populations Science, 307, 1072–1079[Abstract/Free Full Text] .

  13. The International HapMap Consortium. (2003) The International HapMap Project Nature, 426, 789–796[CrossRef][Medline] .

  14. Barrett, J.C., Fry, B., Maller, J., Daly, M.J. (2005) Haploview: analysis and visualization of LD and haplotype maps Bioinformatics, 21, 263–265[Abstract/Free Full Text] .

  15. Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S. (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA Genome Res, . 13, 721–731[Abstract/Free Full Text] .

  16. Kasprzyk, A., Keefe, D., Smedley, D., London, D., Spooner, W., Melsopp, C., Hammond, M., Rocca-Serra, P., Cox, T., Birney, E. (2004) EnsMart: a generic system for fast and flexible access to biological data Genome Res, . 14, 160–169[Abstract/Free Full Text] .

  17. Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A., Huber, W. (2005) BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis Bioinformatics, 21, 3439–3440[Abstract/Free Full Text] .

  18. Dowell, R.D., Jokerst, R.M., Day, A., Eddy, S.R., Stein, L. (2001) The distributed annotation system BMC Bioinformatics, 2, 7[CrossRef][Medline] .

  19. Jones, P., Vinod, N., Down, T., Hackmann, A., Kahari, A., Kretschmann, E., Quinn, A., Wieser, D., Hermjakob, H., Apweiler, R. (2005) Dasty and UniProt DAS: a perfect pair for protein feature visualization Bioinformatics, 21, 3198–3199[Abstract/Free Full Text] .

  20. Prlic, A., Down, T., Hubbard, T.J.P. (2005) Adding some SPICE to DAS Bioinformatics, 21 suppl 2, ii40–ii41 .

  21. Desiere, F., Deutsch, E.W., Nesvizhskii, A.I., Mallick, P., King, N.L., Eng, J.K., Aderem, A., Boyle, R., Brunner, E., Donohoe, S., et al. (2005) Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry Genome Biol, . 6, R9[CrossRef][Medline] .


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
P. Flicek, B. L. Aken, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, G. Coates, S. Fairley, et al.
Ensembl's 10th year
Nucleic Acids Res., November 11, 2009; (2009) gkp972v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Frenkel-Morgenstern, A. A. Cohen, N. Geva-Zatorsky, E. Eden, J. Prilusky, I. Issaeva, A. Sigal, C. Cohen-Saidon, Y. Liron, L. Cohen, et al.
Dynamic Proteomics: a database for dynamics and localizations of endogenous fluorescently-tagged proteins in living human cells
Nucleic Acids Res., October 9, 2009; (2009) gkp808v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. S. Datta, C. Meacham, B. Samad, C. Neyer, and K. Sjolander
Berkeley PHOG: PhyloFacts orthology group prediction web server
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W84 - W89.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. V. Antonov, S. Dietmann, P. Wong, D. Lutter, and H. W. Mewes
GeneSet2miRNA: finding the signature of cooperative miRNA activities in the gene lists
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W323 - W328.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. N. Messina and E. L. L. Sonnhammer
DASher: a stand-alone protein sequence client for DAS, the Distributed Annotation System
Bioinformatics, May 15, 2009; 25(10): 1333 - 1334.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. G. Roider, T. Manke, S. O'Keeffe, M. Vingron, and S. A. Haas
PASTAA: identifying transcription factors associated with sets of co-regulated genes
Bioinformatics, February 15, 2009; 25(4): 435 - 442.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
N. E. Davey, D. C. Shields, and R. J. Edwards
Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery
Bioinformatics, February 15, 2009; 25(4): 443 - 450.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Juliusdottir, F. Pettersson, and R. R. Copley
POPE--a tool to aid high-throughput phylogenetic analysis
Bioinformatics, December 1, 2008; 24(23): 2778 - 2779.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. A. A. Castro, R. J. S. Dalmolin, J. C. F. Moreira, J. C. M. Mombach, and R. M. C. de Almeida
Evolutionary origins of human apoptosis and genome-stability gene networks
Nucleic Acids Res., November 1, 2008; 36(19): 6269 - 6283.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D.-Q. Nguyen, C. Webber, J. Hehir-Kwa, R. Pfundt, J. Veltman, and C. P. Ponting
Reduced purifying selection prevails over positive selection in human copy number variant evolution
Genome Res., November 1, 2008; 18(11): 1711 - 1723.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
Y. Quan, Z.-L. Ji, X. Wang, A. M. Tartakoff, and T. Tao
Evolutionary and Transcriptional Analysis of Karyopherin {beta} Superfamily Proteins
Mol. Cell. Proteomics, July 1, 2008; 7(7): 1254 - 1269.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Wong and M. A. Ragan
MACHOS: Markov clusters of homologous subsequences
Bioinformatics, July 1, 2008; 24(13): i77 - i85.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. V. Antonov, T. Schmidt, Y. Wang, and H. W. Mewes
ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W347 - W351.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
P. Hatzis, L. G. van der Flier, M. A. van Driel, V. Guryev, F. Nielsen, S. Denissov, I. J. Nijman, J. Koster, E. E. Santo, W. Welboren, et al.
Genome-Wide Pattern of TCF7L2/TCF4 Chromatin Occupancy in Colorectal Cancer Cells
Mol. Cell. Biol., April 15, 2008; 28(8): 2732 - 2744.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Nakagawa, Y. Niimura, T. Gojobori, H. Tanaka, and K.-i. Miura
Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes
Nucleic Acids Res., February 11, 2008; 36(3): 861 - 871.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
L. Lee, D. R. Campagna, J. L. Pinkus, H. Mulhern, T. A. Wyatt, J. H. Sisson, J. A. Pavlik, G. S. Pinkus, and M. D. Fleming
Primary Ciliary Dyskinesia in Mice Lacking the Novel Ciliary Protein Pcdp1
Mol. Cell. Biol., February 1, 2008; 28(3): 949 - 957.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Huerta-Cepas, A. Bueno, J. Dopazo, and T. Gabaldon
PhylomeDB: a database for genome-wide collections of gene phylogenies
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D491 - D496.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Matsuya, R. Sakate, Y. Kawahara, K. O. Koyanagi, Y. Sato, Y. Fujii, C. Yamasaki, T. Habara, H. Nakaoka, F. Todokoro, et al.
Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D787 - D792.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Flicek, B. L. Aken, K. Beal, B. Ballester, M. Caccamo, Y. Chen, L. Clarke, G. Coates, F. Cunningham, T. Cutts, et al.
Ensembl 2008
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D707 - D714.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. M. Kim, J. O. Korbel, and M. B. Gerstein
Positive selection at the protein network periphery: Evaluation in terms of structural constraints and cellular context
PNAS, December 18, 2007; 104(51): 20274 - 20279.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. Heger and C. P. Ponting
Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes
Genome Res., December 1, 2007; 17(12): 1837 - 1849.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
H. K. Saini, S. Griffiths-Jones, and A. J. Enright
Genomic analysis of human microRNA transcripts
PNAS, November 6, 2007; 104(45): 17719 - 17724.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Heger and C. P. Ponting
Variable Strength of Translational Selection Among 12 Drosophila Species
Genetics, November 1, 2007; 177(3): 1337 - 1348.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
G. Spudich, X. M. Fernandez-Suarez, and E. Birney
Genome browsing with Ensembl: a practical overview
Brief Funct Genomic Proteomic, October 29, 2007; (2007) elm025v1.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. Khuu, M. Sandor, J. DeYoung, and P. S. Ho
Phylogenomic analysis of the emergence of GC-rich transcription elements
PNAS, October 16, 2007; 104(42): 16528 - 16533.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
Y. Nakatani, H. Takeda, Y. Kohara, and S. Morishita
Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates
Genome Res., September 1, 2007; 17(9): 1254 - 1265.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Fisher, C. Hedeler, K. Wolstencroft, H. Hulme, H. Noyes, S. Kemp, R. Stevens, and A. Brass
A systematic strategy for large-scale analysis of genotype phenotype correlations: identification of candidate genes involved in African trypanosomiasis
Nucleic Acids Res., August 20, 2007; (2007) gkm623v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Laakso, S. Tuupanen, A. Karhu, R. Lehtonen, L. A. Aaltonen, and S. Hautaniemi
Computational identification of candidate loci for recessively inherited mutation using high-throughput SNP arrays
Bioinformatics, August 1, 2007; 23(15): 1952 - 1961.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Kerhornou and R. Guigo
BioMoby web services to support clustering of co-regulated genes based on similarity of promoter configurations
Bioinformatics, July 15, 2007; 23(14): 1831 - 1833.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. D. Schmid, T. Sengstag, P. Bucher, and M. Delorenzi
MADAP, a flexible clustering tool for the interpretation of one-dimensional genome annotation data
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W201 - W205.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
G. Roma, G. Cobellis, P. Claudiani, F. Maione, P. Cruz, G. Tripoli, M. Sardiello, I. Peluso, and E. Stupka
A novel view of the transcriptome revealed from gene trapping in mouse embryonic stem cells
Genome Res., July 1, 2007; 17(7): 1051 - 1060.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. D. Finn, J. W. Stalker, D. K. Jackson, E. Kulesha, J. Clements, and R. Pettett
ProServer: a simple, extensible Perl DAS server
Bioinformatics, June 15, 2007; 23(12): 1568 - 1570.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Taylor, W. Valdar, A. Kumar, J. Flint, and R. Mott
Management, presentation and interpretation of genome scans using GSCANDB
Bioinformatics, June 15, 2007; 23(12): 1545 - 1549.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
C. Huebner, I. Petermann, B. L. Browning, A. N. Shelling, and L. R. Ferguson
Triallelic Single Nucleotide Polymorphisms and Genotyping Error in Genetic Epidemiology Studies: MDR1 (ABCB1) G2677/T/A as an Example
Cancer Epidemiol. Biomarkers Prev., June 1, 2007; 16(6): 1185 - 1192.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
C. M. Koch, R. M. Andrews, P. Flicek, S. C. Dillon, U. Karaoz, G. K. Clelland, S. Wilcox, D. M. Beare, J. C. Fowler, P. Couttet, et al.
The landscape of histone modifications across 1% of the human genome in five human cell lines
Genome Res., June 1, 2007; 17(6): 691 - 707.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. S. Rozowsky, D. Newburger, F. Sayward, J. Wu, G. Jordan, J. O. Korbel, U. Nagalakshmi, J. Yang, D. Zheng, R. Guigo, et al.
The DART classification of unannotated transcription within the ENCODE regions: Associating transcription with known and novel loci
Genome Res., June 1, 2007; 17(6): 732 - 745.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. Ponjavic, C. P. Ponting, and G. Lunter
Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs
Genome Res., May 1, 2007; 17(5): 556 - 565.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
H. Kikuta, M. Laplante, P. Navratilova, A. Z. Komisarczuk, P. G. Engstrom, D. Fredman, A. Akalin, M. Caccamo, I. Sealy, K. Howe, et al.
Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates
Genome Res., May 1, 2007; 17(5): 545 - 555.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. J. Gaulton, K. L. Mohlke, and T. J. Vision
A computational system to select candidate genes for complex human traits
Bioinformatics, May 1, 2007; 23(9): 1132 - 1140.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. A. A. Castro, J. C. M. Mombach, R. M. C. de Almeida, and J. C. F. Moreira
Impaired expression of NER gene network in sporadic solid tumors
Nucleic Acids Res., March 19, 2007; 35(6): 1859 - 1867.
[Abstract] [Full Text] [PDF]


Home page
FASEB J.Home page
M. R. de la Vega, R. G. Sevilla, A. Hermoso, J. Lorenzo, S. Tanco, A. Diez, L. D. Fricker, J. M. Bautista, and F. X. Aviles
Nna1-like proteins are active metallocarboxypeptidases of a new and diverse M14 subfamily
FASEB J, March 1, 2007; 21(3): 851 - 865.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. A. Saunders, H. Liang, and W.-H. Li
Human polymorphism at microRNAs and microRNA target sites
PNAS, February 27, 2007; 104(9): 3300 - 3305.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. R. Miller, J. P. Dunham, A. Amores, W. A. Cresko, and E. A. Johnson
Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers
Genome Res., February 1, 2007; 17(2): 240 - 248.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. J. Cooper, N. D. Trinklein, L. Nguyen, and R. M. Myers
Serum response factor binding sites differ in three human cell types
Genome Res., February 1, 2007; 17(2): 136 - 144.
[Abstract] [Full Text] [PDF]


Home page
DevelopmentHome page
S. L. Kuslak, J. L. Thielen, and P. C. Marker
The mouse seminal vesicle shape mutation is allelic with Fgfr2
Development, February 1, 2007; 134(3): 557 - 565.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. M. Hoffman and E. Birney
Estimating the Neutral Rate of Nucleotide Substitution Using Introns
Mol. Biol. Evol., February 1, 2007; 24(2): 522 - 531.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S.-Y. Tsang, S.-K. Ng, Z. Xu, and H. Xue
The Evolution of GABAA Receptor-Like Genes
Mol. Biol. Evol., February 1, 2007; 24(2): 599 - 610.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Liu, J. M. Brockman, B. Dass, L. N. Hutchins, P. Singh, J. R. McCarrey, C. C. MacDonald, and J. H. Graber
Systematic variation in mRNA 3'-processing signals during mouse spermatogenesis
Nucleic Acids Res., January 12, 2007; 35(1): 234 - 246.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Barthelmes, C. Ebeling, A. Chang, I. Schomburg, and D. Schomburg
BRENDA, AMENDA and FRENDA: the enzyme information system in 2007
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D511 - D514.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Kim, A. V. Alekseyenko, M. Roy, and C. Lee
The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D93 - D98.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Groth, N. Pavlova, I. Kalev, S. Tonov, G. Georgiev, H.-D. Pohlenz, and B. Weiss
PhenomicDB: a new cross-species genotype/phenotype resource
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D696 - D699.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Ruan, Y. Guo, H. Li, Y. Hu, F. Song, X. Huang, K. Kristiensen, L. Bolund, and J. Wang
PigGIS: Pig Genomic Informatics System
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D654 - D657.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Wilson, M. Madera, C. Vogel, C. Chothia, and J. Gough
The SUPERFAMILY database in 2007: families and functions
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D308 - D313.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. M. Kuhn, D. Karolchik, A. S. Zweig, H. Trumbower, D. J. Thomas, A. Thakkapallayil, C. W. Sugnet, M. Stanke, K. E. Smith, A. Siepel, et al.
The UCSC genome browser database: update 2007
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D668 - D673.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. M. Smith, J. H. Finger, T. F. Hayamizu, I. J. McCright, J. T. Eppig, J. A. Kadin, J. E. Richardson, and M. Ringwald
The mouse Gene Expression Database (GXD): 2007 update
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D618 - D623.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Parkinson, M. Kapushesky, M. Shojatalab, N. Abeygunawardena, R. Coulson, A. Farne, E. Holloway, N. Kolesnykov, P. Lilja, M. Lukk, et al.
ArrayExpress--a public database of microarray experiments and gene expression profiles
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D747 - D750.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Li, L. Ma, H. Li, S. Vang, Y. Hu, L. Bolund, and J. Wang
Snap: an integrated SNP annotation platform
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D707 - D710.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Gattiker, C. Niederhauser-Wiederkehr, J. Moore, L. Hermida, and M. Primig
The GermOnline cross-species systems browser provides comprehensive information on genes and gene products relevant for sexual reproduction
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D457 - D462.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Lawson, P. Arensburger, P. Atkinson, N. J. Besansky, R. V. Bruggner, R. Butler, K. S. Campbell, G. K. Christophides, S. Christley, E. Dialynas, et al.
VectorBase: a home for invertebrate vectors of human pathogens
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D503 - D505.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. J. P. Hubbard, B. L. Aken, K. Beal, B. Ballester, M. Caccamo, Y. Chen, L. Clarke, G. Coates, F. Cunningham, T. Cutts, et al.
Ensembl 2007
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D610 - D617.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. N. Twigger, M. Shimoyama, S. Bromberg, A. E. Kwitek, H. J. Jacob, and the RGD Team
The Rat Genome Database, update 2007--Easing the path from disease to data and back again
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D658 - D662.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. Hervold, A. Martin, R. A. Kirkpatrick, P. F. Mc Kenna, and F. A. Ramirez-Weber
Hedgehog Signaling Pathway Database: a repository of current annotation efforts and resources for the Hh research community
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D595 - D598.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. A. Bogue, S. C. Grubb, T. P. Maddatu, and C. J. Bult
Mouse Phenome Database (MPD)
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D643 - D649.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. M. Hulbert, L. J. Smink, E. C. Adlem, J. E. Allen, D. B. Burdick, O. S. Burren, C. C. Cavnor, G. E. Dolman, D. Flamez, K. F. Friery, et al.
T1DBase: integration and presentation of complex data for type 1 diabetes research
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D742 - D746.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
T. A. Eyre, M. W. Wright, M. J. Lush, and E. A. Bruford
HCOP: a searchable database of human orthology predictions
Brief Bioinform, January 1, 2007; 8(1): 2 - 5.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Cao, J.-L. Li, D. Li, J. F. Tobin, and R. E. Gimeno
Molecular identification of microsomal acyl-CoA:glycerol-3-phosphate acyltransferase, a key enzyme in de novo triacylglycerol synthesis
PNAS, December 26, 2006; 103(52): 19695 - 19700.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. Elnitski, V. X. Jin, P. J. Farnham, and S. J.M. Jones
Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques
Genome Res., December 1, 2006; 16(12): 1455 - 1464.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
E. J. Vallender, J. E. Paschall, C. M. Malcom, B. T. Lahn, and G. J. Wyckoff
SPEED: a molecular-evolution-based database of mammalian orthologous groups
Bioinformatics, November 15, 2006; 22(22): 2835 - 2837.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
W. H. Press and H. Robins
Isochores Exhibit Evidence of Genes Interacting With the Large-Scale Genomic Environment
Genetics, October 1, 2006; 174(2): 1029 - 1040.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (4878K) Freely available
Right arrow Screen PDF (672K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Birney, E.
Right arrow Articles by Hubbard, T. J. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Birney, E.
Right arrow Articles by Hubbard, T. J. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?