Nucleic Acids Research Advance Access published online on November 13, 2007
Nucleic Acids Research, doi:10.1093/nar/gkm988
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Database Issue |
Ensembl 2008
1European Bioinformatics Institute (EMBL-EBI) and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
*To whom correspondence should be addressed. Tel: + 44 1223 492581; Fax: +44 1223 494468; Email: flicek{at}ebi.ac.uk
Received September 15, 2007. Revised October 18, 2007. Accepted October 19, 2007.
| ABSTRACT |
|---|
|
|
|---|
The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases and other information for chordate and selected model organism and disease vector genomes. As of release 47 (October 2007), Ensembl fully supports 35 species, with preliminary support for six additional species. New species in the past year include platypus and horse. Major additions and improvements to Ensembl since our previous report include extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein–DNA interactions and the Ensembl regulatory build; support for customization of the Ensembl web interface through the addition of user accounts and user groups; and increased support for genome resequencing. We have also introduced new comparative genomics-based data mining options and report on the continued development of our software infrastructure.
| INTRODUCTION |
|---|
|
|
|---|
The availability of complete genome sequences for an increasing number of chordates has had a dramatic impact on biomedical research in the 21st century. Now 7 years beyond the initial publications of the draft human genome sequence (1,2), both the number of sequenced genomes and the total amount of genome-wide data that can be naturally organized on the genome sequence continue to rapidly increase. The Ensembl project provides a comprehensive genome information system consisting of data storage, integration, analysis and visualization of a wide variety of biological data. In comparison to similar projects based at the University of California Santa Cruz (3) and the National Center for Biotechnology Information (4) the distinguishing characteristics of the Ensembl project include:
- – The Ensembl genome browser available at http://www.ensembl.org providing visualization for our own and collaborators genome annotations, alignments, variation and functional genomics data and supporting additional data integration through the DAS protocol.
- – Ensembl gene sets created using an automated analysis pipeline that has been significantly optimized based on the completeness of the genome sequence and the availability of species-specific supporting data.
- – The Ensembl application programming interface (API) that allows programmatic access to all of our data sets including annotations, genomic alignments and variation data.
- – BioMart data mining tools, which support sophisticated Ensembl-specific queries and federated queries with other BioMart-compliant data resources.
- – An entirely open resource with all of our code and data freely available to all users.
- – Ensembl gene sets created using an automated analysis pipeline that has been significantly optimized based on the completeness of the genome sequence and the availability of species-specific supporting data.
Ensembl generally releases updates six times each year in February, April, June, August, October and December. Specific data updates are driven by the availability of new or updated genome sequence assemblies, significant increases in supporting evidence for genome annotations, updated releases of major external data sets [such as dbSNP (5)] that are incorporated into Ensembl, and new biological data resources such as protein–DNA interaction maps based on genome-wide ChIP-chip and ChIP-seq data sets. Each new Ensembl release may also include new data visualization options and improvements to the underlying software infrastructure.
This report lists only some of the new features, new data and other improvements that we have added to Ensembl since our last report (7). Users interested in the most up-to-date details of the Ensembl project should visit the Ensembl main page (http://www.ensembl.org) and follow the What's new link and/or subscribe to the low-volume Ensembl announce mailing list by sending email subscribe ensembl-announce as the message body to majordomo{at}ebi.ac.uk. Other information about Ensembl features is available on the Ensembl help pages or by email at helpdesk{at}ensembl.org.
| RESULTS |
|---|
|
|
|---|
Ensembl regulatory build
The Ensembl regulatory build is designed to automatically annotate all of the functional regulatory regions in the genome and assign putative functions to as many of these regions as possible. The initial release of the Ensembl regulatory build in June 2007, integrated eight genome-wide data sets, mainly in pre-publication resource status, to identify
110 000 regulatory features across the human genome. Briefly, the integration procedure starts with likely regulatory regions (such as DNase I hypersensitive sites) and seeks to identify the function of each site by analysing specific patterns of histone modification immediately adjacent to the region. We identified a number of patterns highly enriched for gene starts, genic regions and distal regions. Ensembl regulatory features are displayed on ContigView (Figure 1).
|
Functional genomics database
As noted above, the Ensembl Functional Genomics Database is the fourth species-specific database that is part of the standard Ensembl release. The Functional Genomics Database and its associated API provide a platform for the storage, analysis and visualization of array-based functional genomics data. We have created an initial infrastructure for analysis of these data based on the Ensembl analysis pipeline (8). This structure supports the modular incorporation of analysis tools dedicated to various aspects of tiling array analysis such as normalization and platform-specific hit identification.
The database is currently used to support the Ensembl regulatory build (see above) and the display on of ChIP-chip data and analysis within Ensembl (Figure 2). The database and API feature a fully automated data import structure, an extensible array model and support for the Tab2MAGE metadata format (9). Additionally, the database is designed for deployment in external research laboratories and supports local data processing and visualization through DAS.
|
Ensembl customization: user accounts and groups
The major new Ensembl website functionality over the past year is the addition of user and group accounts. These accounts enable users to create bookmarks, customize their Ensembl interface and share their bookmarks and configurations with other users in an Ensembl group. We note, importantly, that all Ensembl data is equally accessible to users whether or not they create an user account.
Ensembl user accounts are designed to personalize the Ensembl interface. As the number of data tracks in Ensembl has grown, the default visualization settings are not ideal for every user. For example, some users may be interested in displaying only the Ensembl genes track together with mapping of gene expression arrays and SNP locations, while other users may want a display consisting of constrained elements, RNA genes, the underlying clone tilepath, or any of more than one hundred available data tracks. These personalized interfaces can now be saved and shared through Ensembl accounts.
Ensembl Groups have several functions. The primary function is to share configurations, bookmarks, or notes with other members of the group. Single users can also create groups as virtual folders to organize bookmarks, configurations and notes-based separate projects. Groups may be created and administered by any user with an Ensembl account. Group administrators can invite anyone to join their group and users can be members of several groups simultaneously. All group members must also have Ensembl accounts.
Notes are currently supported on GeneView pages and allow users the option of creating their own annotations and have these integrated into the web display. Notes will be added to other pages in the future.
New species and improved gene annotations
The Ensembl website currently displays data for 41 species. In the past year, we have added data for seven new high coverage genomes and generated updated gene sets for eight species. Previously, we reported that four low-coverage (2x) genome gene sets were available with five more underway (7). During this year we have finished the both the gene sets in progress and sets for an additional five species [Spermophilus tridecemlineatus (squirrel), Tupaia belangeri (tree shrew), Cavia porcellus (guinea pig), Microcebus murinus (mouse lemur), Ochotona princeps (pika)]. This set of 14 low-coverage annotated genome sequences provides an extensive resource for mammalian comparative genomics.
We have continued the CCDS (Consensus Coding Sequence) collaboration with the Sanger Institute's Havana group (http://www.sanger.ac.uk/HGP/havana/), UCSC (3) and NCBI (4). CCDS is a stable set of protein-coding gene structures for which all consortium members agree on to the base pair. We have released an update to the set that includes 18 290 CDSs from 16 003 genes. This is a substantial improvement in gene coverage over the previous set which contained 14 795 CDSs from 13 142 genes. A CCDS set has also been generated for mouse, which includes 13 374 CDSs from 13 014 genes. Further updates to CCDS sets are in progress based on new human and mouse Ensembl gene builds, Refseq (10) builds and Havana annotation. Additional details regarding the CCDS project are available from http://www.ncbi.nlm.nih.gov/CCDS/.
The Ensembl gene build process is based on alignments of protein and cDNA sequences and in order to produce a high-quality gene set, it is crucial to maximize the value of species-specific sequence data and ensure the suitability of all input sequences. In light of this, we have made improvements to several stages of the automatic annotation process. Improved use of species-specific sequences primarily addressed gene models characterized by a short first CDS exon followed by a long (>10 000 bp) intron as well as those with non GT–AG splice sites. Using standard gene-wise (11) parameters, neither case was predicted well by the Ensembl pipeline. To address these cases, we now run gene-wise with two different parameter sets and also run exonerate (12), a faster alignment algorithm more suited to the longer genomic sequences required for accurate long intron prediction. The results of these three analyses for each protein are compared and the best gene prediction chosen on the basis of a set of rules including percentage identity of the model to the original protein. Using this improved method, the percentage of Refseq genes for which we produce at least one identical CDS model increased from 78% to 88% and for Havana genes from 79% to 88%. We have also improved the quality of the input sequence data by a careful filtering process that identifies anomalous sequences such as chimeric cDNAs, cDNAs with retained introns and viral proteins, and protein sequences derived from repeats. For example, we remove from our input sequence data all of the cDNAs annotated as chimeric by the Mammalian Gene Collection (13). Removing these protein and cDNA sequences from the Ensembl gene build input reduced artefactual gene merging and over prediction.
Two other notable gene build improvements represent incorporation of information not previously used by Ensembl. The first development concerns UTRs that are added from cDNAs, when the cDNA exon boundaries match those from the protein model. Often there is a choice of possible cDNAs with differing UTRs. We are now prioritizing these cDNA choices on whether they match the boundaries of paired end tags (ditags) experimentally derived from the starts and ends of cDNAs, providing a second source of evidence to accurately determine UTR boundaries. We are mapping ditag sequences from the Genome Institute of Singapore and from the Fantom project for human and mouse (14–16). The second enhancement is specific to immunoglobulin segments, which present problems for standard gene prediction methods because the somatic rearrangements of gene segment clusters make complete cDNAs difficult to align. We now align annotated segments from the IMGT database (17) for mouse and human. The predictions based on these replace any overlapping gene models produced by the standard Ensembl pipeline in the immunoglobulin gene clusters.
New gene builds in 2007 included updates to both human and mouse, which both benefited from the methodological improvements described above. For the case of mouse, the new gene build was in support of the newly released NCBI build 37 genome assembly, while the updated human gene build incorporates the latest Havana manual annotation set.
Resequencing data: new resources and visualization
New sequencing technologies are expected to make whole genome resequencing feasible on a large scale (18,19). The genome sequence for a single individual is already available using previous generation sequencing technology (20). We recently reported on TranscriptSNPView, a transcript-based visualization for resequencing data and our SSAHA-based (21) alignment of resequencing reads to the mouse genome (22). We have extended this technique and TranscriptSNPView over the past year to include resequenced human individuals and rat strains. This year we have developed additional resources for analysis and visualization of resequencing data. The new SequenceAlignView (Figure 3) displays the reference genome sequence together with the genome sequence of individuals (or strains in the case of mouse and rat). With this view, the exact sequence of the individual can be quickly determined and the differences between the sequenced individual and the reference genome assembly highlighted. Resequencing data is also provided in structured EMF (Ensembl Multi-Format) text files. On our FTP site, users doing comparative genomics will also find EMF files available for multiple sequence alignments.
|
DAS extensions
Ensembl continues to make extensive use of the DAS protocol (23). During this year, we have released two new DAS resources. Previously, we extended the Ensembl genome browser with DAS client functionality, which allows researchers around the world to remotely host data sources and view these on major Ensembl displays including CytoView, ContigView, GeneView and ProtView (24). This year, we extended our client visualization support through DAS to include a colour gradient, histogram and tiling array wiggle format (Figure 4). These new visualization options are particularly applicable to dense genome data such as that produced by whole-genome tiling array experiments. We now also serve current Ensembl data for integration into other DAS clients. Data available for integration into our DAS clients includes transcripts, ditag data, markers, karyotype information, repeats and DNA and protein align features including cDNA alignments and UniProt alignments. DAS sources setup by Ensembl are also automatically registered with the DAS registry (25). Instructions for using DAS with Ensembl are available from http://www.ensembl.org/info/data/external_data/das/index.html.
|
Ensembl software infrastructure
The Ensembl core software system (26) provides an efficient way of representing genome data in a relational database and providing access to it via an object-oriented API. This API is used by our computational pipelines to generate and store genome annotation, and by the Ensembl website to retrieve information that is to be displayed to the user. Bioinformaticians can use the API to access Ensembl databases remotely (Ensembl databases are available at mysql://ensembldb.ensembl.org:3306; Ensembl BioMart databases use mysql://martdb.ensembl.org:3316) or local databases containing their own data. We maintain full unit test coverage for the API.
The database representation and API are being continuously developed to address bottlenecks affecting website and pipeline performance and increase flexibility. While most of this development is incremental in nature, two significant improvements over the past year merit special mention. First, the mechanism that links the identifiers between Ensembl genes, transcripts and translations and their counterparts in external databases has been significantly improved and extended, including a new configuration system allowing us to appropriately address specific data types and relationships between external and Ensembl data. Second, we have expanded the automatic data quality checks that are vital to ensuring that the billions individual pieces of Ensembl data are as accurate as possible. There are now nearly 300 such tests that run in advance of each Ensembl release.
Comparative genomics
The protein tree calculation pipeline has evolved since last year with closer collaboration with the TreeFam project (http://www.treefam.org). TreeBeST software (http://treesoft.sourceforge.net) is used to both build a protein tree and reconcile it with the species tree. This reconciliation step allows us to call duplication and speciation events in the tree. Next, we check for dubious duplication events. These correspond to prediction where a duplication event is followed by a large number of gene loss events. Finally, we can infer paralogy and orthology relationships between the genes using the resulting protein tree.
Multiple genomic alignments are now calculated using Pecan (http://www.ebi.ac.uk/~bjp/pecan/) as it has been shown to be one of the best algorithms in terms of specificity and sensitivity (27). The new set of alignments includes the platypus genome. Each position in these alignments is further analysed to evaluate the level of evolutionary constraint using GERP as previously described (28). GERP also defines stretches of the Pecan alignments with a high level of conservation called constrained elements (Figure 1).
Data mining for comparative genomics
ComparaMart is a new data mining tool created to allow researchers to create intuitive queries against the Ensembl Compara multi-species database. ComparaMart uses the BioMart (6) data federation technology and provides a powerful, flexible tool to access a subset of the Compara data including predictions of homologues proteins and whole genome alignments.
As noted above, the Compara database stores results of genome-wide species comparisons calculated for each release. The ComparaMart database includes three main data sets: Ensembl homology, Ensembl pair-wise alignments and Ensembl multiple alignments. Through the ComparaMart interface, users may access the Ensembl homology data set to retrieve orthology or paralogy information for two species including various identifiers, homology descriptions, DNA/peptide sequences and peptide alignments. Additionally, the Ensembl homology data can also be linked to any Ensembl species-specific data sets to build more complex queries such as a list of all SNPs in human and mouse one-to-one orthologues. Specific data mining for pair-wise and multi-species whole-genome alignments are accessible through their respective data sets, although the multiple alignments data set includes only the constrained elements defined by GERP (28) from the Pecan alignments of 10 amniota vertebrates.
Outreach
Ensembl continuously tries to enhance the user experience and for this purpose we are in touch with our user community. This year we added video tutorials at http://www.ensembl.org/info/helpdesk/tutorials/index.html and continue to provide on-site courses on request. In an effort to gather information from Ensembl users and better understand how people use Ensembl, we recently conducted our second major user survey. More than 450 people responded primarily from Europe and North America. The results show overall satisfaction with Ensembl's tools and resources. For example, the most important aspects of Ensembl are accurate information (60% of respondents), followed by high-quality data visualization (41%), constant availability (36%), and good data mining tools (33%). Interestingly, the most common user concern was also related to data visualization, specifically the complexity of the Ensembl web interface. We are have already responded to several aspects of the survey and plan to make significant improvements to the web interface in 2008 to address the concerns raised.
| FUTURE DIRECTIONS |
|---|
|
|
|---|
The success of massively parallel sequencing technologies is a significant challenge for bioinformatics resources, although one that has been at least partially anticipated by Ensembl. We envision many ways this new technology will impact Ensembl over the coming year. We expect that resequencing data will be a significant part of Ensembl development over the next year and are working to scale our resequencing and variation resources appropriately. The sequencing technologies have likely made whole genome tiling array analysis obsolete (at least for ChIP) and we are adapting our functional genomics database for ChIP-seq analysis support. We anticipate continued enhancements of the Ensembl regulatory build as new genome-wide data sets become available through projects such as ENCODE. Finally we expect that new transcriptomics data sets will help us guide the Ensembl gene build both in terms of improving currently supported species and mapping transcription in newly sequenced genomes.
| ACKNOWLEDGEMENTS |
|---|
The Ensembl project receives primary funding from the Wellcome Trust. Additional funding is provided by EMBL, NHGRI, NIH-NIAID, BBSRC, MRC and the European Union. We acknowledge those researchers and organizations (especially Greg Crawford, Martin Hirst and the STAR Consortium) that have provided data to Ensembl prior to publication under the understandings of the Fort Lauderdale meeting discussing Community Resource Projects. We thank all of the users of our website and other resources, and those who have provided useful feedback though our mailing list. Funding to pay the Open Access publication charges for this article was provided by the Wellcome Trust.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature (2001) 409:860–921.[CrossRef][Medline]
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, et al. The sequence of the human genome. Science (2001) 291:1304–1351.
[Abstract/Free Full Text] - Kuhn RM, Karolchik D, Zweig AS, Trumbower H, Thomas DJ, Thakkapallayil A, Sugnet CW, Stanke M, Smith KE, et al. The UCSC genome browser database: update 2007. Nucleic Acids Res. (2007) 35:D668–D673.
[Abstract/Free Full Text] - Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. (2007) 35:D5–D12.
[Abstract/Free Full Text] - Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. (2001) 29:308–311.
[Abstract/Free Full Text] - Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, et al. EnsMart: a generic system for fast and flexible access to biological data. Genome Res. (2004) 14:160–169.
[Abstract/Free Full Text] - Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, et al. Ensembl 2007. Nucleic Acids Res. (2007) 35:D610–D617.
[Abstract/Free Full Text] - Potter SC, Clarke L, Curwen V, Keenan S, Mongin E, Searle SM, Stabenau A, Storey R, Clamp M. The Ensembl analysis pipeline. Genome Res. (2004) 14:934–941.
[Abstract/Free Full Text] - Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS, et al. A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics (2006) 7:489.[CrossRef][Medline]
- Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. (2007) 35:D61–D65.
[Abstract/Free Full Text] - Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. (2004) 14:988–995.
[Abstract/Free Full Text] - Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics (2005) 6:31.[CrossRef][Medline]
- MGC Project Team. The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). Genome Res. (2004) 14:2121–2127.
[Abstract/Free Full Text] - Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A, et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat. Methods (2005) 2:105–111.[CrossRef][Web of Science][Medline]
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, et al. The transcriptional landscape of the mammalian genome. Science (2005) 309:1559–1563.
[Abstract/Free Full Text] - Ruan Y, Ooi HS, Choo SW, Chiu KP, Zhao XD, Srinivasan KG, Yao F, Choo CY, Liu J, et al. Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res. (2007) 17:828–838.
[Abstract/Free Full Text] - Lefranc MP, Giudicelli V, Kaas Q, Duprat E, Jabado-Michaloud J, Scaviner D, Ginestoux C, Clément O, Chaume D, et al. IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res. (2005) 33:D593–D597.
[Abstract/Free Full Text] - Mardis ER. Anticipating the 1,000 dollar genome. Genome Biol. (2006) 7:112.[CrossRef][Medline]
- Bentley DR. Whole-genome re-sequencing. Curr. Opin. Genet. Dev. (2006) 16:545–552.[CrossRef][Web of Science][Medline]
- Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, et al. The diploid genome sequence of an individual human. PLoS Biol. (2007) 5:e254.[CrossRef][Medline]
- Ning Z, Cox AJ, Mullikin JC. SSAHA: a fast search method for large DNA databases. Genome Res. (2001) 11:1725–1729.
[Abstract/Free Full Text] - Cunningham F, Rios D, Griffiths M, Smith J, Ning Z, Cox T, Flicek P, Marin-Garcin P, Herrero J, et al. TranscriptSNPView: a genome-wide catalog of mouse coding variation. Nat. Genet. (2006) 38:853.[Web of Science][Medline]
- Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L. The distributed annotation system. BMC Bioinformatics (2001) 2:7.[CrossRef][Medline]
- Birney E, Andrews D, Caccamo M, Chen Y, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, et al. Ensembl 2006. Nucleic Acids Res. (2006) 34:D556–D561.
[Abstract/Free Full Text] - Prlic A, Down TA, Kulesha E, Finn RD, Kahari A, Hubbard TJ. Integrating sequence and structural biology with DAS. BMC Bioinformatics (2007) 8:333.[CrossRef][Medline]
- Stabenau A, McVicker G, Melsopp C, Proctor G, Clamp M, Birney E. The Ensembl core software libraries. Genome Res. (2004) 14:929–933.
[Abstract/Free Full Text] - Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A, Birney E, Keefe D, Schwartz AS, et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. (2007) 17:760–774.
[Abstract/Free Full Text] - Cooper GM, Stone EA, Asimenos G. NISC Comparative Sequencing Program. Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. (2005) 15:901–913.
[Abstract/Free Full Text] - Regha K, Sloane MA, Huang R, Pauler FM, Warczok KE, Melikant B, Radolf M, Martens JH, Schotta G, et al. Active and repressive chromatin are interspersed without spreading in an imprinted gene cluster in the mammalian genome. Mol. Cell (2007) 27:353–366.[CrossRef][Web of Science][Medline]
- Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature (2007) 447:661–678.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
S. Podder and T. C. Ghosh Exploring the Differences in Evolutionary Rates between Monogenic and Polygenic Disease Genes in Human Mol. Biol. Evol., April 1, 2010; 27(4): 934 - 941. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Cai, E. Borenstein, R. Chen, and D. A. Petrov Similarly Strong Purifying Selection Acts on Human Disease Genes of All Evolutionary Ages Gen Biol Evol, March 1, 2010; 2009(0): 131 - 144. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Polak and P. F. Arndt Long-Range Bidirectional Strand Asymmetries Originate at CpG Islands in the Human Genome Gen Biol Evol, March 1, 2010; 2009(0): 189 - 197. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Melzer, C. Villmann, K. Becker, K. Harvey, R. J. Harvey, N. Vogel, C. J. Kluck, M. Kneussel, and C.-M. Becker Multifunctional Basic Motif in the Glycine Receptor Intracellular Domain Induces Subunit-specific Sorting J. Biol. Chem., February 5, 2010; 285(6): 3730 - 3739. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Ruzanov and D. L. Riddle Deep SAGE analysis of the Caenorhabditis elegans transcriptome Nucleic Acids Res., February 3, 2010; (2010): gkq035v1 - gkq035. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Belinky, O. Cohen, and D. Huchon Large-Scale Parsimony Analysis of Metazoan Indels in Protein-Coding Genes Mol. Biol. Evol., February 1, 2010; 27(2): 441 - 451. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. V. Olsen, M. Vermeulen, A. Santamaria, C. Kumar, M. L. Miller, L. J. Jensen, F. Gnad, J. Cox, T. S. Jensen, E. A. Nigg, et al. Quantitative Phosphoproteomics Reveals Widespread Full Phosphorylation Site Occupancy During Mitosis Sci. Signal., January 12, 2010; 3(104): ra3 - ra3. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Turro, A. Lewin, A. Rose, M. J. Dallman, and S. Richardson MMBGX: a method for estimating expression at the isoform level and detecting differential splicing using whole-transcript Affymetrix arrays Nucleic Acids Res., January 1, 2010; 38(1): e4 - e4. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Aparicio, E. Carnero, X. Abad, N. Razquin, E. Guruceaga, V. Segura, and P. Fortes Adenovirus VA RNA-derived miRNAs target cellular genes involved in cell growth, gene expression and DNA repair Nucleic Acids Res., January 1, 2010; 38(3): 750 - 763. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Binkley, K. Karra, A. Kirby, M. Hosobuchi, E. A. Stone, and A. Sidow ProPhylER: A curated online resource for protein function and structure based on evolutionary constraint analyses Genome Res., January 1, 2010; 20(1): 142 - 154. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Huss III, P. Lindenbaum, M. Martone, D. Roberts, A. Pizarro, F. Valafar, J. B. Hogenesch, and A. I. Su The Gene Wiki: community intelligence applied to human gene annotation Nucleic Acids Res., January 1, 2010; 38(suppl_1): D633 - D639. [Abstract] [Full Text] [PDF] |
||||
![]() |
The UniProt Consortium The Universal Protein Resource (UniProt) in 2010 Nucleic Acids Res., January 1, 2010; 38(suppl_1): D142 - D148. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Flicek, B. L. Aken, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, G. Coates, S. Fairley, et al. Ensembl's 10th year Nucleic Acids Res., January 1, 2010; 38(suppl_1): D557 - D562. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Boros, A. O'Donnell, I. J. Donaldson, A. Kasza, L. Zeef, and A. D. Sharrocks Overlapping promoter targeting by Elk-1 and other divergent ETS-domain transcription factor family members Nucleic Acids Res., December 1, 2009; 37(22): 7368 - 7380. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Degn, A. G. Hansen, R. Steffensen, C. Jacobsen, J. C. Jensenius, and S. Thiel MAp44, a Human Protein Associated with Pattern Recognition Molecules of the Complement System and Regulating the Lectin Pathway of Complement Activation J. Immunol., December 1, 2009; 183(11): 7371 - 7378. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Hufton, S. Mathia, H. Braun, U. Georgi, H. Lehrach, M. Vingron, A. J. Poustka, and G. Panopoulou Deeply conserved chordate noncoding sequences preserve genome synteny but do not drive gene duplicate retention Genome Res., November 1, 2009; 19(11): 2036 - 2051. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. G. Roider, B. Lenhard, A. Kanhere, S. A. Haas, and M. Vingron CpG-depleted promoters harbor tissue-specific transcription factor binding signals--implications for motif overrepresentation analyses Nucleic Acids Res., October 1, 2009; 37(19): 6305 - 6315. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ahmed, E. Valen, A. Sandelin, and J. Matthews Dioxin Increases the Interaction Between Aryl Hydrocarbon Receptor and Estrogen Receptor Alpha at Human Promoters Toxicol. Sci., October 1, 2009; 111(2): 254 - 266. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Freeman, R. L. Warren, J. R. Webb, B. H. Nelson, and R. A. Holt Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing Genome Res., October 1, 2009; 19(10): 1817 - 1824. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zeng, S. Zhu, and H. Yan Towards accurate human promoter recognition: a review of currently used sequence features and classification methods Brief Bioinform, September 1, 2009; 10(5): 498 - 508. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Bollig, B. Perner, B. Besenbeck, S. Kothe, C. Ebert, S. Taudien, and C. Englert A highly conserved retinoic acid responsive element controls wt1a expression in the zebrafish pronephros Development, September 1, 2009; 136(17): 2883 - 2892. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. K. Auerbach, G. Euskirchen, J. Rozowsky, N. Lamarre-Vincent, Z. Moqtaderi, P. Lefrancois, K. Struhl, M. Gerstein, and M. Snyder Mapping accessible chromatin regions using Sono-Seq PNAS, September 1, 2009; 106(35): 14926 - 14931. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Lawson and L. Zhang Sexy gene conversions: locating gene conversions on the X-chromosome Nucleic Acids Res., August 1, 2009; 37(14): 4570 - 4579. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hilger, T. Bonaldi, F. Gnad, and M. Mann Systems-wide Analysis of a Phosphatase Knock-down by Quantitative Proteomics and Phosphoproteomics Mol. Cell. Proteomics, August 1, 2009; 8(8): 1908 - 1920. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. H. Tan, B. Bodenmiller, A. Pasculescu, M. Jovanovic, M. O. Hengartner, C. Jorgensen, G. D. Bader, R. Aebersold, T. Pawson, and R. Linding Comparative Analysis Reveals Conserved Protein Phosphorylation Networks Implicated in Multiple Diseases Sci. Signal., July 28, 2009; 2(81): ra39 - ra39. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Taft, E. A. Glazov, T. Lassmann, Y. Hayashizaki, P. Carninci, and J. S. Mattick Small RNAs derived from snoRNAs RNA, July 1, 2009; 15(7): 1233 - 1240. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. R. E. Nett, D. M. A. Martin, D. Miranda-Saavedra, D. Lamont, J. D. Barber, A. Mehlert, and M. A. J. Ferguson The Phosphoproteome of Bloodstream Form Trypanosoma brucei, Causative Agent of African Sleeping Sickness Mol. Cell. Proteomics, July 1, 2009; 8(7): 1527 - 1538. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Glez-Pena, G. Gomez-Lopez, D. G. Pisano, and F. Fdez-Riverola WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis Nucleic Acids Res., July 1, 2009; 37(suppl_2): W329 - W334. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kuzniar, K. Lin, Y. He, H. Nijveen, S. Pongor, and J. A. M. Leunissen ProGMap: an integrated annotation resource for protein orthology Nucleic Acids Res., July 1, 2009; 37(suppl_2): W428 - W434. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Mochida, T. Yoshida, T. Sakurai, Y. Ogihara, and K. Shinozaki TriFLDB: A Database of Clustered Full-Length Coding Sequences from Triticeae with Applications to Comparative Grass Genomics Plant Physiology, July 1, 2009; 150(3): 1135 - 1146. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. A. Blomster, V. Hietakangas, J. Wu, P. Kouvonen, S. Hautaniemi, and L. Sistonen Novel Proteomics Strategy Brings Insight into the Prevalence of SUMO-2 Target Sites Mol. Cell. Proteomics, June 1, 2009; 8(6): 1382 - 1390. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. T. Bradford, R. Hiramatsu, M. P. Maddugoda, P. Bernard, M.-C. Chaboissier, A. Sinclair, A. Schedl, V. Harley, Y. Kanai, P. Koopman, et al. The Cerebellin 4 Precursor Gene Is a Direct Target of SRY and SOX9 in Mice Biol Reprod, June 1, 2009; 80(6): 1178 - 1188. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-O. Desmet, D. Hamroun, M. Lalande, G. Collod-Beroud, M. Claustres, and C. Beroud Human Splicing Finder: an online bioinformatics tool to predict splicing signals Nucleic Acids Res., May 1, 2009; 37(9): e67 - e67. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. de Sousa Abreu, P. C. Sanchez-Diaz, C. Vogel, S. C. Burns, D. Ko, T. L. Burton, D. T. Vo, S. Chennasamudaram, S.-Y. Le, B. A. Shapiro, et al. Genomic Analyses of Musashi1 Downstream Targets Show a Strong Association with Cancer-related Processes J. Biol. Chem., May 1, 2009; 284(18): 12125 - 12135. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Talavera, R. A. Laskowski, and J. M. Thornton WSsas: a web service for the annotation of functional residues through structural homologues Bioinformatics, May 1, 2009; 25(9): 1192 - 1194. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Wang and T. S. Furey Analysis of Complex Disease Association and Linkage Studies Using the University of California Santa Cruz Genome Browser Circ Cardiovasc Genet, April 1, 2009; 2(2): 199 - 204. [Full Text] [PDF] |
||||
![]() |
A. T. Garnett, T. M. Han, M. J. Gilchrist, J. C. Smith, M. B. Eisen, F. C. Wardle, and S. L. Amacher Identification of direct T-box target genes in the developing zebrafish mesoderm Development, March 1, 2009; 136(5): 749 - 760. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Toll-Riera, N. Bosch, N. Bellora, R. Castelo, L. Armengol, X. Estivill, and M. Mar Alba Origin of Primate Orphan Genes: A Comparative Genomics Approach Mol. Biol. Evol., March 1, 2009; 26(3): 603 - 612. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Tongaonkar and M. E. Selsted SDF2L1, a Component of the Endoplasmic Reticulum Chaperone Complex, Differentially Interacts with {alpha}-, {beta}-, and {theta}-Defensin Propeptides J. Biol. Chem., February 27, 2009; 284(9): 5602 - 5609. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Warren, B. H. Nelson, and R. A. Holt Profiling model T-cell metagenomes with short reads Bioinformatics, February 15, 2009; 25(4): 458 - 464. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A Reeves, D. Talavera, and J. M Thornton Genome and proteome annotation: organization, interpretation and integration J R Soc Interface, February 6, 2009; 6(31): 129 - 147. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Waegele, I. Dunger-Kaltenbach, G. Fobo, C. Montrone, H.-W. Mewes, and A. Ruepp CRONOS: the cross-reference navigation server Bioinformatics, January 1, 2009; 25(1): 141 - 143. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Portales-Casamar, D. Arenillas, J. Lim, M. I. Swanson, S. Jiang, A. McCallum, S. Kirov, and W. W. Wasserman The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences Nucleic Acids Res., January 1, 2009; 37(suppl_1): D54 - D60. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Cochrane, R. Akhtar, J. Bonfield, L. Bower, F. Demiralp, N. Faruque, R. Gibson, G. Hoad, T. Hubbard, C. Hunter, et al. Petabyte-scale innovations at the European Nucleotide Archive Nucleic Acids Res., January 1, 2009; 37(suppl_1): D19 - D25. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Hulsen, P. M. A. Groenen, J. de Vlieg, and W. Alkema PhyloPat: an updated version of the phylogenetic pattern database contains gene neighborhood Nucleic Acids Res., January 1, 2009; 37(suppl_1): D731 - D737. [Abstract] [Full Text] [PDF] |
||||
![]() |
The UniProt Consortium The Universal Protein Resource (UniProt) 2009 Nucleic Acids Res., January 1, 2009; 37(suppl_1): D169 - D174. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Keerthikumar, R. Raju, K. Kandasamy, A. Hijikata, S. Ramabadran, L. Balakrishnan, M. Ahmed, S. Rani, L. D. N. Selvan, D. S. Somanathan, et al. RAPID: Resource of Asian Primary Immunodeficiency Diseases Nucleic Acids Res., January 1, 2009; 37(suppl_1): D863 - D867. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kamburov, C. Wierling, H. Lehrach, and R. Herwig ConsensusPathDB--a database for integrating human functional interaction networks Nucleic Acids Res., January 1, 2009; 37(suppl_1): D623 - D628. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-M. Rouillard and E. Gulari OligoArrayDb: pangenomic oligonucleotide microarray probe sets database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D938 - D941. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Gnad, M. Oroshi, E. Birney, and M. Mann MAPU 2.0: high-accuracy proteomes mapped to genomes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D902 - D906. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Lefever, J. Vandesompele, F. Speleman, and F. Pattyn RTPrimerDB: the portal for real-time PCR primers and probes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D942 - D945. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Pieper, N. Eswar, B. M. Webb, D. Eramian, L. Kelly, D. T. Barkan, H. Carter, P. Mankoo, R. Karchin, M. A. Marti-Renom, et al. MODBASE, a database of annotated comparative protein structure models and associated resources Nucleic Acids Res., January 1, 2009; 37(suppl_1): D347 - D354. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Barrell, E. Dimmer, R. P. Huntley, D. Binns, C. O'Donovan, and R. Apweiler The GOA database in 2009--an integrated Gene Ontology Annotation resource Nucleic Acids Res., January 1, 2009; 37(suppl_1): D396 - D403. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. L. Papadopoulos, M. Reczko, V. A. Simossis, P. Sethupathy, and A. G. Hatzigeorgiou The database of experimentally supported targets: a functional update of TarBase Nucleic Acids Res., January 1, 2009; 37(suppl_1): D155 - D158. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Y. K. Lam, E. Khurana, G. Fang, P. Cayting, N. Carriero, K.-H. Cheung, and M. B. Gerstein Pseudofam: the pseudogene families database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D738 - D743. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nogales-Cadenas, F. Abascal, J. Diez-Perez, J. M. Carazo, and A. Pascual-Montano CentrosomeDB: a human centrosomal proteins database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D175 - D180. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Ding, P. Lorenz, M. Kreutzer, Y. Li, and H.-J. Thiesen SysZNF: the C2H2 zinc finger gene database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D267 - D273. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Letunic, T. Doerks, and P. Bork SMART 6: recent updates and new developments Nucleic Acids Res., January 1, 2009; 37(suppl_1): D229 - D232. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. D. R. Croning, M. C. Marshall, P. McLaren, J. D. Armstrong, and S. G. N. Grant G2Cdb: the Genes to Cognition database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D846 - D851. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bhasi, P. Philip, V. Manikandan, and P. Senapathy ExDom: an integrated database for comparative analysis of the exon-intron structures of protein domains in eukaryotes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D703 - D711. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Miranda-Saavedra, S. De, M. W. Trotter, S. A. Teichmann, and B. Gottgens BloodExpress: a database of gene expression in mouse haematopoiesis Nucleic Acids Res., January 1, 2009; 37(suppl_1): D873 - D879. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Richardson, Q. Gao, C. Mitsopoulous, M. Zvelebil, L. H. Pearl, and F. M. G. Pearl MoKCa database--mutations of kinases in cancer Nucleic Acids Res., January 1, 2009; 37(suppl_1): D824 - D831. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Samarajiwa, S. Forster, K. Auchettl, and P. J. Hertzog INTERFEROME: the database of interferon regulated genes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D852 - D857. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Kuhn, D. Karolchik, A. S. Zweig, T. Wang, K. E. Smith, K. R. Rosenbloom, B. Rhead, B. J. Raney, A. Pohl, M. Pheasant, et al. The UCSC Genome Browser Database: update 2009 Nucleic Acids Res., January 1, 2009; 37(suppl_1): D755 - D761. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Lawson, P. Arensburger, P. Atkinson, N. J. Besansky, R. V. Bruggner, R. Butler, K. S. Campbell, G. K. Christophides, S. Christley, E. Dialynas, et al. VectorBase: a data resource for invertebrate vector genomics Nucleic Acids Res., January 1, 2009; 37(suppl_1): D583 - D587. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. P. Hubbard, B. L. Aken, S. Ayling, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, L. Clarke, et al. Ensembl 2009 Nucleic Acids Res., January 1, 2009; 37(suppl_1): D690 - D697. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Xi, J. Park, G. Ding, Y.-H. Lee, and Y. Li SysPIMP: the web-based systematical platform for identifying human disease-related mutated sequences from mass spectrometry Nucleic Acids Res., January 1, 2009; 37(suppl_1): D913 - D920. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Mabey Gilsenan, G. Atherton, J. Bartholomew, P. F. Giles, T. K. Attwood, D. W. Denning, and P. Bowyer Aspergillus Genomes and the Aspergillus Cloud Nucleic Acids Res., January 1, 2009; 37(suppl_1): D509 - D514. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Robinson, M. J. Waller, S. C. Fail, H. McWilliam, R. Lopez, P. Parham, and S. G. E. Marsh The IMGT/HLA database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D1013 - D1017. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Smedley, M. A. Swertz, K. Wolstencroft, G. Proctor, M. Zouberakis, J. Bard, J. M. Hancock, and P. Schofield Solutions for data integration in functional genomics: a critical assessment and case study Brief Bioinform, November 1, 2008; 9(6): 532 - 544. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Andachi A novel biochemical method to identify target genes of individual microRNAs: Identification of a new Caenorhabditis elegans let-7 target RNA, November 1, 2008; 14(11): 2440 - 2451. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kosmrlj, A. K. Jha, E. S. Huseby, M. Kardar, and A. K. Chakraborty How the thymus designs antigen-specific and self-tolerant T cell receptor sequences PNAS, October 28, 2008; 105(43): 16671 - 16676. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





















