Abstract

The regulation of protein function through reversible phosphorylation by protein kinases and phosphatases is a general mechanism controlling virtually every cellular activity. Eukaryotic protein kinases can be classified into distinct, well-characterized groups based on amino acid sequence similarity and function. We recently reported a highly sensitive and accurate hidden Markov model-based method for the automatic detection and classification of protein kinases into these specific groups. The Kinomer v. 1.0 database presented here contains annotated classifications for the protein kinase complements of 43 eukaryotic genomes. These span the taxonomic range and include fungi (16 species), plants (6), diatoms (1), amoebas (2), protists (1) and animals (17). The kinomes are stored in a relational database and are accessible through a web interface on the basis of species, kinase group or a combination of both. In addition, the Kinomer v. 1.0 HMM library is made available for users to perform classification on arbitrary sequences. The Kinomer v. 1.0 database is a continually updated resource where direct comparison of kinase sequences across kinase groups and across species can give insights into kinase function and evolution. Kinomer v. 1.0 is available at http://www.compbio.dundee.ac.uk/kinomer/.

INTRODUCTION

The regulation of protein function through reversible phosphorylation by protein kinases and phosphatases is a widespread cellular mechanism thought to control virtually every cellular activity (1), and abnormal levels of phosphorylation are known to be responsible for severe diseases (2).

Hanks and Hunter were the first to report that sequence similarity of kinase catalytic domains reflects protein kinase function and/or mode of regulation (3,4). Observation of distinct clades where function segregated with sequence similarity allowed Hanks and Hunter to divide the protein kinase superfamily into specific ‘groups’. The currently accepted classification of the eukaryotic protein kinase superfamily considers eight ‘conventional’ protein kinase groups (ePKs) and four ‘atypical’ groups (aPKs) (5,6). Among the ePKs are the AGC group (including cyclic-nucleotide and calcium-phospholipid-dependent kinases, ribosomal S6-phosphorylating kinases, G protein-coupled kinases and all close relatives of these sets); the CAMKs (calmodulin-regulated kinases); the CK1 group (casein kinase 1, and close relatives); the CMGC group (including cyclin-dependent kinases, mitogen-activated protein kinases, glycogen synthase kinases and CDK-like kinases); the RGC group (receptor guanylate cyclase); the STEs (including many kinases functioning in MAP kinase cascades); the TKs (tyrosine kinases) and the TKLs (tyrosine kinase-like kinases). However, there is a significant proportion of kinases which, whilst exhibiting some degree of sequence similarity to the eight groups above, could not be classified easily into particular groups. These form a ninth group called ‘Other’.

The aPKs are a small set of protein kinases that do not share clear sequence similarity with ePKs, but have been shown experimentally to have protein kinase activity. The bona fide aPKs (6) are the alpha-kinase group (exemplified by myosin heavy chain kinase of Dictyostelium discoideum), PIKK (phosphatidyl inositol 3′ kinase-related kinases), RIO and PHDK (pyruvate dehydrogenase kinases).

The sequencing of complete genomes for many eukaryotic species has allowed the determination and comparison of their complete kinase complements (kinomes). These include the kinomes of Saccharomyces cerevisiae (7), Caenorhabditis elegans (8), Drosophila melanogaster (9), Mus musculus (10), Homo sapiens (5), Dictyostelium discoideum (11), Strongylocentrotus purpuratus (12), Tetrahymena thermophila (13), and the plants Arabidopsis thaliana and Oryza sativa (14). Several parasite kinomes have been determined, including the malaria parasite Plasmodium falciparum (15), its comparison with Plasmodium yoelii (16) and those of the three Trypanosomatid species Leishmania major, Trypanosoma brucei and Trypanosoma cruzi (17). The kinomes of H. sapiens, M. musculus, S. purpuratus, D. melanogaster, C. elegans, S. cerevisiae, D. discoideum and T. thermophila are available through Kinbase (http://www.kinase.com/kinbase/). In particular, the observation that many important protein kinases of parasitic protozoa are significantly dissimilar from their eukaryotic counterparts has raised the prospects for therapeutics based on the selective inhibition of parasitic protein kinases (18–20).

We have recently exploited the sequence similarity of protein kinases in developing a multi-level Hidden Markov Model (HMM) library that is capable of classifying protein kinases into their correct functional group (6). The protein kinase HMM library was shown to be three times more sensitive than BLAST for identifying kinase catalytic domains. It was also shown to be more sensitive than a general Pfam model of the kinase catalytic domain, with the added advantage that the HMM library is capable of discriminating among protein kinase groups. The validated HMM library was applied to improve the group-level classification of the S. cerevisiae ePKs from 66.96% to 90.43% by classifying many of the yeast kinases previously assigned to the ‘Other’ group. In this article, we describe the extension of this analysis to the complete classification at the kinase group level of 43 curated eukaryotic kinomes and a web-based resource through which these annotations can be examined. In addition, we provide an interface to the HMM library, allowing for the classification of arbitrary sequences.

MATERIALS AND METHODS

Sequence data sources

The complete translated protein coding sequences were obtained for the fungi Aspergillus fumigatus (21), Aspergillus nidulans (22), Aspergillus niger (23), Aspergillus oryzae (24), Candida glabrata (25), Cryptococcus neoformans (26), Debaryomyces hansenii (25), Kluyveromyces lactis (25), Magnaporthe grisea (27), Neurospora crassa (28), Phanerochaete chrysosporium (29), Ustilago maydis (30) and Yarrowia lipolytica (25). Among the photosynthetic organisms we have included A. thaliana (31), the red alga Cyanidioschyzon merolae (32), the rice species Oryza sativa ssp. Japonica (33), the green algae Ostreococcus lucimarinus (34) and Ostreococcus tauri (35), and the poplar tree Populus trichocarpa (36). The metazoan genomes include the yellow fever mosquito Aedes aegypti (37), the malaria mosquito vector Anopheles gambiae (38), the silkworm Bombyx mori (39), the common dog Canis familiaris (40), the early chordate Ciona intestinalis (41), the chicken Gallus gallus (42), the Rhesus macaque Macaca mulatta (43), the marsupial Monodelphis domestica (Opossum) (44), the fishes medaka Oryzias latipes (45), Takifugu rubripes (46) and Tetraodon nigroviridis (47), the laboratory rat Rattus norvegicus (48) and the chimpanzee Pan troglodytes (49). Finally, we have also included the amoeba Entamoeba histolytica (50), the diatom Thalassiosira pseudonana (51) and the pathogenic protist Trichomonas vaginalis (52). The manually annotated kinomes of Caenorhabditis elegans (8), Dictyostelium discoideum (11), Drosophila melanogaster, Homo sapiens (5) and M. musculus (10) were downloaded from Kinbase (http://www.kinase.com/kinbase/) on 28 September 2008. The manually annotated kinomes of Encephalitozoon cuniculi, Saccharomyces cerevisiae and Schyzosaccharomyces pombe had previously been manually annotated and analysed in detail (53).

Kinase classification

The predicted peptide sequences for each of the genomes were searched individually against the Kinomer v. 1.0 multi-level HMM library (6) with the hmmpfam program of the HMMer package (54). Partial matches to the kinase catalytic domain were excluded through manual curation. Empirical cutoffs for association of kinase matches with each of the specific kinase groups were determined through analysis of the significance scores for the matches of the library HMMs to the well annotated kinases in Kinbase for the organisms H. sapiens, C. elegans, D. melanogaster and S. cerevisiae (6). The highest observed E-value for that group was taken as the cutoff for confident assignment. These are AGC (2.7e−7), CAMK (3.2e−14), CK1 (3.2e−5), CMGC (1.2e−7), RGC (4.8e−5), STE (1.4e−6), TK (1.1e−9), TKL (1.7e−12), Alpha (8.5e−66), PDHK (2.7e−10), PIKK (8.4e−6) and RIO (2.3e−3). Protein kinase catalytic domains that had E-values above this cutoff were automatically classified as belonging to the ‘Other’ group. Table 1 lists the protein kinase complements of the 43 eukaryotic genomes contained in Kinomer v.1.0, split by kinase group. All kinase matches were stored in a relational database, linking the sequence to the library matches and the subsequent assignments to a functional group.

Table 1.

The kinomes of the 43 genomes analysed split into the major kinase groups

Protein kinase groupNumber of predicted peptidesAGCCAMKCK1CMGCRGCSTETKTKLOtherTotal ePKsAlphaPDHKPIKKRIOTotal aPKs
Fungi
Ascomycete fungi
    Aspergillus fumigatus9630202733001310810203418
    Aspergillus nidulans10 7011923227012001710003418
    Aspergillus niger11 2002121344012101611803418
    Aspergillus oryzae12 07418233320131089803418
    Candida glabrata52152530423011001110402518
    Debaryomyces hansenii6319191832301300159103317
    Encephalitozoon cuniculi199745212000152900213
    Kluyveromyces lactis532722223230120089003418
    Magnaporthe grisea11 109211524200101910003306
    Neurospora crassa9822192022101410189503418
    Saccharomyces cerevisiae67172036425014001811702529
    Schyzosaccharomyces pombe50212028526013001710901528
    Yarrowia lipolytica643619192210110047603418
Basidiomycete fungi
    Cryptococcus neoformans6578191942501301990035210
    Phanerochaete chrysosporium10 0483323525016131011603418
    Ustilago maydis6522171921801602108403216
Plants
Streptophytes
    Arabidopsis thaliana30 6907611620119073362586111801438
    Oryza sativa ssp. Japonica66 7107213131147074511791391778048214
    Populus trichocarpa58 036561071996076310331361526017210
Green algae
    Ostreococcus lucimarinus76511624421092111310001517
    Ostreococcus tauri7892151942309213139801416
Red algae
    Cyanidioschyzon merolae5014109216070996201315
Diatoms
    Thalassiosira pseudonana11 390333932408042613702428
Amoebozoa
    Dictyostelium discoideum13 463432753804336927255605213
    Entamoeba histolytica9772374994702971093432100639
Excavates/Trichomonads
    Trichomonas vaginalis59 68115432164131139190868870042244
Metazoans
Arthropods/Nematodes
    Aedes aegypti16 7894835104372635188230046313
    Anopheles gambiae13 13337347316253217719601539
    Bombyx mori21 3022420319618259713101539
    Caenorhabditis elegans27 258384984502731821738416125412
    Drosophila melanogaster20 8154141103862133221122301539
Chordata/Fishes
    Ciona intestinalis19 858717213513438323253842112419
    Oryzias latipes25 107116146169810791355625681255416
    Takifugu rubripes21 974921111310012621135425582166417
    Tetraodon nigroviridis28 00594102127314551085333544155314
Chordata/Birds
    Gallus gallus22 195818914633721175916514639422
Chordata/Mammals
    Canis familiaris25 5599911622989781246114621756422
    Homo sapiens46 70482951268561914816478656320
    Macaca mulatta36 423133153231276102134712777612711232
    Monodelphis domestica32 612126149271131311821367278539810431
    Mus musculus39 667791181167760914916498656320
    Pan troglodytes32 83411613619118597149751773210612331
    Rattus norvegicus33 43812702996701486710484766322
Protein kinase groupNumber of predicted peptidesAGCCAMKCK1CMGCRGCSTETKTKLOtherTotal ePKsAlphaPDHKPIKKRIOTotal aPKs
Fungi
Ascomycete fungi
    Aspergillus fumigatus9630202733001310810203418
    Aspergillus nidulans10 7011923227012001710003418
    Aspergillus niger11 2002121344012101611803418
    Aspergillus oryzae12 07418233320131089803418
    Candida glabrata52152530423011001110402518
    Debaryomyces hansenii6319191832301300159103317
    Encephalitozoon cuniculi199745212000152900213
    Kluyveromyces lactis532722223230120089003418
    Magnaporthe grisea11 109211524200101910003306
    Neurospora crassa9822192022101410189503418
    Saccharomyces cerevisiae67172036425014001811702529
    Schyzosaccharomyces pombe50212028526013001710901528
    Yarrowia lipolytica643619192210110047603418
Basidiomycete fungi
    Cryptococcus neoformans6578191942501301990035210
    Phanerochaete chrysosporium10 0483323525016131011603418
    Ustilago maydis6522171921801602108403216
Plants
Streptophytes
    Arabidopsis thaliana30 6907611620119073362586111801438
    Oryza sativa ssp. Japonica66 7107213131147074511791391778048214
    Populus trichocarpa58 036561071996076310331361526017210
Green algae
    Ostreococcus lucimarinus76511624421092111310001517
    Ostreococcus tauri7892151942309213139801416
Red algae
    Cyanidioschyzon merolae5014109216070996201315
Diatoms
    Thalassiosira pseudonana11 390333932408042613702428
Amoebozoa
    Dictyostelium discoideum13 463432753804336927255605213
    Entamoeba histolytica9772374994702971093432100639
Excavates/Trichomonads
    Trichomonas vaginalis59 68115432164131139190868870042244
Metazoans
Arthropods/Nematodes
    Aedes aegypti16 7894835104372635188230046313
    Anopheles gambiae13 13337347316253217719601539
    Bombyx mori21 3022420319618259713101539
    Caenorhabditis elegans27 258384984502731821738416125412
    Drosophila melanogaster20 8154141103862133221122301539
Chordata/Fishes
    Ciona intestinalis19 858717213513438323253842112419
    Oryzias latipes25 107116146169810791355625681255416
    Takifugu rubripes21 974921111310012621135425582166417
    Tetraodon nigroviridis28 00594102127314551085333544155314
Chordata/Birds
    Gallus gallus22 195818914633721175916514639422
Chordata/Mammals
    Canis familiaris25 5599911622989781246114621756422
    Homo sapiens46 70482951268561914816478656320
    Macaca mulatta36 423133153231276102134712777612711232
    Monodelphis domestica32 612126149271131311821367278539810431
    Mus musculus39 667791181167760914916498656320
    Pan troglodytes32 83411613619118597149751773210612331
    Rattus norvegicus33 43812702996701486710484766322
Table 1.

The kinomes of the 43 genomes analysed split into the major kinase groups

Protein kinase groupNumber of predicted peptidesAGCCAMKCK1CMGCRGCSTETKTKLOtherTotal ePKsAlphaPDHKPIKKRIOTotal aPKs
Fungi
Ascomycete fungi
    Aspergillus fumigatus9630202733001310810203418
    Aspergillus nidulans10 7011923227012001710003418
    Aspergillus niger11 2002121344012101611803418
    Aspergillus oryzae12 07418233320131089803418
    Candida glabrata52152530423011001110402518
    Debaryomyces hansenii6319191832301300159103317
    Encephalitozoon cuniculi199745212000152900213
    Kluyveromyces lactis532722223230120089003418
    Magnaporthe grisea11 109211524200101910003306
    Neurospora crassa9822192022101410189503418
    Saccharomyces cerevisiae67172036425014001811702529
    Schyzosaccharomyces pombe50212028526013001710901528
    Yarrowia lipolytica643619192210110047603418
Basidiomycete fungi
    Cryptococcus neoformans6578191942501301990035210
    Phanerochaete chrysosporium10 0483323525016131011603418
    Ustilago maydis6522171921801602108403216
Plants
Streptophytes
    Arabidopsis thaliana30 6907611620119073362586111801438
    Oryza sativa ssp. Japonica66 7107213131147074511791391778048214
    Populus trichocarpa58 036561071996076310331361526017210
Green algae
    Ostreococcus lucimarinus76511624421092111310001517
    Ostreococcus tauri7892151942309213139801416
Red algae
    Cyanidioschyzon merolae5014109216070996201315
Diatoms
    Thalassiosira pseudonana11 390333932408042613702428
Amoebozoa
    Dictyostelium discoideum13 463432753804336927255605213
    Entamoeba histolytica9772374994702971093432100639
Excavates/Trichomonads
    Trichomonas vaginalis59 68115432164131139190868870042244
Metazoans
Arthropods/Nematodes
    Aedes aegypti16 7894835104372635188230046313
    Anopheles gambiae13 13337347316253217719601539
    Bombyx mori21 3022420319618259713101539
    Caenorhabditis elegans27 258384984502731821738416125412
    Drosophila melanogaster20 8154141103862133221122301539
Chordata/Fishes
    Ciona intestinalis19 858717213513438323253842112419
    Oryzias latipes25 107116146169810791355625681255416
    Takifugu rubripes21 974921111310012621135425582166417
    Tetraodon nigroviridis28 00594102127314551085333544155314
Chordata/Birds
    Gallus gallus22 195818914633721175916514639422
Chordata/Mammals
    Canis familiaris25 5599911622989781246114621756422
    Homo sapiens46 70482951268561914816478656320
    Macaca mulatta36 423133153231276102134712777612711232
    Monodelphis domestica32 612126149271131311821367278539810431
    Mus musculus39 667791181167760914916498656320
    Pan troglodytes32 83411613619118597149751773210612331
    Rattus norvegicus33 43812702996701486710484766322
Protein kinase groupNumber of predicted peptidesAGCCAMKCK1CMGCRGCSTETKTKLOtherTotal ePKsAlphaPDHKPIKKRIOTotal aPKs
Fungi
Ascomycete fungi
    Aspergillus fumigatus9630202733001310810203418
    Aspergillus nidulans10 7011923227012001710003418
    Aspergillus niger11 2002121344012101611803418
    Aspergillus oryzae12 07418233320131089803418
    Candida glabrata52152530423011001110402518
    Debaryomyces hansenii6319191832301300159103317
    Encephalitozoon cuniculi199745212000152900213
    Kluyveromyces lactis532722223230120089003418
    Magnaporthe grisea11 109211524200101910003306
    Neurospora crassa9822192022101410189503418
    Saccharomyces cerevisiae67172036425014001811702529
    Schyzosaccharomyces pombe50212028526013001710901528
    Yarrowia lipolytica643619192210110047603418
Basidiomycete fungi
    Cryptococcus neoformans6578191942501301990035210
    Phanerochaete chrysosporium10 0483323525016131011603418
    Ustilago maydis6522171921801602108403216
Plants
Streptophytes
    Arabidopsis thaliana30 6907611620119073362586111801438
    Oryza sativa ssp. Japonica66 7107213131147074511791391778048214
    Populus trichocarpa58 036561071996076310331361526017210
Green algae
    Ostreococcus lucimarinus76511624421092111310001517
    Ostreococcus tauri7892151942309213139801416
Red algae
    Cyanidioschyzon merolae5014109216070996201315
Diatoms
    Thalassiosira pseudonana11 390333932408042613702428
Amoebozoa
    Dictyostelium discoideum13 463432753804336927255605213
    Entamoeba histolytica9772374994702971093432100639
Excavates/Trichomonads
    Trichomonas vaginalis59 68115432164131139190868870042244
Metazoans
Arthropods/Nematodes
    Aedes aegypti16 7894835104372635188230046313
    Anopheles gambiae13 13337347316253217719601539
    Bombyx mori21 3022420319618259713101539
    Caenorhabditis elegans27 258384984502731821738416125412
    Drosophila melanogaster20 8154141103862133221122301539
Chordata/Fishes
    Ciona intestinalis19 858717213513438323253842112419
    Oryzias latipes25 107116146169810791355625681255416
    Takifugu rubripes21 974921111310012621135425582166417
    Tetraodon nigroviridis28 00594102127314551085333544155314
Chordata/Birds
    Gallus gallus22 195818914633721175916514639422
Chordata/Mammals
    Canis familiaris25 5599911622989781246114621756422
    Homo sapiens46 70482951268561914816478656320
    Macaca mulatta36 423133153231276102134712777612711232
    Monodelphis domestica32 612126149271131311821367278539810431
    Mus musculus39 667791181167760914916498656320
    Pan troglodytes32 83411613619118597149751773210612331
    Rattus norvegicus33 43812702996701486710484766322

User interface

The Kinomer v. 1.0 web server provides a comprehensive search interface for accessing the database. Sequences can be retrieved by kinase group, by species or by a combination of both. A summary table illustrates the quality of match of each sequence to the HMM library, as well as providing direct clickable links to the public databases (Figure 1). In addition, an option is available to allow data sets to be downloaded as FASTA format sequence files. The multiple sequence alignment analysis program Jalview (55) is integrated into the Kinomer v. 1.0 interface and allows visualization of the query results. Kinase sequences retrieved are grouped by type and aligned. Jalview allows colouring of the sequences by protein secondary structural properties or amino acid chemical character and on-the-fly calculation of Neighbour-Joining and average distance phylogenetic trees. The web-applet form of Jalview can launch the full Jalview application via the ‘File->View in Full Application’ option. This gives access to further tools for the generation of multiple sequence alignments by Muscle (56), MAFFT (57,58) or ClustalW (59) and secondary structure prediction by JNet (60,61).

Figure 1.

The precalculated kinomes may be downloaded from the Kinomer v. 1.0 website and select by species, kinase group or a combination of both.

In addition, a separate web interface allows users to classify arbitrary sequences with the HMM library. This web based tool allows a user to upload a sequence in any of the many sequence formats supported by EMBOSS (62), including the popular FASTA, GCG, PIR and SwissProt (62) formats. This sequence is subjected to basic quality assurance checks before the hmmpfam search job is queued for execution on a multi-node Linux cluster. The user is then provided with a job ID, and the interface is asynchronous, returning a status page to the user which is updated automatically. The user can bookmark the results page and return at a later time. In addition, an optional field allows the user to associate arbitrary comments with their job, a useful feature to allow otherwise similar jobs to be distinguished. There are no additional parameters that are user-selectable. This allows for a clean and straightforward interface form.

The results are displayed as a formatted HTML page (Figure 2) with the group classification clearly indicated. This shows to which protein kinase group Kinomer v. 1.0 has assigned the sequence. In addition, alternative assignments are given and a summary of all potential significant matches shown. Kinomer v. 1.0 will typically show matches to many kinase group HMMs spanning several kinase groups. All the top-scoring HMMs for one particular group will be the most significant matches, followed by closely related groups. The detailed alignment for each HMM match is linked further down the screen. As some users may wish for more details, the Kinomer v. 1.0 results page also provides a link to the raw HMMer output.

Figure 2.

Results of searching a peptide sequence for kinase catalytic domains using the Kinomer v. 1.0 HMM library. A list of hits is displayed at the top followed by the alignment of the peptide sequence to the individual sub-group HMMs that constitute the HMM library.

DISCUSSION

The 43 species considered here span a number of phylogenetic lineages, genome sizes and display a range of adaptations to their environment. The genome-wide kinase group assignments are consistent with our previously published results (6) in that seven protein kinase groups (AGC, CAMK, CK1, CMGC, STE, PIKK and RIO) are present in all species surveyed (Table 1) and some kinases in these groups are likely to be essential. Kinases of the groups RGC, TK, TKL, Alpha and PDHK are late innovations in specific phyla or have been lost secondarily in specific lines of descent. The presence of a discrete number of putative TKs in photosynthetic organisms and the pathogen Entamoeba histolytica suggests that TKs are also likely to have had an ancient origin. This observation has recently been strengthened by the finding of animal-like signalling molecules in the green alga Chlamydomonas reinhardtii (63). These include scavenger receptor cysteine rich (SRCR) and C-type lectin domain (CTLD) proteins, both of which play key roles in the innate immune system of metazoa. The identification of SH2 domain proteins in photosynthetic organisms (63,64) suggests that phosphotyrosine-SH2 domain signalling also has an ancient origin and that important cell signalling and adhesion domains evolved before the divergence of the animal lineage.

The observation that many species outside the Opisthokont group lack important kinase groups, as is the case of TKs in Apicomplexa (Miranda-Saavedra, D. et al., manuscript submitted for publication), and which have many lineage-specific groups of kinases, suggests that the group level is the most specific level for the automatic classification of kinomes based on models constructed from sequences outside the taxonomic clade under investigation. With the availability of a number of Deuterostome, Protostome and pre-bilaterian genome sequences, having all kinases belonging to a particular kinase group enables novel analyses to be performed. For example, it is now possible to trace the evolution of receptor tyrosine kinase families and that of their ligands. Since receptor tyrosine kinases are multi-domain proteins, diverging rates of evolution of the various domains, and their incorporation in the receptor molecule in select phylogenetic lineages, is informative of distinct selection pressures and can be informative of newly acquired functions through the acquisition of new ligand-binding domains. This is the case with the Trk family of receptor tyrosine kinases, which encode the neurotrophin receptors [nerve growth factor (NGF), brain-derived neurotrophic factor (BDNF), neurotrophin-3 (NT-3) and neurotrophin-4 (NT-4)]. The neurotrophin receptors are an ancient family whose function has been lost in multiple lineages and the roles of the receptors have been modified over time (65).

Kinomer v. 1.0 also includes the manually annotated kinomes of the model fungi S. cerevisiae and S. pombe, and that of the unicellular fungi-like parasite Encephalitozoon cuniculi (53). We have recently shown that the two model fungi share ∼85% of their kinomes (53), a degree of similarity much higher than that previously reported. The kinomes of budding and fission yeasts are therefore a useful dataset for annotating the kinomes of other fungi, among which we have included species of importance in basic and medical research, and in biotechnology. The manually annotated kinomes of C. elegans, D. discoideum, D. melanogaster, H. sapiens and M. musculus, as provided in Kinbase (http://www.kinase.com/kinbase/), have also been included in the Kinomer v. 1.0 database. These will facilitate the manual annotation of other kinomes included in the database and which belong to the same taxonomic clade. The classification of a number of kinases in the kinomes of C. elegans, D. discoideum, D. melanogaster, H. sapiens and M. musculus could be improved as suggested by the Kinomer v. 1.0 HMM group scores. However, careful manual annotation of the kinomes of other species in the same taxonomic clades will be performed in the future to make a more informed decision about the re-classification of such kinases.

To our knowledge, Kinomer v. 1.0 is unique in being based on a high-accuracy validated kinase-group classification method (6). Other databases of protein kinases exist, but none of these offer the combination of breadth and accuracy of kinase classification that is present in Kinomer v. 1.0. These include KinMutBase (66), a database of clinically validated mutations in human kinases that lead to disease, and RTK.db (67), a database of receptor tyrosine kinases. The Protein Kinase Resource (68) collates data from several databases and includes a subset of protein kinase 3D structures to produce high-quality multiple structure-based alignments. Kinbase (http://www.kinase.com/kinbase/) contains manually curated kinomes classified according to the Hanks and Hunter classification of protein kinases (4). Although of high quality, Kinbase only contains kinomes for nine species. Finally, KinG (69) includes protein kinases identified in completed genomes that have been classified by a variety of metazoan kinome-based sequence search methods, but do not provide the confidence in kinase classification that is seen in Kinomer v. 1.0. Different eukaryotic lineages possess lineage-specific kinase groups and families that are just beginning to be characterized and which constitute as much as 50% of their kinomes (17). The applicability of the KinG approach to non-metazoan kinases needs further testing. A similar limitation is encountered by the PANTHER (70) database. Although not specific to protein kinases, PANTHER provides an extensive and detailed HMM library for kinase families and sub-families. These family and sub-family HMM libraries are trained on metazoan sequences and thus preclude their use to annotate non-metazoan sequences confidently into kinase families and sub-families which may not exist in non-metazoan species. Kinomer v. 1.0 annotates to the group level only and in our view annotating to the family/sub-family level requires manual curation.

In summary, Kinomer v. 1.0 is an easy-to-use interface to a novel database of both manually and automatically annotated kinomes. The availability of 43 eukaryotic kinomes in a relational database allows the easy querying of protein kinases by species and/or protein kinase group. In addition, the Kinomer v. 1.0 website includes a web server interface to the previously validated HMM library for the classification of peptide sequences into protein kinase groups. In the future, Kinomer v. 1.0 will be enhanced with the addition of a number of manually annotated kinomes of fungal, metazoan and photosynthetic organisms (Miranda-Saavedra, D., et al., manuscript in preparation). These will include the kinomes of pathogenic fungi of the Rhizopus and Fusarium geni, and the kinomes of several unicellular and multicellular photosynthetic organisms including diatoms, red, brown and green algae, and vascular plants. Thus, Kinomer v. 1.0 is a useful and developing repository of expert and automatically annotated kinomes.

FUNDING

D.M.S. was a Wellcome Trust Prize Student at the University of Dundee. Funding for open access charge: Wellcome Trust.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank Drs Tom Walsh and Jonathan Monk for assistance with computing.

REFERENCES

1
Cohen
P
The regulation of protein function by multisite phosphorylation—a 25 year update
Trends Biochem Sci.
2000
, vol. 
25
 (pg. 
596
-
601
)
2
Cohen
P
The role of protein phosphorylation in human health and disease. The Sir Hans Krebs Medal Lecture
Eur. J. Biochem.
2001
, vol. 
268
 (pg. 
5001
-
5010
)
3
Hanks
SK
Quinn
AM
Hunter
T
The protein kinase family: conserved features and deduced phylogeny of the catalytic domains
Science
1988
, vol. 
241
 (pg. 
42
-
52
)
4
Hanks
SK
Hunter
T
Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification
FASEB J.
1995
, vol. 
9
 (pg. 
576
-
596
)
5
Manning
G
Whyte
DB
Martinez
R
Hunter
T
Sudarsanam
S
The protein kinase complement of the human genome
Science
2002
, vol. 
298
 (pg. 
1912
-
1934
)
6
Miranda-Saavedra
D
Barton
GJ
Classification and functional annotation of eukaryotic protein kinases
Proteins
2007
, vol. 
68
 (pg. 
893
-
914
)
7
Hunter
T
Plowman
GD
The protein kinases of budding yeast: six score and more
Trends Biochem Sci.
1997
, vol. 
22
 (pg. 
18
-
22
)
8
Plowman
GD
Sudarsanam
S
Bingham
J
Whyte
D
Hunter
T
The protein kinases of Caenorhabditis elegans: a model for signal transduction in multicellular organisms
Proc. Natl Acad. Sci. USA
1999
, vol. 
96
 (pg. 
13603
-
13610
)
9
Morrison
DK
Murakami
MS
Cleghon
V
Protein kinases and phosphatases in the Drosophila genome
J. Cell Biol.
2000
, vol. 
150
 (pg. 
F57
-
F62
)
10
Caenepeel
S
Charydczak
G
Sudarsanam
S
Hunter
T
Manning
G
The mouse kinome: discovery and comparative genomics of all mouse protein kinases
Proc. Natl Acad. Sci. USA
2004
, vol. 
101
 (pg. 
11707
-
11712
)
11
Goldberg
JM
Manning
G
Liu
A
Fey
P
Pilcher
KE
Xu
Y
Smith
JL
The dictyostelium kinome—analysis of the protein kinases from a simple model organism
PLoS Genet.
2006
, vol. 
2
 pg. 
e38
 
12
Bradham
CA
Foltz
KR
Beane
WS
Arnone
MI
Rizzo
F
Coffman
JA
Mushegian
A
Goel
M
Morales
J
Geneviere
AM
, et al. 
The sea urchin kinome: a first look
Dev. Biol.
2006
, vol. 
300
 (pg. 
180
-
193
)
13
Eisen
JA
Coyne
RS
Wu
M
Wu
D
Thiagarajan
M
Wortman
JR
Badger
JH
Ren
Q
Amedeo
P
Jones
KM
, et al. 
Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote
PLoS Biol.
2006
, vol. 
4
 pg. 
e286
 
14
Krupa
A
Anamika
Srinivasan
N
Genome-wide comparative analyses of domain organisation of repertoires of protein kinases of Arabidopsis thaliana and Oryza sativa
Gene
2006
, vol. 
380
 (pg. 
1
-
13
)
15
Ward
P
Equinet
L
Packer
J
Doerig
C
Protein kinases of the human malaria parasite Plasmodium falciparum: the kinome of a divergent eukaryote
BMC Genomics
2004
, vol. 
5
 pg. 
79
 
16
Anamika
K
Srinivasan
N
Comparative kinomics of Plasmodium organisms: unity in diversity
Protein Pept. Lett.
2007
, vol. 
14
 (pg. 
509
-
517
)
17
Parsons
M
Worthey
EA
Ward
PN
Mottram
JC
Comparative analysis of the kinomes of three pathogenic trypanosomatids: Leishmania major, Trypanosoma brucei and Trypanosoma cruzi
BMC Genomics
2005
, vol. 
6
 pg. 
127
 
18
Doerig
C
Billker
O
Pratt
D
Endicott
J
Protein kinases as targets for antimalarial intervention: kinomics, structure-based design, transmission-blockade, and targeting host cell enzymes
Biochim. Biophys. Acta
2005
, vol. 
1754
 (pg. 
132
-
150
)
19
Doerig
C
Meijer
L
Antimalarial drug discovery: targeting protein kinases
Expert Opin. Ther. Targets
2007
, vol. 
11
 (pg. 
279
-
290
)
20
Naula
C
Parsons
M
Mottram
JC
Protein kinases as drug targets in trypanosomes and Leishmania
Biochim. Biophys. Acta
2005
, vol. 
1754
 (pg. 
151
-
159
)
21
Nierman
WC
Pain
A
Anderson
MJ
Wortman
JR
Kim
HS
Arroyo
J
Berriman
M
Abe
K
Archer
DB
Bermejo
C
, et al. 
Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus
Nature
2005
, vol. 
438
 (pg. 
1151
-
1156
)
22
Galagan
JE
Calvo
SE
Cuomo
C
Ma
LJ
Wortman
JR
Batzoglou
S
Lee
SI
Basturkmen
M
Spevak
CC
Clutterbuck
J
, et al. 
Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae
Nature
2005
, vol. 
438
 (pg. 
1105
-
1115
)
23
Pel
HJ
de Winde
JH
Archer
DB
Dyer
PS
Hofmann
G
Schaap
PJ
Turner
G
de Vries
RP
Albang
R
Albermann
K
, et al. 
Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88
Nat. Biotechnol.
2007
, vol. 
25
 (pg. 
221
-
231
)
24
Machida
M
Asai
K
Sano
M
Tanaka
T
Kumagai
T
Terai
G
Kusumoto
K
Arima
T
Akita
O
Kashiwagi
Y
, et al. 
Genome sequencing and analysis of Aspergillus oryzae
Nature
2005
, vol. 
438
 (pg. 
1157
-
1161
)
25
Dujon
B
Sherman
D
Fischer
G
Durrens
P
Casaregola
S
Lafontaine
I
De Montigny
J
Marck
C
Neuveglise
C
Talla
E
, et al. 
Genome evolution in yeasts
Nature
2004
, vol. 
430
 (pg. 
35
-
44
)
26
Loftus
BJ
Fung
E
Roncaglia
P
Rowley
D
Amedeo
P
Bruno
D
Vamathevan
J
Miranda
M
Anderson
IJ
Fraser
JA
, et al. 
The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans
Science
2005
, vol. 
307
 (pg. 
1321
-
1324
)
27
Dean
RA
Talbot
NJ
Ebbole
DJ
Farman
ML
Mitchell
TK
Orbach
MJ
Thon
M
Kulkarni
R
Xu
JR
Pan
H
, et al. 
The genome sequence of the rice blast fungus Magnaporthe grisea
Nature
2005
, vol. 
434
 (pg. 
980
-
986
)
28
Galagan
JE
Calvo
SE
Borkovich
KA
Selker
EU
Read
ND
Jaffe
D
FitzHugh
W
Ma
LJ
Smirnov
S
Purcell
S
, et al. 
The genome sequence of the filamentous fungus Neurospora crassa
Nature
2003
, vol. 
422
 (pg. 
859
-
868
)
29
Martinez
D
Larrondo
LF
Putnam
N
Gelpke
MD
Huang
K
Chapman
J
Helfenbein
KG
Ramaiya
P
Detter
JC
Larimer
F
, et al. 
Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78
Nat. Biotechnol.
2004
, vol. 
22
 (pg. 
695
-
700
)
30
Kamper
J
Kahmann
R
Bolker
M
Ma
LJ
Brefort
T
Saville
BJ
Banuett
F
Kronstad
JW
Gold
SE
Muller
O
, et al. 
Insights from the genome of the biotrophic fungal plant pathogen Ustilago maydis
Nature
2006
, vol. 
444
 (pg. 
97
-
101
)
31
Arabidopsis.Genome.Initiative.
Analysis of the genome sequence of the flowering plant Arabidopsis thaliana
Nature
2000
, vol. 
408
 (pg. 
796
-
815
)
32
Matsuzaki
M
Misumi
O
Shin
IT
Maruyama
S
Takahara
M
Miyagishima
SY
Mori
T
Nishida
K
Yagisawa
F
Yoshida
Y
, et al. 
Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D
Nature
2004
, vol. 
428
 (pg. 
653
-
657
)
33
Goff
SA
Ricke
D
Lan
TH
Presting
G
Wang
R
Dunn
M
Glazebrook
J
Sessions
A
Oeller
P
Varma
H
, et al. 
A draft sequence of the rice genome (Oryza sativa L. ssp. japonica)
Science
2002
, vol. 
296
 (pg. 
92
-
100
)
34
Palenik
B
Grimwood
J
Aerts
A
Rouze
P
Salamov
A
Putnam
N
Dupont
C
Jorgensen
R
Derelle
E
Rombauts
S
, et al. 
The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation
Proc. Natl Acad. Sci. USA
2007
, vol. 
104
 (pg. 
7705
-
7710
)
35
Derelle
E
Ferraz
C
Rombauts
S
Rouze
P
Worden
AZ
Robbens
S
Partensky
F
Degroeve
S
Echeynie
S
Cooke
R
, et al. 
Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features
Proc. Natl. Acad. Sci. USA
2006
, vol. 
103
 (pg. 
11647
-
11652
)
36
Tuskan
GA
Difazio
S
Jansson
S
Bohlmann
J
Grigoriev
I
Hellsten
U
Putnam
N
Ralph
S
Rombauts
S
Salamov
A
, et al. 
The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)
Science
2006
, vol. 
313
 (pg. 
1596
-
1604
)
37
Nene
V
Wortman
JR
Lawson
D
Haas
B
Kodira
C
Tu
ZJ
Loftus
B
Xi
Z
Megy
K
Grabherr
M
, et al. 
Genome sequence of Aedes aegypti, a major arbovirus vector
Science
2007
, vol. 
316
 (pg. 
1718
-
1723
)
38
Holt
RA
Subramanian
GM
Halpern
A
Sutton
GG
Charlab
R
Nusskern
DR
Wincker
P
Clark
AG
Ribeiro
JM
Wides
R
, et al. 
The genome sequence of the malaria mosquito Anopheles gambiae
Science
2002
, vol. 
298
 (pg. 
129
-
149
)
39
Xia
Q
Zhou
Z
Lu
C
Cheng
D
Dai
F
Li
B
Zhao
P
Zha
X
Cheng
T
Chai
C
, et al. 
A draft sequence for the genome of the domesticated silkworm (Bombyx mori)
Science
2004
, vol. 
306
 (pg. 
1937
-
1940
)
40
Lindblad-Toh
K
Wade
CM
Mikkelsen
TS
Karlsson
EK
Jaffe
DB
Kamal
M
Clamp
M
Chang
JL
Kulbokas
E.J.
III
Zody
MC
, et al. 
Genome sequence, comparative analysis and haplotype structure of the domestic dog
Nature
2005
, vol. 
438
 (pg. 
803
-
819
)
41
Dehal
P
Satou
Y
Campbell
RK
Chapman
J
Degnan
B
De Tomaso
A
Davidson
B
Di Gregorio
A
Gelpke
M
Goodstein
DM
, et al. 
The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins
Science
2002
, vol. 
298
 (pg. 
2157
-
2167
)
42
International.Chicken.Genome.Sequencing.Consortium.
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution
Nature
2004
, vol. 
432
 (pg. 
695
-
716
)
43
Gibbs
RA
Rogers
J
Katze
MG
Bumgarner
R
Weinstock
GM
Mardis
ER
Remington
KA
Strausberg
RL
Venter
JC
Wilson
RK
, et al. 
Evolutionary and biomedical insights from the rhesus macaque genome
Science
2007
, vol. 
316
 (pg. 
222
-
234
)
44
Mikkelsen
TS
Wakefield
MJ
Aken
B
Amemiya
CT
Chang
JL
Duke
S
Garber
M
Gentles
AJ
Goodstadt
L
Heger
A
, et al. 
Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences
Nature
2007
, vol. 
447
 (pg. 
167
-
177
)
45
Kasahara
M
Naruse
K
Sasaki
S
Nakatani
Y
Qu
W
Ahsan
B
Yamada
T
Nagayasu
Y
Doi
K
Kasai
Y
, et al. 
The medaka draft genome and insights into vertebrate genome evolution
Nature
2007
, vol. 
447
 (pg. 
714
-
719
)
46
Aparicio
S
Chapman
J
Stupka
E
Putnam
N
Chia
JM
Dehal
P
Christoffels
A
Rash
S
Hoon
S
Smit
A
, et al. 
Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes
Science
2002
, vol. 
297
 (pg. 
1301
-
1310
)
47
Jaillon
O
Aury
JM
Brunet
F
Petit
JL
Stange-Thomann
N
Mauceli
E
Bouneau
L
Fischer
C
Ozouf-Costaz
C
Bernot
A
, et al. 
Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype
Nature
2004
, vol. 
431
 (pg. 
946
-
957
)
48
Gibbs
RA
Weinstock
GM
Metzker
ML
Muzny
DM
Sodergren
EJ
Scherer
S
Scott
G
Steffen
D
Worley
KC
Burch
PE
, et al. 
Genome sequence of the Brown Norway rat yields insights into mammalian evolution
Nature
2004
, vol. 
428
 (pg. 
493
-
521
)
49
Chimpanzee.Sequencing.and.Analysis.Consortium.
Initial sequence of the chimpanzee genome and comparison with the human genome
Nature
2005
, vol. 
437
 (pg. 
69
-
87
)
50
Loftus
B
Anderson
I
Davies
R
Alsmark
UC
Samuelson
J
Amedeo
P
Roncaglia
P
Berriman
M
Hirt
RP
Mann
BJ
, et al. 
The genome of the protist parasite Entamoeba histolytica
Nature
2005
, vol. 
433
 (pg. 
865
-
868
)
51
Armbrust
EV
Berges
JA
Bowler
C
Green
BR
Martinez
D
Putnam
NH
Zhou
S
Allen
AE
Apt
KE
Bechner
M
, et al. 
The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism
Science
2004
, vol. 
306
 (pg. 
79
-
86
)
52
Carlton
JM
Hirt
RP
Silva
JC
Delcher
AL
Schatz
M
Zhao
Q
Wortman
JR
Bidwell
SL
Alsmark
UC
Besteiro
S
, et al. 
Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis
Science
2007
, vol. 
315
 (pg. 
207
-
212
)
53
Miranda-Saavedra
D
Stark
MJ
Packer
JC
Vivares
CP
Doerig
C
Barton
GJ
The complement of protein kinases of the microsporidium Encephalitozoon cuniculi in relation to those of Saccharomyces cerevisiae and Schizosaccharomyces pombe
BMC Genomics
2007
, vol. 
8
 pg. 
309
 
54
Eddy
SR
Profile hidden Markov models
Bioinformatics
1998
, vol. 
14
 (pg. 
755
-
763
)
55
Clamp
M
Cuff
J
Searle
SM
Barton
GJ
The Jalview Java alignment editor
Bioinformatics
2004
, vol. 
20
 (pg. 
426
-
427
)
56
Edgar
RC
MUSCLE: a multiple sequence alignment method with reduced time and space complexity
BMC Bioinformatics
2004
, vol. 
5
 pg. 
113
 
57
Katoh
K
Kuma
K
Toh
H
Miyata
T
MAFFT version 5: improvement in accuracy of multiple sequence alignment
Nucleic Acids Res.
2005
, vol. 
33
 (pg. 
511
-
518
)
58
Katoh
K
Kuma
K
Miyata
T
Toh
H
Improvement in the accuracy of multiple sequence alignment program MAFFT
Genome Inform.
2005
, vol. 
16
 (pg. 
22
-
33
)
59
Larkin
MA
Blackshields
G
Brown
NP
Chenna
R
McGettigan
PA
McWilliam
H
Valentin
F
Wallace
IM
Wilm
A
Lopez
R
, et al. 
Clustal W and Clustal X version 2.0
Bioinformatics
2007
, vol. 
23
 (pg. 
2947
-
2948
)
60
Cole
C
Barber
JD
Barton
GJ
The Jpred 3 secondary structure prediction server
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
W197
-
W201
)
61
Cuff
JA
Barton
GJ
Application of multiple sequence alignment profiles to improve protein secondary structure prediction
Proteins
2000
, vol. 
40
 (pg. 
502
-
511
)
62
Rice
P
Longden
I
Bleasby
A
EMBOSS: the European Molecular Biology Open Software Suite
Trends Genet.
2000
, vol. 
16
 (pg. 
276
-
277
)
63
Wheeler
GL
Miranda-Saavedra
D
Barton
GJ
Genome analysis of the unicellular green alga Chlamydomonas reinhardtii Indicates an ancient evolutionary origin for key pattern recognition and cell-signaling protein families
Genetics
2008
, vol. 
179
 (pg. 
193
-
197
)
64
Williams
JG
Zvelebil
M
SH2 domains in plants imply new signalling scenarios
Trends Plant Sci.
2004
, vol. 
9
 (pg. 
161
-
163
)
65
Lanave
C
Colangelo
AM
Saccone
C
Alberghina
L
Molecular evolution of the neurotrophin family members and their Trk receptors
Gene
2007
, vol. 
394
 (pg. 
1
-
12
)
66
Ortutay
C
Valiaho
J
Stenberg
K
Vihinen
M
KinMutBase: a registry of disease-causing mutations in protein kinase domains
Hum. Mutat.
2005
, vol. 
25
 (pg. 
435
-
442
)
67
Grassot
J
Mouchiroud
G
Perriere
G
RTKdb: database of Receptor Tyrosine Kinase
Nucleic Acids Res.
2003
, vol. 
31
 (pg. 
353
-
358
)
68
Niedner
RH
Buzko
OV
Haste
NM
Taylor
A
Gribskov
M
Taylor
SS
Protein kinase resource: an integrated environment for phosphorylation research
Proteins
2006
, vol. 
63
 (pg. 
78
-
86
)
69
Krupa
A
Abhinandan
KR
Srinivasan
N
KinG: a database of protein kinases in genomes
Nucleic Acids Res.
2004
, vol. 
32
 (pg. 
D153
-
D155
)
70
Mi
H
Lazareva-Ulitsky
B
Loo
R
Kejariwal
A
Vandergriff
J
Rabkin
S
Guo
N
Muruganujan
A
Doremieux
O
Campbell
MJ
, et al. 
The PANTHER database of protein families, subfamilies, functions and pathways
Nucleic Acids Res.
2005
, vol. 
33
 (pg. 
D284
-
D288
)

Author notes

Present address: Diego Miranda-Saavedra, Cambridge Institute for Medical Research, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Hills Road, Cambridge CB2 0XY, UK.

The authors wish it to be known that, in their opinion, the first two authors should be regarded as the joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.