Skip Navigation



Nucleic Acids Research Advance Access published online on September 12, 2008

Nucleic Acids Research, doi:10.1093/nar/gkn568
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (9616K) Freely available
Right arrow Screen PDF (996K) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
37/suppl_1/D251    most recent
gkn568v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Li, C.-Y.
Right arrow Articles by Uhl, G. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, C.-Y.
Right arrow Articles by Uhl, G. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press 2008
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Database Issue

OKCAM: an ontology-based, human-centered knowledgebase for cell adhesion molecules

Chuan-Yun Li1,2, Qing-Rong Liu1, Ping-Wu Zhang1, Xiao-Mo Li2, Liping Wei2 and George R. Uhl1,*

1Molecular Neurobiology Branch, NIH-IRP (NIDA), Baltimore, MD 21224, USA and 2Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, China

To whom correspondence should be addressed. Tel: +1 410 550 2843 x146; Fax: +1 410 550 1535; Email: guhl{at}intra.nida.nih.gov

Received June 24, 2008. Revised August 18, 2008. Accepted August 21, 2008.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
‘Cell adhesion molecules’ (CAMs) are essential elements of cell/cell communication that are important for proper development and plasticity of a variety of organs and tissues. In the brain, appropriate assembly and tuning of neuronal connections is likely to require appropriate function of many cell adhesion processes. Genetic studies have linked and/or associated CAM variants with psychiatric, neurologic, neoplastic, immunologic and developmental phenotypes. However, despite increasing recognition of their functional and pathological significance, no systematic study has enumerated CAMs or documented their global features. We now report compilation of 496 human CAM genes in six gene families based on manual curation of protein domain structures, Gene Ontology annotations, and 1487 NCBI Entrez annotations. We map these genes onto a cell adhesion molecule ontology that contains 850 terms, up to seven levels of depth and provides a hierarchical description of these molecules and their functions. We develop OKCAM, a CAM knowledgebase that provides ready access to these data and ontologic system at http://okcam.cbi.pku.edu.cn. We identify global CAM properties that include: (i) functional enrichment, (ii) over-represented regulation modes and expression patterns and (iii) relationships to human Mendelian and complex diseases, and discuss the strengths and limitations of these data.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
‘Cell adhesion molecules’ play central roles in much of the connection and communication between cells and their synapses (1). Cell adhesion-related communication is essential for many aspects of the proper development of a variety of organs and tissues (1). This cellular communication also plays substantial roles in the plasticity of cell recognition processes in the developed organism (2).

Cell adhesion molecules (CAM) may be especially important in the brain. The brain requires proper connections of many trillions of synapses to develop properly as well as substantial plasticity in many of these synapses to facilitate learning and memory. The dynamics of neuronal synaptic recognition, connection and disconnection appear to make substantial contributions to disorders that display mnemonic features, including addictions and autism (3,4). Current physiologic and cell biologic studies have implicated CAMs as good candidates to play important roles in synapse adhesion (1,5), neuronal connectivity and communication (1), signal transduction (5–8) and proper arrangement of pre-synaptic active zones and postsynaptic densities at classical synapses (9,10).

Current genetic studies have linked and/or associated variants in cell adhesion molecule genes with psychiatric, neurologic, neoplastic, immunologic and developmental phenotypes. The importance of CAMs in learning and memory-associated disorders is demonstrated in recent genome wide association studies (11). Vulnerabilities to addictions are associated with variants in CAM genes in studies of several independent samples (12–14). Genetic variants of the CAM genes NRXN1 and CNTNAP2 have been associated with autism (4,15). Variants in neuregulin have been associated with vulnerability to schizophrenia (16). Variants in an adhesion-like protein KIAA0319 have been associated with dyslexia (17,18).

These data underscore the importance of cell adhesion molecules in both Mendelian and complex disorders of brain and other organs and suggest that a more comprehensive view of these genes and molecules would be valuable. However, there is currently no systematic study that enumerates: (i) the number of genes and gene families that function as CAMs; (ii) common and/or global CAM functions, including those that might extend beyond their cell/cell recognition functions; (iii) common CAM genetic variants that might provide individual differences in CAM structures and functions; (iv) over-represented regulation modes and expression patterns and (v) CAM associations with diseases, especially with brain disorders.

We now report compilation of a list of 496 human CAM genes and construction of corresponding cell adhesion molecule ontology (CAMO) to systematically address these questions. Detailed annotations on CAM genes are provided. Global properties of CAM genes, overrepresented types of variation, overrepresented regulation modes and expression patterns, and disease associations are identified. We report a knowledgebase for cell adhesion molecules (OKCAM) that provides ready access to these data and the associated ontologic system that we describe here.


    IDENTIFICATION OF HUMAN CAM GENES AND RODENT HOMOLOGS
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
CAMs were identified based on compilation of data from manual curation of protein domain structures, Gene Ontology annotations, and 1487 annotation entries from keyword queries based on NCBI Entrez Gene annotations (Figure 1). First, we identified features of common protein domains for CAM families based on common motifs from cadherin, immunoglobulin/FibronectinIII (IgFn), integrin, neurexin, neuroligin and catenin families. Using these features, we developed Perl scripts to retrieve and standardize related InterPro domain architectures and the proteins that contain such architectures (19). After manual curation, 44 types of protein domains with 202 detailed domain architectures were identified. These included 532 human proteins that map onto 218 human genes. We used similar protocols to identify cell adhesion gene lists for rat and mouse; these genes were then further mapped to the human genome using Homologene (20). We next extracted CAMs using the Gene Ontology term ‘cell adhesion’ (GO:0007155) (21). We focus on curated entries; entries that are identified only by annotations that display Evidence Code IEA (Inferred from Electronic Annotation) are noted in Supplementary Table 7. Two hundred eighteen human proteins were identified, which mapped onto 196 human genes. Finally, we manually curated 1487 annotation entries selected from results of the Entrez Gene query ‘adhesion AND Homo sapiens [organism]’ (20). This approach added 136 more human genes to the list of cell adhesion molecules. In total, we thus identified 496 unique human CAM genes and their homologs in other species.


Figure 1
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1 Collection of Human CAMs. CAMs were compiled by integrating Gene Ontology annotations, domain structure information and keywords query against NCBI Entrez Gene annotations. Four hundred and ninety-six unique human genes were identified as CAMs (additional genes that may also function in this way are identified in the supplement).

 
Meta-data about the domain architectures for CAMs in nonhuman species provided information about CAM evolutionary histories. Of the 113 types of protein domains assessed in our dataset, 705 detailed domain architectures were noted. Among these, only 44 domains with 202 domain architectures were identified in all of the three species, human, rat and mouse. For example, in the cadherin superfamily, there is only one human gene encodes a protein with enzymatic activity, though several dozen cadherins with enzymatic activities are found in bacteria and yeast. Several categories with large numbers of domain architectures that can be detected in lower species including Caenorhabditis elegans, Drosophila melanogaster and Danio rerio, are totally absent from human, rat and/or mouse. These categories include ‘IgCAM-like cadherins’ that display 29 such domain architectures, ‘cadherins with Leucine-rich structures’ that display two such domain architectures, ‘toxin-related cadherins’ that display such 36 domain architectures and ‘cadherins with surface anchor structures’ that display seven such domain architectures. In striking contrast, 119 of the 123 ‘cadherin’ genes that can be identified in humans fall into the category of ‘simple cadherins’, that includes genes with only simple combinations of cadherin prodomains, cadherin domains and cadherin cytoplasmic domains. Although 79%, not all, of the proteins that we identify in this study display characteristic InterPro domains, the domain architecture patterns we identify do imply the specification of the CAMs in mammals.


    DATA ANNOTATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
To elucidate the functions of CAMs, detailed annotations were given to each CAM gene. These data allow interpretation of features of each CAM at five levels: gene family and basic information, genetics, regulation, expression, and Mendelian or complex disease linkage/association.

Information about gene family and basic characterization comes from NCBI Entrez gene annotations (20), Gene Ontology (21), InterPro domains (19), protein interaction databases (22–24), knowledgebases for molecular pathways including KEGG (25), BioCarta and Pathway Interaction Database (PID) and the NCBI PubMed database (20). Genetic variations in these genes, including chromosome recombination hotspots (26), SNPs (20), insertion/deletions (27), chromosomal translocations (27) and CNVs (27), were retrieved from the UCSC Genome Browser Database (26), HapMap (28), NCBI dbSNP database (20) and Database of Genomic Variants (27), respectively. Information about potential or actual modes of regulation was annotated based on the presence of experimentally validated transcription factor binding sites (TFBS) (29), experimentally validated (30,31) and putative miRNA targets (32), noncoding RNA loci (33), cis/trans-natural antisense transcripts (NATs) (34,35), alternative splicing and post-translational modifications (36) from databases that included TransFac (29), Argonaute (31), TarBase (30), PicTar (32), NatsDB (34,35), NONCODE (33) and dbPTM (36). Information about mRNA expression levels came from: (i) integrated human expressed sequence tag profiles based on developmental stages and tissue distributions, as deposited in Unigene (20) and (ii) mouse brain region expression profiles described in the Allen Brain Atlas (37), with mapping of these data to human orthologs using Homologene (20). We integrated gene expression information at peptide/protein levels by collecting expressed proteins and peptides deposited in the PRIDE database (38). To assess potential disease linkages or associations, we integrated OMIM (20) and genome-wide association datasets (39), from public data deposited in the Genetic Association Database (39) and an additional 12 in-house genome wide association datasets.

Full descriptions of the annotation statistics are provided in Table 1. These annotations, extending from genome to post-translational modification, provide a novel avenue for studies of the global properties of CAM genes, overrepresented types of variation, overrepresented regulation modes and expression patterns, and disease associations, as we discuss in the following sections.


View this table:
[in this window]
[in a new window]

 
Table 1 Annotations for CAM genes

 

    CONSTRUCTION OF A CAMO
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
We iteratively organized the information and knowledge for CAMs to construct a novel CAMO. CAMO was constructed as a directed acyclic graph (DAG) using DAG-Edit (40) to input, manage and update data, as shown in the screenshot (Supplementary Figure 1). We annotated each term with name, definition and source references. We added its relationship to other terms based on manual reviews of domain architecture and functional annotations at the five levels noted above.

If vertices represent terms and the relationships between terms are represented by edges, the terms in a DAG can be connected via a directed graph without cycles. CAMO thus provides a hierarchical description of functions and properties of CAMs with five top-level categories: CAM gene families, CAM genetics, CAM regulation, CAM expression and CAM diseases. Each top-level term is further divided into several categories to describe the functions in detail (Figure 2). In toto, CAMO has 850 terms with up to seven levels of depth. We mapped the 496 human genes that function in cell adhesion onto CAMO, providing a novel systematic description of CAMs (Figure 2). CAMO thus provides more specific, complete and resolved information about CAMs to scientists, especially to neuroscientists, than is available in general-purposed ontologies such as MeSH (41) and Gene Ontology (21).


Figure 2
View larger version (45K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2 Structure of CAMO. CAMO provides a hierarchical description of functions and properties of CAMs with five top-level categories (A): CAM expression (B), CAM diseases (C), CAM genetics (D), CAM gene families (E) and CAM regulation (F). Each top-level term is further divided into several categories that allow more detailed functional descriptions.

 

    OKCAM WEB INTERFACE DESIGN
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
We developed a PostgreSQL database termed ‘OKCAM (Ontology-based Knowledgebase for Cell Adhesion Molecules)’ to manage the CAM gene list, annotations and ‘CAMO’. We implemented a web-based user interface of this database that uses PHP and PHP/SQL query scripts. Cross-references to key external databases were included to integrate functional information about CAM genes. These external databases provide annotations for CAM gene families, CAM genetics and genomics, CAM regulation modes and expression patterns, and relationships between CAMs and human diseases (Figure 3).


Figure 3
View larger version (52K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3 Structure of OKCAM Web Server. Several interactive browsing options were implemented to facilitate user queries of OKCAM. These include ontology overview (A), full gene list overview (B), chromosomal overview (C), text search (D) and BLAST search (D). Each interactive browsing interface returns CAM gene/gene lists that meet query requirements (F). Users can then obtain further detailed annotations mentioned above by clicking on gene names (G). A download page makes all data, database schema and PostgreSQL commands available (E).

 
The information for each CAM gene is integrated and presented in a single graphical web page. For example, the OKCAM entry page for cadherin 1 (CDH1) (http://okcam.cbi.pku.edu.cn/entry-info.php?id=999) shows that CDH1 is located on chromosome 16 in a chromosome region that contains a recombination hotspot, copy number variations and insersion/deletions (‘CAM genetics information’). CDH1 transcripts are relatively highly expressed in adult (‘developmental stage’), mammary gland (‘tissue distribution’) and cerebral cortex (‘brain region’). Translation products are also expressed in placenta/blood serum (‘protein expression’). CDH1 is implicated in neoplasia by genomewide association studies and OMIM annotations (‘CAM disease’). Potential CDH1 regulatory modes include alternative splicing regulation, cis-NATs regulation, miRNA regulation as well as post-translational modifications (‘CAM regulation’). Links to the original databases and other resources facilitate information tracing.

We implemented four interactive browsing options in OKCAM to facilitate user queries. Users can browse cell adhesion genes by ‘CAMO’, displayed as hierarchical trees on the homepage. They can zoom in on a particular branch of the ontology by clicking the ‘+’ sign to expand the branch. For example, a user interested in ‘psychiatric disorders’ may expand this category, focus on ‘drug addiction’ and see the 49 CAM genes currently mapped on this term by clicking the number that follows this term (Figures 2 and 3). A ‘Chromosomal Overview’ browser supports browsing the CAM genes by clicks on chromosomal locations marked by ‘+++’ (Figure 3). A text search interface facilitates database queries that use either gene IDs or names. A fourth interface supports sequence searching based on BLAST nucleotide and amino acid sequence similarities. Each interactive browsing interface returns CAM gene/gene lists that meet query requirements. Users can then obtain further detailed annotation by clicking on the gene name (Figure 3). A download page makes all data, database schema and PostgreSQL commands available at http://okcam.cbi.pku.edu.cn/download.php.


    APPLICATIONS OF OKCAM
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
The comprehensive annotations and ontology system of OKCAM facilitate studies of the global properties of the CAM genes, overrepresented types of variation, overrepresented regulation modes and expression patterns, and disease associations.


    GLOBAL FEATURES OF CAMS
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
CAMs in our dataset were annotated using Gene Ontology (GO) (21) and the pathway databases KEGG (25), BioCarta and Pathway Interaction Database (PID). We can thus identify significantly enriched Gene Ontology terms and pathways using DAVID (42) and KOBAS (43,44), respectively. We selected the functional categories that were more likely to be biologically meaningful by calculating the statistical significance of each functional category in the input set of genes versus all annotated genes in the human genome. There was statistically significant enrichment for CAM genes in 16 ‘molecular function’ terms (Supplementary Table 1), 11 ‘subcellular localization’ terms (Supplementary Table 2) and 45 ‘biological processes’ terms (Supplementary Table 3), when compared to corresponding data for the whole genome.

Identification of functional enrichment for several of the ‘molecular function’ and ‘subcellular localization’ terms is reassuring. This identification provides relatively little additional information, however, since CAMs do function as ‘adhesion molecules’. Most are well documented to sit within (or be anchored to) plasma membranes. However, there is also significant enrichment for other molecular functions that might not have been so readily anticipated, including calcium binding, protein kinase, and protein phosphatase activities (Supplementary Tables 1 and 4). The significant overrepresentation of CAM localizations within receptor complexes and extracellular matrix is also of interest (Supplementary Table 2). It is interesting that the CAMs identified in this work are overrepresented in not only ‘cell adhesion’ but also in biological processes that include signal transduction, responses to external stimuli, cell motility, migration, and nervous system development (Supplementary Table 3). Reassuringly, the molecular pathway enrichment analyses that used each of the three different pathway databases provided results that implicated their roles in largely similar functional pathways (Supplementary Table 5).

Data from OKCAM annotations for protein interactions allowed us to develop a molecular network based on proteins that could interact with the CAMs identified here (Supplementary Figure 2). As for other established biological networks (45,46), the connectivity distribution of the network that we nominated in this way appears to follow scale-free rules. CAMs appear to interact with each other to form a relatively tight ‘core’ that interrelates with hundreds of other signal transduction genes. Focus on the ‘hub nodes’ in this apparent network (Supplementary Figure 2) may even help to elucidate novel CAM roles in signal transduction that come from its partnerships with other signaling molecules.


    CAM REGULATORY MODES
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Mapping the CAMs in our dataset onto CAMO and detailed gene structural/regulatory terms allows us to identify specific potential regulatory modes for these CAMs. We can then perform Monte Carlo analyses to test whether these structural/regulatory modes are overrepresented among CAMs. On human genomic level, both recombination ‘hotspots’ (Monte Carlo P = 0.024) and copy number variations (Monte Carlo P < 0.0001) are over-represented in chromosome regions that contain CAM genes. Indeed, ‘cell adhesion molecule’ is the GO category that is most enriched in the genes that overlap with 1447 copy number variants identified using Affymetrix 500 K and whole genome TilePath (WGTP) reagents (47). There is a more modest but still significant 1.42-fold enrichment for CAM genes in chromosomal regions that contain both copy number variations and recombination hotspots (P = 0.07). By contrast, we detected no significant difference for the densities of single nucleotide polymorphisms (SNP) distributions in chromosomal regions that contain CAM genes versus the whole genome (P > 0.5).

When we tested potential overrepresentation of transcriptional regulatory modes using hypergeometric tests, we found that the potential for miRNA regulation was significantly enriched for CAM genes when compared to the whole genome (P < 0.0001). In contrast, no over-represented transcription factor regulation for CAM genes were detected using either low scale experimentally validated (P = 0.37) or ChIP-chip data (P = 0.51). There was no significant over- or under-representation of CAMs among genes involved in either cis- or trans-NAT (35) regulation (P > 0.5 for each).

We can also seek overrepresentation of CAM alternative splicing by compiling the alternative splicing isoforms for each human gene mapped on CAMO and plotting the distributions of the numbers of isoforms for (i) CAMs versus (ii) all human genes (Supplementary Figure 3). The overall distributions appear similar. However, genes that utilize a wealth of alternative transcripts, those that encode ~40–50 alternatively spliced isoforms, are over-represented in the dataset that encodes CAMs. These genes provide an apparently distinct ‘peak’ in the distribution curve (Supplementary Figure 3). This analysis agrees with our previous work that has characterized multiple alternative splicing events in specific addiction-associated CAMs (13).

We integrated post-translational modification (PTM) data to identify possible contributions of this regulatory mode to CAM functions. On the basis of the experimentally validated PTM data deposited in dbPTM, the 496 CAM genes are candidates for involvement in glycosylation (334 genes), phosphorylation (114 genes), amidation (22 genes), palmitoylation (eight genes), methylation (three genes), farnesylation (two genes), myristoylation (two genes), sulfation (one gene) and acetylation (one gene). There is a highly significant enrichment for CAM N-linked glycosylation (331 genes, P < 0.0001), but not for O-linked glycosylation (10 genes). No significant over- or under-representation was detected for other modes of post translational modification.

On the basis of the OKCAM annotations and CAMO, we identified a list of regulatory modes for cell adhesion molecules. These analyses identified both expected and unexpected CAM regulatory modes. First, the data document the overrepresentation of CNVs within CAM genes, in ways that were suggested in even some of the initial descriptions of CNVs (48). Documenting a 1.4-fold enrichment for CAM genes in chromosomal regions that contain both copy number variations and recombination hotspots both supports these initial observations and provides a possible mechanism for the abundance of CNVs in CAM genes. Secondly, although many papers have described many alternative splicing isoforms for CAMs, it was somewhat surprising to note that the largest diversity of alternative transcripts (e.g. ~40–50) was selectively over-represented among CAM genes.


    CAM EXPRESSION PATTERNS
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Integration of data from human expressed sequence tags (EST) derived from brain libraries and mouse brain atlas expression profiles provided strong levels of agreement that support use of this comparative approach (Supplementary Table 6). We thus analyzed CAM expression patterns and levels in 17 mouse brain regions, based on Allen Brain Atlas profiles from murine brains. For each brain region, we used the program R to plot the density curves that illustrate the frequency distributions of expression levels for (i) CAMs and (ii) all human genes expressed in this brain region (Supplementary Figure 4). For 16 of the 17 brain regions, the expression distribution curves for the two datasets merged. In these brain regions, CAM genes taken as a group appear to be expressed in ways that are not markedly different from those of other brain-expressed genes. However, in the cerebral cortex, CAM genes with the highest expression levels appear to be over-represented. There is thus an additional peak in the CAM distribution curve that is not found when all other genes are examined (Supplementary Figure 4). While much prior data documents expression of many CAMs in cerebral cortex, the specificity of the relatively richer expression of CAMs in this brain region provides a novel observation.


    CAM DISEASE ASSOCIATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
We assessed potential relationships between CAM variants and disease using data from OMIM, public GWAS data and our in-house datasets. These data nominate 167 human CAMs as likely to contain variants that could contribute to individual differences in vulnerability to disorders in brain and a variety of other organs (Figure 4). CAMs were identified by association and/or linkage findings in disorders of the nervous system (91 genes), immune system (30 genes), metabolism (29 genes), cardiovascular system (28 genes), skin and connective tissues (26 genes), musculoskeletal system (25 genes) and hyperplasia and/or tumors (23 genes). When assessed in relation to specific disorders or narrower classes of disorders, there were relatively large numbers of cell adhesion molecules implicated in substance dependence (49 genes), Alzheimer's disease (42 genes), tumors (21 genes), heart disease (20 genes), bipolar disorder (18 genes), autoimmune diseases (19 genes) and diabetes mellitus (17 genes). The number of CAMs whose variants are tentatively implicated in nervous system phenotypes is larger than anticipated by chance (Figure 4). The distribution of findings in other disorders is similar to that displayed by all genes, when comparing data from either OMIM or GWA datasets.


Figure 4
View larger version (22K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4 Distribution of CAM in OMIM and GWA. OMIM, GWA and/or our in-house GWA data implicates variants in at least 167 (of the 496) CAM genes in various diseases. Data from OMIM shares disease distribution patterns with that from GWA studies.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
‘Cell adhesion molecules’ are increasingly recognized as ‘cell adhesion receptors’, since many of their functions are just ‘cell glue’ but rather are more consistent with roles in cell–cell and cell–matrix interactions and in molecular recognition events that transduce signals. The computational approaches that we use here to define and characterize a universe of ‘cell adhesion’ molecules provide both expected and unexpected results. These results should be assessed in light of the strengths and limitations of the approaches used here, and the strengths and limitations of the underlying datasets employed for these analyses. We also discuss details of the strengths and limitations of these data in Supplementary Text 1.

We have attempted to provide as comprehensive a list of human CAM genes, annotations and ontology-based CAM knowledgebase as possible. However, it is clear that there will be rapid progress in the study of these molecules and of cell adhesion mechanisms. The OKCAM database provides means for integrating new data and updating knowledge, in ways that should facilitate better and better understanding of the global and specific CAM properties. As CAM genomic features regulatory modes, expression patterns and disease associations become clearer, we thus hope that OKCAM should become even more comprehensive and useful.


    SUPPLEMENTARY DATA
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Supplementary Data are available at NAR online.


    FUNDING
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
National Institutes of Health Intramural Research Program (NIDA), NIH grants P50CA/DA84718; China Scholarship Council (C.Y.L.); China National High-tech 863 Programs (2006AA02A312, 2006AA02Z334); 973 Programs (2007CB946904). Funding for open access charge: NIDA/IRP grants P50CA/DA84718.

Conflict of interest statement. None declared.


    ACKNOWLEDGEMENTS
 
We thank Drs T. Drgon, A. Hishimoto, Y. Zhang and X. Yu for insightful suggestions. We are grateful to Shuqi Zhao, Xizeng Mao, Zhi-Yu Peng and Qi-Yao Li for assistance with OKCAM Web Server development.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 IDENTIFICATION OF HUMAN CAM...
 DATA ANNOTATIONS
 CONSTRUCTION OF A CAMO
 OKCAM WEB INTERFACE DESIGN
 APPLICATIONS OF OKCAM
 GLOBAL FEATURES OF CAMs
 CAM REGULATORY MODES
 CAM EXPRESSION PATTERNS
 CAM DISEASE ASSOCIATIONS
 DISCUSSION
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 

  1. Yamada S, Nelson WJ. Synapses: sites of cell recognition, adhesion, and functional specification. Annu. Rev. Biochem. (2007) 76:267–294.[CrossRef][Web of Science][Medline]

  2. Takeichi M, Abe K. Synaptic contact dynamics controlled by cadherin and catenins. Trends Cell. Biol. (2005) 15:216–221.[CrossRef][Web of Science][Medline]

  3. Hishimoto A, Liu QR, Drgon T, Pletnikova O, Walther D, Zhu XG, Troncoso JC, Uhl GR. Neurexin 3 polymorphisms are associated with alcohol dependence and altered expression of specific isoforms. Hum. Mol. Genet. (2007) 16:2880–2891.[Abstract/Free Full Text]

  4. Kim HG, Kishikawa S, Higgins AW, Seong IS, Donovan DJ, Shen Y, Lally E, Weiss LA, Najm J, Kutsche K, et al. Disruption of neurexin 1 associated with autism spectrum disorder. Am. J. Hum. Genet. (2008) 82:199–207.[CrossRef][Web of Science][Medline]

  5. Shapiro L, Love J, Colman DR. Adhesion molecules in the nervous system: structural insights into function and diversity. Annu. Rev. Neurosci. (2007) 30:451–474.[CrossRef][Web of Science][Medline]

  6. Stoker AW. Protein tyrosine phosphatases and signalling. J. Endocrinol. (2005) 185:19–33.[Abstract/Free Full Text]

  7. Salinas PC, Price SR. Cadherins and catenins in synapse development. Curr. Opin. Neurobiol. (2005) 15:73–80.[CrossRef][Web of Science][Medline]

  8. Hirano S, Suzuki ST, Redies C. The cadherin superfamily in neural development: diversity, function and interaction with other molecules. Front. Biosci. (2003) 8:d306–355.[Web of Science][Medline]

  9. Song JY, Ichtchenko K, Sudhof TC, Brose N. Neuroligin 1 is a postsynaptic cell-adhesion molecule of excitatory synapses. Proc. Natl Acad. Sci. USA (1999) 96:1100–1105.[Abstract/Free Full Text]

  10. Dityatev A, Dityateva G, Schachner M. Synaptic strength as a function of post- versus presynaptic expression of the neural cell adhesion molecule NCAM. Neuron (2000) 26:207–217.[CrossRef][Web of Science][Medline]

  11. Butcher LM, Meaburn E, Dale PS, Sham P, Schalkwyk LC, Craig IW, Plomin R. Association analysis of mild mental impairment using DNA pooling to screen 432 brain-expressed single-nucleotide polymorphisms. Mol. Psychiatry (2005) 10:384–392.[CrossRef][Web of Science][Medline]

  12. Johnson C, Drgon T, Liu QR, Walther D, Edenberg H, Rice J, Foroud T, Uhl GR. Pooled association genome scanning for alcohol dependence using 104,268 SNPs: validation and use to identify alcoholism vulnerability loci in unrelated individuals from the collaborative study on the genetics of alcoholism. Am. J. Med. Genet. B Neuropsychiatr. Genet. (2006) 141B:844–853.

  13. Liu QR, Drgon T, Johnson C, Walther D, Hess J, Uhl GR. Addiction molecular genetics: 639,401 SNP whole genome association identifies many ‘cell adhesion’ genes. Am. J. Med. Genet. B Neuropsychiatr. Genet. (2006) 141:918–925.[Medline]

  14. Uhl GR, Liu QR, Drgon T, Johnson C, Walther D, Rose JE, David SP, Niaura R, Lerman C. Molecular genetics of successful smoking cessation: convergent genome-wide association study results. Arch. Gen. Psychiatry (2008) 65:683–693.[Abstract/Free Full Text]

  15. Arking DE, Cutler DJ, Brune CW, Teslovich TM, West K, Ikeda M, Rea A, Guy M, Lin S, Cook EH, et al. A common genetic variant in the neurexin superfamily member CNTNAP2 increases familial risk of autism. Am. J. Hum. Genet. (2008) 82:160–164.[CrossRef][Web of Science][Medline]

  16. Munafo MR, Attwood AS, Flint J. Neuregulin 1 genotype and schizophrenia. Schizophr. Bull. (2008) 34:9–12.[Abstract/Free Full Text]

  17. Velayos-Baeza A, Toma C, da Roza S, Paracchini S, Monaco AP. Alternative splicing in the dyslexia-associated gene KIAA0319. Mamm. Genome (2007) 18:627–634.[CrossRef][Medline]

  18. Paracchini S, Thomas A, Castro S, Lai C, Paramasivam M, Wang Y, Keating BJ, Taylor JM, Hacking DF, Scerri T, et al. The chromosome 6p22 haplotype associated with dyslexia reduces the expression of KIAA0319, a novel gene involved in neuronal migration. Hum. Mol. Genet. (2006) 15:1659–1666.[Abstract/Free Full Text]

  19. Mulder N, Apweiler R. InterPro and InterProScan: Tools for Protein Sequence Classification and Comparison. Methods Mol. Biol. (2007) 396:59–70.[Medline]

  20. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. (2008) 36:D13–D21.[Abstract/Free Full Text]

  21. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. (2000) 25:25–29.[CrossRef][Web of Science][Medline]

  22. Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bahler J, Wood V, et al. The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. (2008) 36:D637–640.[Abstract/Free Full Text]

  23. Bader GD, Betel D, Hogue CW. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. (2003) 31:248–250.[Abstract/Free Full Text]

  24. Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, et al. Human protein reference database–2006 update. Nucleic Acids Res. (2006) 34:D411–414.[Abstract/Free Full Text]

  25. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. (2008) 36:D480–D484.[Abstract/Free Full Text]

  26. Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, et al. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. (2008) 36:D773–779.[Abstract/Free Full Text]

  27. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. Detection of large-scale variation in the human genome. Nat. Genet. (2004) 36:949–951.[CrossRef][Web of Science][Medline]

  28. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature (2007) 449:851–861.[CrossRef][Web of Science][Medline]

  29. Wingender E, Dietze P, Karas H, Knuppel R. TRANSFAC: a database on transcription factors and their DNA binding sites.10.1093/nar/24.1.238. Nucleic Acids Res. (1996) 24:238–241.[Abstract/Free Full Text]

  30. Sethupathy P, Corda B, Hatzigeorgiou AG. TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA (2006) 12:192–197.[Abstract/Free Full Text]

  31. Shahi P, Loukianiouk S, Bohne-Lang A, Kenzelmann M, Kuffer S, Maertens S, Eils R, Grone HJ, Gretz N, Brors B. Argonaute–a database for gene regulation by mammalian microRNAs. Nucleic Acids Res. (2006) 34:D115–D118.[Abstract/Free Full Text]

  32. Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, et al. Combinatorial microRNA target predictions. Nat. Genet. (2005) 37:495–500.[CrossRef][Web of Science][Medline]

  33. He S, Liu C, Skogerbo G, Zhao H, Wang J, Liu T, Bai B, Zhao Y, Chen R. NONCODE v2.0: decoding the non-coding. Nucleic Acids Res. (2008) 36:D170–D172.[Abstract/Free Full Text]

  34. Zhang Y, Li J, Kong L, Gao G, Liu QR, Wei L. NATsDB: Natural Antisense Transcripts DataBase. Nucleic Acids Res. (2007) 35:D156–D161.[Abstract/Free Full Text]

  35. Zhang Y, Liu XS, Liu QR, Wei L. Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species. Nucleic Acids Res. (2006) 34:3465–3475.[Abstract/Free Full Text]

  36. Lee TY, Huang HD, Hung JH, Huang HY, Yang YS, Wang TH. dbPTM: an information repository of protein post-translational modification. Nucleic Acids Res. (2006) 34:D622–D627.[Abstract/Free Full Text]

  37. Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, Boe AF, Boguski MS, Brockway KS, Byrnes EJ, et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature (2007) 445:168–176.[CrossRef][Web of Science][Medline]

  38. Jones P, Cote RG, Martens L, Quinn AF, Taylor CF, Derache W, Hermjakob H, Apweiler R. PRIDE: a public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Res. (2006) 34:D659–D663.[Abstract/Free Full Text]

  39. Lin BK, Clyne M, Walsh M, Gomez O, Yu W, Gwinn M, Khoury MJ. Tracking the epidemiology of human genes in the literature: the HuGE Published Literature database. Am. J. Epidemiol. (2006) 164:1–4.[Abstract/Free Full Text]

  40. Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C. Relations in biomedical ontologies. Genome Biol. (2005) 6:R46.[CrossRef][Medline]

  41. Lipscomb CE. Medical Subject Headings (MeSH). Bull. Med. Libr. Assoc. (2000) 88:265–266.[Web of Science][Medline]

  42. Huang da W, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, Stephens R, Baseler MW, Lane HC, et al. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. (2007) 35:W169–W175.[Abstract/Free Full Text]

  43. Wu J, Mao X, Cai T, Luo J, Wei L. KOBAS server: a web-based platform for automated annotation and pathway identification. Nucleic Acids Res. (2006) 34:W720–W724.[Abstract/Free Full Text]

  44. Mao X, Cai T, Olyarchuk JG, Wei L. Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics (2005) 21:3787–3793.[Abstract/Free Full Text]

  45. Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. (2005) 272:5129–5148.[CrossRef][Medline]

  46. Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature (2004) 430:88–93.[CrossRef][Web of Science][Medline]

  47. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, et al. Global variation in copy number in the human genome. Nature (2006) 444:444–454.[CrossRef][Web of Science][Medline]

  48. Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature (2008) 451:998–1003.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (9616K) Freely available
Right arrow Screen PDF (996K) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
37/suppl_1/D251    most recent
gkn568v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Li, C.-Y.
Right arrow Articles by Uhl, G. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, C.-Y.
Right arrow Articles by Uhl, G. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?