Skip Navigation


Nucleic Acids Research Advance Access originally published online on May 30, 2007
Nucleic Acids Research 2007 35(Web Server issue):W132-W136; doi:10.1093/nar/gkm392
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (9237K) Freely available
Right arrow Screen PDF (870K) Freely available
Right arrowOA All Versions of this Article:
35/suppl_2/W132    most recent
gkm392v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Angellotti, M. C.
Right arrow Articles by Wan, X.-F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Angellotti, M. C.
Right arrow Articles by Wan, X.-F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2007, Vol. 35, No. suppl_2 W132-W136
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Articles

CodonO: codon usage bias analysis within and across genomes

Michael C. Angellotti, Shafquat B. Bhuiyan, Guorong Chen and Xiu-Feng Wan*

Systems Biology Laboratory, Department of Microbiology, Miami University, Oxford, OH 45056, USA

*To whom correspondence should be addressed. Tel: +1-513-529-0426; Fax: +1-513-529-2431; Email: wanhenry{at}yahoo.com

Received January 31, 2007. Revised April 4, 2007. Accepted May 1, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 ACCESS
 CONCLUSIONS
 REFERENCES
 
Synonymous codon usage biases are associated with various biological factors, such as gene expression level, gene length, gene translation initiation signal, protein amino acid composition, protein structure, tRNA abundance, mutation frequency and patterns, and GC compositions. Quantification of codon usage bias helps understand evolution of living organisms. A codon usage bias pipeline is demanding for codon usage bias analyses within and across genomes. Here we present a CodonO webserver service as a user-friendly tool for codon usage bias analyses across and within genomes in real time. The webserver is available at http//www.sysbiology.org/CodonO. Contact: wanhenry{at}yahoo.com.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 ACCESS
 CONCLUSIONS
 REFERENCES
 
Within the standard genetic codes, all amino acids except Met and Trp are coded by more than one codon, which are called synonymous codons. DNA sequence data from diverse organisms clearly show that synonymous codons for any amino acid are not used with equal frequency, and these biases are as the consequence of natural selection during evolution. Extensive studies have shown that synonymous codon usage biases are associated with various biological factors, such as gene expression level, gene length, gene translation initiation signal, protein amino acid composition, protein structure, tRNA abundance, mutation frequency and patterns, and GC compositions (1–11). Quantification of codon usage bias, especially at genomic scale, helps understand evolution of living organisms.

Many different approaches have been developed in the past few decades. These methods may be grouped into two categories: (i) methods based on the statistical distribution, such as codon-usage preference bias measure (CPS) based on {chi}2 (12) and scaled {chi}2 analyses (13); (ii) methods using a group of gene sequences as reference, which can be ‘optimal codons’ [e.g. codon bias index (14)], a defined set of highly expressed genes [e.g. codon preference statistics (15) and codon adaptation index (16)], a defined gene class [e.g. Codon Bias (7)], or all genes in the entire genome [e.g. the Shannon Information Method (17)]. Most of existing computational approaches are only suitable for the comparison of codon usage bias within a single genome. In order to overcome these limitations, we developed a new informatics method based on Shannon informational theory, referred to as synonymous codon usage order (SCUO), which enables a measurement of synonymous codon usage bias within and across genomes (3,12). The review and comparison of SCUO and current available methods are detailed in Wan et al. (18). Several computational software packages or webservers, for instance, CodonW (http://bioweb.pasteur.fr/seqanal/interfaces/codonw.html) and JCAT (19), have been developed to measure Codon Adaptation Index (CAI) for genes. JCAT also integrates intrinsic terminators and enzyme digestion sites into their analyses.

Codon usage analyses within and across genomes will facilitate the understanding of evolution and environmental adaptation of living organisms. GC compositions have been shown to drive codon and amino-acid usages thus affect codon usage bias (20). Thus, it will be critical to study the correlation between GC compositions and codon usage bias. Previously, we have developed an analytical model to quantify synonymous codon usage bias by GC compositions based on SCUO (11). However, it is still laborious to perform codon usage analyses within and across genomes based on our knowledge, there is not any available tool designed for these purposes. The CodonO webserver described here is a pipeline for codon usage bias analyses within and across genomic sequences as well as a tool for studying the correlation between codon usage bias and GC compositions, especially for microbial species. Different from the standalone CodonO we developed earlier (10,11,18), CodonO webserver has the following additional functions: (i) besides allowing the users to compare their submissions, it connects genomic database and perform analyses in real time; (ii) it can be used to study the correlation between SCUO and GC compositions; (iii) it performs statistical comparison of SCUO within and across genomes; (iv) besides SCUO values, it extracts and displays codon usage frequency table as well as the gene attribute for each gene from the genomic database; and (v) it provides a user-friendly interface.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 ACCESS
 CONCLUSIONS
 REFERENCES
 
Synonymous codon usage order measurement
CodonO webserver employs the synonymous codon usage order (SCUO) measurement as the method to calculate synonymous codon usage biases. The details about the SCUO concept and method have been described previously (10,11,18). Simply, we calculate the entropy of the i-th amino acid in a sequence

Formula
Where 1 <= i <= 18, j is the codon for the i-th amino acid, 1 <= j <= 6 for leucine, 1 <= j <= 2 for tyrosine, etc. If the synonymous codons for the i-th amino acid were used at random, one would expect a uniform distribution of them as representatives for the i-th amino acid. Thus, the maximum entropy for the i-th amino acid in each sequence is

Formula

Thus, we can calculate SCUO for the i-th amino acid in each sequence.

Formula

Then the average SCUO for each sequence can be represented to summarize the SCUO from each amino acid.

Formula

The SCUO represents the synonymous codon usage bias for the entire sequence, and j is the codon for the i-th amino acid. Thus, 0 <= SCUO <= 1, and a larger SCUO denotes a higher codon usage bias in the sequence.

Statistical methods
CodonO webserver can perform codon usage bias analyses within genomes using Tukey statistical analysis (21) and across genomes using Wilcoxon Two Sample Test (22). Tukey statistical analysis is a simple and powerful method for estimating outliers for a population, which can be either a normal distribution or a non-normal distribution. We adapted the percentile calculation from JMP method (SAS, Inc., Cary, NC USA).

Formula
where n is the number of data points; IR is the integer part of R while FR is the fraction part of R. Then,

q-th percentile = IR-th observation + FR[(IR + 1)-th observation IR-th observation]

The Tukey outliers are genes with SCUO values less than Q1 – 1.5IQR or greater than Q3 + 1.5IQR, where IQR represent Interquartile range. IQR is the difference between 75th percentile and 25th percentile SCUO.

The Wilcoxon Two Sample Test (22) is utilized to test null hypothesis that the distributions of SCUO from two groups of sequences (e.g. genomes) are the same. The Wilcoxon Two Sample Test is a sensitive test in two groups even their values are not Normal distributed.

Features
As shown in Figure 1, CodonO server is directly connected and updated with GenBank genomic database daily. The user can define and select one or multiple genomes for analyses at the same time. The users can upload their own datasets as well. The underlying computations include synonymous codon usage order (SCUO) and GC composition measurements, and the latter includes GC, GC1s, GC2s and GC3s, where GC is the overall GC composition, GC1s is the GC composition at the first site of a codon, GC2s is the GC composition at the second site of a codon, and GC3s is the GC composition at the third site of a codon. The results will be plotted in a two-dimensional graph, by which the clients can visualize and compare the results. The webserver can display the results for multiple genomes in the same plots, by which, the users can analyse the two dimensional differences (GC/GC1s/GC2s/GC3s versus SCUO) between genes within and across genomes (Figure 2A) (11). Generally, a very low or very high GC composition is associated with a large codon usage bias. It has been shown that codon usage bias in some bacteria and archaea were affected by GC composition and environment condition (e.g. temperature) (23). Thus, the users can perform these types of analyses based on their own preferences.


Figure 1
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Simplified CodonO webserver infrastructure.

 

Figure 2
Figure 2
View larger version (135K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. (A) Visualization of the correlation between synonymous codon usage bias and GC compositions; (B) Visualization of synonymous codon usage bias for each gene in a specific genome; (C) Statistical analysis of synonymous codon usage bias.

 
As mentioned in the ‘Statistical and methods’ section, the webserver can identify the outliers for a genome or a group of sequences based on Tukey statistical analysis (21). The clients can pick and select the ‘outlier’ from the plot and find associated information for each codon and annotation information of a specific gene (Figure 2B), in which the outliers are marked in different color from the other members in the SCUO population. To compare the statistical analyses across genomes, the CodonO webserver applys the Wilcoxon Two Sample Test (22) to compare whether the SCUO populations are the same or not between different genomes. The P-values from statistical comparison between genomes are listed in table (Figure 2C), and a P-value less than 0.05 informs a significant difference between two SCUO populations compared.

Implementation
The programs in this solution package are written in C/C++ or Java. The shell scripts are written in korn shell script in order to achieve high performance. GNUPlot is used for visualization. Cascading style sheets (CSS) are used for a consistent look across the pages. This also enables to change the overall design just by replacing the CSS definition file. PHP has been used as server side scripting and is written in C. In order to achieve high performance for computing in a genomic scale, we apply hash function or a binary tree, which enables that the codon usage analyses have a time complexity of O(nlog(n)) or O(n). The webservers have also designed special functions targeting the security and concurrency issues.


    ACCESS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 ACCESS
 CONCLUSIONS
 REFERENCES
 
CodonO has been tested on Microsoft Internet Explorer, Netscape and Mozilla Firefox. The users need JavaScript to obtain full function of CodonO server. The webserver is available at http//www.sysbiology.org/CodonO/. This webserver can be run in a real time manner. The users can compare the maximum of 16 genomes for comparative analyses at the same time.


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 ACCESS
 CONCLUSIONS
 REFERENCES
 
In summary, CodonO webserver has three major computational features for codon usage bias analyses: (i) it calculates the codon usage bias for one or more genomes; (ii) it compares and visualizes the correlation between codon usage bias and GC compositions; (iii) it performs statistical analyses for codon usage bias within and across genomes. Thus, CodonO provides an efficient user friendly web service for codon usage bias analyses across and within genomes using SCUO in real time.


    ACKNOWLEDGEMENTS
 
We are grateful to Dr Steven Hutcheson from University of Maryland for his critical suggestion. Funding to pay the Open Access publication charges for this article was provided by the start-up funds of Miami University.

Conflict of interest statement. None declared.


    Footnotes
 
Present address: Xiu-Feng Wan, Molecular Virology and Vaccine Branch, Influenza Division, CDC, Atlanta, GA 30333, USA


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 ACCESS
 CONCLUSIONS
 REFERENCES
 

  1. Bains W. Codon distribution in vertebrate genes may be used to predict gene length. J. Mol. Biol (1987) 197:379–388.[CrossRef][ISI][Medline]

  2. D’Onofrio G, Ghosh TC, Bernardi G. The base composition of the genes is correlated with the secondary structures of the encoded proteins. Gene (2002) 300:179–187.[CrossRef][ISI][Medline]

  3. Bernardi G, Bernardi G. Compositional constraints and genome evolution. J. Mol. Evol (1986) 24:1–11.[CrossRef][ISI][Medline]

  4. Gouy M, Gautier C. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res (1982) 10:7055–7074.[Abstract/Free Full Text]

  5. Gu W, Zhou T, Ma J, Sun X, Lu Z. The relationship between synonymous codon usage and protein structure in Escherichia coli and Homo sapiens. Biosystems (2004) 73:89–97.[CrossRef][ISI][Medline]

  6. Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol (1981) 151:389–409.[CrossRef][ISI][Medline]

  7. Karlin S, Mrazek J. What drives codon choices in human genes? J. Mol. Biol (1996) 262:459–472.[CrossRef][ISI][Medline]

  8. Lobry JR, Gautier C. Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. Nucleic Acids Res (1994) 22:3174–3180.[Abstract/Free Full Text]

  9. Ma J, Campbell A, Karlin S. Correlations between Shine-Dalgarno sequences and gene features such as predicted expression levels and operon structures. J. Bacteriol (2002) 184:5733–5745.[Abstract/Free Full Text]

  10. Wan XF, Xu D, Zhou J. Intelligent Engineering Systems Through Artificial Neural Networks—Dagli, ed. (2003) 13. New York: ASME Press. 1101–1118.

  11. Wan XF, Xu D, Kleinhofs A, Zhou J. Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes. BMC Evol. Biol (2004) 4:19.[CrossRef][Medline]

  12. McLachlan AD, Staden R, Boswell DR. A method for measuring the non-random bias of a codon usage table. Nucleic Acids Res (1984) 12:9567–9575.[Abstract/Free Full Text]

  13. Shields DC, Sharp PM. Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res (1987) 15:8023–8040.[Abstract/Free Full Text]

  14. Bennetzen JL, Hall BD. Codon selection in yeast. J. Biol. Chem (1982) 257:3026–3031.[Abstract/Free Full Text]

  15. Gribskov M, Devereux J, Burgess RR. The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression. Nucleic Acids Res (1984) 12:539–549.[Abstract/Free Full Text]

  16. Sharp PM, Li WH. The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res (1987) 15:1281–1295.[Abstract/Free Full Text]

  17. Zeeberg B. Shannon information theoretic computation of synonymous codon usage biases in coding regions of human and mouse genomes. Genome Res (2002) 12:944–955.[Abstract/Free Full Text]

  18. Wan XF, Xu D, Zhou J. CodonO: a new informatics method measuring synonymous codon usage bias. Int. J. General Syst (2006) 35:109–125.[CrossRef]

  19. Grote A, Hiller K, Scheer M, Munch R, Nortemann B, Hempel DC, Jahn D. JCat: a novel tool to adapt codon usage of a target gene to its potential expression host. Nucleic Acids Res (2005) 33:W526–W531.[Abstract/Free Full Text]

  20. Knight RD, Freeland SJ, Landweber LF. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol (2001) 2. RESEARCH0010.

  21. Tukey JW. Exploratory Data Analysis (1977) Addison-Wesley Publishing Company, Inc.

  22. Wilcoxon F. Individual comparisons by ranking methods. Biometrics (1945) 1:80–83.[CrossRef][ISI]

  23. Lynn DJ, Singer GA, Hickey DA. Synonymous codon usage is subject to selection in thermophilic bacteria. Nucleic Acids Res (2002) 30:4272–4277.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (9237K) Freely available
Right arrow Screen PDF (870K) Freely available
Right arrowOA All Versions of this Article:
35/suppl_2/W132    most recent
gkm392v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Angellotti, M. C.
Right arrow Articles by Wan, X.-F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Angellotti, M. C.
Right arrow Articles by Wan, X.-F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?