Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (16K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (50)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Nakamura, Y.
Right arrow Articles by Ikemura, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nakamura, Y.
Right arrow Articles by Ikemura, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 334-334  


Codon usage tabulated from the international DNA sequence databases
Acknowledgements
References


Codon usage tabulated from the international DNA sequence databases

Codon usage tabulated from the international DNA sequence databases

Yasukazu Nakamura*, Takashi Gojobori1, Toshimichi Ikemura1

Laboratory of Gene Structure 2, Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292, Japan and 1National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411, Japan

Received October 6, 1997; Accepted October 8, 1997

ABSTRACT

CUTG (codon usage tabulated from GenBank) is a comprehensive database for codon usage. The codon usage for each full-length protein gene has been calculated using the nucleotide sequence obtained from GenBank sequence database. The sum of the codon use of each organism has been also calculated. The data files can be obtained from anonymous ftp sites of DDBJ, DISC and EBI. The list of codonusage of genes in organisms was made searchableby name of organism through a web site http://www.dna.affrc.go.jp/~nakamura/CUTG.html The compilation is synchronized with major release of GenBank.

CUTG consists of lists of the codon usage of genes and the sum of codon use for each organism. In September 1997, CUTG contained 155 623 genes for 6048 organisms. The database has been compiled using the nucleotide sequence obtained from the latest major release of GenBank sequence database (1). The divisions which have been used are pri (primate), rod (rodent), mam (other mammalian), vrt (other vertebrate), inv (invertebrate), pln (plant), bct (bacterial), vrl (viral) and phg (phage). Other divisions that do not represent taxonomical collection (such as sts for STS or syn for synthesized sequences) have been excluded from the compilation. In selecting protein coding sequences we relied on the feature tables of GenBank flat file. Codons that contain one or more letter for ambiguous bases were excluded from the count. Partially sequenced protein genes were not compiled.

Files of the database are available by anonymous ftp. Files named gb***.codon, where the `***' is a division name in GenBank (e.g. bct), list the codon use in each gene registered in the GenBank flat files. An entry for a gene has two lines. The first line consists of following information separated by backslash which is extracted from feature table for defining each CDS (protein coding sequence). If a LOCUS contains more than one gene, the symbol # followed by a number is added after the LOCUS name; the numbers represent the order of the CDS registered in the feature table. The second line consists of the count of codons in the CDS. The order of the codons in the table is the same as in the previous compilation (2) and described in the CODON_LABEL file.

To show the characteristics of codon use of a wide range of species, as well as viruses and organella, the codon use in each organism was summed up. Files named gb***.spsum list the sum of numbers of codon use in each organism. An entry for an organism has two lines. The first line consists of latin name of the organism and number of CDS used in summing up. The second line consists of the sum of codons for the organism and its order is the same as gb***.codon files.

The complete form of the database is available from the following URLs:

(i) DDBJ (DNA Data Bank of Japan, National Institute of Genetics, Mishima Japan) ftp://ftp.nig.ac.jp/pub/db/codon/current/

(ii) DISC (DNA Information and Stock Center, National Institute of Agrobiological Resources, Tsukuba, Japan) ftp://ftp.dna.affrc.go.jp/pub/codon/current/

(iii) EBI (European Bioinformatics Institute, Cambridge, UK) ftp://ftp.ebi.ac.uk/pub/databases/cutg/

Split file for each organism is made searchable by latin name of the organism through a site http://www.dna.affrc.go.jp/~nakamura/CUTG.html . Comments on the database can be sent to cutg@lab.nig.ac.jp

ACKNOWLEDGEMENTS

We wish to thank Dr Y.Ugawa at the DNA Information and Stock Center, National Institute of Agrobiological Resources for help in constructing and distributing the database. This work was supported in part by a grant-in-aid for databases from the Ministry of Education, Science, Sports and Culture of Japan. Y.N. is supported by the Kazusa DNA Research Institute Foundation.

REFERENCES

1. Benson,D.A., Boguski,M.S., Lipman,D.J. and Ostell,J. (1997) Nucleic Acids Res. 25, 1-6 [see also this issue (1998) Nucleic Acids Res. 26, 1-7].

2. Nakamura,Y., Gojobori,T. and Ikemura,T. (1997) Nucleic Acids Res. 25, 244-245. MEDLINE Abstract


*To whom correspondence should be addressed. Tel: +81 438 52 3935; Fax: +81 438 52 3934; Email: ynakamu@kazusa.or.jp


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals Comments and feedback: www-admin{at}oup.co.uk
Last modification: 17 Dec 1997
Copyright© Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
R. Boyce, P. Chilana, and T. M. Rose
iCODEHOP: a new interactive program for designing COnsensus-DEgenerate Hybrid Oligonucleotide Primers from multiply aligned protein sequences
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W222 - W228.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. Castillo, L. E. Eguiarte, and V. Souza
A genomic population genetics analysis of the pathogenic enterocyte effacement island in Escherichia coli: The search for the unit of selection
PNAS, February 1, 2005; 102(5): 1542 - 1547.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
A. Mai-Prochnow, F. Evans, D. Dalisay-Saludes, S. Stelzer, S. Egan, S. James, J. S. Webb, and S. Kjelleberg
Biofilm Development and Cell Death in the Marine Bacterium Pseudoalteromonas tunicata
Appl. Envir. Microbiol., June 1, 2004; 70(6): 3232 - 3238.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
D. Kadouri, S. Burdman, E. Jurkevitch, and Y. Okon
Identification and Isolation of Genes Involved in Poly({beta}-Hydroxybutyrate) Biosynthesis in Azospirillum brasilense and Characterization of a phbC Mutant
Appl. Envir. Microbiol., June 1, 2002; 68(6): 2943 - 2949.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
B. Zeeberg
Shannon Information Theoretic Computation of Synonymous Codon Usage Biases in Coding Regions of Human and Mouse Genomes
Genome Res., June 1, 2002; 12(6): 944 - 955.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. B. Carlini, Y. Chen, and W. Stephan
The Relationship Between Third-Codon Position Nucleotide Content, Codon Bias, mRNA Secondary Structure and Gene Expression in the Drosophilid Alcohol Dehydrogenase Genes Adh and Adhr
Genetics, October 1, 2001; 159(2): 623 - 633.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
S. Drouault, G. Corthier, S. D. Ehrlich, and P. Renault
Expression of the Staphylococcus hyicus Lipase in Lactococcus lactis
Appl. Envir. Microbiol., February 1, 2000; 66(2): 588 - 598.
[Abstract] [Full Text]


Home page
Infect. Immun.Home page
A. Sabat, K. Kosowska, K. Poulsen, A. Kasprowicz, A. Sekowska, B. van den Burg, J. Travis, and J. Potempa
Two Allelic Forms of the Aureolysin Gene (aur) within Staphylococcus aureus
Infect. Immun., February 1, 2000; 68(2): 973 - 976.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
I. Wierzbicka-Patynowski, S. Niewiarowski, C. Marcinkiewicz, J. J. Calvete, M. M. Marcinkiewicz, and M. A. McLane
Structural Requirements of Echistatin for the Recognition of alpha vbeta 3 and alpha 5beta 1 Integrins
J. Biol. Chem., December 31, 1999; 274(53): 37809 - 37814.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (16K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (50)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Nakamura, Y.
Right arrow Articles by Ikemura, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nakamura, Y.
Right arrow Articles by Ikemura, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?