Nucleic Acids Research, 2003, Vol. 31, No. 18 5338-5348
© 2003 Oxford University Press
Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes
Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520-8114, USA
*To whom correspondence should be addressed. Tel: +1 203 432 6105; Fax: +1 360 838 7861; Email: mark.gerstein{at}yale.edu
Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences, we have thoroughly studied DNA mutation patterns in the human genome. We analyzed a total of 1726 processed RP pseudogene sequences, comprising more than 700 000 bases. To be sure to differentiate the sequence changes occurring in the functional genes during evolution from those occurring in pseudogenes after they were fixed in the genome, we used only pseudogene sequences originating from parts of RP genes that are identical in human and mouse. Overall, we found that nucleotide transitions are more common than transversions, by roughly a factor of two. Moreover, the substitution rates amongst the 12 possible nucleotide pairs are not homogeneous as they are affected by the type of immediately neighboring nucleotides and the overall local G+C content. Finally, our dataset is large enough that it has many indels, thus allowing for the first time statistically robust analysis of these events. Overall, we found that deletions are about three times more common than insertions (3740 versus 1291). The frequencies of both these events follow characteristic powerlaw behavior associated with the size of the indel. However, unexpectedly, the frequency of 3 bp deletions (in contrast to 3 bp insertions) violates this trend, being considerably higher than that of 2 bp deletions. The possible biological implications of such a 3 bp bias are discussed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Stoltzfus and L. Y. Yampolsky Climbing Mount Probable: Mutation as a Cause of Nonrandomness in Evolution J. Hered., September 1, 2009; 100(5): 637 - 647. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Shang, Y. Tao, X. Chen, Y. Zou, C. Lei, J. Wang, X. Li, X. Zhao, M. Zhang, Z. Lu, et al. Identification of a New Rice Blast Resistance Gene, Pid3, by Genomewide Comparison of Paired Nucleotide-Binding Site-Leucine-Rich Repeat Genes and Their Pseudogene Alleles Between the Two Sequenced Rice Genomes Genetics, August 1, 2009; 182(4): 1303 - 1311. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Fletcher and Z. Yang INDELible: A Flexible Simulator of Biological Sequence Evolution Mol. Biol. Evol., August 1, 2009; 26(8): 1879 - 1888. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Q. Chen, Y. Wu, H. Yang, J. Bergelson, M. Kreitman, and D. Tian Variation in the Ratio of Nucleotide Substitution and Indel Rates across Genomes in Mammals and Bacteria Mol. Biol. Evol., July 1, 2009; 26(7): 1523 - 1531. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Ziolkowski, G. Koczyk, L. Galganski, and J. Sadowski Genome sequence comparison of Col and Ler lines reveals the dynamic nature of Arabidopsis chromosomes Nucleic Acids Res., June 1, 2009; 37(10): 3189 - 3201. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Cartwright Problems and Solutions for Estimating Indel Rates and Length Distributions Mol. Biol. Evol., February 1, 2009; 26(2): 473 - 480. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Y. K. Lam, E. Khurana, G. Fang, P. Cayting, N. Carriero, K.-H. Cheung, and M. B. Gerstein Pseudofam: the pseudogene families database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D738 - D743. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Baele, Y. Van de Peer, and S. Vansteelandt A Model-Based Approach to Study Nearest-Neighbor Influences Reveals Complex Substitution Patterns in Non-coding Sequences Syst Biol, October 1, 2008; 57(5): 675 - 692. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Neuenfeldt, A. Just, H. Betat, and M. Morl Evolution of tRNA nucleotidyltransferases: A small deletion generated CC-adding enzymes PNAS, June 10, 2008; 105(23): 7953 - 7958. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ke, X. H.-F. Zhang, and L. A. Chasin Positive selection acting on splicing motifs reflects compensatory evolution Genome Res., April 1, 2008; 18(4): 533 - 543. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Yang and L. Zhang Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction Nucleic Acids Res., March 1, 2008; 36(5): e33 - e33. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Hernandez, S. H. Williamson, and C. D. Bustamante Context Dependence, Ancestral Misidentification, and Spurious Signatures of Natural Selection Mol. Biol. Evol., August 1, 2007; 24(8): 1792 - 1800. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Brandstrom and H. Ellegren The Genomic Landscape of Short Insertion and Deletion Polymorphisms in the Chicken (Gallus gallus) Genome: A High Frequency of Deletions in Tandem Duplicates Genetics, July 1, 2007; 176(3): 1691 - 1701. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Novik, J. J. Spinelli, A. C. MacArthur, K. Shumansky, P. Sipahimalani, S. Leach, A. Lai, J. M. Connors, R. D. Gascoyne, R. P. Gallagher, et al. Genetic Variation in H2AFX Contributes to Risk of Non-Hodgkin Lymphoma Cancer Epidemiol. Biomarkers Prev., June 1, 2007; 16(6): 1098 - 1106. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Zheng, A. Frankish, R. Baertsch, P. Kapranov, A. Reymond, S. W. Choo, Y. Lu, F. Denoeud, S. E. Antonarakis, M. Snyder, et al. Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution Genome Res., June 1, 2007; 17(6): 839 - 851. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Cartwright Ngila: global pairwise alignments with logarithmic and affine gap costs Bioinformatics, June 1, 2007; 23(11): 1427 - 1428. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. R. Mackwan, G. T. Carver, J. W. Drake, and D. W. Grogan An Unusual Pattern of Spontaneous Mutations Recovered in the Halophilic Archaeon Haloferax volcanii Genetics, May 1, 2007; 176(1): 697 - 702. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. W. Messer and P. F. Arndt The Majority of Recent Short DNA Insertions in the Human Genome Are Tandem Duplications Mol. Biol. Evol., May 1, 2007; 24(5): 1190 - 1197. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Karro, Y. Yan, D. Zheng, Z. Zhang, N. Carriero, P. Cayting, P. Harrrison, and M. Gerstein Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation Nucleic Acids Res., January 12, 2007; 35(suppl_1): D55 - D60. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-C. Chen, C.-J. Chen, W.-H. Li, and T.-J. Chuang Human-specific insertions and deletions inferred from mammalian genome sequences Genome Res., January 1, 2007; 17(1): 16 - 22. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Takahashi and H. Nakashima Negative Correlation of G+C Content at Silent Substitution Sites Between Orthologous Human and Mouse Protein-Coding Sequences DNA Res, January 1, 2006; 13(4): 135 - 140. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Csuros and I. Miklos Statistical Alignment of Retropseudogenes and Their Functional Paralogs Mol. Biol. Evol., December 1, 2005; 22(12): 2457 - 2471. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Balakirev, V. R. Chechetkin, V. V. Lobzin, and F. J. Ayala Entropy and GC Content in the {beta}-esterase Gene Cluster of the Drosophila melanogaster Subgroup Mol. Biol. Evol., October 1, 2005; 22(10): 2063 - 2072. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sinha and E. D. Siggia Sequence Turnover and Tandem Repeats in cis-Regulatory Modules in Drosophila Mol. Biol. Evol., April 1, 2005; 22(4): 874 - 885. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. R. Bhangale, M. J. Rieder, R. J. Livingston, and D. A. Nickerson Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes Hum. Mol. Genet., January 1, 2005; 14(1): 59 - 69. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Touchon, A. Arneodo, Y. d'Aubenton-Carafa, and C. Thermes Transcription-coupled and splicing-coupled strand asymmetries in eukaryotic genomes Nucleic Acids Res., September 23, 2004; 32(17): 4969 - 4978. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. C. W. Goonesekere and B. Lee Frequency of gaps observed in a structurally aligned protein pair database suggests a simple gap penalty function Nucleic Acids Res., May 20, 2004; 32(9): 2838 - 2843. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. I. Castillo-Davis, F. A. Kondrashov, D. L. Hartl, and R. J. Kulathinal The Functional Genomic Distribution of Protein Divergence in Two Animal Phyla: Coevolution, Genomic Conflict, and Constraint Genome Res., May 1, 2004; 14(5): 802 - 811. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Taylor, C. P. Ponting, and R. R. Copley Occurrence and Consequences of Coding Sequence Insertions and Deletions in Mammalian Genomes Genome Res., April 1, 2004; 14(4): 555 - 566. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhang, P. M. Harrison, Y. Liu, and M. Gerstein Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome Genome Res., December 1, 2003; 13(12): 2541 - 2558. [Abstract] [Full Text] [PDF] |
||||










