Theoretical and Applied Genetics

, Volume 112, Issue 1, pp 72–84 | Cite as

Characteristics of the tomato nuclear genome as determined by sequencing undermethylated EcoRI digested fragments

  • Y. Wang
  • R. S. van der Hoeven
  • R. Nielsen
  • L. A. Mueller
  • S. D. Tanksley
Original Paper


A collection of 9,990 single-pass nuclear genomic sequences, corresponding to 5 Mb of tomato DNA, were obtained using methylation filtration (MF) strategy and reduced to 7,053 unique undermethylated genomic islands (UGIs) distributed as follows: (1) 59% non-coding sequences, (2) 28% coding sequences, (3) 12% transposons—96% of which are class I retroelements, and (4) 1% organellar sequences integrated into the nuclear genome over the past approximately 100 million years. A more detailed analysis of coding UGIs indicates that the unmethylated portion of tomato genes extends as far as 676 bp upstream and 766 bp downstream of coding regions with an average of 174 and 171 bp, respectively. Based on the analysis of the UGI copy distribution, the undermethylated portion of the tomato genome is determined to account for the majority of the unmethylated genes in the genome and is estimated to constitute 61±15 Mb of DNA (~5% of the entire genome)—which is significantly less than the 220 Mb estimated for gene-rich euchromatic arms of the tomato genome. This result indicates that, while most genes reside in the euchromatin, a significant portion of euchromatin is methylated in the intergenic spacer regions. Implications of the results for sequencing the genome of tomato and other solanaceous species are discussed.


  1. Adams KL, Qiu YL, Stoutemyer M, Palmer JD (2002) Punctuated evolution of mitochondrial gene content: high and variable rates of mitochondrial gene loss and transfer to the nucleus during angiosperm evolution. Proc Natl Acad Sci USA 99:9905–9912CrossRefPubMedPubMedCentralGoogle Scholar
  2. Antequera F, Bird AP (1988) Unmethylated CpG islands associated with genes in higher-plant DNA. Embo J 7:2295–2299PubMedPubMedCentralGoogle Scholar
  3. Arumuganathan K, Slattery JP, Tanksley SD, Earle ED (1991) Preparation and flow cytometric analysis of metaphase chromosomes of tomato. Theor Appl Genet 82:101–111CrossRefPubMedGoogle Scholar
  4. Ashapkin VV, Kutueva LI, Vanyushin BF (2002) The gene for domains rearranged methyltransferase (DRM2) in Arabidopsis thaliana plants is methylated at both cytosine and adenine residues. Febs Lett 532:367–372CrossRefPubMedGoogle Scholar
  5. Ayliffe MA, Scott NS, Timmis JN (1998) Analysis of plastid DNA-like sequences within the nuclear genomes of higher plants. Mol Biol Evol 15:738–745CrossRefPubMedGoogle Scholar
  6. Bedell JA, Budiman MA, Nunberg A, Citek RW, Robbins D, Jones J, Flick E, Rohlfing T, Fries J, Bradford K, McMenamy J, Smith M, Holeman H, Roe BA, Wiley G, Korf IF, Rabinowicz PD, Lakey N, McCombie WR, Jeddeloh JA, Martienssen RA (2005) Sorghum genome sequencing by methylation filtration. Plos Biology 3:103–115CrossRefGoogle Scholar
  7. Bennetzen JL, Schrick K, Springer PS, Brown WE, Sanmiguel P (1994) Active maize genes are unmodified and flanked by diverse classes of modified highly repetitive DNA. Genome 37:565–576CrossRefPubMedGoogle Scholar
  8. Bernatzky R, Tanksley SD (1986) Toward a saturated linkage map in tomato based on isozymes and random cDNA Sequences. Genetics 112:887–898PubMedPubMedCentralGoogle Scholar
  9. Bird A (2002) DNA methylation patterns and epigenetic memory. Genes Dev 16:6–21CrossRefPubMedGoogle Scholar
  10. Borodovsky M, McIninch J (1993) GeneMark: parallel gene recognition for both DNA strands. Comput Chem 17:123–133CrossRefGoogle Scholar
  11. Budiman MA, Mao L, Wood TC, Wing RA (2000) A deep-coverage tomato BAC library and prospects toward development of an STC framework for genome sequencing. Genome Res 10:129–136PubMedPubMedCentralGoogle Scholar
  12. Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94CrossRefPubMedGoogle Scholar
  13. Burr B, Burr FA, Thompson KH, Albertson MC, Stuber CW (1988) Gene-mapping with recombinant inbreds in maize. Genetics 118:519–526PubMedPubMedCentralGoogle Scholar
  14. Cao XF, Jacobsen SE (2002) Locus-specific control of asymmetric and CpNpG methylation by the DRM and CMT3 methyltransferase genes. Proc Natl Acad Sci USA 99:16491–16498CrossRefPubMedPubMedCentralGoogle Scholar
  15. Celniker SE, Rubin GM (2003) The Drosophila melanogaster genome. Annu Rev Genomics Hum Genet 4:89–117CrossRefPubMedGoogle Scholar
  16. Chen R, Bouck JB, Weinstock GM, Gibbs RA (2001) Comparing vertebrate whole-genome shotgun reads to the human genome. Genome Res 11:1807–1816PubMedPubMedCentralGoogle Scholar
  17. de Jong JH (1998) High resolution FISH reveals the molecular and chromosomal organisation of repetitive sequences in tomato. Cytogenet Cell Genet 81:104–104Google Scholar
  18. Doganlar S, Frary A, Daunay MC, Lester RN, Tanksley SD (2002) A comparative genetic linkage map of eggplant (Solanum melongena) and its implications for genome evolution in the Solanaceae. Genetics 161:1697–1711PubMedPubMedCentralGoogle Scholar
  19. Ewing B, Green P (1998) Base-calling of automated sequencer traces using Phred I.accuracy assessment. Genome Res 8:186–194CrossRefPubMedGoogle Scholar
  20. Fojtova M, Van Houdt H, Depicker A, Kovarik A (2003) Epigenetic switch from posttranscriptional to transcriptional silencing is correlated with promoter hypermethylation. Plant Physiol 133:1240–1250CrossRefPubMedPubMedCentralGoogle Scholar
  21. Fulton TM, Van der Hoeven R, Eannetta NT, Tanksley SD (2002) Identification analysis and utilization of conserved ortholog set markers for comparative genomics in higher plants. Plant Cell 14:1457–1467CrossRefPubMedPubMedCentralGoogle Scholar
  22. Ganal MW, Lapitan NLV, Tanksley SD (1988) A molecular and cytogenetic survey of major repeated DNA-sequences in tomato (Lycopersicon esculentum). Mol Gen Genet 213:262–268CrossRefGoogle Scholar
  23. Goff SA, Ricke D, Lan TH, Presting G, Wang RL, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchinson D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong JP, Miguel T, Paszkowski U, Zhang SP, Colbert M, Sun WL, Chen LL, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu YS, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp japonica). Science 296:92–100CrossRefPubMedGoogle Scholar
  24. Gottschalk W (1954) Die chromosomenstruktur der Solanaceen unter Berucksichtigung phylogenetischer Fragestellungen. Chromosoma 6:539–626CrossRefPubMedGoogle Scholar
  25. Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9:868–877CrossRefPubMedPubMedCentralGoogle Scholar
  26. Huang CY, Ayliffe MA, Timmis JN (2004) Simple and complex nuclear loci created by newly transferred chloroplast DNA in tobacco. Proc Natl Acad Sci USA 101:9710–9715CrossRefPubMedPubMedCentralGoogle Scholar
  27. Jacobsen SE, Meyerowitz EM (1997) Hypermethylated SUPERMAN epigenetic alleles in Arabidopsis. Science 277:1100–1103CrossRefPubMedGoogle Scholar
  28. Jasencakova Z, Soppe WJJ, Meister A, Gernand D, Turner BM, Schubert I (2003) Histone modifications in Arabidopsis—high methylation of H3 lysine 9 is dispensable for constitutive heterochromatin. Plant J 33:471–480CrossRefPubMedGoogle Scholar
  29. Johnson LM, Cao XF, Jacobsen SE (2002) Interplay between two epigenetic marks: DNA methylation and histone H3 lysine 9 methylation. Curr Biol 12:1360–1367CrossRefPubMedGoogle Scholar
  30. Kakes P (1973) Chromosome number of Cochlearia pyrenaica DC near Moresnet (Belgium). Acta Botanica Neerlandica 22:206–208CrossRefGoogle Scholar
  31. Kiss T, Szkukalek A, Solymosy F (1989) Nucleotide sequence of a 17S (18S) ribosomal RNA gene from tomato. Nucleic Acids Res 17:2127–2127CrossRefPubMedPubMedCentralGoogle Scholar
  32. Korf I, Yandell M, Bedell J (2003) Blast. O’Reilly & Associates, Inc. Sebastopol, pp 357Google Scholar
  33. Kulikova O, Gualtieri G, Geurts R, Kim D-J, Cook D, Huguet T, de Jong JH, Fransz PF, Bisseling T (2001) Integration of the FISH pachytene and genetic maps of Medicago truncatula. Plant J 27:49–58CrossRefPubMedGoogle Scholar
  34. Lee DY, Teyssier C, Strahl BD, Stallcup MR (2005) Role of protein methylation in regulation of transcription. Endocr Rev 26:147–170CrossRefPubMedGoogle Scholar
  35. Lippman Z, Gendrel AV, Black M, Vaughn MW, Dedhia N, McCombie WR, Lavine K, Mittal V, May B, Kasschau KD, Carrington JC, Doerge RW, Colot V, Martienssen R (2004) Role of transposable elements in heterochromatin and epigenetic control. Nature 430:471–476CrossRefPubMedGoogle Scholar
  36. Majoros WH, Pertea M, Antonescu C, Salzberg SL (2003) GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders. Nucleic Acids Res 31:3601–3604CrossRefPubMedPubMedCentralGoogle Scholar
  37. Mao L, Begum D, Goff SA, Wing RA (2001) Sequence and analysis of the tomato JOINTLESS locus. Plant Physiol 126:1331–1340CrossRefPubMedPubMedCentralGoogle Scholar
  38. Martienssen R (1998) Transposons DNA methylation and gene control. Trends Genet 14:263–264CrossRefPubMedGoogle Scholar
  39. Martienssen RA, Rabinowicz PD, O’Shaughnessy A, McCombie WR (2004) Sequencing the maize genome. Curr Opin Plant Biol 7:102–107CrossRefPubMedGoogle Scholar
  40. Martin GB, Brommonschenkel SH, Chunwongse J, Frary A, Ganal MW, Spivey R, Wu TY, Earle ED, Tanksley SD (1993) Map-based cloning of a protein-kinase gene conferring disease resistance in tomato. Science 262:1432–1436CrossRefPubMedGoogle Scholar
  41. McClelland M (1983) The frequency and distribution of methylatable DNA sequences in leguminous plant protein coding genes. J Mol Evol 19:346–354CrossRefPubMedGoogle Scholar
  42. Messeguer R, Ganal MW, Steffens JC, Tanksley SD (1991) Characterization of the level, target sites and inheritance of cytosine methylation in tomato nuclear-DNA. Plant Mol Biol 16:753–770CrossRefPubMedGoogle Scholar
  43. Nick H, Bowen B, Ferl RJ, Gilnert W (1986) Detection of cytosine methylation in the maize alcohol dehydrogenase gene by genomic sequencing. Nature 319:243–246CrossRefGoogle Scholar
  44. Palmer LE, Rabinowicz PD, O’Shaughnessy AL, Balija VS, Nascimento LU, Dike S, de la Bastide M, Martienssen RA, McCombie WR (2003) Maize genome sequencing by methylation filtrations. Science 302:2115–2117CrossRefPubMedGoogle Scholar
  45. Panstruga R, Buschges R, Piffanelli P, Schulze-Lefert P (1998) A contiguous 60 kb genomic stretch from barley reveals molecular evidence for gene islands in a monocot genome. Nucleic Acids Res 26:1056–1062CrossRefPubMedPubMedCentralGoogle Scholar
  46. Paran I, van der Voort JR, Lefebvre V, Jahn M, Landry L, van Schriek M, Tanyolac B, Caranta C, Ben Chaim A, Livingstone K, Palloix A, Peleman J (2004) An integrated genetic linkage map of pepper (Capsicum spp.). Mol Breed 13:251–261CrossRefGoogle Scholar
  47. Peterson DG, Pearson WR, Stack SM (1998) Characterization of the tomato (Lycopersicon esculentum) genome using in vitro and in situ DNA reassociation. Genome 41:346–356CrossRefGoogle Scholar
  48. Peterson DG, Wessler SR, Paterson AH (2002) Efficient capture of unique sequences from eukaryotic genomes. Trends Genet 18:547–550CrossRefPubMedGoogle Scholar
  49. Pichersky E, Logsdon JM, McGrath JM, Stasys RA (1991) Fragments of plastid DNA in the nuclear genome of tomato—prevalence, chromosomal location, and possible mechanism of integration. Mol Gen Genet 225:453–458CrossRefPubMedGoogle Scholar
  50. Pop M, Phillippy A, Delcher AL, Salzberg SL (2004) Comparative genome assembly. Brief Bioinform 5:237–248CrossRefPubMedGoogle Scholar
  51. Qu LH, Meng Q, Zhou H, Chen YQ (2001) Identification of 10 novel snoRNA gene clusters from Arabidopsis thaliana. Nucleic Acids Res 29:1623–1630CrossRefPubMedGoogle Scholar
  52. Rabinowicz PD, Schutz K, Dedhia N, Yordan C, Parnell LD, Stein L, McCombie WR, Martienssen RA (1999) Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome. Nature Genet 23:305–308CrossRefPubMedGoogle Scholar
  53. Rabinowicz PD, Palmer LE, May BP, Hemann MT, Lowe SW, McCombie WR, Martienssen RA (2003) Genes and transposons are differentially methylated in plants, but not in mammals. Genome Res 13:2658–2664CrossRefPubMedPubMedCentralGoogle Scholar
  54. Raleigh EA, Murray NE, Revel H, Blumenthal RM, Westaway D, Reith AD, Rigby PW, Elhai J, Hanahan D (1988) McrA and McrB restriction phenotypes of some E coli strains and implications for gene cloning. Nucleic Acids Res 16:1563–1575CrossRefPubMedPubMedCentralGoogle Scholar
  55. Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10:516–522CrossRefPubMedPubMedCentralGoogle Scholar
  56. Salzberg S, Delcher A, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26:544–548CrossRefPubMedPubMedCentralGoogle Scholar
  57. Shahmuradov IA, Akbarova YY, Solovyev VV, Aliyev JA (2003) Abundance of plastid DNA insertions in nuclear genomes of rice and Arabidopsis. Plant Mol Biol 52:923–934CrossRefPubMedGoogle Scholar
  58. Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsubayashi T, Zaita N, Chunwongse J, Obokata J, Yamaguchishinozaki K, Ohto C, Torazawa K, Meng BY, Sugita M, Deno H, Kamogashira T, Yamada K, Kusuda J, Takaiwa F, Kato A, Tohdoh N, Shimada H, Sugiura M (1986) The complete nucleotide-sequence of the tobacco chloroplast genome - its gene organization and expression. Embo J 5:2043–2049PubMedPubMedCentralGoogle Scholar
  59. Sutherland E, Coe L, Raleigh EA (1992) McrBC: a multisubunit GTP-dependent restriction endonuclease. J Mol Biol 225:327–348CrossRefPubMedGoogle Scholar
  60. Steimer A, Schob H, Grossniklaus U (2004) Epigenetic control of plant development: new layers of complexity. Curr Opin Plant Biol 7:11–19CrossRefPubMedGoogle Scholar
  61. The Arabidopsis genome initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815CrossRefGoogle Scholar
  62. Tikhonov AP, SanMiguel PJ, Nakajima Y, Gorenstein NM, Bennetzen JL, Avramova Z (1999) Colinearity and its exceptions in orthologous Adh regions of maize and sorghum. Proc Natl Acad Sci USA 96:7409–7414CrossRefPubMedPubMedCentralGoogle Scholar
  63. Timmis JN, Ayliffe MA, Huang CY, Martin W (2004) Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes. Nat Rev Genet 5:123–135CrossRefPubMedGoogle Scholar
  64. Tran RK, Henikoff JG, Zilberman D, Ditt RF, Jacobsen SE, Henikoff S (2005) DNA methylation profiling identifies CG methylation clusters Arabidopsis genes. Curr Biol 15:154–159CrossRefPubMedGoogle Scholar
  65. Unseld M, Marienfeld JR, Brandt P, Brennicke A (1997) The mitochondrial genome of Arabidopsis thaliana contains 57 genes in 366,924 nucleotides. Nature Genet 15:57–61CrossRefPubMedGoogle Scholar
  66. Valarik M, Bartos J, Kovarova P, Kubalakova M, de Jong JH, Dolezel J (2004) High-resolution FISH on super-stretched flow-sorted plant chromosomes. Plant J 37:940–950CrossRefPubMedGoogle Scholar
  67. van der Hoeven R, Ronning C, Giovannoni J, Martin G, Tanksley S (2002) Deductions about the number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. Plant Cell 14:1441–1456CrossRefPubMedPubMedCentralGoogle Scholar
  68. Wagner I, Capesius I (1981) Determination of 5-methycytosine from plant DNA by high-performance liquid chromatograph. Biochem Biophys Acta 654:52–56PubMedGoogle Scholar
  69. Walker EL, Panavas T (2001) Structural features and methylation patterns associated with paramutation at the r1 locus of Zea mays. Genetics 159:1201–1215PubMedPubMedCentralGoogle Scholar
  70. Walbot V, Warren C (1990) DNA methylation in the Alcohol-dehydrogenase-1 gene of maize. Plant Mol Biol 15:121–125CrossRefPubMedGoogle Scholar
  71. White S, Doebley J (1998) Of genes and genomes and the origin of maize. Trends Genet 14:327–332CrossRefPubMedGoogle Scholar
  72. Wicker T, Stein N, Albar L, Feuillet C, Schlagenhauf E, Keller B (2001) Analysis of a contiguous 211 kb sequence in diploid wheat (Triticum monococcum L.) reveals multiple mechanisms of genome evolution. Plant J 26:307–316CrossRefPubMedGoogle Scholar
  73. Wikstrom N, Savolainen V, Chase MW (2001) Evolution of the angiosperms: calibrating the family tree. Proc R Soc Lond Ser B Biol Sci 268:2211–2220CrossRefGoogle Scholar
  74. Ye F, Signer ER (1996) RIGS (repeat-induced gene silencing) in Arabidopsis is transcriptional and alters chromatin configuration. Proc Natl Acad Sci USA 93:10881–10886CrossRefPubMedPubMedCentralGoogle Scholar
  75. Yang ZH, Yoder AD (2003) Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse lemur species. Syst Biol 52:705–716CrossRefPubMedGoogle Scholar
  76. Yu J, Hu SN, Wang J, Wong GKS, Li SG, Liu B, Deng YJ, Dai L, Zhou Y, Zhang XQ, Cao ML, Liu J, Sun JD, Tang JB, Chen YJ, Huang XB, Lin W, Ye C, Tong W, Cong LJ, Geng JN, Han YJ, Li L, Li W, Hu GQ, Huang XG, Li WJ, Li J, Liu ZW, Liu JP, Qi QH, Liu JS, Li T, Wang XG, Lu H, Wu TT, Zhu M, Ni PX, Han H, Dong W, Ren XY, Feng XL, Cui P, Li XR, Wang H, Xu X, Zhai WX, Xu Z, Zhang JS, He SJ, Zhang JG, Xu JC, Zhang KL, Zheng XW, Dong JH, Zeng WY, Tao L, Ye J, Tan J, Ren XD, Chen XW, He J, Liu DF, Tian W, Tian CG, Xia HG, Bao QY, Li G, Gao H, Cao T, Zhao WM, Li P, Chen W, Wang XD, Zhang Y, Hu JF, Liu S, Yang J, Zhang GY, Xiong YQ, Li ZJ, Mao L, Zhou CS, Zhu Z, Chen RS, Hao BL, Zheng WM, Chen SY, Guo W, Li GJ, Liu SQ, Tao M, Zhu LH, Yuan LP, Yang HM (2002) A draft sequence of the rice genome (Oryza sativa L. ssp indica). Science 296:79–92CrossRefPubMedGoogle Scholar
  77. Yuan YN, SanMiguel PJ, Bennetzen JL (2002) Methylation-spanning linker libraries link gene-rich regions and identify epigenetic boundaries in Zea mays. Genome Res 12:1345–1349CrossRefPubMedPubMedCentralGoogle Scholar
  78. Zemach A, Grafi G (2003) Characterization of Arabidopsis thaliana methyl-CpG-binding domain (MBD) proteins. Plant J 34:565–572CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  • Y. Wang
    • 1
  • R. S. van der Hoeven
    • 1
  • R. Nielsen
    • 2
  • L. A. Mueller
    • 1
  • S. D. Tanksley
    • 1
  1. 1.Department of Plant Breeding and Genetics, Department of Plant BiologyCornell University IthacaUSA
  2. 2.Department of Biological Statistics and Computational BiologyCornell University IthacaUSA

Personalised recommendations