Discovering Patterns of DNA Methylation: Rule Mining with Rough Sets and Decision Trees, and Comethylation Analysis

  • Niu Ben
  • Qiang Yang
  • Jinyan Li
  • Shiu Chi-keung
  • Sankar Pal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4815)


DNA methylation regulates the transcription of genes without changing their coding sequences. It plays a vital role in the process of embryogenesis and tumorgenesis. To gain more insights into how such epigenetic mechanism works in the human cells, we apply the two popular data mining techniques, i.e., Rough Sets, and Decision Trees, to uncover the logical rules of DNA methylation. Our results show that the Rough Sets method can generate and utilize fewer rules to fully separate the methylation dataset, whereas Decision Trees method relies on more rules but involves fewer decision variables to do the same task. We also find that some of the gene promoters are highly comethylated, demonstrating the evidence that genes are highly interactive epigenetically in human cells.


Embryonic Stem Cell Human Embryonic Stem Cell Methylation Profile Logical Rule Decision Tree Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Jaenisch, R., Bird, A.: Epigenetic Regulation of Gene Expression: How the Genome Integrates Intrinsic and Environmental Signals. Nature Genetics 33 Suppl., 245–254 (2003)CrossRefGoogle Scholar
  2. 2.
    Fabian, M., Peter, A., Alexander, O., Christian, P.: Feature Selection for DNA Methylation based Cancer Classification. Bioinformatics 17(90001), S157–S164 (2001)Google Scholar
  3. 3.
    Bibikova, M., et al.: Human Embryonic Stem Cells Have a Unique Epigenetic Signature., Genome Research, online article (August 2006) Google Scholar
  4. 4.
    Bhasin, M., Zhang, H., Reinherz, E., Reche, P.A.: Prediction of Methylated CpGs in DNA Sequences Using a Support Vector Machine. FEBS Letters 579, 4302–4308 (2005)CrossRefGoogle Scholar
  5. 5.
    Marjoram, P., Chang, J., Laird, P.W., Siegmund, K.D.: Cluster Analysis for DNA Methylation Profiles Having a Detection Threshold. BMC Bioinformatics 7, 361 (2006)CrossRefGoogle Scholar
  6. 6.
    Das, R., et al.: Computational Prediction of Methylation Status in Human Genomic Sequences. PNAS 103(28), 10713–10716 (2006)CrossRefGoogle Scholar
  7. 7.
    Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: Probabilistic versus Deterministic Approach. International Journal of Man-Machine Studies 29, 81–95 (1988)zbMATHCrossRefGoogle Scholar
  8. 8.
    Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)Google Scholar
  9. 9.
    Rosetta software,
  10. 10.
    SPASS Clementine software,

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Niu Ben
    • 1
  • Qiang Yang
    • 1
  • Jinyan Li
    • 2
  • Shiu Chi-keung
    • 3
  • Sankar Pal
    • 4
  1. 1.Department of Computer Science and Engineering, Hong Kong University of Science & Technology, Hong KongChina
  2. 2.Institute for Infocomm ResearchSingapore
  3. 3.Department of Computing, Hong Kong Polytechnic University, Hong KongChina
  4. 4.Indian Statistical Institute, KolkataIndia

Personalised recommendations