Skip to main content
Book cover

Oral Biology pp 347–364Cite as

Exploring Genome-Wide Expression Profiles Using Machine Learning Techniques

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1537))

Abstract

Although contemporary high-throughput –omics methods produce high-dimensional data, the resulting wealth of information is difficult to assess using traditional statistical procedures. Machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups.

Here, we demonstrate the utility of (1) supervised classification algorithms in class validation, and (2) unsupervised clustering in class discovery. We use data from our previous work that described the transcriptional profiles of gingival tissue samples obtained from subjects suffering from chronic or aggressive periodontitis (1) to test whether the two diagnostic entities were also characterized by differences on the molecular level, and (2) to search for a novel, alternative classification of periodontitis based on the tissue transcriptomes.

Using machine learning technology, we provide evidence for diagnostic imprecision in the currently accepted classification of periodontitis, and demonstrate that a novel, alternative classification based on differences in gingival tissue transcriptomes is feasible. The outlined procedures allow for the unbiased interrogation of high-dimensional datasets for characteristic underlying classes, and are applicable to a broad range of –omics data.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Kebschull M, Guarnieri P, Demmer RT, Boulesteix AL, Pavlidis P, Papapanou PN (2013) Molecular differences between chronic and aggressive periodontitis. J Dent Res 92:1081–1088

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28:1–35

    Article  Google Scholar 

  3. Kebschull M, Demmer RT, Grun B, Guarnieri P, Pavlidis P, Papapanou PN (2014) Gingival tissue transcriptomes identify distinct periodontitis phenotypes. J Dent Res 93:459–468

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Slawski M, Daumer M, Boulesteix AL (2008) CMA: a comprehensive bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9:439

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wickham H (2007) Reshaping data with the reshape package. J Stat Software 21:1–20

    Article  Google Scholar 

  6. Wilkerson MD, Hayes DN (2010) ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26:1572–1573

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47

    Article  PubMed  PubMed Central  Google Scholar 

  8. Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W, Liaw A, Lumley T, Maechler M, Magnusson A, Moeller S, Schwartz M, Venables B (2009) gplots: various R programming tools for plotting data. R Package Version 2(4)

    Google Scholar 

  9. Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report no. 597, Department of Statistics, University of Washington, USA

    Google Scholar 

  10. Armitage GC (1999) Development of a classification system for periodontal diseases and conditions. Ann Periodontol 4:1–6

    Article  CAS  PubMed  Google Scholar 

  11. Armitage GC, Cullinan MP (2010) Comparison of the clinical features of chronic and aggressive periodontitis. Periodontol 2000 53:12–27

    Article  PubMed  Google Scholar 

  12. Gillis J, Mistry M, Pavlidis P (2010) Gene function analysis in complex data sets using ErmineJ. Nat Protoc 5:1148–1159

    Article  CAS  PubMed  Google Scholar 

  13. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    Article  Google Scholar 

  14. Papapanou PN, Abron A, Verbitsky M, Picolos D, Yang J, Qin J, Fine JB, Pavlidis P (2004) Gene expression signatures in chronic and aggressive periodontitis: a pilot study. Eur J Oral Sci 112:216–223

    Article  CAS  PubMed  Google Scholar 

  15. Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739

    Article  CAS  PubMed  Google Scholar 

  16. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882–883

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Boulesteix AL (2010) Over-optimism in bioinformatics research. Bioinformatics 26:437–439

    Article  CAS  PubMed  Google Scholar 

  18. Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:Article3

    Google Scholar 

  19. Boulesteix AL, Strobl C (2009) Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction. BMC Med Res Methodol 9:85

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

This work was supported by grants from the German Society for Periodontology (DG PARO) and the German Society for Oral and Maxillo-Facial Sciences (DGZMK) to M.K., and by grants from NIH/NIDCR (DE015649 and DE024735) and by an unrestricted gift from Colgate-Palmolive Inc. to P.N.P. The authors thank Prof. Anne-Laure Boulesteix (Munich, Germany) and Prof. Bettina Grün (Linz, Austria) for their support with the CMA and flexmix packages, respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moritz Kebschull .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Kebschull, M., Papapanou, P.N. (2017). Exploring Genome-Wide Expression Profiles Using Machine Learning Techniques. In: Seymour, G., Cullinan, M., Heng, N. (eds) Oral Biology. Methods in Molecular Biology, vol 1537. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6685-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6685-1_20

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6683-7

  • Online ISBN: 978-1-4939-6685-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics