A Scenario Implementation in R for SubtypeDiscovery Examplified on Chemoinformatics Data

  • Fabrice Colas
  • Ingrid Meulenbelt
  • Jeanine J. Houwing-Duistermaat
  • Margreet Kloppenburg
  • Iain Watt
  • Stephanie M. van Rooden
  • Martine Visser
  • Johan Marinus
  • Edward O. Cannon
  • Andreas Bender
  • Jacobus J. van Hilten
  • P. Eline Slagboom
  • Joost N. Kok
Part of the Communications in Computer and Information Science book series (CCIS, volume 17)

Abstract

We developed a methodology that both facilitates and enhances the search for homogeneous subtypes in data. We applied this methodology to medical research on Osteoarthritis and Parkinson’s Disease and to chemoinformatics research on the chemical structure of molecule profiles. We release this methodology as the RSubtypeDiscovery package to enable reproducibility of our analyses. In this paper, we present the package implementation and we illustrate its output on molecular data from chemoinformatics. Our methodology includes different techniques to process the data, a computational approach repeating data modelling to select for a number of subtypes or a type of model, and additional methods to characterize, compare and evaluate the top ranking models. Therefore, this methodology does not solely cluster data but it also produces a complete set of results to conduct a subtype discovery analysis.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Colas, F., Meulenbelt, I., Houwing-Duistermaat, J., van Rooden, S., Visser, M., Marinus, H., van Hilten, B., Slagboom, P.E., Kok, J.N.: Stability of clusters for different time adjustments in complex disease research. In: 30th Annual International IEEE EMBS Conference (EMBC 2008), Vancouver, British Columbia, Canada (August 2008)Google Scholar
  2. 2.
    Meulenbelt, I.: Genetic predisposing factors of osteoarthritis. PhD thesis, Universiteit van Leiden (1997)Google Scholar
  3. 3.
    Riyazi, N.: Familial osteoarthritis, risk factors and determinants of outcome. PhD thesis, Universiteit van Leiden (2006)Google Scholar
  4. 4.
    Neurology Department: SCales for Outcomes in PArkinson’s Disease-PROfiling PARKinson’s Disease. Leiden University Medical Center, Leiden, The NetherlandsGoogle Scholar
  5. 5.
    Cannon, E.O., Nigsch, F., Mitchell, J.B.O.: A novel hybrid ultrafast shape descriptor method for use in virtual screening. Chemistry Central Journal 2 (2008)Google Scholar
  6. 6.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, Heidelberg (2001)MATHGoogle Scholar
  7. 7.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy, The Principles and Practice of Numerical Classification. Books in Biology. W. H. Freeman and Company, New York (1973)MATHGoogle Scholar
  9. 9.
    Fraley, C., Raftery, A.E.: MCLUST: Software for model-based cluster analysis. Journal of Classification 16, 297–306 (1999)CrossRefMATHGoogle Scholar
  10. 10.
    Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association 97, 611–631 (2002)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Fraley, C., Raftery, A.E.: Enhanced software for model-based clustering, density estimation, and discriminant analysis: MCLUST. Journal of Classification 20, 263–286 (2003)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Fraley, C., Raftery, A.E.: MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, University of Washington, Department of Statistics (September 2006)Google Scholar
  13. 13.
    Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Kass, R.E., Raftery, A.E.: Bayes factors. Journal of the American Statistical Association 90(430) (1995)Google Scholar
  15. 15.
    Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)MATHGoogle Scholar
  16. 16.
    Tufte, E.R.: The Visual Display of Quantitative Information. Graphics Press, Cheshire (1983)Google Scholar
  17. 17.
    Tufte, E.R.: Envisioning Information. Graphics Press, Cheshire (1990)Google Scholar
  18. 18.
    Brewer, C.A.: 7. In: Color Use Guidelines for Mapping and Visualization, pp. 123–147. Elsevier Science, Tarrytown (1994)Google Scholar
  19. 19.
    Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of National Academy of Science USA 95, 11863–14868 (1998)CrossRefGoogle Scholar
  20. 20.
    Inselberg, A.: The plane with parallel coordinates. The Visual Computer 1(2), 69–91 (1985)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Fabrice Colas
    • 1
  • Ingrid Meulenbelt
    • 2
  • Jeanine J. Houwing-Duistermaat
    • 3
  • Margreet Kloppenburg
    • 4
  • Iain Watt
    • 5
  • Stephanie M. van Rooden
    • 6
  • Martine Visser
    • 6
  • Johan Marinus
    • 6
  • Edward O. Cannon
    • 7
  • Andreas Bender
    • 8
  • Jacobus J. van Hilten
    • 6
  • P. Eline Slagboom
    • 2
  • Joost N. Kok
    • 1
    • 2
  1. 1.LIACSLeiden UniversityThe Netherlands
  2. 2.MOLEPI, LUMCThe Netherlands
  3. 3.MEDSTATSLUMCThe Netherlands
  4. 4.Rheumatology dept.LUMCThe Netherlands
  5. 5.Radiology dept.LUMCThe Netherlands
  6. 6.Neurology dept.LUMCThe Netherlands
  7. 7.UCMSIUniversity of CambridgeUnited Kingdom
  8. 8.LACDRLeiden UniversityThe Netherlands

Personalised recommendations