Explaining Mixture Models through Semantic Pattern Mining and Banded Matrix Visualization

  • Prem Raj Adhikari
  • Anže Vavpetič
  • Jan Kralj
  • Nada Lavrač
  • Jaakko Hollmén
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8777)

Abstract

Semi-automated data analysis is possible for the end user if data analysis processes are supported by easily accessible tools and methodologies for pattern/model construction, explanation, and exploration. The proposed three–part methodology for multiresolution 0–1 data analysis consists of data clustering with mixture models, extraction of rules from clusters, as well as data, cluster, and rule visualization using banded matrices. The results of the three-part process—clusters, rules from clusters, and banded structure of the data matrix—are finally merged in a unified visual banded matrix display. The incorporation of multiresolution data is enabled by the supporting ontology, describing the relationships between the different resolutions, which is used as background knowledge in the semantic pattern mining process of descriptive rule induction. The presented experimental use case highlights the usefulness of the proposed methodology for analyzing complex DNA copy number amplification data, studied in previous research, for which we provide new insights in terms of induced semantic patterns and cluster/pattern visualization.

Keywords

Mixture Models Semantic Pattern Mining Pattern Visualization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chen, C.-H., Hwu, H.-G., Jang, W.-J., Kao, C.-H., Tien, Y.-J., Tzeng, S., Wu, H.-M.: Matrix Visualization and Information Mining. In: Antoch, J. (ed.) COMPSTAT 2004 – Proceedings in Computational Statistics, pp. 85–100. Physica-Verlag HD (2004)Google Scholar
  2. 2.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological) 39(1), 1–38 (1977)MathSciNetMATHGoogle Scholar
  3. 3.
    Durkin, S.G., Glover, T.W.: Chromosome Fragile Sites. Annual Review of Genetics 41(1), 169–192 (2007)CrossRefGoogle Scholar
  4. 4.
    Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rahman, N., Stratton, M.R.: A census of human cancer genes. Nature Reviews. Cancer 4(3), 177–183 (2004)CrossRefGoogle Scholar
  5. 5.
    Garriga, G.C., Junttila, E., Mannila, H.: Banded structure in binary matrices. Knowledge and Information Systems 28(1), 197–226 (2011)CrossRefGoogle Scholar
  6. 6.
    Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. Adaptive Computation and Machine Learning Series. MIT Press (2001)Google Scholar
  7. 7.
    Hollmén, J., Seppänen, J.K., Mannila, H.: Mixture models and frequent sets: combining global and local methods for 0-1 data. In: Proceedings of the Third SIAM International Conference on Data Mining, pp. 289–293. Society of Industrial and Applied Mathematics (2003)Google Scholar
  8. 8.
    Hollmén, J., Tikka, J.: Compact and understandable descriptions of mixtures of Bernoulli distributions. In: Berthold, M.R., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 1–12. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Langohr, L., Podpecan, V., Petek, M., Mozetic, I., Gruden, K., Lavrač, N., Toivonen, H.: Contrasting Subgroup Discovery. The Computer Journal 56(3), 289–303 (2013)CrossRefGoogle Scholar
  10. 10.
    Lockwood, W.W., Chari, R., Coe, B.P., Girard, L., Macaulay, C., Lam, S., Gazdar, A.F., Minna, J.D., Lam, W.L.: DNA amplification is a ubiquitous mechanism of oncogene activation in lung and other cancers. Oncogene 27(33), 4615–4624 (2008)CrossRefGoogle Scholar
  11. 11.
    McLachlan, G.J., Peel, D.: Finite mixture models. Probability and Statistics – Applied Probability and Statistics, vol. 299. Wiley (2000)Google Scholar
  12. 12.
    Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.: DNA copy number amplification profiling of human neoplasms. Oncogene 25(55), 7324–7332 (2006)CrossRefGoogle Scholar
  13. 13.
    Myllykangas, S., Tikka, J., Böhling, T., Knuutila, S., Hollmén, J.: Classification of human cancers based on DNA copy number amplification modeling. BMC Medical Genomics 1(15) (May 2008)Google Scholar
  14. 14.
    Novak, P., Lavrač, N., Webb, G.I.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. Journal of Machine Learning Research 10, 377–403 (2009)MATHGoogle Scholar
  15. 15.
    Shaffer, L.G., Tommerup, N.: ISCN 2005: An Intl. System for Human Cytogenetic Nomenclature (2005) Recommendations of the Intl. Standing Committee on Human Cytogenetic Nomenclature. Karger (2005)Google Scholar
  16. 16.
    Tikka, J., Hollmén, J., Myllykangas, S.: Mixture Modeling of DNA copy number amplification patterns in cancer. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 972–979. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Trajkovski, I., Železný, F., Lavrač, N., Tolar, J.: Learning Relational Descriptions of Differentially Expressed Gene Groups. IEEE Transactions on Systems, Man, and Cybernetics, Part C 38(1), 16–25 (2008)CrossRefGoogle Scholar
  18. 18.
    Vavpetič, A., Lavrač, N.: Semantic Subgroup Discovery Systems and Workflows in the SDM-Toolkit. The Comput. J. 56(3), 304–320 (2013)CrossRefGoogle Scholar
  19. 19.
    Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N.: Semantic Data Mining of Financial News Articles. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 294–307. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  20. 20.
    Vavpetič, A., Podpečan, V., Lavrač, N.: Semantic subgroup explanations. Journal of Intelligent Information Systems (2013) (in press)Google Scholar
  21. 21.
    zur Hausen, H.: The search for infectious causes of human cancers: Where and why. Virology 392(1), 1–10 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Prem Raj Adhikari
    • 1
  • Anže Vavpetič
    • 2
  • Jan Kralj
    • 2
  • Nada Lavrač
    • 2
  • Jaakko Hollmén
    • 1
  1. 1.Helsinki Institute for Information Technology HIIT and Department of Information and Computer ScienceAalto University School of ScienceAaltoFinland
  2. 2.Jožef Stefan Institute and Jožef Stefan International Postgraduate SchoolLjubljanaSlovenia

Personalised recommendations