Advertisement

Constrained Predictive Clustering

Chapter

Abstract

In this chapter, we extend predictive clustering by introducing constraints on the clusters and predictive models. A domain expert is usually not only interested in the most compact clusters or the most accurate model; other factors, such as model size and prediction cost, may also be important.We will see how such factors can be controlled by means of constraints. In predictive clustering trees, constraints can be imposed both from the clustering and the prediction point of view. We present an overview of various constraint types and look into algorithms for enforcing them.

Keywords

Cluster Tree Inductive Logic Programming Level Constraint Beam Search Decision Tree Induction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    H. Almuallim. An efficient algorithm for optimal pruning of decision trees. Artificial Intelligence, 83(2):347–362, 1996.CrossRefGoogle Scholar
  2. 2.
    N. Angelopoulos and J. Cussens. Exploiting informative priors for Bayesian classification and regression trees. In 19th Int’l Joint Conf. on Artificial Intelligence, pages 641–646, 2005.Google Scholar
  3. 3.
    S. Basu, M. Bilenko, and R.J. Mooney. A probabilistic framework for semi-supervised clustering. In 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pages 59–68, 2004.Google Scholar
  4. 4.
    M. Bilenko, S. Basu, and R.J. Mooney. Integrating constraints and metric learning in semisupervised clustering. In 21st Int’l Conf. on Machine Learning, pages 81–88, 2004.Google Scholar
  5. 5.
    S. Bistarelli and F. Bonchi. Extending the soft constraint based mining paradigm. In 5th Int’l Workshop on Knowledge Discovery in Inductive Databases, pages 24–41, 2007.Google Scholar
  6. 6.
    H. Blockeel, L. De Raedt, and J. Ramon. Top-down induction of clustering trees. In 15th Int’l Conf. on Machine Learning, pages 55–63, 1998.Google Scholar
  7. 7.
    H. Blockeel, S. Džeroski, and J. Grbović. Simultaneous prediction of multiple chemical parameters of river water quality with Tilde. In 3rd European Conf. on Principles of Data Mining and Knowledge Discovery, pages 32–40, 1999.Google Scholar
  8. 8.
    Hendrik Blockeel. Top-down Induction of First Order Logical Decision Trees. PhD thesis, K.U. Leuven, Dep. of Computer Science, Leuven, Belgium, 1998.Google Scholar
  9. 9.
    M. Bohanec and I. Bratko. Trading accuracy for simplicity in decision trees. Machine Learning, 15(3):223–250, 1994.MATHGoogle Scholar
  10. 10.
    P.S. Bradley, K.P. Bennett, and A. Demiriz. Constrained k-means clustering. Technical Report MSR-TR-2000-65, Microsoft Research, 2000.Google Scholar
  11. 11.
    L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984.MATHGoogle Scholar
  12. 12.
    Rich Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.CrossRefGoogle Scholar
  13. 13.
    I. Davidson and S.S. Ravi. Clustering with constraints: Feasibility issues and the k-means algorithm. In SIAM Int’l Data Mining Conf., 2005.Google Scholar
  14. 14.
    I. Davidson, K. Wagstaff, and S. Basu. Measuring constraint-set utility for partitional clustering algorithms. In 10th European Conf. on Principles and Practice of Knowledge Discovery in Databases, pages 115–126, 2006.Google Scholar
  15. 15.
    D. Demšar, S. Džeroski, P. Henning Krogh, T. Larsen, and J. Struyf. Using multiobjective classification to model communities of soil microarthropods. Ecological Modelling, 191(1):131–143, 2006.CrossRefGoogle Scholar
  16. 16.
    S. Džeroski, I. Slavkov, V. Gjorgjioski, and J. Struyf. Analysis of time series data with predictive clustering trees. In 5th Int’l Workshop on Knowledge Discovery in Inductive Databases, pages 47–58, 2006.Google Scholar
  17. 17.
    A. Friedman, Schuster A., and R. Wolff. k-anonymous decision tree induction. In 10th European Conf. on Principles and Practice of Knowledge Discovery in Databases, pages 151–162, 2006.Google Scholar
  18. 18.
    M. Garofalakis, D. Hyun, R. Rastogi, and K. Shim. Building decision trees with constraints. Data Mining and Knowledge Discovery, 7(2):187–214, 2003.CrossRefMathSciNetGoogle Scholar
  19. 19.
    D. Kocev, J. Struyf, and S. Džeroski. Beam search induction and similarity constraints for predictive clustering trees. In 5th Int’l Workshop on Knowledge Discovery in Inductive Databases, pages 134–151, 2007.Google Scholar
  20. 20.
    D. Kocev, C. Vens, J. Struyf, and S. Džeroski. Ensembles of multi-objective decision trees. In 18th European Conf. on Machine Learning, pages 624–631, 2007.Google Scholar
  21. 21.
    C. X. Ling, Q. Yang, J. Wang, and S. Zhang. Decision trees with minimal costs. In 21 Int’l Conf on Machine Learning, pages 544–551, 2004.Google Scholar
  22. 22.
    R.S. Michalski and R.E. Stepp. Learning from observation: Conceptual clustering. In Machine Learning: An Artificial Intelligence Approach, volume 1. Tioga Publishing Company, 1983.Google Scholar
  23. 23.
    C. Nédellec, H. Adé, F. Bergadano, and B. Tausend. Declarative bias in ILP. In Advances in Inductive Logic Programming, volume 32 of Frontiers in Artificial Intelligence and Applications, pages 82–103. IOS Press, 1996.Google Scholar
  24. 24.
    S. Nijssen and E. Fromont. Optimal constraint-based decision tree induction from itemset lattices. Data Mining and Knowledge Discovery, 21(1):9–51, 2010.CrossRefGoogle Scholar
  25. 25.
    J.R. Quinlan. Learning with continuous classes. In 5th Australian Joint Conf. on Artificial Intelligence, pages 343–348. World Scientific, 1992.Google Scholar
  26. 26.
    L.E. Raileanu and K. Stoffel. Theoretical comparison between the Gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence, 41(1):77–93, 2004.MATHCrossRefMathSciNetGoogle Scholar
  27. 27.
    R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297–336, 1999.MATHCrossRefGoogle Scholar
  28. 28.
    J. Struyf and S. Džeroski. Constraint based induction of multi-objective regression trees. In 4th Int’l Workshop on Knowledge Discovery in Inductive Databases, pages 222–233, 2006.Google Scholar
  29. 29.
    J. Struyf and S. Džeroski. Clustering trees with instance level constraints. In 18th European Conf. on Machine Learning, pages 359–370, 2007.Google Scholar
  30. 30.
    L. Todorovski, B. Cestnik, M. Kline, N. Lavrač, and S. Džeroski. Qualitative clustering of short time-series: A case study of firms reputation data. In Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning, pages 141–149, 2002.Google Scholar
  31. 31.
    P. Turney. Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. J. of Artificial Intelligence Research, 2:369–409, 1995.Google Scholar
  32. 32.
    C. Vens, J. Struyf, L. Schietgat, S. Džeroski, and H. Blockeel. Decision trees for hierarchical multi-label classification. Machine Learning, 73(2):185–214, 2008.CrossRefGoogle Scholar
  33. 33.
    K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In 17th Int’l Conf. on Machine Learning, pages 1103–1110, 2000.Google Scholar
  34. 34.
    K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In 18th Int’l Conf. on Machine Learning, pages 577–584, 2001.Google Scholar
  35. 35.
    B. Ženko and S. Džeroski. Learning classification rules for multiple target attributes. In Advances in Knowledge Discovery and Data Mining, pages 454–465, 2008.Google Scholar
  36. 36.
    S. Zhong and J. Ghosh. Scalable, balanced model-based clustering. In SIAM Int’l Conf. on Data Mining, pages 71–82, 2003.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer ScienceKatholieke Universiteit LeuvenLeuvenBelgium
  2. 2.Department of Knowledge TechnologiesJožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations