Advertisement

Partition-Based Clustering Using Constraint Optimization

  • Valerio Grossi
  • Tias Guns
  • Anna Monreale
  • Mirco Nanni
  • Siegfried Nijssen
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10101)

Abstract

Partition-based clustering is the task of partitioning a dataset in a number of groups of examples, such that examples in each group are similar to each other. Many criteria for what constitutes a good clustering have been identified in the literature; furthermore, the use of additional constraints to find more useful clusterings has been proposed. In this chapter, it will be shown that most of these clustering tasks can be formalized using optimization criteria and constraints. We demonstrate how a range of clustering tasks can be modelled in generic constraint programming languages with these constraints and optimization criteria. Using the constraint-based modeling approach we also relate the DBSCAN method for density-based clustering to the label propagation technique for community discovery.

Keywords

Constraint Programming Global Constraint Label Propagation Cluster Setting Core Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)zbMATHGoogle Scholar
  2. 2.
    Babaki, B., Guns, T., Nijssen, S.: Constrained clustering using column generation. In: Simonis, H. (ed.) CPAIOR 2014. LNCS, vol. 8451, pp. 438–454. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-07046-9_31 CrossRefGoogle Scholar
  3. 3.
    Basu, S., Davidson, I., Wagstaff, K., Clustering, C.: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2008)Google Scholar
  4. 4.
    Berthold, M.R., Borgelt, C., Hppner, F., Klawonn, F.: Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data, 1st edn. Springer, Heidelberg (2010)CrossRefzbMATHGoogle Scholar
  5. 5.
    Coscia, M., Giannotti, F., Pedreschi, D.: A classification for community discovery methods in complex networks. Stat. Anal. Data Min. 4(5), 512–546 (2011)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Dao, T., Duong, K., Vrain, C.: A filtering algorithm for constrained clustering with within-cluster sum of dissimilarities criterion. In: 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, Herndon, VA, USA, 4–6 November 2013, pp. 1060–1067 (2013)Google Scholar
  7. 7.
    Dao, T.-B.-H., Duong, K.-C., Vrain, C.: A declarative framework for constrained clustering. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 419–434. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40994-3_27 CrossRefGoogle Scholar
  8. 8.
    Dao, T.-B.-H., Duong, K.-C., Vrain, C.: Constrained clustering by constraint programming. Artif. Intell. (2015)Google Scholar
  9. 9.
    Davidson, I., Ravi, S.: The complexity of non-hierarchical clustering with instance and cluster level constraints. Data Min. Knowl. Disc. 14(1), 25–61 (2007)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Davidson, I., Ravi, S.S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005, Newport Beach, CA, USA, 21–23 April 2005, pp. 138–149 (2005)Google Scholar
  11. 11.
    du Merle, O., Hansen, P., Jaumard, B., Mladenovic, N.: An interior point algorithm for minimum sum-of-squares clustering. SIAM J. Sci. Comput. 21(4), 1485–1505 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) KDD, pp. 226–231. AAAI Press, Menlo Park (1996)Google Scholar
  13. 13.
    Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Hansen,P., Aloise, D.: A survey on exact methods for minimum sum-of-squares clustering, pp. 1–2, January 2009. http://www.math.iit.edu/Buck65files/msscStLouis.pdf
  15. 15.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  16. 16.
    Mueller, M., Kramer, S.: Integer linear programming models for constrained clustering. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS (LNAI), vol. 6332, pp. 159–173. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-16184-1_12 CrossRefGoogle Scholar
  17. 17.
    Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(2), 036106+ (2007)CrossRefGoogle Scholar
  18. 18.
    Wagstaff, K., Cardie,C.: Clustering with instance-level constraints. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29–July 2 2000, pp. 1103–1110 (2000)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Valerio Grossi
    • 1
  • Tias Guns
    • 3
  • Anna Monreale
    • 1
  • Mirco Nanni
    • 2
  • Siegfried Nijssen
    • 3
    • 4
  1. 1.University of PisaPisaItaly
  2. 2.ISTI - CNRPisaItaly
  3. 3.DTAIKU LeuvenLeuvenBelgium
  4. 4.LIACSUniversiteit LeidenLeidenThe Netherlands

Personalised recommendations