Skip to main content

Clustering Formulation Using Constraint Optimization

  • Conference paper
  • First Online:
Software Engineering and Formal Methods (SEFM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9509))

Included in the following conference series:

Abstract

The problem of clustering a set of data is a textbook machine learning problem, but at the same time, at heart, a typical optimization problem. Given an objective function, such as minimizing the intra-cluster distances or maximizing the inter-cluster distances, the task is to find an assignment of data points to clusters that achieves this objective. In this paper, we present a constraint programming model for a centroid based clustering and one for a density based clustering. In particular, as a key contribution, we show how the expressivity introduced by the formulation of the problem by constraint programming makes the standard problem easy to be extended with other constraints that permit to generate interesting variants of the problem. We show this important aspect in two different ways: first, we show how the formulation of the density-based clustering by constraint programming makes it very similar to the label propagation problem and then, we propose a variant of the standard label propagation approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Babaki, B., Guns, T., Nijssen, S.: Constrained clustering using column generation. In: Simonis, H. (ed.) CPAIOR 2014. LNCS, vol. 8451, pp. 438–454. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  2. Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning distance functions using equivalence relations. In: ICML, pp. 11–18 (2003)

    Google Scholar 

  3. Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: SDM (2004)

    Google Scholar 

  4. Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: KDD, pp. 59–68 (2004)

    Google Scholar 

  5. Berthold, M.R., Borgelt, C., Hppner, F., Klawonn, F.: Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data, 1st edn. Springer, London (2010)

    Book  Google Scholar 

  6. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981)

    Book  MATH  Google Scholar 

  7. Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML, ACM (2004)

    Google Scholar 

  8. Coscia, M., Giannotti, F., Pedreschi, D.: A classification for community discovery methods in complex networks. Stat. Anal. Data Min. 4(5), 512–546 (2011)

    Article  MathSciNet  Google Scholar 

  9. Dao, T.-B.-H., Duong, K.-C., Vrain, C.: A declarative framework for constrained clustering. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III. LNCS, vol. 8190, pp. 419–434. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  10. Davidson, I., Ravi, S.S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: SDM (2005)

    Google Scholar 

  11. Davidson, I., Ravi, S.S.: Identifying and generating easy sets of constraints for clustering. In: Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference (AAAI), pp. 336–341 (2006)

    Google Scholar 

  12. Davidson, I., Ravi, S.S.: The complexity of non-hierarchical clustering with instance and cluster level constraints. DMKD 14(1), 25–61 (2007)

    MathSciNet  Google Scholar 

  13. Davidson, I., Ravi, S.S.: Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Min. Knowl. Discov. 18(2), 257–282 (2009)

    Article  MathSciNet  Google Scholar 

  14. Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1974)

    Article  MathSciNet  Google Scholar 

  15. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) KDD, pp. 226–231. AAAI Press (1996)

    Google Scholar 

  16. Guns, T., Nijssen, S., Raedt, L.D.: k-pattern set mining under constraints. IEEE Trans. Knowl. Data Eng. 25(2), 402–418 (2013)

    Article  Google Scholar 

  17. Hansen, P., Aloise, D.: A survey on exact methods for minimum sum-of-squares clustering. http://www.math.iit.edu/Buck65files/msscStLouis.pdf, pp. 1–2, January 2009

  18. Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)

    MATH  Google Scholar 

  19. Merle, O.D., Hansen, P., Jaumard, B., Mladenović, N.: An interior point algorithm for minimum sum of squares clustering. SIAM J. Sci. Comput. 21, 1485–1505 (1997)

    Article  Google Scholar 

  20. Mueller, M., Kramer, S.: Integer linear programming models for constrained clustering. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS, vol. 6332, pp. 159–173. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  21. Negrevergne, B., Guns, T.: Constraint-based sequence mining using constraint programming. In: Michel, L. (ed.) CPAIOR 2015. LNCS, vol. 9075, pp. 288–305. Springer, Heidelberg (2015)

    Google Scholar 

  22. Okabe, M., Yamada, S.: Clustering by learning constraints priorities. In: ICDM, pp. 1050–1055 (2012)

    Google Scholar 

  23. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(2), 036106+ (2007)

    Article  Google Scholar 

  24. Ruiz, C., Spiliopoulou, M., Menasalvas, E.: C-DBSCAN: density-based clustering with constraints. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds.) RSFDGrC 2007. LNCS (LNAI), vol. 4482, pp. 216–223. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  25. Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: ICML, pp. 1103–1110 (2000)

    Google Scholar 

  26. Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: AAAI/IAAI, p. 1097 (2000)

    Google Scholar 

  27. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)

    Google Scholar 

  28. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems, vol. 15, pp. 505–512. MIT Press (2002)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the European Commission under the project Inductive Constraint Programming (ICON) contract number FP7-284715.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valerio Grossi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grossi, V., Monreale, A., Nanni, M., Pedreschi, D., Turini, F. (2015). Clustering Formulation Using Constraint Optimization. In: Bianculli, D., Calinescu, R., Rumpe, B. (eds) Software Engineering and Formal Methods. SEFM 2015. Lecture Notes in Computer Science(), vol 9509. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49224-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49224-6_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49223-9

  • Online ISBN: 978-3-662-49224-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics