Skip to main content
Log in

Extended Box Clustering for Classification Problems

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

In this work we address a technique, based on elementary convex sets called box hulls, for effectively grouping finite point sets into non-convex objects, called box clusters. The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and it is situated inside the theoretical framework of Supervised clustering. This approach extends the so-called (convex) box clustering, originally developed in the context of the logical analysis of data, to non-convex geometry. We briefly discuss the topological properties of these clusters and introduce a family of hypergraphs, called incompatibility hypergraphs; the main aim for these hypergraphs is their role in clustering algorithms, even if they have strong theoretical properties as shown in other works in literature. We also discuss of supervised classification problems and generalized Voronoi diagrams are considered to define a classifier based on box clusters. Finally, computational experiments on real world data are used to show the efficacy of our methods both in terms of clustering and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ACHARYA, B. (2007), “Domination in Hypergraphs”, AKCE Journal of Graphs Combinatorics, 4, 117–126.

    MathSciNet  MATH  Google Scholar 

  • AURENHAMMER, F. (1991), “Voronoi Diagrams - A Survey of a Fundamental Geometric Data Structure”, ACM Computing Surveys, 23(3), 345–405.

    Article  Google Scholar 

  • AWASTHI, P., and ZADEH, R.B. (2010), “Supervised Clustering”, in Advances in Neural Information Processing Systems 23, eds. J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Curran Associates, Inc., pp 91–99.

  • BACHE, K., and LICHMAN, M. (2013), “UCIMachine Learning Repository”, http://www.ics.uci.edu/~mlearn/MLRepository.html

  • BÁRÁNY, I., and LEHEL, J. (1987), “Covering with Euclidean Boxes”, European Journal of Combinatorics, 8(2), 113–119.

    Article  MathSciNet  MATH  Google Scholar 

  • BOISSONNAT, J.D., WORMSER, C., and YVINEC, M. (2006), “Curved Voronoi Diagrams”, in Effective Computational Geometry for Curves and Surfaces, eds. J.D. Boissonnat and M. Teillaud, Berlin Heidelberg: Springer, pp 67–116.

    Chapter  Google Scholar 

  • BOROS, E. (2010), “Incompatibility Graphs”, in Workshop in Graph Theory and Combinatorics, University of Illinois at Chicago.

  • BOROS, E., HAMMER, P., IBARAKI, T., KOGAN, A., MAYORAZ, E., and MUCHNIK, I. (2000), “An Implementation of Logical Analysis of Data”, IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.

    Article  Google Scholar 

  • BOROS, E., GURVICH, V., and LIU, Y. (2005), “Comparison of Convex Hulls and Box Hulls”, Ars Combinatoria, 77.

  • BOROS, E., RICCA, F., and SPINELLI, V. (2011), “Incompatibility Graphs in Data Mining”, in Proceedings of the 10th Cologne-Twente Workshop on Graphs and Combinatorial Optimization, pp. 4–7.

  • CARATHÉODORY, C. (1911), “Über den Variabilitätsbereich der Fourier’schen Konstanten von Positiven Harmonischen Funktionen”, Rendiconti del Circolo Matematico di Palermo, 32, 193–217.

    Article  MATH  Google Scholar 

  • CHIARANDINI, M., and STÜTZLE, T. (2010), “An Analysis of Heuristics for Vertex Colouring”, in Festa P (ed) Experimental Algorithms, Lecture Notes in Computer Science, Vol 6049, Berlin Heidelberg: Springer, pp 67–116.

    Google Scholar 

  • CRAMA, Y., and HAMMER, P.L. (2011), Boolean Functions - Theory, Algorithms, and Applications, Encyclopedia of Mathematics and Its Applications, Vol 142, Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • CRAMA, Y., HAMMER, P., and IBARAKI, T. (1988), “Cause-Effect Relationships and Partially Defined Boolean Functions”, Annals of Operations Research, 16(1), 299–325.

    Article  MathSciNet  MATH  Google Scholar 

  • DINUR, I., REGEV, O., and SMYTH, C. (2005), “The Hardness of 3-Uniform Hypergraph Coloring”, Combinatorica 25, 519–535.

    Article  MathSciNet  MATH  Google Scholar 

  • DOTSON, R., and NAGLE, B. (2009), “Hereditary Properties of Hypergraphs”, Journal of Combinatorial Theory Series B, 99, 460–473.

    Article  MathSciNet  MATH  Google Scholar 

  • DUCH, W. (2000), “Similarity-Based Methods: A General Framework for Classification, Approximation and Association”, Control and Cybernetics, 29(4), 937–968.

    MathSciNet  MATH  Google Scholar 

  • ECKSTEIN, J., HAMMER, P., LIU, Y., NEDIAK, M., and SIMEONE, B. (2002), “The Maximum Box Problem and Its Application to Data Analysis”, Computational Optimization and Application, 23(3), 285–298.

    Article  MathSciNet  MATH  Google Scholar 

  • EICK, C.F., ZEIDAT, N., and ZHAO, Z. (2004), “Supervised Clustering - Algorithms and Benefits”, in Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 774–776.

  • FAYOLLE, P., PASKO, A., SCHMITT, B., and MIRENKOV, N. (2005), “Constructive Heterogeneous Object Modeling Using Signed Approximate Real Distance Functions”, Journal of Computing and Information Science in Engineering, 6, 221–229.

    Article  Google Scholar 

  • FELICI, G., SIMEONE, B., and SPINELLI, V. (2010),“Classification Techniques and Error Control in Logic Mining”, in Data Mining, Annals of Information Systems, Vol 8, eds. R. Stahlbock, S.F. Crone, and S. Lessmann, Springer, pp. 99–119.

  • GOLUBITSKY, O., MAZALOV, V., and WATT, S. (2012), “An Algorithm to Compute the Distance from a Point to a Simplex”, ACM Communications in Computer Algebra, 46(2-180), 57.

    Google Scholar 

  • GOLUMBIC, M.C. (1980), Algorithmic Graph Theory and Perfect Graphs. Computer Science and Applied Mathematics, New York: Academic Press.

    MATH  Google Scholar 

  • HELLY, E. (1923), “Über Mengen Konvexer Körper mit Gemeinschaftlichen Punkten”, Jahresbericht der Deutschen Mathematiker-Vereinigung, pp. 175–176.

  • KANEKO, A., and KANO, M. (2003), “Discrete Geometry on Red and Blue Points in the Plane – A Survey”, in Discrete and Computational Geometry, Springer, pp. 551–570.

  • KANTARDZIC, M. (2002), Data Mining: Concepts, Models, Methods and Algorithms, New York NY: John Wiley and Sons, Inc.

    Google Scholar 

  • KLEIN, E., LANGETEPE, E., and NILFOROUSHAN, Z. (2009), “Abstract Voronoi Diagrams Revisited”, Computational Geometry, 42(9), 885–902.

    Article  MathSciNet  MATH  Google Scholar 

  • KRIVELEVICH, M., and SUDAKOV, B. (2003), “Approximate Coloring of Uniform Hypergraphs”, Journal of Algorithms, 49, 2–12.

    Article  MathSciNet  MATH  Google Scholar 

  • KULIS, B. (2013), “Metric Learning: A Survey”, Foundations and Trends in Machine Learning, 5(4), 287–364.

    Article  MathSciNet  MATH  Google Scholar 

  • LEIGHTON, F. (1979), “A Graph Coloring Algorithm for Large Scheduling Problems”, Journal of Research of the National Bureau of Standards, 84(6), 489–506.

    Article  MathSciNet  MATH  Google Scholar 

  • MACHINE LEARNING GROUP (2013), “WEKA - Data Mining Software in Java”, University of Waikato, New Zealand, http://www.cs.waikato.ac.nz/ml/weka.

    Google Scholar 

  • PREPARATA, F., and HONG, S. (1977), “Convex Hulls of Finite Sets of Points in Two and Three Dimensions”, Communications of the ACM, 20, 87–93.

    Article  MathSciNet  MATH  Google Scholar 

  • RADON, J. (1921), “Mengen Konvexer Körper, Die Einen Gemeinsamen Punkt Enthalten”, Mathematische Annalen, 83, 113–115.

    Article  MathSciNet  MATH  Google Scholar 

  • SOGAARD, A. (2013) ,Semi-Supervised Learning and Domain Adaptation in Natural Language Processing, San Rafael: Morgan and Claypool.

    Google Scholar 

  • SPINELLI, V. (2016), “Pruning Boxes in a Box-Based Classification Method”, Advances in Data Analysis and Classification,10(3), 285–304.

    Article  MathSciNet  Google Scholar 

  • SPINELLI, V. (2017), “Supervised Box Clustering”, Advances in Data Analysis and Classification, 11(1), 179–204.

    Article  MathSciNet  Google Scholar 

  • WITTEN, I., FRANK, E., and HALL, M. (2011), Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.), Morgan Kaufmann.

  • ZHAO, M., EDAKUNNI, N., POCOCK, A., and BROWN, G. (2013), “Beyond Fanos Inequality: Bounds on the Optimal F-Score, Ber, and Cost-Sensitive Risk and Their Implications”, Journal of Machine Learning Research, 14, 1033–1090.

    MathSciNet  MATH  Google Scholar 

  • ZHOU, D., HUANG, J., and SCHÖLKOPF, B. (2006), “Learning with Hypergraphs: Clustering, Classification, and Embedding”, in Advances in Neural Information Processing Systems (NIPS) 19, MIT Press, pp. 1601–1608.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincenzo Spinelli.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Spinelli, V. Extended Box Clustering for Classification Problems. J Classif 35, 100–123 (2018). https://doi.org/10.1007/s00357-018-9253-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-018-9253-2

Keywords

Navigation