Abstract
In this work we address a technique, based on elementary convex sets called box hulls, for effectively grouping finite point sets into non-convex objects, called box clusters. The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and it is situated inside the theoretical framework of Supervised clustering. This approach extends the so-called (convex) box clustering, originally developed in the context of the logical analysis of data, to non-convex geometry. We briefly discuss the topological properties of these clusters and introduce a family of hypergraphs, called incompatibility hypergraphs; the main aim for these hypergraphs is their role in clustering algorithms, even if they have strong theoretical properties as shown in other works in literature. We also discuss of supervised classification problems and generalized Voronoi diagrams are considered to define a classifier based on box clusters. Finally, computational experiments on real world data are used to show the efficacy of our methods both in terms of clustering and accuracy.
Similar content being viewed by others
References
ACHARYA, B. (2007), “Domination in Hypergraphs”, AKCE Journal of Graphs Combinatorics, 4, 117–126.
AURENHAMMER, F. (1991), “Voronoi Diagrams - A Survey of a Fundamental Geometric Data Structure”, ACM Computing Surveys, 23(3), 345–405.
AWASTHI, P., and ZADEH, R.B. (2010), “Supervised Clustering”, in Advances in Neural Information Processing Systems 23, eds. J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Curran Associates, Inc., pp 91–99.
BACHE, K., and LICHMAN, M. (2013), “UCIMachine Learning Repository”, http://www.ics.uci.edu/~mlearn/MLRepository.html
BÁRÁNY, I., and LEHEL, J. (1987), “Covering with Euclidean Boxes”, European Journal of Combinatorics, 8(2), 113–119.
BOISSONNAT, J.D., WORMSER, C., and YVINEC, M. (2006), “Curved Voronoi Diagrams”, in Effective Computational Geometry for Curves and Surfaces, eds. J.D. Boissonnat and M. Teillaud, Berlin Heidelberg: Springer, pp 67–116.
BOROS, E. (2010), “Incompatibility Graphs”, in Workshop in Graph Theory and Combinatorics, University of Illinois at Chicago.
BOROS, E., HAMMER, P., IBARAKI, T., KOGAN, A., MAYORAZ, E., and MUCHNIK, I. (2000), “An Implementation of Logical Analysis of Data”, IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.
BOROS, E., GURVICH, V., and LIU, Y. (2005), “Comparison of Convex Hulls and Box Hulls”, Ars Combinatoria, 77.
BOROS, E., RICCA, F., and SPINELLI, V. (2011), “Incompatibility Graphs in Data Mining”, in Proceedings of the 10th Cologne-Twente Workshop on Graphs and Combinatorial Optimization, pp. 4–7.
CARATHÉODORY, C. (1911), “Über den Variabilitätsbereich der Fourier’schen Konstanten von Positiven Harmonischen Funktionen”, Rendiconti del Circolo Matematico di Palermo, 32, 193–217.
CHIARANDINI, M., and STÜTZLE, T. (2010), “An Analysis of Heuristics for Vertex Colouring”, in Festa P (ed) Experimental Algorithms, Lecture Notes in Computer Science, Vol 6049, Berlin Heidelberg: Springer, pp 67–116.
CRAMA, Y., and HAMMER, P.L. (2011), Boolean Functions - Theory, Algorithms, and Applications, Encyclopedia of Mathematics and Its Applications, Vol 142, Cambridge: Cambridge University Press.
CRAMA, Y., HAMMER, P., and IBARAKI, T. (1988), “Cause-Effect Relationships and Partially Defined Boolean Functions”, Annals of Operations Research, 16(1), 299–325.
DINUR, I., REGEV, O., and SMYTH, C. (2005), “The Hardness of 3-Uniform Hypergraph Coloring”, Combinatorica 25, 519–535.
DOTSON, R., and NAGLE, B. (2009), “Hereditary Properties of Hypergraphs”, Journal of Combinatorial Theory Series B, 99, 460–473.
DUCH, W. (2000), “Similarity-Based Methods: A General Framework for Classification, Approximation and Association”, Control and Cybernetics, 29(4), 937–968.
ECKSTEIN, J., HAMMER, P., LIU, Y., NEDIAK, M., and SIMEONE, B. (2002), “The Maximum Box Problem and Its Application to Data Analysis”, Computational Optimization and Application, 23(3), 285–298.
EICK, C.F., ZEIDAT, N., and ZHAO, Z. (2004), “Supervised Clustering - Algorithms and Benefits”, in Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 774–776.
FAYOLLE, P., PASKO, A., SCHMITT, B., and MIRENKOV, N. (2005), “Constructive Heterogeneous Object Modeling Using Signed Approximate Real Distance Functions”, Journal of Computing and Information Science in Engineering, 6, 221–229.
FELICI, G., SIMEONE, B., and SPINELLI, V. (2010),“Classification Techniques and Error Control in Logic Mining”, in Data Mining, Annals of Information Systems, Vol 8, eds. R. Stahlbock, S.F. Crone, and S. Lessmann, Springer, pp. 99–119.
GOLUBITSKY, O., MAZALOV, V., and WATT, S. (2012), “An Algorithm to Compute the Distance from a Point to a Simplex”, ACM Communications in Computer Algebra, 46(2-180), 57.
GOLUMBIC, M.C. (1980), Algorithmic Graph Theory and Perfect Graphs. Computer Science and Applied Mathematics, New York: Academic Press.
HELLY, E. (1923), “Über Mengen Konvexer Körper mit Gemeinschaftlichen Punkten”, Jahresbericht der Deutschen Mathematiker-Vereinigung, pp. 175–176.
KANEKO, A., and KANO, M. (2003), “Discrete Geometry on Red and Blue Points in the Plane – A Survey”, in Discrete and Computational Geometry, Springer, pp. 551–570.
KANTARDZIC, M. (2002), Data Mining: Concepts, Models, Methods and Algorithms, New York NY: John Wiley and Sons, Inc.
KLEIN, E., LANGETEPE, E., and NILFOROUSHAN, Z. (2009), “Abstract Voronoi Diagrams Revisited”, Computational Geometry, 42(9), 885–902.
KRIVELEVICH, M., and SUDAKOV, B. (2003), “Approximate Coloring of Uniform Hypergraphs”, Journal of Algorithms, 49, 2–12.
KULIS, B. (2013), “Metric Learning: A Survey”, Foundations and Trends in Machine Learning, 5(4), 287–364.
LEIGHTON, F. (1979), “A Graph Coloring Algorithm for Large Scheduling Problems”, Journal of Research of the National Bureau of Standards, 84(6), 489–506.
MACHINE LEARNING GROUP (2013), “WEKA - Data Mining Software in Java”, University of Waikato, New Zealand, http://www.cs.waikato.ac.nz/ml/weka.
PREPARATA, F., and HONG, S. (1977), “Convex Hulls of Finite Sets of Points in Two and Three Dimensions”, Communications of the ACM, 20, 87–93.
RADON, J. (1921), “Mengen Konvexer Körper, Die Einen Gemeinsamen Punkt Enthalten”, Mathematische Annalen, 83, 113–115.
SOGAARD, A. (2013) ,Semi-Supervised Learning and Domain Adaptation in Natural Language Processing, San Rafael: Morgan and Claypool.
SPINELLI, V. (2016), “Pruning Boxes in a Box-Based Classification Method”, Advances in Data Analysis and Classification,10(3), 285–304.
SPINELLI, V. (2017), “Supervised Box Clustering”, Advances in Data Analysis and Classification, 11(1), 179–204.
WITTEN, I., FRANK, E., and HALL, M. (2011), Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.), Morgan Kaufmann.
ZHAO, M., EDAKUNNI, N., POCOCK, A., and BROWN, G. (2013), “Beyond Fanos Inequality: Bounds on the Optimal F-Score, Ber, and Cost-Sensitive Risk and Their Implications”, Journal of Machine Learning Research, 14, 1033–1090.
ZHOU, D., HUANG, J., and SCHÖLKOPF, B. (2006), “Learning with Hypergraphs: Clustering, Classification, and Embedding”, in Advances in Neural Information Processing Systems (NIPS) 19, MIT Press, pp. 1601–1608.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Spinelli, V. Extended Box Clustering for Classification Problems. J Classif 35, 100–123 (2018). https://doi.org/10.1007/s00357-018-9253-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-018-9253-2