Abstract
Clustering is a powerful machine learning technique that groups “similar” data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver “qbsolv.” The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.
Similar content being viewed by others
References
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6(3–4), 281 (1999)
Das, R., Saha, S.: 2016 IEEE Congress on (IEEE, 2016) in Evolutionary Computation (CEC), pp. 3124–3130
Gorzałczany, M.B., Rudzínski, F., Piekoszewski, J.: 2016 International Joint Conference on (IEEE, 2016) in Neural Networks (IJCNN), pp. 3666–3673
Marisa, L., de Reyniès, A., Duval, A., Selves, J., Gaub, M.P., Vescovo, L., Etienne-Grimaldi, M.C., Schiappa, R., Guenot, D., Ayadi, M., et al.: Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 10(5), e1001453 (2013)
Xie, P., Xing, E.P.: CoRR abs/1309.6874. http://arxiv.org/abs/1309.6874 (2013)
Balabantaray, R.C., Sarma, C., Jha, M.: CoRR abs/1502.07938. http://arxiv.org/abs/1502.07938 (2015)
Mudambi, S.: Branding importance in business-to-business markets: three buyer clusters. Ind. Mark. Manag. 31(6), 525 (2002)
Sharma, A., Lambert, D.M.: Segmentation of markets based on customer service. Int. J. Phys. Distrib. Logist. Manag. 24(4), 50–58 (1994)
Chan, K.Y., Kwong, C., Hu, B.Q.: Market segmentation and ideal point identification for new product design using fuzzy data compression and fuzzy clustering methods. Appl. Soft Comput. 12(4), 1371 (2012)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer, New York (2001)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100 (1979)
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241 (1967)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651 (2010)
Garey, M.R., Johnson, D.S.: Computers and Intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New York (1979)
Papadimitriou, C.H.: The Euclidean travelling salesman problem is NP-complete. Theor. Comput. Sci. 4(3), 237 (1977)
Al-Sultana, K.S., Khan, M.M.: Computational experience on four algorithms for the hard clustering problem. Pattern Recogn. Lett. 17(3), 295 (1996)
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P., et al.: Optimization by simulated annealing. Science 220(4598), 671 (1983)
Selim, S.Z., Alsultan, K.: A simulated annealing algorithm for the clustering problem. Pattern Recogn. 24(10), 1003 (1991)
Mitra, D., Romeo, F., Sangiovanni-Vincentelli, A.: 1985 24th IEEE Conference on Decision and Control, vol. 24, pp. 761–767. IEEE (1985)
Szu, H., Hartley, R.: Fast simulated annealing. Phys. Lett. A 122(3–4), 157 (1987)
Ingber, L.: Very fast simulated re-annealing. Math. Comput. Model. 12(8), 967 (1989)
Bouleimen, K., Lecocq, H.: A new efficient simulated annealing algorithm for the resource-constrained project scheduling problem and its multiple mode version. Eur. J. Oper. Res. 149(2), 268 (2003)
Kadowaki, T., Nishimori, H.: Quantum annealing in the transverse Ising model. Phys. Rev. E 58(5), 5355 (1998)
Santoro, G.E., Tosatti, E.: Optimization using quantum mechanics: quantum annealing through adiabatic evolution. J. Phys. A Math. Gen. 39(36), R393 (2006)
Denchev, V.S., Boixo, S., Isakov, S.V., Ding, N., Babbush, R., Smelyanskiy, V., Martinis, J., Neven, H.: What is the computational value of finite-range tunneling? Phys. Rev. X 6(3), 031015 (2016)
Born, M., Fock, V.: Beweis des Adiabatensatzes. Z. Angew. Phys. 51, 165 (1928). https://doi.org/10.1007/BF01343193
Albash, T., Lidar, D.A.: ArXiv e-prints (2016)
Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: ArXiv e-prints (2016)
Dulny, J., III, Kim, M.: ArXiv e-prints (2016)
Neven, H., Denchev, V.S., Drew-Brook, M., Zhang, J., Macready, W.G., Rose, G.: Binary classification using hardware implementation of quantum annealing. In: Demonstrations at NIPS-09, 24th Annual Conference on Neural Information Processing Systems, pp. 1–17 (2009)
Denchev, V.S.: Binary Classification with Adiabatic Quantum Optimization. Ph.D. thesis, Purdue University (2013)
Farinelli, A.: Theory and Practice of Natural Computing: 5th International Conference, TPNC 2016, Sendai, Japan, December 12–13, 2016, Proceedings, vol. 10071, p. 175. Springer (2016)
Kurihara, K., Tanaka, S., Miyashita, S.: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 321–328. AUAI Press (2009)
Sato, I., Tanaka, S., Kurihara, K., Miyashita, S., Nakagawa, H.: Quantum annealing for Dirichlet process mixture models with applications to network clustering. Neurocomputing 121, 523 (2013)
Ising, E.: Zeitschrift für Physik 31(1), 253 (1925). https://doi.org/10.1007/BF02980577
Dahl, E.D.: Programming with d-wave: map coloring problem. D-Wave Official Whitepaper (2013)
Ishikawa, H.: Transformation of general binary MRF minimization to the first-order case. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1234 (2011). https://doi.org/10.1109/TPAMI.2010.91
Booth, M., Reinhardt, S.P., Roy, A.: Partitioning optimization problems for hybrid classical/quantum execution. Technical Report, pp. 1–9 (2017)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825 (2011)
Arthur, D., Vassilvitskii, S.: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Savaresi, S.M., Boley, D.L.: Proceedings of the 2001 SIAM International Conference on Data Mining, pp. 1–14. SIAM (2001)
Cai, J., Macready, W.G., Roy, A.: arXiv preprint arXiv:1406.2741 (2014)
Guénoche, A., Hansen, P., Jaumard, B.: Efficient algorithms for divisive hierarchical clustering with the diameter criterion. J. Classif. 8(1), 5 (1991). https://doi.org/10.1007/BF02616245
Acknowledgements
We acknowledge the support of the Universities Space Research Association, Quantum AI Lab Research Opportunity Program, Cycle 2.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kumar, V., Bass, G., Tomlin, C. et al. Quantum annealing for combinatorial clustering. Quantum Inf Process 17, 39 (2018). https://doi.org/10.1007/s11128-017-1809-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11128-017-1809-2