Applying Compression to Hierarchical Clustering

  • Gilad Baruch
  • Shmuel Tomi KleinEmail author
  • Dana Shapira
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11223)


Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a data compression application of hierarchical clustering with a double usage of the xoring operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.


  1. 1.
    Arthur, D., Vassilvitskii, S.: \(k\)-means++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7–9, 2007, pp. 1027–1035, 2007Google Scholar
  2. 2.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). Scholar
  3. 3.
    Bookstein, A., Klein, S.T.: Compression of correlated bit-vectors. Inf. Syst. 16(4), 387–400 (1991)CrossRefGoogle Scholar
  4. 4.
    Burrows, M. and Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation, SRC-RR-124:1–18 (1994)Google Scholar
  5. 5.
    Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). Scholar
  6. 6.
    Choueka, Y., Fraenkel, A.S., Klein, S.T., Segal, E.: Improved hierarchical bit-vector compression in document retrieval systems. In: SIGIR 1986, Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 8–10 September 1986, pp. 88–96 (1986)Google Scholar
  7. 7.
    Claude, F., Nicholson, P.K., Seco, D.: Differentially encoded search trees. In: 2012 Data Compression Conference, pp. 357–366 (2012)Google Scholar
  8. 8.
    Fraenkel, A.S., Klein, S.T.: Novel compression of sparse bit-strings – preliminary report. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO ASI Series (Series F: Computer and Systems Sciences), vol. 12, pp. 169–183. Springer, Heidelberg (1985). Scholar
  9. 9.
    Fukunaga, K., Narendra, P.M.: A branch and bound algorithms for computing \(k\)-nearest neighbors. IEEE Trans. Comput. 24(7), 750–753 (1975)CrossRefGoogle Scholar
  10. 10.
    Gálvez-López, D., Tardós, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)CrossRefGoogle Scholar
  11. 11.
    Klein, S.T., Shapira, D.: Compressed pattern matching in JPEG images. Int. J. Found. Comput. Sci. 17(6), 1297–1306 (2006)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Klein, S.T., Shapira, D.: Compressed matching for feature vectors. Theor. Comput. Sci. 638, 52–62 (2016)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE (2011)Google Scholar
  14. 14.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP International Conference on Computer Vision Theory and Applications, pp. 331–340 (2009)Google Scholar
  16. 16.
    Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)CrossRefGoogle Scholar
  17. 17.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006, New York, NY, USA, 17–22 June 2006, pp. 2161–2168 (2006)Google Scholar
  18. 18.
    Rokach, L., Maimon, O.: Clustering methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Heidelberg (2005). Scholar
  19. 19.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, 6–13 November 2011, pp. 2564–2571 (2011)Google Scholar
  20. 20.
    Trzcinski, T., Lepetit, V., Fua, P.: Thick boundaries in binary space and their influence on nearest-neighbor search. Pattern Recogn. Lett. 33(16), 2173–2180 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Gilad Baruch
    • 1
  • Shmuel Tomi Klein
    • 1
    Email author
  • Dana Shapira
    • 2
  1. 1.Department of Computer ScienceBar Ilan UniversityRamat GanIsrael
  2. 2.Department of Computer ScienceAriel UniversityArielIsrael

Personalised recommendations