Image Analysis in a Parameter-Free Setting

  • Yu ZhuEmail author
  • Thomas Zeugmann
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 363)


The paper proposes a new method to approximate the normalized information distance by a compression method that is particularly suited for image data. The new method is based on a video compressor. The new method is used to compute the distance matrix of all the images in the data sets considered. Moreover, the hierarchical clustering method from the R package is used to cluster the distance matrix obtained. Two different datasets are considered to demonstrate the usefulness of our new image analysis method. The results are very promising and show that one can obtain a very good clustering of the image data.


Distance Matrix Video Compressor Kolmogorov Complexity Short Program Average Linkage Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We would like to thank to the program committee and the anonymous referees for their valuable comments.


  1. 1.
    Bennett, C.H., Gács, P., Li, M., Paul M.B.V., Zurek, W.H.: Information distance. IEEE Trans. Inf. Theor. 44(4), 1407–1423 (1998)Google Scholar
  2. 2.
    Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inf. Theor. 51(4), 1523–1545 (2005)Google Scholar
  3. 3.
    Ito, K., Zeugmann, T., Zhu, Y.: Clustering the normalized compression distance for influenza virus data. In: Algorithms and Applications, volume 6060 of Lecture Notes in Computer Science, pp. 130–146. Springer, New York (2010)Google Scholar
  4. 4.
    Keogh, E., Lonardi, S., Ann, C.: Ratanamahatana. Towards parameter-free data mining. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 206–215. ACM Press, New York (2004)Google Scholar
  5. 5.
    Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metric. IEEE Trans. Inf. Theor. 50(12), 3250–3264 (2004)Google Scholar
  6. 6.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)Google Scholar
  7. 7.
    Pavlidis, T.: Limitations of content-based image retrieval, 2008. unpublished manuscript:
  8. 8.
    Russell, K.N., Do, M.T., Huff, J.C., Platnick, N.I.: Introducing spida-web: Wavelets, neural networks and internet accessibility in an image-based automated identification system. In: MacLeod, N. (eds.) Automated Taxon Identification in Systematics: Theory, Approaches and Applications, pp. 131–152. CRC Press, New York (2007)Google Scholar
  9. 9.
    Sumathi, S., Paneerselvam, S.: Computational Intelligence Paradigms Theory and Applications using MATLAB. CRC Press, New York (2010)Google Scholar
  10. 10.
    The R project for statistical computing.
  11. 11.
    Ticay-Rivas, J.R., del Pozo-Baños, M., Eberhard, W.G., Alonso, J.B., Travieso, C.M.: Spider specie identification and verification based on pattern recognition of it cobweb. Expert Syst. Appl. 40(10), 4213–4225 (2013)zbMATHCrossRefGoogle Scholar
  12. 12.
    Paul M.B.V., Frank J.B., Rudi L.C., Li, M.: Normalized information distance. In: Information Theory and Statistical Learning, pp. 45–82. Springer, New York (2008)Google Scholar
  13. 13.
    Wang, X., Ye, L., Keogh, E., Shelton, C.: Annotating historical archives of images. In: Joint Conference on Digital Libraries, pp. 341–350 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Division of Computer ScienceHokkaido UniversitySapporoJapan

Personalised recommendations