Multi-modal Similarity Retrieval with a Shared Distributed Data Store

Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 139)


We propose a generic system architecture for large-scale similarity search in various types of digital data. The architecture combines contemporary highly-scalable distributed data stores with recent efficient similarity indexes and also with other types of search indexes. The system is designed to provide several types of queries – distance-based similarity queries, term-based queries, attribute queries, and advanced queries combining several search aspects (modalities). The first part of this work is devoted to the generic architecture and to description of a similarity index PPP-Codes that is suitable for our system. In the second part, we describe a specific instance of this architecture that manages a 106 million image collection providing content-based visual search, keyword search, attribute-based access, and their combinations.


Similarity search Multi-modal search Big Data Scalability 



This work was supported by Czech Research Foundation project P103/12/G084.


  1. 1.
    Amato, G., Gennaro, C., Savino, P.: MI-File: using inverted files for scalable approximate similarity search. In: Multimedia Tools and Applications, pp. 1–30 (2012)Google Scholar
  2. 2.
    Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16, 345–379 (2010)CrossRefGoogle Scholar
  3. 3.
    Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubsky, J., Zezula, P.: Building a web-scale image similarity search system. Multimed. Tools Appl. 47(3), 599–629 (2010)CrossRefGoogle Scholar
  4. 4.
    Batko, M., Kohoutkova, P., Novak, D.: CoPhIR image collection under the microscope. In: Proceedings of SISAP 2009, pp. 47–54. IEEE (2009)Google Scholar
  5. 5.
    Batko, M., Novak, D., Falchi, F., Zezula, P.: On scalability of the similarity search in the world of peers. In: Proceedings of InfoScale 2006, pp. 1–12. ACM Press, New York, USA (2006)Google Scholar
  6. 6.
    Batko, M., Novak, D., Zezula, P.: MESSIF: metric similarity search implementation framework. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: Research and Development. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  7. 7.
    Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: Cophir: a test collection for content-based image retrieval. CoRR 0905.4 (2009)Google Scholar
  8. 8.
    Chávez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008)CrossRefGoogle Scholar
  9. 9.
    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRefGoogle Scholar
  10. 10.
    Ciaccia, P., Patella, M., Zezula, P.: M-Tree: an efficient access method for similarity search in metric spaces. In: Proceedings of VLDB 1997, vol. 25, pp. 426–435 (1997)Google Scholar
  11. 11.
    DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazons highly available key-value store. ACM SIGOPS Operating Syst. Rev. 41(6), 205–220 (2007)CrossRefGoogle Scholar
  12. 12.
    Esuli, A.: Use of permutation prefixes for efficient and scalable approximate similarity search. Inf. Process. Manage. 48(5), 889–902 (2012)CrossRefGoogle Scholar
  13. 13.
    Gil-Costa, V., Marin, M.: Approximate distributed metric-space search. In: Proceedings of LSDS-IR 2011, pp. 15–20. ACM Press, New York, USA (2011)Google Scholar
  14. 14.
    Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V.: Scalable distributed algorithm for approximate nearest neighbor search problem in high dimensional general metric spaces. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 132–147. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  15. 15.
    MPEG-7: Multimedia content description interfaces. Part 3: Visual. ISO/IEC 15938–3:2002 (2002)Google Scholar
  16. 16.
    Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)CrossRefGoogle Scholar
  17. 17.
    Novak, D., Batko, M., Zezula, P.: Large-scale similarity data management with distributed Metric Index. Inf. Process. Manage. 48(5), 855–872 (2012)CrossRefGoogle Scholar
  18. 18.
    Novak, D., Zezula, P.: M-Chord: a scalable distributed similarity search structure. In: Proceedings of InfoScale 2006, pp. 1–10. ACM Press, NY, USA (2006)Google Scholar
  19. 19.
    Novak, D., Zezula, P.: Rank aggregation of candidate sets for efficient similarity search. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014, Part II. LNCS, vol. 8645, pp. 42–58. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  20. 20.
    Patella, M., Ciaccia, P.: Approximate similarity search: a multi-faceted problem. J. Discret. Algorithms 7(1), 36–48 (2009)CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search the Metric Space Approach. Advances in Database Systems, vol. 32. Springer, Heidelberg (2006) MATHGoogle Scholar

Copyright information

© Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2015

Authors and Affiliations

  1. 1.Masaryk UniversityBrnoCzech Republic

Personalised recommendations