Mobile Networks and Applications

, Volume 20, Issue 4, pp 521–532 | Cite as

Multi-modal Similarity Retrieval with Distributed Key-value Store

Article

Abstract

We propose a system architecture for large-scale similarity search in various types of digital data. The architecture combines contemporary highly-scalable distributed data stores with recent efficient similarity indexes and also with other types of search indexes. The system enables various types of data access by distance-based similarity queries, standard term and attribute queries, and advanced queries combining several search aspects (modalities). The first part of this work describes the generic architecture and similarity index PPP-Codes, which is suitable for our system. In the second part, we describe two specific instances of this architecture that manage two large collections of digital images and provide content-based visual search, keyword search, attribute-based access, and their combinations. The first collection is the CoPhIR benchmark with 106 million images accessed by MPEG7 visual descriptors and the second collection contains 20 million images with complex features obtained from deep convolutional neural network.

Keywords

Similarity search Multi-modal search Big Data Scalability Distributed hash table 

Notes

Acknowledgements

This work was supported by the Czech Research Foundation project P103/12/G084.

References

  1. 1.
    Amato G, Gennaro C, Savino P (2012) MI-File: Using inverted files for scalable approximate similarity search. Multimed Tools Appl:1–30Google Scholar
  2. 2.
    Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: A survey. Multimed Syst 16:345–379CrossRefGoogle Scholar
  3. 3.
    Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2010) Building a web-scale image similarity search system. Multimed Tools Appl 47(3):599–629CrossRefGoogle Scholar
  4. 4.
    Batko M, Kohoutkova P, Novak D (2009) CoPhIR Image Collection under the Microscope. In: Proceedings of SISAP 2009, pp. 47–54. IEEE Computer SocietyGoogle Scholar
  5. 5.
    Batko M, Novak D, Falchi F, Zezula P (2006) On scalability of the similarity search in the world of peers. In: Proceedings of InfoScale ’06. ACM Press, New York, p 12Google Scholar
  6. 6.
    Batko M, Novak D, Zezula P (2007) MESSIF: Metric Similarity Search Implementation Framework. In: Digital Libraries: Research and Development, vol. LNCS 4877. Springer, pp 1–10Google Scholar
  7. 7.
    Bolettieri P, Esuli A, Falchi F, Lucchese C, Perego R, Piccioli T, Rabitti F (2009) CoPhIR: A Test Collection for Content-Based Image Retrieval. CoRR abs/0905.4Google Scholar
  8. 8.
    Budikova P, Batko M, Zezula P (2011) Evaluation Platform for Content-based Image Retrieval Systems. In: International Conference on Theory and Practice of Digital Libraries, LNCS. Springer Berlin, Heidelberg, pp 130–142Google Scholar
  9. 9.
    Budikova P, Batko M, Zezula P (2012) Query language for complex similarity queries. In: Advances in Databases and Information Systems, LNCS. Springer Berlin , Heidelberg, pp 85–98CrossRefGoogle Scholar
  10. 10.
    Chávez E, Figueroa K, Navarro G (2008) Effective Proximity Retrieval by Ordering Permutations. IEEE Trans Pattern Anal Mach Intell 30(9):1647–1658CrossRefGoogle Scholar
  11. 11.
    DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon Highly Available Key-value Store. ACM SIGOPS Oper Syst Rev 41(6):205–220CrossRefGoogle Scholar
  12. 12.
    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A Deep Convolutional Activation Feature for Generic Visual RecognitionGoogle Scholar
  13. 13.
    Esuli A (2012) Use of permutation prefixes for efficient and scalable approximate similarity search. Inf Process Manag 48(5):889–902CrossRefGoogle Scholar
  14. 14.
    Gil-Costa V, Marin M (2011) Approximate Distributed Metric-Space Search. In: Proceedings of LSDS-IR ’11, Glasgow, UK, October 28. ACM Press, New York, pp 15–20Google Scholar
  15. 15.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014). Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv:1408.5093
  16. 16.
    Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees. In: Proceedings of STOC ’97. ACM Press, New York, pp 654–663Google Scholar
  17. 17.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Adv Neural Inf Process Syst:1106–1114Google Scholar
  18. 18.
    Lu W, Shen Y, Chen S, Ooi B (2012) Efficient processing of k nearest neighbor joins using mapreduce. Proceedings of the VLDB Endowment:1016–1027Google Scholar
  19. 19.
    Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2012) Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces. In: Similarity Search and Applications, Lecture Notes in Computer Science, vol 7404. Springer Berlin, Heidelberg, pp 132–147Google Scholar
  20. 20.
    Moise D, Shestakov D, Gudmundsson G, Amsaleg L (2013) Terabyte-scale Image Similarity Search: Experience and Best Practice. In: 2013 IEEE International Conference on Big Data, pp. 674–682Google Scholar
  21. 21.
    MPEG-7 (2002) Multimedia content description interfaces. Part 3: Visual. ISO/IEC 2002:15938–3Google Scholar
  22. 22.
    Novak D, Batko M, Zezula P (2011) Metric Index: An Efficient and Scalable Solution for Precise and Approximate Similarity Search. Inf Syst 36(4):721–733CrossRefGoogle Scholar
  23. 23.
    Novak D, Batko M, Zezula P (2012) Large-scale similarity data management with distributed Metric Index. Inf Process Manag 48(5):855–872CrossRefGoogle Scholar
  24. 24.
    Novak D, Zezula P (2006) M-Chord: A Scalable Distributed Similarity Search Structure. In: Proceedings of InfoScale ’06. ACM Press, New York, pp 1–10Google Scholar
  25. 25.
    Novak D, Zezula P (2014) Rank Aggregation of Candidate Sets for Efficient Similarity Search. In: Database and Expert Systems Applications: 25th International Conference, DEXA 2014. Proceedings, Part II, LNCS, vol 8645. Springer, pp 42–58Google Scholar
  26. 26.
    Patella M, Ciaccia P (2009) Approximate similarity search: A multi-faceted problem. J Discrete Alg 7(1):36–48MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Silva YN, Pearson SS, Cheney JA (2013) Database Similarity Join for Metric Spaces. In: Similarity Search and Applications, pp. 266–279Google Scholar
  28. 28.
    Silva YN, Reed JM (2012) Exploiting MapReduce-based similarity joins. In: Proceedings of SIGMOD ’12. ACM Press, New York, p 693Google Scholar
  29. 29.
    Wan J, Wang D, Hoi S, Wu P, Zhu J, Zhang Y, Li J (2014) Deep Learning for Content-Based Image Retrieval: A Comprehensive Study. In: Proceedings of 22nd ACM International Conference on MultimediaGoogle Scholar
  30. 30.
    Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity Search: The Metric Space Approach, Advances in Database Systems, vol 32. SpringerGoogle Scholar
  31. 31.
    Zezula P, Savino P, Amato G, Rabitti F (1998) Approximate similarity retrieval with M-Trees. VLDB J 7(4):275–293CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Masaryk UniversityBrnoCzech Republic

Personalised recommendations