Skip to main content
Log in

Multi-modal Similarity Retrieval with Distributed Key-value Store

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

We propose a system architecture for large-scale similarity search in various types of digital data. The architecture combines contemporary highly-scalable distributed data stores with recent efficient similarity indexes and also with other types of search indexes. The system enables various types of data access by distance-based similarity queries, standard term and attribute queries, and advanced queries combining several search aspects (modalities). The first part of this work describes the generic architecture and similarity index PPP-Codes, which is suitable for our system. In the second part, we describe two specific instances of this architecture that manage two large collections of digital images and provide content-based visual search, keyword search, attribute-based access, and their combinations. The first collection is the CoPhIR benchmark with 106 million images accessed by MPEG7 visual descriptors and the second collection contains 20 million images with complex features obtained from deep convolutional neural network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://basho.com/riak/

  2. http://redis.io

  3. http://www.project-voldemort.com

  4. http://www.jboss.org/infinispan/

  5. http://www.mongodb.org

  6. http://couchdb.apache.org

  7. http://www.flickr.com

  8. http://www.jboss.org/infinispan/

  9. http://lucene.apache.org

  10. http://caffe.berkeleyvision.org

  11. http://disa.fi.muni.cz/profiset/

  12. http://disa.fi.muni.cz/demos/profiset-decaf/

References

  1. Amato G, Gennaro C, Savino P (2012) MI-File: Using inverted files for scalable approximate similarity search. Multimed Tools Appl:1–30

  2. Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: A survey. Multimed Syst 16:345–379

    Article  Google Scholar 

  3. Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2010) Building a web-scale image similarity search system. Multimed Tools Appl 47(3):599–629

    Article  Google Scholar 

  4. Batko M, Kohoutkova P, Novak D (2009) CoPhIR Image Collection under the Microscope. In: Proceedings of SISAP 2009, pp. 47–54. IEEE Computer Society

  5. Batko M, Novak D, Falchi F, Zezula P (2006) On scalability of the similarity search in the world of peers. In: Proceedings of InfoScale ’06. ACM Press, New York, p 12

    Google Scholar 

  6. Batko M, Novak D, Zezula P (2007) MESSIF: Metric Similarity Search Implementation Framework. In: Digital Libraries: Research and Development, vol. LNCS 4877. Springer, pp 1–10

  7. Bolettieri P, Esuli A, Falchi F, Lucchese C, Perego R, Piccioli T, Rabitti F (2009) CoPhIR: A Test Collection for Content-Based Image Retrieval. CoRR abs/0905.4

  8. Budikova P, Batko M, Zezula P (2011) Evaluation Platform for Content-based Image Retrieval Systems. In: International Conference on Theory and Practice of Digital Libraries, LNCS. Springer Berlin, Heidelberg, pp 130–142

    Google Scholar 

  9. Budikova P, Batko M, Zezula P (2012) Query language for complex similarity queries. In: Advances in Databases and Information Systems, LNCS. Springer Berlin , Heidelberg, pp 85–98

    Chapter  Google Scholar 

  10. Chávez E, Figueroa K, Navarro G (2008) Effective Proximity Retrieval by Ordering Permutations. IEEE Trans Pattern Anal Mach Intell 30(9):1647–1658

    Article  Google Scholar 

  11. DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon Highly Available Key-value Store. ACM SIGOPS Oper Syst Rev 41(6):205–220

    Article  Google Scholar 

  12. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

  13. Esuli A (2012) Use of permutation prefixes for efficient and scalable approximate similarity search. Inf Process Manag 48(5):889–902

    Article  Google Scholar 

  14. Gil-Costa V, Marin M (2011) Approximate Distributed Metric-Space Search. In: Proceedings of LSDS-IR ’11, Glasgow, UK, October 28. ACM Press, New York, pp 15–20

    Google Scholar 

  15. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014). Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv:1408.5093

  16. Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees. In: Proceedings of STOC ’97. ACM Press, New York, pp 654–663

    Google Scholar 

  17. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Adv Neural Inf Process Syst:1106–1114

  18. Lu W, Shen Y, Chen S, Ooi B (2012) Efficient processing of k nearest neighbor joins using mapreduce. Proceedings of the VLDB Endowment:1016–1027

  19. Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2012) Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces. In: Similarity Search and Applications, Lecture Notes in Computer Science, vol 7404. Springer Berlin, Heidelberg, pp 132–147

    Google Scholar 

  20. Moise D, Shestakov D, Gudmundsson G, Amsaleg L (2013) Terabyte-scale Image Similarity Search: Experience and Best Practice. In: 2013 IEEE International Conference on Big Data, pp. 674–682

  21. MPEG-7 (2002) Multimedia content description interfaces. Part 3: Visual. ISO/IEC 2002:15938–3

  22. Novak D, Batko M, Zezula P (2011) Metric Index: An Efficient and Scalable Solution for Precise and Approximate Similarity Search. Inf Syst 36(4):721–733

    Article  Google Scholar 

  23. Novak D, Batko M, Zezula P (2012) Large-scale similarity data management with distributed Metric Index. Inf Process Manag 48(5):855–872

    Article  Google Scholar 

  24. Novak D, Zezula P (2006) M-Chord: A Scalable Distributed Similarity Search Structure. In: Proceedings of InfoScale ’06. ACM Press, New York, pp 1–10

    Google Scholar 

  25. Novak D, Zezula P (2014) Rank Aggregation of Candidate Sets for Efficient Similarity Search. In: Database and Expert Systems Applications: 25th International Conference, DEXA 2014. Proceedings, Part II, LNCS, vol 8645. Springer, pp 42–58

  26. Patella M, Ciaccia P (2009) Approximate similarity search: A multi-faceted problem. J Discrete Alg 7(1):36–48

    Article  MathSciNet  MATH  Google Scholar 

  27. Silva YN, Pearson SS, Cheney JA (2013) Database Similarity Join for Metric Spaces. In: Similarity Search and Applications, pp. 266–279

  28. Silva YN, Reed JM (2012) Exploiting MapReduce-based similarity joins. In: Proceedings of SIGMOD ’12. ACM Press, New York, p 693

    Google Scholar 

  29. Wan J, Wang D, Hoi S, Wu P, Zhu J, Zhang Y, Li J (2014) Deep Learning for Content-Based Image Retrieval: A Comprehensive Study. In: Proceedings of 22nd ACM International Conference on Multimedia

  30. Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity Search: The Metric Space Approach, Advances in Database Systems, vol 32. Springer

  31. Zezula P, Savino P, Amato G, Rabitti F (1998) Approximate similarity retrieval with M-Trees. VLDB J 7(4):275–293

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Czech Research Foundation project P103/12/G084.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Novak.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Novak, D. Multi-modal Similarity Retrieval with Distributed Key-value Store. Mobile Netw Appl 20, 521–532 (2015). https://doi.org/10.1007/s11036-014-0561-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-014-0561-4

Keywords

Navigation