Abstract
We propose a system architecture for large-scale similarity search in various types of digital data. The architecture combines contemporary highly-scalable distributed data stores with recent efficient similarity indexes and also with other types of search indexes. The system enables various types of data access by distance-based similarity queries, standard term and attribute queries, and advanced queries combining several search aspects (modalities). The first part of this work describes the generic architecture and similarity index PPP-Codes, which is suitable for our system. In the second part, we describe two specific instances of this architecture that manage two large collections of digital images and provide content-based visual search, keyword search, attribute-based access, and their combinations. The first collection is the CoPhIR benchmark with 106 million images accessed by MPEG7 visual descriptors and the second collection contains 20 million images with complex features obtained from deep convolutional neural network.
Similar content being viewed by others
Notes
References
Amato G, Gennaro C, Savino P (2012) MI-File: Using inverted files for scalable approximate similarity search. Multimed Tools Appl:1–30
Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: A survey. Multimed Syst 16:345–379
Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2010) Building a web-scale image similarity search system. Multimed Tools Appl 47(3):599–629
Batko M, Kohoutkova P, Novak D (2009) CoPhIR Image Collection under the Microscope. In: Proceedings of SISAP 2009, pp. 47–54. IEEE Computer Society
Batko M, Novak D, Falchi F, Zezula P (2006) On scalability of the similarity search in the world of peers. In: Proceedings of InfoScale ’06. ACM Press, New York, p 12
Batko M, Novak D, Zezula P (2007) MESSIF: Metric Similarity Search Implementation Framework. In: Digital Libraries: Research and Development, vol. LNCS 4877. Springer, pp 1–10
Bolettieri P, Esuli A, Falchi F, Lucchese C, Perego R, Piccioli T, Rabitti F (2009) CoPhIR: A Test Collection for Content-Based Image Retrieval. CoRR abs/0905.4
Budikova P, Batko M, Zezula P (2011) Evaluation Platform for Content-based Image Retrieval Systems. In: International Conference on Theory and Practice of Digital Libraries, LNCS. Springer Berlin, Heidelberg, pp 130–142
Budikova P, Batko M, Zezula P (2012) Query language for complex similarity queries. In: Advances in Databases and Information Systems, LNCS. Springer Berlin , Heidelberg, pp 85–98
Chávez E, Figueroa K, Navarro G (2008) Effective Proximity Retrieval by Ordering Permutations. IEEE Trans Pattern Anal Mach Intell 30(9):1647–1658
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon Highly Available Key-value Store. ACM SIGOPS Oper Syst Rev 41(6):205–220
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
Esuli A (2012) Use of permutation prefixes for efficient and scalable approximate similarity search. Inf Process Manag 48(5):889–902
Gil-Costa V, Marin M (2011) Approximate Distributed Metric-Space Search. In: Proceedings of LSDS-IR ’11, Glasgow, UK, October 28. ACM Press, New York, pp 15–20
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014). Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv:1408.5093
Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees. In: Proceedings of STOC ’97. ACM Press, New York, pp 654–663
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Adv Neural Inf Process Syst:1106–1114
Lu W, Shen Y, Chen S, Ooi B (2012) Efficient processing of k nearest neighbor joins using mapreduce. Proceedings of the VLDB Endowment:1016–1027
Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2012) Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces. In: Similarity Search and Applications, Lecture Notes in Computer Science, vol 7404. Springer Berlin, Heidelberg, pp 132–147
Moise D, Shestakov D, Gudmundsson G, Amsaleg L (2013) Terabyte-scale Image Similarity Search: Experience and Best Practice. In: 2013 IEEE International Conference on Big Data, pp. 674–682
MPEG-7 (2002) Multimedia content description interfaces. Part 3: Visual. ISO/IEC 2002:15938–3
Novak D, Batko M, Zezula P (2011) Metric Index: An Efficient and Scalable Solution for Precise and Approximate Similarity Search. Inf Syst 36(4):721–733
Novak D, Batko M, Zezula P (2012) Large-scale similarity data management with distributed Metric Index. Inf Process Manag 48(5):855–872
Novak D, Zezula P (2006) M-Chord: A Scalable Distributed Similarity Search Structure. In: Proceedings of InfoScale ’06. ACM Press, New York, pp 1–10
Novak D, Zezula P (2014) Rank Aggregation of Candidate Sets for Efficient Similarity Search. In: Database and Expert Systems Applications: 25th International Conference, DEXA 2014. Proceedings, Part II, LNCS, vol 8645. Springer, pp 42–58
Patella M, Ciaccia P (2009) Approximate similarity search: A multi-faceted problem. J Discrete Alg 7(1):36–48
Silva YN, Pearson SS, Cheney JA (2013) Database Similarity Join for Metric Spaces. In: Similarity Search and Applications, pp. 266–279
Silva YN, Reed JM (2012) Exploiting MapReduce-based similarity joins. In: Proceedings of SIGMOD ’12. ACM Press, New York, p 693
Wan J, Wang D, Hoi S, Wu P, Zhu J, Zhang Y, Li J (2014) Deep Learning for Content-Based Image Retrieval: A Comprehensive Study. In: Proceedings of 22nd ACM International Conference on Multimedia
Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity Search: The Metric Space Approach, Advances in Database Systems, vol 32. Springer
Zezula P, Savino P, Amato G, Rabitti F (1998) Approximate similarity retrieval with M-Trees. VLDB J 7(4):275–293
Acknowledgements
This work was supported by the Czech Research Foundation project P103/12/G084.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Novak, D. Multi-modal Similarity Retrieval with Distributed Key-value Store. Mobile Netw Appl 20, 521–532 (2015). https://doi.org/10.1007/s11036-014-0561-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-014-0561-4