Scalable Similarity Search for Big Data

Zezula, Pavel

doi:10.1007/978-3-319-16868-5_1

Pavel Zezula¹⁸

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 139))

Included in the following conference series:

International Conference on Scalable Information Systems

338 Accesses

Abstract

Analysis of contemporary Big Data collections require an effective and efficient content-based access to data which is usually unstructured. This first implies a necessity to uncover descriptive knowledge of complex and heterogeneous objects to make them findable. Second, multimodal search structures are needed to efficiently execute complex similarity queries possibly in outsourced environments while preserving privacy. Four specific research objectives to tackle the challenges are outlined and discussed. It is believed that a relevant solution of these problems is necessary for a scalable similarity search operating on Big Data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Challenges and Opportunities with Big Data. A community white paper developed by leading researchers across the United States (2014). http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf. Accessed March 2014
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval - The Concepts and Technology Behind Search. Addison-Wesley, Reading (2011)
Google Scholar
Beecks, C., Ivanescu, A.M., Seidl, T., Martin, D., Pischke, P., Kneer, R.: Applying similarity search for the investigation of the fuel injection process. In: Ferro, A. (ed.) SISAP, pp. 117–118. ACM (2011)
Google Scholar
Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. PVLDB 6(3), 217–228 (2013)
Google Scholar
Chum, O., Matas, J.: Large-scale discovery of spatially related images. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 371–377 (2010)
Article Google Scholar
Dhar, V.: Data science and prediction. Commun. ACM 56(12), 64–73 (2013)
Article Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, pp. 28–36. Society for Industrial and Applied Mathematics, Philadelphia (2003). http://portal.acm.org/citation.cfmid=644108.644113
Kamara, S., Charalampos, P., Tom, R.: Dynamic searchable symmetric encryption. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 965–976 (2012)
Google Scholar
Kamara, S., Lauter, K.: Cryptographic cloud storage. In: Sion, R., Curtmola, R., Dietrich, S., Kiayias, A., Miret, J.M., Sako, K., Sebé, F. (eds.) RLCPS, WECSR, and WLC 2010. LNCS, vol. 6054, pp. 136–149. Springer, Heidelberg (2010)
Chapter Google Scholar
Kozak, S.: Efficiency and security in similarity cloud services. PVLDB 6(12), 1450–1455 (2013)
Google Scholar
Kozak, S., Novak, D., Zezula, P.: Secure metric-based index for similarity cloud. In: Jonker, W., Petković, M. (eds.) SDM 2012. LNCS, vol. 7482, pp. 130–147. Springer, Heidelberg (2012)
Chapter Google Scholar
Krulis, M., Skopal, T., Lokoc, J., Beecks, C.: Combining CPU and GPU architectures for fast similarity search. Distrib. Parallel Databases 30(3), 179–207 (2012)
Article Google Scholar
Kuzu, M., Islam, M.S., Kantarcioglu, M.: Efficient similarity search over encrypted data. In: Kementsietsidis, A., Salles, M.A.V. (eds.) ICDE. IEEE Computer Society, pp. 1156–1167 (2012)
Google Scholar
Larkey, L., Markman, A.: Processes of similarity judgment. Cogn. Sci. 29, 1061–1076 (2005)
Article Google Scholar
Lokoč, J., Novák, D., Batko, M., Skopal, T.: Visual image search: feature signatures or/and global descriptors. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 177–191. Springer, Heidelberg (2012)
Chapter Google Scholar
Marz, N., Warren, J.: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co., Shelter Island (2014)
Google Scholar
Menezez, A., van Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography. CRR Press, Boca Raton (1997)
Google Scholar
Novak, D., Batko, M., Zezula, P.: Large-scale similarity data management with distributed metric index. Inf. Process. Manage. 48(5), 855–872 (2012)
Article Google Scholar
Salembier, P., Smith, J.: Overview of MPEG-7 multimedia description schemes and schema tools. In: Introduction to MPEG-7: Multimedia Content Description Interface (2002)
Google Scholar
Samet, H.: Foundations Of Multidimensional And Metric Data Structures. (Computer Graphics and Geometric Modeling. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Google Scholar
Vosniadou, S., Ortony, A.: Similarity and Analogical Reasoning. Advances in Database Systems. Cambridge University Press, New York (2003)
Google Scholar
Yiu, M.L., Assent, I., Jensen, C.S., Kalnis, P.: Outsourced similarity search on metric data assets. IEEE Trans. Knowl. Data Eng. 24(2), 338–352 (2012)
Article Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer, New York (2006)
Google Scholar
Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with M-trees. VLDB J. 7(4), 275–293 (1998)
Article Google Scholar
Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Education, New York (2006)
Google Scholar

Download references

Acknowledgments

This research was supported by the Czech Science Foundation project number P103/12/G084.

Author information

Authors and Affiliations

Masaryk University, Brno, Czech Republic
Pavel Zezula

Authors

Pavel Zezula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavel Zezula .

Editor information

Editors and Affiliations

Chung-Ang University, Seoul, Korea, Republic of (South Korea)
Jason J. Jung
University of Craiova, Craiova, Romania
Costin Badica
Eötvös Loránd University, Budapest, Hungary
Attila Kiss

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zezula, P. (2015). Scalable Similarity Search for Big Data. In: Jung, J., Badica, C., Kiss, A. (eds) Scalable Information Systems. INFOSCALE 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 139. Springer, Cham. https://doi.org/10.1007/978-3-319-16868-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-16868-5_1
Published: 07 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16867-8
Online ISBN: 978-3-319-16868-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics