Abstract
Analysis of contemporary Big Data collections require an effective and efficient content-based access to data which is usually unstructured. This first implies a necessity to uncover descriptive knowledge of complex and heterogeneous objects to make them findable. Second, multimodal search structures are needed to efficiently execute complex similarity queries possibly in outsourced environments while preserving privacy. Four specific research objectives to tackle the challenges are outlined and discussed. It is believed that a relevant solution of these problems is necessary for a scalable similarity search operating on Big Data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Challenges and Opportunities with Big Data. A community white paper developed by leading researchers across the United States (2014). http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf. Accessed March 2014
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval - The Concepts and Technology Behind Search. Addison-Wesley, Reading (2011)
Beecks, C., Ivanescu, A.M., Seidl, T., Martin, D., Pischke, P., Kneer, R.: Applying similarity search for the investigation of the fuel injection process. In: Ferro, A. (ed.) SISAP, pp. 117–118. ACM (2011)
Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. PVLDB 6(3), 217–228 (2013)
Chum, O., Matas, J.: Large-scale discovery of spatially related images. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 371–377 (2010)
Dhar, V.: Data science and prediction. Commun. ACM 56(12), 64–73 (2013)
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, pp. 28–36. Society for Industrial and Applied Mathematics, Philadelphia (2003). http://portal.acm.org/citation.cfmid=644108.644113
Kamara, S., Charalampos, P., Tom, R.: Dynamic searchable symmetric encryption. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 965–976 (2012)
Kamara, S., Lauter, K.: Cryptographic cloud storage. In: Sion, R., Curtmola, R., Dietrich, S., Kiayias, A., Miret, J.M., Sako, K., Sebé, F. (eds.) RLCPS, WECSR, and WLC 2010. LNCS, vol. 6054, pp. 136–149. Springer, Heidelberg (2010)
Kozak, S.: Efficiency and security in similarity cloud services. PVLDB 6(12), 1450–1455 (2013)
Kozak, S., Novak, D., Zezula, P.: Secure metric-based index for similarity cloud. In: Jonker, W., Petković, M. (eds.) SDM 2012. LNCS, vol. 7482, pp. 130–147. Springer, Heidelberg (2012)
Krulis, M., Skopal, T., Lokoc, J., Beecks, C.: Combining CPU and GPU architectures for fast similarity search. Distrib. Parallel Databases 30(3), 179–207 (2012)
Kuzu, M., Islam, M.S., Kantarcioglu, M.: Efficient similarity search over encrypted data. In: Kementsietsidis, A., Salles, M.A.V. (eds.) ICDE. IEEE Computer Society, pp. 1156–1167 (2012)
Larkey, L., Markman, A.: Processes of similarity judgment. Cogn. Sci. 29, 1061–1076 (2005)
Lokoč, J., Novák, D., Batko, M., Skopal, T.: Visual image search: feature signatures or/and global descriptors. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 177–191. Springer, Heidelberg (2012)
Marz, N., Warren, J.: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co., Shelter Island (2014)
Menezez, A., van Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography. CRR Press, Boca Raton (1997)
Novak, D., Batko, M., Zezula, P.: Large-scale similarity data management with distributed metric index. Inf. Process. Manage. 48(5), 855–872 (2012)
Salembier, P., Smith, J.: Overview of MPEG-7 multimedia description schemes and schema tools. In: Introduction to MPEG-7: Multimedia Content Description Interface (2002)
Samet, H.: Foundations Of Multidimensional And Metric Data Structures. (Computer Graphics and Geometric Modeling. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Vosniadou, S., Ortony, A.: Similarity and Analogical Reasoning. Advances in Database Systems. Cambridge University Press, New York (2003)
Yiu, M.L., Assent, I., Jensen, C.S., Kalnis, P.: Outsourced similarity search on metric data assets. IEEE Trans. Knowl. Data Eng. 24(2), 338–352 (2012)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer, New York (2006)
Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with M-trees. VLDB J. 7(4), 275–293 (1998)
Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Education, New York (2006)
Acknowledgments
This research was supported by the Czech Science Foundation project number P103/12/G084.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Zezula, P. (2015). Scalable Similarity Search for Big Data. In: Jung, J., Badica, C., Kiss, A. (eds) Scalable Information Systems. INFOSCALE 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 139. Springer, Cham. https://doi.org/10.1007/978-3-319-16868-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-16868-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16867-8
Online ISBN: 978-3-319-16868-5
eBook Packages: Computer ScienceComputer Science (R0)