Skip to main content

Scalable Similarity Search for Big Data

Challenges and Research Objectives

  • Conference paper
  • First Online:
Scalable Information Systems (INFOSCALE 2014)

Abstract

Analysis of contemporary Big Data collections require an effective and efficient content-based access to data which is usually unstructured. This first implies a necessity to uncover descriptive knowledge of complex and heterogeneous objects to make them findable. Second, multimodal search structures are needed to efficiently execute complex similarity queries possibly in outsourced environments while preserving privacy. Four specific research objectives to tackle the challenges are outlined and discussed. It is believed that a relevant solution of these problems is necessary for a scalable similarity search operating on Big Data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Challenges and Opportunities with Big Data. A community white paper developed by leading researchers across the United States (2014). http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf. Accessed March 2014

  2. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval - The Concepts and Technology Behind Search. Addison-Wesley, Reading (2011)

    Google Scholar 

  3. Beecks, C., Ivanescu, A.M., Seidl, T., Martin, D., Pischke, P., Kneer, R.: Applying similarity search for the investigation of the fuel injection process. In: Ferro, A. (ed.) SISAP, pp. 117–118. ACM (2011)

    Google Scholar 

  4. Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. PVLDB 6(3), 217–228 (2013)

    Google Scholar 

  5. Chum, O., Matas, J.: Large-scale discovery of spatially related images. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 371–377 (2010)

    Article  Google Scholar 

  6. Dhar, V.: Data science and prediction. Commun. ACM 56(12), 64–73 (2013)

    Article  Google Scholar 

  7. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, pp. 28–36. Society for Industrial and Applied Mathematics, Philadelphia (2003). http://portal.acm.org/citation.cfmid=644108.644113

  8. Kamara, S., Charalampos, P., Tom, R.: Dynamic searchable symmetric encryption. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 965–976 (2012)

    Google Scholar 

  9. Kamara, S., Lauter, K.: Cryptographic cloud storage. In: Sion, R., Curtmola, R., Dietrich, S., Kiayias, A., Miret, J.M., Sako, K., Sebé, F. (eds.) RLCPS, WECSR, and WLC 2010. LNCS, vol. 6054, pp. 136–149. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Kozak, S.: Efficiency and security in similarity cloud services. PVLDB 6(12), 1450–1455 (2013)

    Google Scholar 

  11. Kozak, S., Novak, D., Zezula, P.: Secure metric-based index for similarity cloud. In: Jonker, W., Petković, M. (eds.) SDM 2012. LNCS, vol. 7482, pp. 130–147. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Krulis, M., Skopal, T., Lokoc, J., Beecks, C.: Combining CPU and GPU architectures for fast similarity search. Distrib. Parallel Databases 30(3), 179–207 (2012)

    Article  Google Scholar 

  13. Kuzu, M., Islam, M.S., Kantarcioglu, M.: Efficient similarity search over encrypted data. In: Kementsietsidis, A., Salles, M.A.V. (eds.) ICDE. IEEE Computer Society, pp. 1156–1167 (2012)

    Google Scholar 

  14. Larkey, L., Markman, A.: Processes of similarity judgment. Cogn. Sci. 29, 1061–1076 (2005)

    Article  Google Scholar 

  15. Lokoč, J., Novák, D., Batko, M., Skopal, T.: Visual image search: feature signatures or/and global descriptors. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 177–191. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Marz, N., Warren, J.: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co., Shelter Island (2014)

    Google Scholar 

  17. Menezez, A., van Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography. CRR Press, Boca Raton (1997)

    Google Scholar 

  18. Novak, D., Batko, M., Zezula, P.: Large-scale similarity data management with distributed metric index. Inf. Process. Manage. 48(5), 855–872 (2012)

    Article  Google Scholar 

  19. Salembier, P., Smith, J.: Overview of MPEG-7 multimedia description schemes and schema tools. In: Introduction to MPEG-7: Multimedia Content Description Interface (2002)

    Google Scholar 

  20. Samet, H.: Foundations Of Multidimensional And Metric Data Structures. (Computer Graphics and Geometric Modeling. Morgan Kaufmann Publishers Inc., San Francisco (2005)

    Google Scholar 

  21. Vosniadou, S., Ortony, A.: Similarity and Analogical Reasoning. Advances in Database Systems. Cambridge University Press, New York (2003)

    Google Scholar 

  22. Yiu, M.L., Assent, I., Jensen, C.S., Kalnis, P.: Outsourced similarity search on metric data assets. IEEE Trans. Knowl. Data Eng. 24(2), 338–352 (2012)

    Article  Google Scholar 

  23. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer, New York (2006)

    Google Scholar 

  24. Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with M-trees. VLDB J. 7(4), 275–293 (1998)

    Article  Google Scholar 

  25. Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Education, New York (2006)

    Google Scholar 

Download references

Acknowledgments

This research was supported by the Czech Science Foundation project number P103/12/G084.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavel Zezula .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Zezula, P. (2015). Scalable Similarity Search for Big Data. In: Jung, J., Badica, C., Kiss, A. (eds) Scalable Information Systems. INFOSCALE 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 139. Springer, Cham. https://doi.org/10.1007/978-3-319-16868-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16868-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16867-8

  • Online ISBN: 978-3-319-16868-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics