Skip to main content
Log in

Accessing very high dimensional spaces in parallel

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Access methods are a fundamental tool on Information Retrieval. However, most of these methods suffer the problem known as the curse of dimensionality when they are applied to objects with very high dimensionality representation spaces, such as text documents. In this paper we introduce a new parallel access method that uses several graphs as distributed index structure and a kNN search algorithm. Two parallel versions of the search method are presented, one based on master–slave scheme and the other based on a pipeline. A thorough experimental analysis on different datasets shows that our method can process efficiently large flows of queries, compete with other parallel algorithms and obtain at the same time very high quality results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ares LG, Brisaboa NR, Pereira AO, Pedreira O (2012) Efficient similarity search in metric spaces with cluster reduction. In: Proceedings Similarity Search and Applications—5th International Conference, SISAP 2012, Toronto, ON, Canada, August 9–10, 2012, pp 70–84

  2. Artigas-Fuentes FJ, Gil-García R, Badía-Contelles JM (2010) A high-dimensional access method for approximated similarity search in text mining. In: 20th International Conference on Pattern Recognition, ICPR 2010, Istanbul, Turkey, 23–26 August 2010, pp 3155–3158

  3. Artigas-Fuentes FJ, Gil-García R, Badía-Contelles JM, Pons-Porrata A (2010) Fast k-nn classifier for documents based on a graph structure. In: Proceedings Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications—15th Iberoamerican Congress on Pattern Recognition, CIARP 2010, Sao Paulo, Brazil, November 8–11, 2010, pp 228–235

  4. Aydin B (2014) Parallel algorithms on nearest neighbor search. Survey paper, Georgia State University

  5. Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern Information Retrieval—the concepts and technology behind search, 2nd edn. Pearson Education Ltd., England

    Google Scholar 

  6. Barrientos RJ, Gómez JI, Tenllado C, Prieto-Matías M, Marín M (2011) kNN query processing in metric spaces using GPUs. In: Proceedings Euro-Par 2011 Parallel Processing—17th International Conference, Euro-Par 2011, Bordeaux, France, August 29–September 2, 2011. Part I, pp 380–392

  7. Barrientos RJ, Gómez JI, Tenllado C, Prieto-Matías M, Marín M (2013) Range query processing on single and multi GPU environments. Comput Electr Eng 39(8):2656–2668

    Article  Google Scholar 

  8. Chávez E, Navarro G (2005) A compact space decomposition for effective metric indexing. Pattern Recognit Lett 26(9):1363–1376

    Article  Google Scholar 

  9. Costa VG, Barrientos RJ, Marín M, Bonacic C (2010) Scheduling metric-space queries processing on multi-core processors. In: Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010, Pisa, Italy, February 17–19, 2010, pp 187–194

  10. Costa VG, Marín M (2008) Distributed sparse spatial selection indexes. 16th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), 13–15 February 2008. Toulouse, France, pp 440–444

  11. Costa VG, Marín M, Reyes N (2009) Parallel query processing on distributed clustering indexes. J Discrete Algorithms 7(1):3–17

    Article  MathSciNet  MATH  Google Scholar 

  12. Dashti A (2013) Efficient computation of k-nearest neighbor graphs for large high-dimensional data sets on GPU clusters. Master’s thesis, University of Wisconsin-Milwaukee, Paper 280

  13. Dong W (2011) High-dimensional similarity search for large datasets. Ph.D. thesis, Department of Computer Science, Princeton University

  14. Garcia V, Nielsen F (2009) Searching high-dimensional neighbours: CPU-based tailored data-structures versus GPU-based brute-force method. In: MIRAGE. 4th International Conference on Computer Vision/Computer Graphics Collaboration Techniques, pp 425–436

  15. Kamble A (2014) Survey of text categorization techniques. IJRCCT 3(7):720–723

    Google Scholar 

  16. Lewis DD, Yang Y, Rose TG, Li F (2004) RCV1: A new benchmark collection for text categorization research. J Mach Learn Res 5:361–397

    Google Scholar 

  17. Marín M, Reyes N (2005) Efficient parallelization of spatial approximation trees. In: Proceedings Computational Science—ICCS 2005, 5th International Conference, Atlanta, GA, USA, May 22–25, 2005, Part I, pp 1003–1010

  18. Marín M, Uribe R, Barrientos RJ (2007) Searching and updating metric space databases using the parallel EGNAT. In: Proceedings Computational Science—ICCS 2007, 7th International Conference Beijing, China, May 27–30, 2007, Part I, pp 229–236

  19. Naidan B, Boytsov L, Nyberg E (2015) Permutation search methods are efficient, yet faster search is possible. PVLDB 8(12):1618–1629

    Google Scholar 

  20. Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In: Proceedings 19th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2011 (November), pp 1–4 (2011) Chicago, IL, USA, pp 211–220

  21. Paredes R (2008) Graph for metric space searching. Ph.D. thesis, Universidad de Chile

  22. Radovanovic M, Nanopoulos A, Ivanovic M (2010) Hubs in space: Popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This research has been supported by the CICYT project TIN2014-53495-R of the Ministerio de Economía y Competitividad.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. M. Badía.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Artigas-Fuentes, F.J., Badía, J.M. Accessing very high dimensional spaces in parallel. J Supercomput 73, 176–189 (2017). https://doi.org/10.1007/s11227-016-1673-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1673-3

Keywords

Navigation