Abstract
The similarity join finds all pairs of similar objects in a large collection. This search problem could be successfully divided into many sub-problems by an algorithm called Quickjoin recently. Besides, this algorithm could be extended to a wide range of application areas as it is based on metric spaces instead of vector spaces only. When the volume of a dataset reaches to a certain degree or the distance measure of the similarity is complex enough, however, Quickjoin still takes much time to accomplish the similarity join task, which leads us to develop a parallel version of the algorithm. In this paper, we present two parallel versions of the Quickjoin algorithm exploiting multi-core and many-core processors respectively as well as evaluate them. Experiments show that the parallelization of this algorithm in our many-core processor yields speedup to 22 at most compared with its non-parallel version.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10 (1966)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer (2006)
Jacox, E.H., Samet, H.: Metric Space Similarity Joins. ACM Transaction on Database Systems 33(2), 1–38 (2008)
Shim, K., Srikant, R., Agrawal, R.: High-Dimensional Similarity Joins. IEEE Transactions on Knowledge and Data Engineering 14(1), 156–171 (2002)
Böhm, C., Braunmüller, B., Krebs, F., Kriegel, H.-P.: Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data. In: SIGMOD, pp. 379–388 (2001)
Lieberman, M.D., Sankaranarayanan, J., Samet, H.: A Fast Similarity Join Algorithm Using Graphics Processing Units. In: ICDE, pp. 1111–1120 (2008)
Faloutsos, C., Lin, K.: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. In: SIGMOD, pp. 163–174 (1995)
Dohnal, V., Gennaro, C., Zezula, P.: Similarity Join in Metric Spaces Using eD-Index. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 484–493. Springer, Heidelberg (2003)
CUDA C Programming Guide: CUDA Toolkit Documentation, http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
OpenMP: An API for multi-platform shared-memory parallel programming in C/C++ and Fortran, http://www.openmp.org/
Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The Design of OpenMP Tasks. IEEE Transactions on Parallel and Distributed Systems 20(3), 404–418 (2009)
Yianilos, P.N.: Data Structures and Algorithms for Nearest Neighbour Search in General Metric Spaces. In: SODA, pp. 311–321 (1993)
Dynamic Parallelismin CUDA, http://developer.download.nvidia.com/assets/cuda/docs/TechBrief_Dynamic_Parallelism_in_CUDA_v2.pdf
CUDA C/C++ Streams and Concurrency, http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, S., Kim, O., Feng, W. (2013). Accelerating Metric Space Similarity Joins with Multi-core and Many-core Processors. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39640-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-39640-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39639-7
Online ISBN: 978-3-642-39640-3
eBook Packages: Computer ScienceComputer Science (R0)