Abstract
Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapReduce combines Hilbert curve and random sampling method to parallel partition and sort spatial data, thus it balances the number of spatial data in each partition. Then the bottom-up method is introduced to simplify and accelerate the sub-index construction in each partition. Three area metrics are used to test the quality of generated index under different partitions. The extensive experiments show that the generated R-trees have the similar quality with the generated R-tree using sequential bulk-loading method, while the execution time is reduced considerably by exploiting parallelism.
Similar content being viewed by others
References
Apostolos P, Yannis M. Parallel bulk loading of spatial data [J]. Parallel Computing, 2003, 29: 1419–1444.
Roussopoulos N, Leifker D. Direct spatial search on pictorial databases using packed R-trees [C]//Proceedings of 1985 ACM SIGMOD Conference. Austin: ACM Press, 1985: 17–31.
Kamel I, Faloutsos C. On packing R-trees [C]//Proceedings of the 2nd CIKM Conference. Washington: ACM press, 1993: 490–499.
Leutenegger S T, Lopez M A, Edgington J. STR: A simple and efficient algorithm for R-tree packing [C]//Proceedings of the 13th IEEE ICDE Conference. Birmingham: IEEE Computer Society Press, 1997: 497–506.
Leutenegger S T, Nicol D M. Efficient bulk-loading of grid files [J]. IEEE Transactions on Knowledge and Data Engineering, 1997, 9: 410–420.
Ciaccia P, Patella M. Bulk-loading the M-tree [C]//Proceedings of the 9th Australian Database Conference. Perth: Springer-Verlag, 1998: 15–26.
van den Bercken J, Seeger B. A generic approach to bulk loading multidimensional index structures [C]//Proceedings of the 23rd VLDB Conference. Athens: Springer-Verlag, 1997: 406–415.
Arge L, Hinrichs K H. Efficient bulk operations on dynamic R-trees [J]. Algorithmica, 2002, 1: 104–128.
Berchtold S, Bohm C. Improving the query performance of high-dimensional index structures by bulk load operations [C]//Proceedings of the 6th EDBT. Valencia: Springer-Verlag, 1998: 216–230.
van den Bercken J, Seeger B. An evaluation of generic bulk-loading techniques [C]//Proceedings of the 27th VLDB Conference. Rome: Springer-Verlag, 2001: 461–470.
Jeffrey D, Sanjay G. MapReduce: Simplified data processing on large clusters [J]. Communications of the ACM, 2008, 1: 1958–2008.
Cary A, Sun Zhengguo, Hristidis V, et al. Experiences on processing spatial data with MapReduce [C]//Proceedings of 2009 Statistical And Scientific Database Management. New Orleans: Springer-Verlag, 2009: 302–319.
Apache Hadoop Project. Open source software for reliable, scalable, distributed computing [EB/OL]. [2010-09-18]. http://hadoop.apache.org.
Guttman A. R-tree: A dynamic index structure for spatial searching [C]//Proceedings of the 1984 ACM SIGMOD Conference. New York: ACM press, 1984: 47–57.
Lawder J K, King P J H. Using space-filling curves for multi-dimensional indexing [C]//Proceedings of the 17th British National Conferenc on Databases. London: Springer-Verlag, 2000: 20–35.
Beckmann N, Kriegel H P. The R*-tree: An efficient and robust access method for points and rectangles [C]//Proceedings of the 1990 ACM SIGMOD Conference. New York: ACM Press, 1990: 322–331.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Supported by the National High Technology Research and Development Program of China (863 Program) (2011AA12A306) and the National Natural Science Foundation of China (40801160, 60902036)
Biography: LIU Yi, male, Ph. D., research direction: massive spatial data processing.
Rights and permissions
About this article
Cite this article
Liu, Y., Jing, N., Chen, L. et al. Parallel bulk-loading of spatial data with MapReduce: An R-tree case. Wuhan Univ. J. Nat. Sci. 16, 513–519 (2011). https://doi.org/10.1007/s11859-011-0790-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11859-011-0790-3