Skip to main content
Log in

Parallel bulk-loading of spatial data with MapReduce: An R-tree case

  • Published:
Wuhan University Journal of Natural Sciences

Abstract

Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapReduce combines Hilbert curve and random sampling method to parallel partition and sort spatial data, thus it balances the number of spatial data in each partition. Then the bottom-up method is introduced to simplify and accelerate the sub-index construction in each partition. Three area metrics are used to test the quality of generated index under different partitions. The extensive experiments show that the generated R-trees have the similar quality with the generated R-tree using sequential bulk-loading method, while the execution time is reduced considerably by exploiting parallelism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Apostolos P, Yannis M. Parallel bulk loading of spatial data [J]. Parallel Computing, 2003, 29: 1419–1444.

    Article  MathSciNet  Google Scholar 

  2. Roussopoulos N, Leifker D. Direct spatial search on pictorial databases using packed R-trees [C]//Proceedings of 1985 ACM SIGMOD Conference. Austin: ACM Press, 1985: 17–31.

    Google Scholar 

  3. Kamel I, Faloutsos C. On packing R-trees [C]//Proceedings of the 2nd CIKM Conference. Washington: ACM press, 1993: 490–499.

    Google Scholar 

  4. Leutenegger S T, Lopez M A, Edgington J. STR: A simple and efficient algorithm for R-tree packing [C]//Proceedings of the 13th IEEE ICDE Conference. Birmingham: IEEE Computer Society Press, 1997: 497–506.

    Google Scholar 

  5. Leutenegger S T, Nicol D M. Efficient bulk-loading of grid files [J]. IEEE Transactions on Knowledge and Data Engineering, 1997, 9: 410–420.

    Article  Google Scholar 

  6. Ciaccia P, Patella M. Bulk-loading the M-tree [C]//Proceedings of the 9th Australian Database Conference. Perth: Springer-Verlag, 1998: 15–26.

    Google Scholar 

  7. van den Bercken J, Seeger B. A generic approach to bulk loading multidimensional index structures [C]//Proceedings of the 23rd VLDB Conference. Athens: Springer-Verlag, 1997: 406–415.

    Google Scholar 

  8. Arge L, Hinrichs K H. Efficient bulk operations on dynamic R-trees [J]. Algorithmica, 2002, 1: 104–128.

    Article  MathSciNet  Google Scholar 

  9. Berchtold S, Bohm C. Improving the query performance of high-dimensional index structures by bulk load operations [C]//Proceedings of the 6th EDBT. Valencia: Springer-Verlag, 1998: 216–230.

    Google Scholar 

  10. van den Bercken J, Seeger B. An evaluation of generic bulk-loading techniques [C]//Proceedings of the 27th VLDB Conference. Rome: Springer-Verlag, 2001: 461–470.

    Google Scholar 

  11. Jeffrey D, Sanjay G. MapReduce: Simplified data processing on large clusters [J]. Communications of the ACM, 2008, 1: 1958–2008.

    Google Scholar 

  12. Cary A, Sun Zhengguo, Hristidis V, et al. Experiences on processing spatial data with MapReduce [C]//Proceedings of 2009 Statistical And Scientific Database Management. New Orleans: Springer-Verlag, 2009: 302–319.

    Google Scholar 

  13. Apache Hadoop Project. Open source software for reliable, scalable, distributed computing [EB/OL]. [2010-09-18]. http://hadoop.apache.org.

  14. Guttman A. R-tree: A dynamic index structure for spatial searching [C]//Proceedings of the 1984 ACM SIGMOD Conference. New York: ACM press, 1984: 47–57.

    Google Scholar 

  15. Lawder J K, King P J H. Using space-filling curves for multi-dimensional indexing [C]//Proceedings of the 17th British National Conferenc on Databases. London: Springer-Verlag, 2000: 20–35.

    Google Scholar 

  16. Beckmann N, Kriegel H P. The R*-tree: An efficient and robust access method for points and rectangles [C]//Proceedings of the 1990 ACM SIGMOD Conference. New York: ACM Press, 1990: 322–331.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Liu.

Additional information

Foundation item: Supported by the National High Technology Research and Development Program of China (863 Program) (2011AA12A306) and the National Natural Science Foundation of China (40801160, 60902036)

Biography: LIU Yi, male, Ph. D., research direction: massive spatial data processing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Jing, N., Chen, L. et al. Parallel bulk-loading of spatial data with MapReduce: An R-tree case. Wuhan Univ. J. Nat. Sci. 16, 513–519 (2011). https://doi.org/10.1007/s11859-011-0790-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11859-011-0790-3

Key words

CLC number

Navigation