Lessons Learned with Laser Scanning Point Cloud Management in Hadoop HBase

  • Anh-Vu Vo
  • Nikita Konda
  • Neel Chauhan
  • Harith Aljumaily
  • Debra F. LaeferEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10863)


While big data technologies are growing rapidly and benefit a wide range of science and engineering domains, many barriers remain for the remote sensing community to fully exploit the benefits provided by these powerful and rapidly developing technologies. To overcome existing barriers, this paper presents the in-depth experience gained when adopting a distributed computing framework – Hadoop HBase – for storage, indexing, and integration of large scale, high resolution laser scanning point cloud data. Four data models were conceptualized, implemented, and rigorously investigated to explore the advantageous features of distributed, key-value database systems. In addition, the comparison of the four models facilitated the reassessment of several well-known point cloud management techniques founded in traditional computing environments in the new context of a distributed, key-value database. The four models were derived from two row-key designs and two columns structures, thereby demonstrating various considerations during the development of a data solution for high-resolution, city-scale aerial laser scan for a portion of Dublin, Ireland. This paper presents lessons learned from the data model design and its implementation for spatial data management in a distributed computing framework. The study is a step towards full exploitation of powerful emerging computing assets for dense spatio-temporal data.


LiDAR Point cloud Big Data Spatial data management Hadoop HBase Distributed database 



The Hadoop cluster used for the work presented in this paper was provided by allocation TG-CIE170036 - Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562 [30]. The authors would like to thank the staff at Pittsburg Supercomputing Center for the truly outstanding technical support provided during setting up the testing. This research also made use of data collected with funding from the European Research Council grant ERC-2012-StG 20111012 “RETURN - Rethinking Tunnelling in Urban Neighbourhoods” Project 307836.

The dataset is available from NYU Spatial Data Repository


  1. 1.
    Vo, A.V., Laefer, D.F., Bertolotto, M.: Airborne laser scanning data storage and indexing: state of the art review. Int. J. Remote Sens. 37(24), 6187–6204 (2016). Scholar
  2. 2.
    Kitchin, R., McArdle, G.: What makes Big Data, Big Data? exploring the ontological characteristics of 26 datasets. Big Data Soc. 3(1), 1–10 (2016). Scholar
  3. 3.
    Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the 19th ACM Symposium Operating Systems Principles, New York, pp. 29–43 (2003)Google Scholar
  4. 4.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2004). Scholar
  5. 5.
    White, T.: Hadoop The Definitive Guide, 4th ed. O’Reilly, Massachusetts (2015)Google Scholar
  6. 6.
    Chang, F., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4 (2008). Scholar
  7. 7.
    George, L.: HBase The Definitive Guide, 1st edn. O’Reilly, Massachusetts (2011)Google Scholar
  8. 8.
    Middleton, W., Spilhaus, A.: The measurement of atmospheric humidity. In: Meteorological Instruments, Toronto, pp. 105–111 (1953)Google Scholar
  9. 9.
    Shepherd, E.C.: Laser to watch height: New Scientist, vol. 6, no. 437, p. 33 (1965)Google Scholar
  10. 10.
    van Oosterom, P., et al.: Massive point cloud data management: design, implementation and execution of a point cloud benchmark. Comput. & Graph. 49, 92–125 (2015). Scholar
  11. 11.
    Cura, R., Perret, J., Paparoditis, N.: A scalable and multi-purpose point cloud server (PCS) for easier and faster point cloud data management and processing. ISPRS J. Photogramm. Remote Sens. 127, 39–56 (2017). Scholar
  12. 12.
    Krishnan, S., Baru, C., Crosby, C.: Evaluation of MapReduce for gridding LIDAR data. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp. 33–40 (2010).
  13. 13.
    Li, Z., Hodgson, M.E., Li, W.: A general-purpose framework for parallel processing of large-scale LiDAR data, vol. 8947. Int. J. Digit Earth 11(1), 26–47 (2017). Scholar
  14. 14.
    Rizki, P.N.M., Eum, J., Lee, H., Oh, S.: Spark-based in-memory DEM creation from 3D LiDAR point clouds. Remote Sens. Lett. 8(4), 360–369 (2017). Scholar
  15. 15.
    Hamraz, H., Contreras, M.A., Zhang, J.: A scalable approach for tree segmentation within small-footprint Airborne LiDAR data. Comput. Geosci. 8(4), 360–369 (2017). Scholar
  16. 16.
    Aljumaily, H., Laefer, D.F., Cuadra, D.: Urban point cloud mining based on density clustering and MapReduce. J. Comput. Civ. Eng. 31(5) (2017). Scholar
  17. 17.
    Moler, C.: Matrix computation on distributed memory multiprocessors. In: Hypercube Multiprocessors 1986, pp. 181–195 (1987)Google Scholar
  18. 18.
    Baumann, P., et al.: Big Data analytics for Earth sciences: the EarthServer approach. Int. J. Digit. Earth 9(1), 3–29 (2015). Scholar
  19. 19.
    Boehm, J., Liu, K.: NoSQL for storage and retrieval of large LiDAR data collections. In: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XL-3/W3, pp. 577–582, La Grande Motte (2015)CrossRefGoogle Scholar
  20. 20.
    Martinez-Rubi, O., et al.: Benchmarking and improving point cloud data management in MonetDB. SIGSPATIAL Special - Big Spatial 6(2), 11–18 (2014). Scholar
  21. 21.
    Gertz, M., Renz, M., Zhou, X., Hoel, E., Ku, W.-S., Voisard, A., Zhang, C., Chen, H., Tang, L., Huang, Y., Lu, C.-T., Ravada, S. (eds.): SSTD 2017. LNCS, vol. 10411. Springer, Cham (2017). Scholar
  22. 22.
    Mosa, A.S.M., Schön, B., Bertolotto, M., Laefer, D.F.: Evaluating the benefits of octree-based indexing for LiDAR data. Photogramm. Eng. Remote Sens. 78(9), 927–934 (2012). Scholar
  23. 23.
    Ramsey, P.: LiDAR in PostgreSQL with PointCloud. In: FOSS4G, Nottingham (2013)Google Scholar
  24. 24.
    Nandigam, V., Baru, C., Crosby, C.: Database design for high-resolution LIDAR topography data. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 151–159. Springer, Heidelberg (2010). Scholar
  25. 25.
    Murray, C., et al.: Oracle Spatial and Graph - developer’ s guide, 12c Release 1 (2017).
  26. 26.
    Vo, A.-V.: Spatial data storage and processing strategies for urban laser scanning. Ph.D. thesis. University College Dublin (2017).
  27. 27.
    Haverkort, H., van Walderveen, F.: Locality and bounding-box quality of two-dimensional space-filling curves. Comput. Geom. 43(2), 131–147 (2008). Scholar
  28. 28.
    Wang, J., Shan, J.: Space-filling curve based point clouds index. In: Proceedings of the 8th International Conference on GeoComputation, Michigan (2005)Google Scholar
  29. 29.
    Psomadaki, S., van Oosterom, P.J.M., Tijssen, T.P.M., Baart, F.: Using a space filling curve approach for the management of dynamic point clouds. In: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, IV-2/W1, pp. 107–118 (2016). Scholar
  30. 30.
    Towns J., Cockerill T., Dahan M., Foster I., Gaither K., Grimshaw A., Hazlewood V., Lathrop S., Lifka D., Peterson G.D., Roskies R., Scott J.R., Wilkins-Diehr N.: XSEDE: accelerating scientific discovery. Comput. Sci. Eng. 16(5), 62–74 (2014). Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.New York UniversityBrooklynUSA
  2. 2.University at BuffaloBuffaloUSA
  3. 3.Carlos III University of MadridMadridSpain

Personalised recommendations