Advertisement

Journal of Geographical Systems

, Volume 21, Issue 2, pp 211–235 | Cite as

HadoopTrajectory: a Hadoop spatiotemporal data processing extension

  • Mohamed BakliEmail author
  • Mahmoud Sakr
  • Taysir Hassan A. Soliman
Original Article
  • 499 Downloads

Abstract

The recent advances in location tracking technologies and the widespread use of location-aware applications have resulted in big datasets of moving object trajectories. While there exists a couple of research prototypes for moving object databases, there is a lack of systems that can process big spatiotemporal data. This work proposes HadoopTrajectory, a Hadoop extension for spatiotemporal data processing. The extension adds spatiotemporal types and operators to the Hadoop core. These types and operators can be directly used in MapReduce programs, which gives the Hadoop user the possibility to write spatiotemporal data analytics programs. The storage layer of Hadoop, the HDFS, is extended by types to represent trajectory data and their corresponding input and output functions. It is also extended by file splitters and record readers. This enables Hadoop to read big files of moving object trajectories such as vehicle GPS tracks and split them over worker nodes for distributed processing. The storage layer is also extended by spatiotemporal indexes that help filtering the data before splitting it over the worker nodes. Several data access functions are provided so that the MapReduce layer can deal with this data. The MapReduce layer is extended with trajectory processing operators, to compute for instance the length of a trajectory in meters. This paper describes the extension and evaluates it using a synthetic dataset and a real dataset. Comparisons with non-Hadoop systems and with standard Hadoop are given. The extension accounts for about 11,601 lines of Java code.

Keywords

Spatiotemporal Hadoop 3DR-tree Trajectory data management Big data 

JEL Classification

C6 C8 R4 R53 L86 O3 

Notes

References

  1. Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proc VLDB Endow 6(11):1009.  https://doi.org/10.14778/2536222.2536227 CrossRefGoogle Scholar
  2. Alarabi L, Mokbel M (2017) A demonstration of st-hadoop: a mapreduce framework for big spatio-temporal data. Proc VLDB Endow 10(12):1961CrossRefGoogle Scholar
  3. Bakli MS, Sakr MA, Soliman THA (2018) A spatiotemporal algebra in hadoop for moving objects. Geo-spatial Inf Sci 21(2):102.  https://doi.org/10.1080/10095020.2017.1413798 CrossRefGoogle Scholar
  4. De Almeida VT, Güting RH (2005) Indexing the trajectories of moving objects in networks. Geoinformatica 9(1):33.  https://doi.org/10.1007/s10707-004-5621-7 CrossRefGoogle Scholar
  5. Düntgen C, Behr T, Güting RH (2009) Berlinmod: a benchmark for moving object databases. VLDB J 18(6):1335.  https://doi.org/10.1007/s00778-009-0142-5 CrossRefGoogle Scholar
  6. Eldawy A, Mokbel MF (2015) In: 2015 IEEE 31st international conference on data engineering, pp 1352–1363. https://doi.org/10.1109/ICDE.2015.7113382
  7. Environmental systems research institute (ESRI). https://www.esri.com/. Accessed 15 Apr 2018
  8. Forlizzi L, Güting RH, Nardelli E, Schneider M (2000) In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00. ACM, New York, pp 319–330. https://doi.org/10.1145/342009.335426
  9. Fox A, Eichelberger C, Hughes J, Lyon S (2013) In: 2013 IEEE international conference on big data, pp 291–299. https://doi.org/10.1109/BigData.2013.6691586
  10. Frentzos E (2003) In: Proceedings of the 8th international symposium on spatial and temporal databases (SSTD). Springer, pp 289–305Google Scholar
  11. Grumbach S, Rigaux P, Scholl M, Segoufin L (1998) DEDALE, a spatial constraint database. Springer, Berlin, pp 38–59Google Scholar
  12. Gting RH, Behr T, Almeida V, Ding Z, Hoffmann F, Spiekermann M (2004) Secondo: an extensible dbms architecture and prototype. Technical report, FernUni-HagenGoogle Scholar
  13. Güting RH, Lu J (2015) Parallel secondo: scalable query processing in the cloud for non-standard applications. SIGSPATIAL Spec 6(2):3.  https://doi.org/10.1145/2744700.2744701 CrossRefGoogle Scholar
  14. Güting RH, Böhlen MH, Erwig M, Jensen CS, Lorentzos NA, Schneider M, Vazirgiannis M (2000) A foundation for representing and querying moving objects. ACM Trans Database Syst 25(1):1.  https://doi.org/10.1145/352958.352963 CrossRefGoogle Scholar
  15. Hadoop http://hadoop.apache.org/. Accessed 15 Apr 2018
  16. Ma Q, Yang B, Qian W, Zhou A (2009) In: Proceedings of the first international workshop on cloud data management, CloudDB ’09. ACM, New York, pp 9–16. https://doi.org/10.1145/1651263.1651266
  17. Nidzwetzki JK, Güting RH (2015) In: Advances in spatial and temporal databases. Springer, pp 491–496Google Scholar
  18. Pelekis N, Theodoridis Y, Vosinakis S, Panayiotopoulos T (2006) Hermes—a framework for location-based data management. Springer, Berlin, pp 1130–1134Google Scholar
  19. Pfoser D, Jensen CS, Theodoridis Y (2000) In: Proceedings of the 26th international conference on very large data bases, VLDB ’00. Morgan Kaufmann Publishers Inc., San Francisco, pp 395–406. http://dl.acm.org/citation.cfm?id=645926.672019. Accessed 15 Apr 2018
  20. Taxi trajectory analytics. https://www.kaggle.com. Accessed 15 Apr 2018
  21. Theodoridis Y, Vazirgiannis M, Sellis T (1996) In: Proceedings of the third IEEE international conference on multimedia computing and systems, pp 441–448. https://doi.org/10.1109/MMCS.1996.535011
  22. Yang B, Ma Q, Qian W, Zhou A (2009) In: Proceedings of the 14th international conference on database systems for advanced applications, DASFAA ’09. Springer, Berlin, pp 768–771Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Assiut UniversityAsyutEgypt
  2. 2.Ain Shams UniversityCairoEgypt
  3. 3.Université libre de BruxellesBruxellesBelgium

Personalised recommendations