Skip to main content
Log in

Distributed processing of big mobility data as spatio-temporal data streams

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Recent rapid development of wireless communication, mobile computing, global navigation satellite systems (GNSS), and spatially enabled sensors are leading to an exponential growth of available mobility data produced continuously at high speed. Due to these advancements, a new class of monitoring applications has come to the focus, including real-time intelligent transportation systems, traffic monitoring and mobile objects tracking. These new information flow processing (IFP) application domains need to process huge volume of mobility data arriving in the form of continuous data streams from mobile objects. IFP applications are pushing traditional database technologies beyond their limits due to their massively increasing data volumes and demands for real-time processing. Mobility data, i.e. real-time, transient, time-varying sequences of spatio-temporal data items, generated by embedded positioning sensors demonstrates at least two Big Data core features: volume and velocity. Existing distributed data stream management systems (DSMS), real-time computing systems (RTCS) and their processing models are dominantly based on relational paradigm and continuous operator model. Thus, they have rudimentary spatio-temporal capabilities, provide expensive fault recovery requiring either hot replication or long recovery times, and do not handle faults and slow nodes. The framework proposed in this paper is a cornerstone towards efficient real-time managing and monitoring of mobile objects through distributed spatio-temporal streams processing on large clusters. A prototype implementation is rooted in a new stream processing model that overcomes the challenges of current distributed stream processing models and enable seamless integration with batch and interactive processing like MapReduce.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. In some cases real-time actually means near-real-time or low latency.

  2. The source code for prototype called MobyDick is available at Bitbucket: https://bitbucket. org/DarioOsm/mobydick

  3. Our prototype, including operations on non-temporal types, relies on operations and functions of Apache Flink DataStream library [14] and JTS Topology Suite [58].

  4. Scala programming language doesn’t support interfaces, but enables multiple inheritance by implementing interfaces as traits. For this reason TemporalObject is implemented as a trait that enables TemporalPoint to inherit from two types.

  5. Watermark is special event generated at stream sources that coarsely advances event time. A watermark for time instant τ states that event time has progressed to τ in that particular stream, meaning that no events with a time instant smaller than τ can arrive any more.

  6. On the contrary, some cluster computing frameworks (Spark, Trill, etc.) wraps data streams into mini-batches, i.e., it collects all data that arrives within a certain period of time and runs a regular batch program on the collected data.

  7. Batch processing applications run efficiently as special cases of stream processing applications.

  8. This GPS dataset was collected by 182 users in a period of over five years, from April 2007 to August 2012. It contains approximately 20 million points with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours. The data were logged in over 30 cities in China, USA, and Europe.

References

  1. Aitchison A (2012) Pro spatial with SQL server 2012. Apress Media LLC, New York

    Book  Google Scholar 

  2. Akidau T, Bradshaw R, Chambers C, Chernyak S, Fernández-Moctezuma R, Lax R, McVeety S, Mills D, Perry F, Schmidt E, Whittle S (2015) The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8(12):1792–1803. http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf

    Google Scholar 

  3. Alexandrov A, Bergmann R, Ewen S, Freytag J, Hueske F, Heise A, Kao O, Leich M, Leser U, Markl V, Naumann F, Peters M, Rheinländer A, Sax MJ, Schelter S, Höger M, Tzoumas K, Warneke D (2014) The Stratosphere platform for big data analytics. VLDB J 23(6):939–964. doi:10.1007/s00778-014-0357-y

    Article  Google Scholar 

  4. Ali MH, Gerea C, Raman BS, Sezgin B, Tarnavski T, Verona T, Wang P, Zabback P, Kirilov A, Ananthanarayan A, Lu M, Raizman A, Krishnan R, Schindlauer R, Grabs T, Bjeletich S, Chandramouli B, Goldstein J, Bhat S, Li Y, Nicola V D, Wang X, Maier D, Santos I, Nano O, Grell S (2009) Microsoft CEP server and online behavioral targeting. PVLDB 2(2):1558–1561

    Google Scholar 

  5. Ali MH, Chandramouli B, Raman BS, Katibah E (2010) Spatio-temporal stream processing in Microsoft StreamInsight. IEEE Data Eng Bull 33(2):69–74

    Google Scholar 

  6. de Almeida VT, Güting RH, Behr T (2006) Querying moving objects in SECONDO. In: Mobile Data Management, pp 47–51

  7. Apache Foundation (2016a) Apache Flink . http://flink.apache.org

  8. Apache Foundation (2016b) Apache Hadoop. http://hadoop.apache.org

  9. Apache Foundation (2016c) Apache Hive. http://hive.apache.org

  10. Apache Foundation (2016d) Apache Samza. http://samza.apache.org

  11. (2016e) Apache Spark. http://spark.apache.org/

  12. Apache Foundation (2016f) Apache Spark Streaming. http://spark.apache.org/streaming

  13. Apache Foundation (2016g) Apache Storm. http://storm.apache.org

  14. Apache Foundation (2016h) Flink DataStream API Programming Guide. https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html

  15. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Popa L, Abiteboul S, Kolaitis P G (eds) PODS, ACM, pp 1–16

  16. Balazinska M, Balakrishnan H, Madden S, Stonebraker M (2008) Fault-tolerance in the Borealis distributed stream processing system. ACM Trans Database Syst 33(1):3:1–3:44

    Article  Google Scholar 

  17. Bettini C, Dyreson CE, Evans WS, Snodgrass RT, Wang XS (1997) A glossary of time granularity concepts. In: Temporal Databases, Dagstuhl, pp 406–413

  18. Biem A, Bouillet E, Feng H, Ranganathan A, Riabov A, Verscheure O, Koutsopoulos HN, Moran C (2010) IBM InfoSphere Streams for scalable, real-time, intelligent transportation services. In: Elmagarmid AK, Agrawal D (eds) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, ACM. doi:10.1145/1807167.1807291, pp 1093–1104

  19. California Center for Innovative Transportation (2015) The Mobile Millennium Project. http://traffic.berkeley.edu

  20. Chandramouli B, Goldstein J, Barnett M, DeLine R, Platt JC, Terwilliger JF, Wernsing J (2014) Trill: A high-performance incremental query processor for diverse analytics. PVLDB 8(4):401–412. http://www.vldb.org/pvldb/vol8/p401-chandramouli.pdf

    Google Scholar 

  21. Chandy KM, Lamport L (1985) Distributed snapshots: Determining global states of distributed systems. ACM Trans Comput Syst 3(1):63–75. doi:10.1145/214451.214456

    Article  Google Scholar 

  22. Chen CX (2008) Spatio-temporal query languages. In: Shekhar S, Xiong H (eds) Encyclopedia of GIS. Springer, Berlin, pp 1125–1128

  23. Commonwealth Computer Research Inc (2016) GeoMesa. http://www.geomesa.org

  24. Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. In: NSDI, USENIX Association, pp 313–328

  25. Dean J, Ghemawat S (2004) MapReduce: Simplified data processing on large clusters. In: OSDI, USENIX Association, pp 137–150

  26. Ebbers M, Abdel-Gayed A, Budhi V, Dolot F, Kamat V, Picone R, Trevelin J (2013) Addressing Data Volume, Velocity, and Variety with IBM InfoSphere Streams 3.0. IBM

  27. Eldawy A, Mokbel MF (2013) A demonstration of SpatialHadoop: An efficient MapReduce framework for spatial data. PVLDB 6(12):1230–1233

    Google Scholar 

  28. Eldawy A, Elganainy M, Bakeer A, Abdelmotaleb A, Mokbel M (2015) Sphinx: Distributed execution of interactive sql queries on big spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, New York, NY, USA, GIS ’15, pp 78:1–78:4 doi:10.1145/2820783.2820869

  29. Esper Tech Inc (2016) EsperTech. http://www.espertech.com/products/

  30. Fox A, Eichelberger C, Hughes J, Lyon S (2013) Spatio-temporal indexing in non-relational distributed databases. In: Proceedings of the 2013 IEEE International Conference on Big Data, 6-9 October 2013, Santa Clara, CA, USA, IEEE, pp 291–299. doi:10.1109/BigData.2013.6691586

  31. Franklin MJ, Krishnamurthy S, Conway N, Li A, Russakovsky A, Thombre N (2009) Continuous analytics: Rethinking query processing in a network-effect world. In: CIDR. www.crdrdb.org

  32. Galić Z, Mešković E, Križanović K, Baranović M (2012) OCEANUS: a spatio-temporal data stream system prototype. In: Proceedings of the Third ACM SIGSPATIAL International Workshop on GeoStreaming, ACM, New York, NY, USA, IWGS ’12, pp 109–115. doi:10.1145/2442968.2442982

  33. Galić Z, Baranović M, Križanović K, Mešković E (2014) Geospatial data streams: Formal framework and implementation. Data Knowl Eng 91:1–16

    Article  Google Scholar 

  34. Golab L, Özsu M T (2010) Data stream management. Synthesis lectures on data management morgan claypool publishers, San Rafael, CA

  35. Güting RH (1993) Second-order signature: A tool for specifying data models, query processing, and optimization. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26-28, 1993,ACM Press, pp 277–286. doi:10.1145/170035.170079

  36. Güting R H, Schneider M (2005) Moving objects databases. Morgan Kaufmann, San Francisco

    Google Scholar 

  37. Güting R H, Böhlen M H, Erwig M, Jensen C S, Lorentzos N A, Schneider M, Vazirgiannis M (2000) A foundation for representing and quering moving objects. ACM Trans Database Syst 25(1):1– 42

    Article  Google Scholar 

  38. Güting RH, Behr T, Düngten C (2013) Trajectory databases. In: Mobility data – modeling, management, and understanding. Cambridge University Press, New York, pp 42–61

  39. Hortonworks (2016) Magellan: Geospatial Analytics on Spark. http://hortonworks.com/blog/magellan-geospatial-analytics-in-spark

  40. Hu X, Lin TY, Raghavan VV, Wah BW, Baeza-Yates RA, Fox G, Shahabi C, Smith M, Yang Q, Ghani R, Fan W, Lempel R, Nambiar R (eds.) (2013) In: Proceedings of the 2013 IEEE International Conference on Big Data, 6-9 October 2013, Santa Clara, CA, USA, IEEE . http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6679357

  41. Huang Y, Zhang C (2008) New data types and operations to support geo-streams. In: Cova T J, Miller H J, Beard K, Frank A U, Goodchild M F (eds) GIScience, Springer, Lecture Notes in Computer Science, vol 5266, pp 106–118

  42. Hunter T, Das T, Zaharia M, Abbeel P, Bayen A M (2013) Large-scale estimation in cyberphysical systems using streaming data: a case study with arterial traffic estimation. IEEE T Automation Science and Engineering 10(4):884–898

    Article  Google Scholar 

  43. Information Management Lab – University of Piraeus (2016) HERMES. http://hermes-mod.java.net

  44. ISO 19107:2003 (2003) Geographic information – Spatial schema

  45. ISO 19108:2002 (2002) Geographic information – Temporal schema

  46. ISO 19141:2008 (2008) Geographic information – Schema for moving features

  47. ISO/IEC 13249-3:2011 (2011) Information technology – Database languages – SQL multimedia and application packages – Part 3: Spatial

  48. Jiang J, Bao H, Chang EY, Li Y (2012) MOIST: A scalable and parallel moving object indexer with school tracking. PVLDB 5(12):1838–1849. http://vldb.org/pvldb/vol5/p1838_junchenjiang_vldb2012.pdf

    Google Scholar 

  49. Kazemitabar SJ, Kashani FB, McLeod D (2011) Geostreaming in cloud. In: Ali MH, Hoel EG, Kashani FB (eds) Proceedings of the 2011 ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2011, November 1, 2011, Chicago, IL, USA, ACM, pp 3–9. doi:10.1145/2064959.2064962

  50. Kornacker M, Behm A, Bittorf V, Bobrovytsky T, Ching C, Choi A, Erickson J, Grund M, Hecht D, Jacobs M, Joshi I, Kuff L, Kumar D, Leblang A, Li N, Pandis I, Robinson H, Rorke D, Rus S, Russell J, Tsirogiannis D, Wanderman-Milne S, Yoder M (2015) Impala: A modern, open-source SQL engine for Hadoop. In: CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings, http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf

  51. Koubarakis M, Sellis TK, Frank AU, Grumbach S, Güting RH, Jensen CS, Lorentzos NA, Manolopoulos Y, Nardelli E, Pernici B, Schek HJ, Scholl M, Theodoulidis B, Tryfona N (2003) Spatio-Temporal Databases: The CHOROCHRONOS Approach, Lecture Notes in Computer Science, vol 2520, Springer

  52. Krämer J, Seeger B (2009) Semantics and implementation of continuous sliding window queries over data streams. ACM Trans Database Syst 34(1)

  53. Law YN, Wang H, Zaniolo C (2011) Relational languages and data models for continuous queries on sequences and data streams. ACM Trans Database Syst 36(2):8:1–8:32

    Article  Google Scholar 

  54. Loeckx J, Ehrich HD, Wolf M (1996) Specification of Abstract Data Types. John Wiley & Sons and B. G. Teubner

  55. Lu J, Güting RH (2013) Parallel SECONDO: Practical and efficient mobility data processing in the cloud. In: Proceedings of the 2013 IEEE International Conference on Big Data, 6-9 October 2013, Santa Clara, CA, USA, IEEE, pp 17–25. http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6679357

  56. Ma Q, Yang B, Qian W, Zhou A (2009) Query processing of massive trajectory data based on MapReduce. In: Meng X, Wang H, Chen Y (eds) Proceedings of the First International CIKM Workshop on Cloud Data Management, CloudDb 2009, Hong Kong, China, November 2, 2009, ACM, pp 9–16. doi:10.1145/1651263.1651266

  57. Mahmood AR, Aly AM, Qadah T, Rezig EK, Daghistani A, Madkour A, Abdelhamid AS, Hassan MS, Aref WG, Basalamah S (2015) Tornado: A distributed spatio-textual stream processing system. PVLDB 8 (12):2020–2031. http://www.vldb.org/pvldb/vol8/p2020-mahmood.pdf

    Google Scholar 

  58. Davis M (2016) JTS Topology Suite. http://tsusiatsoftware.net/jts/main.html

  59. Meehan J, Tatbul N, Zdonik S, Aslantas C, Çetintemel U, Du J, Kraska T, Madden S, Maier D, Pavlo A, Stonebraker M, Tufte K, Wang H (2015) S-store: Streaming meets transaction processing. PVLDB 8(13):2134–2145. http://www.vldb.org/pvldb/vol8/p2134-meehan.pdf

    Google Scholar 

  60. Miller J, Raymond M, Archer J, Adem S, Hansel L, Konda S, Luti M, Zhao Y, Teredesai A, Ali M H (2011) An extensibility approach for spatio-temporal stream processing using Microsoft StreamInsight. In: Pfoser D, Tao Y, Mouratidis K, Nascimento M A, Mokbel M F, Shekhar S, Huang Y (eds) SSTD, Springer, Lecture Notes in Computer Science, vol 6849, pp 496–501

  61. Mokbel MF, Xiong X, Hammad MA, Aref WG (2005) Continuous query processing of spatio-temporal data streams in PLACE. GeoInformatica 9(4):343–365

    Article  Google Scholar 

  62. Murray C (2014) Oracle Spatial and Graph Developer’s Guide. Oracle

  63. Murray DG, McSherry F, Isaacs R, Isard M, Barham P, Abadi M (2013) Naiad: a timely dataflow system. In: Kaminsky M, Dahlin M (eds) ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP ’13, Farmington, PA, USA, November 3-6, 2013, pp 439–455. ACM. doi:10.1145/2517349.2522738

  64. Nidzwetzki JK, Güting RH (2015) Distributed SECONDO: A highly available and scalable system for spatial data processing. In: Claramunt C, Schneider M, Wong RC, Xiong L, Loh W, Shahabi C, Li K (eds) Advances in Spatial and Temporal Databases - 14th International Symposium, SSTD 2015, Hong Kong, China, August 26-28, 2015. Proceedings, Springer, Lecture Notes in Computer Science, vol 9239, pp 491–496. doi:10.1007/978-3-319-22363-6_28

  65. Obe R, Hsu L, Ramsey P (2012) PostGIS in Action Manning Publications, Greenwich, CT

  66. Oracle (2015) Oracle Fusion Middleware – Developing Applications for Oracle CQL Data Cartridges, 12c Release 1 (12.2.1). Oracle Corporation

  67. Patroumpas K, Sellis TK (2004) Managing trajectories of moving objects as data streams. In: Sander J, Nascimento M A (eds) STDBM, pp 41–48

  68. Patroumpas K, Sellis TK (2011) Maintaining consistent results of continuous queries under diverse window specifications. Inf Syst 36(1):42–61

    Article  Google Scholar 

  69. Patroumpas K, Sellis TK (2012) Event processing and real-time monitoring over streaming traffic data. In: Martino SD, Peron A, Tezuka T (eds), vol 7236. W2GIS, Springer, Lecture Notes in Computer Science, pp 116–133

  70. Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, Zhou L, Yu Y, Zhang Z (2013) TimeStream: reliable stream computation in the cloud. In: Hanzálek Z, Härtig H, Castro M, Kaashoek MF (eds) EuroSys, ACM, pp 1–14

  71. SAP (2016) SAP HANA Data Streaming. http://help.sap.com/hana_options_sds

  72. Sarwat M (2015) Interactive and scalable exploration of big spatial data - A data management perspective. In: Jensen CS, Xie X, Zadorozhny V, Madria S, Pitoura E, Zheng B, Chow C (eds) 16th IEEE International Conference on Mobile Data Management, MDM 2015, Pittsburgh, PA, USA, June 15-18, 2015 - Volume 1, IEEE, pp 263–270. doi:10.1109/MDM.2015.67

  73. Schneider M (1997) Spatial data types for database systems, finite resolution geometry for geographic information systems, Lecture Notes in Computer Science, vol 1288. Springer, Berlin

  74. Schneider M (2009) Spatial and spatio-temporal data models and languages. In: Liu L, Özsu MT (eds) Encyclopedia of Database Systems, Springer US, pp 2681–2685, pp 2681–2685. doi:10.1007/978-0-387-39940-9_360

  75. Shekhar S, Chawla S (2003) Spatial databases - a tour prentice hall. Upper Saddle River, NJ

    Google Scholar 

  76. (2008). In: Shekhar S, Xiong H (eds) Encyclopedia of GIS. Springer, Berlin

  77. Stonebraker M, Çetintemel U, Zdonik S B (2005) The 8 requirements of real-time stream processing. SIGMOD Record 34(4):42–47

    Article  Google Scholar 

  78. Tan H, Luo W, Ni LM (2012) CloST: a Hadoop-based storage system for big spatio-temporal data analytics. In: Chen X, Lebanon G, Wang H, Zaki MJ (eds) 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, October 29 - November 02, 2012, ACM, pp 2139–2143. doi:10.1145/2396761.2398589

  79. TIBCO (2016) TIBCO StreamBase. http://www.streambase.com

  80. Xiong X, Mokbel MF, Aref WG (2008) Spatio-temporal database. In: Shekhar S, Xiong H (eds) Encyclopedia of GIS. Springer, Berlin, pp 1114–1115

  81. Yu J, Wu J, Sarwat M (2015) GeoSpark: A cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, New York, NY, USA, GIS ’15, pp 70:1–70:4. doi:10.1145/2820783.2820860

  82. Zheng Y, Chen Y, Li Q, Xie X, Ma W (2010a) Understanding transportation modes based on GPS data for web applications. TWEB 4(1). doi:10.1145/1658373.1658374

  83. Zheng Y, Xie X, Ma W (2010b) GeoLife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng Bull 33(2):32–39. http://sites.computer.org/debull/A10june/geolife.pdf

Download references

Acknowledgments

The authors would like to thank Mirta Baranović, Damir Kalpić and anonymous reviewers for their helpful and constructive comments that helped us to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zdravko Galić.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Galić, Z., Mešković, E. & Osmanović, D. Distributed processing of big mobility data as spatio-temporal data streams. Geoinformatica 21, 263–291 (2017). https://doi.org/10.1007/s10707-016-0264-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-016-0264-z

Keywords

Navigation