Skip to main content

Spatio-Temporal Data Streams and Big Data Paradigm

  • Chapter
  • First Online:
Spatio-Temporal Data Streams

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

Recent rapid development of wireless communication, mobile computing, global navigational satellite systems (GNSS), and spatially enabled sensors is leading to an exponential growth of available spatio-temporal data produced continuously at hight speed. Spatio-temporal data streams, i.e. real-time, transient, time-varying sequences of spatiotemporal data items, demonstrates at least two Big Data core features: volume and velocity. To handle the volumes of data and computation they involve, these applications need to be distributed over clusters. However, despite substantial work on cluster programming models for batch computation, there are few similarly high-level tools for stream processing. Obviously, there is a clear need for highly scalable spatio-temporal stream computing framework that can operate at high data rates and process massive amounts of big spatio-temporal data streams. In this chapter we present our approach and framework for an integrated big spatio-temporal data stream processing. The key concept here is that streaming data and persistent data are not intrinsically different - the persistent spatio-temporal data is simply streaming data that has been entered into the persistent structures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available on https://bitbucket.org/DarioOsm/mobydick—it relies on operations and functions of Apache Flink DataStream library [6] and JTS Topology Suite [43].

  2. 2.

    Scala programming language does not support interfaces, but enables multiple inheritance by implementing interfaces as traits. For this reason TemporalObject is implemented as a trait that enables TemporalPoint to inherit from two types.

  3. 3.

    Watermark is a special event generated at stream sources that coarsely advances event time. A watermark for time instant \(\tau \) states that event time has progressed to \(\tau \) in that particular stream, meaning that no events with a time instant smaller than \(\tau \) can arrive any more.

  4. 4.

    On the contrary, some cluster computing frameworks (Spark, Trill, etc.) wraps data streams into mini-batches, i.e., it collects all data that arrives within a certain period of time and runs a regular batch program on the collected data.

  5. 5.

    Batch processing applications run efficiently as special cases of stream processing applications.

  6. 6.

    This GPS dataset was collected by 182 users in a period of over five years, from April 2007 to August 2012. It contains approximately 20 million points with a total distance of about 1.2 million kilometers and a total duration of 48,000\(+\) h. The data were logged in over 30 cities in China, USA, and Europe.

References

  1. Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8(12), 1792–1803 (2015). http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf

    Google Scholar 

  2. Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M.J., Schelter, S., Höger, M., Tzoumas, K., Warneke, D.: The Stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014). http://dx.doi.org/10.1007/s00778-014-0357-y

    Google Scholar 

  3. Ali, M.H., Gerea, C., Raman, B.S., Sezgin, B., Tarnavski, T., Verona, T., Wang, P., Zabback, P., Kirilov, A., Ananthanarayan, A., Lu, M., Raizman, A., Krishnan, R., Schindlauer, R., Grabs, T., Bjeletich, S., Chandramouli, B., Goldstein, J., Bhat, S., Li, Y., Nicola, V.D., Wang, X., Maier, D., Santos, I., Nano, O., Grell, S.: Microsoft CEP server and online behavioral targeting. PVLDB 2(2), 1558–1561 (2009)

    Google Scholar 

  4. Ali, M.H., Chandramouli, B., Raman, B.S., Katibah, E.: Spatio-temporal stream processing in microsoft streaminsight. IEEE Data Eng. Bull. 33(2), 69–74 (2010)

    Google Scholar 

  5. Apache Software Foundation: Spark. http://spark.apache.org

  6. Apache Software Foundation: Apache Flink (2016). http://flink.apache.org/

  7. Apache Software Foundation: Apache Samza (2016). http://samza.apache.org

  8. Apache Software Foundation: Apache Storm (2016). http://storm.apache.org

  9. Apache Software Foundation: S4 (2016). http://incubator.apache.org/s4

  10. Apache Software Foundation: Spark Streaming (2016). http://spark.apache.org/streaming

  11. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Popa, L., Abiteboul, S., Kolaitis, P.G. (eds.) PODS. pp. 1–16. ACM (2002)

    Google Scholar 

  12. Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-tolerance in the Borealis distributed stream processing system. ACM Trans. Database Syst. 33(1), 3:1–3:44 (2008)

    Article  Google Scholar 

  13. Ballard, C., Brandt, O., Devaraju, B., Farrell, D., Foster, K., Howard, C., Nicholls, P., Pasricha, A., Rea, R., Schulz, N., Shimada, T., Thorson, J., Tucker, S., Uleman, R.: IBM InfoSphere Streams: Accelerating Deployments with Analytic Accelerators. IBM (2014)

    Google Scholar 

  14. Biem, A., Bouillet, E., Feng, H., Ranganathan, A., Riabov, A., Verscheure, O., Koutsopoulos, H.N., Moran, C.: IBM InfoSphere Streams for scalable, real-time, intelligent transportation services. In: Elmagarmid, A.K., Agrawal, D. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6–10, 2010. pp. 1093–1104. ACM (2010). http://doi.acm.org/10.1145/1807167.1807291

  15. California Center for Innovative Transportation: The Mobile Millennium Project (2016). http://traffic.berkeley.edu

  16. Capriolo, E., Wampler, D., Rutherglen, J.: Programming Hive, 1st edn. O’Reilly Media, Inc., California (2012)

    Google Scholar 

  17. Chandramouli, B., Goldstein, J., Barnett, M., DeLine, R., Platt, J.C., Terwilliger, J.F., Wernsing, J.: Trill: a high-performance incremental query processor for diverse analytics. PVLDB 8(4), 401–412 (2014). http://www.vldb.org/pvldb/vol8/p401-chandramouli.pdf

    Google Scholar 

  18. Chandy, K.M., Lamport, L.: Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 63–75 (1985). http://doi.acm.org/10.1145/214451.214456

    Google Scholar 

  19. Commonwealth Computer Research, Inc.: GeoMesa (2016). http://www.geomesa.org

  20. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: Mapreduce online. In: Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2010, Apr 28–30, 2010, San Jose, CA, USA. pp. 313–328. USENIX Association (2010)

    Google Scholar 

  21. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Brewer, E.A., Chen, P. (eds.) OSDI. pp. 137–150. USENIX Association (2004)

    Google Scholar 

  22. Eldawy, A., Elganainy, M., Bakeer, A., Abdelmotaleb, A., Mokbel, M.: Sphinx: Distributed execution of interactive sql queries on big spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. pp. 78:1–78:4. GIS ’15, ACM, New York, NY, USA (2015). http://doi.acm.org/10.1145/2820783.2820869

  23. Eldawy, A., Mokbel, M.F.: A demonstration of SpatialHadoop: an efficient MapReduce framework for spatial data. PVLDB 6(12), 1230–1233 (2013)

    Google Scholar 

  24. Fox, A., Eichelberger, C., Hughes, J., Lyon, S.: Spatio-temporal indexing in non-relational distributed databases. In: Hu et al. [30], pp. 291–299. http://dx.doi.org/10.1109/BigData.2013.6691586

  25. Franklin, M.J., Krishnamurthy, S., Conway, N., Li, A., Russakovsky, A., Thombre, N.: Continuous analytics: Rethinking query processing in a network-effect world. In: CIDR (2009). www.crdrdb.org

  26. Galić, Z., Mešković, E., Križanović, K., Baranović, M.: OCEANUS: a spatio-temporal data stream system prototype. In: Proceedings of the Third ACM SIGSPATIAL International Workshop on GeoStreaming. pp. 109–115. IWGS ’12, ACM, New York, NY, USA (2012). http://doi.acm.org/10.1145/2442968.2442982

  27. Galić, Z., Baranović, M., Križanović, K., Mešković, E.: Geospatial data streams: formal framework and implementation. Data Knowl Eng 91, 1–16 (2014)

    Article  Google Scholar 

  28. Golab, L., Özsu, M.T.: Data Stream Management. Synthesis lectures on data management. Morgan Claypool Publishers, San Rafael, CA (2010)

    MATH  Google Scholar 

  29. Hortonworks: Magellan: Geospatial Analytics on Spark. http://hortonworks.com/blog/magellan-geospatial-analytics-in-spark (2016)

  30. Hu, X., Lin, T.Y., Raghavan, V.V., Wah, B.W., Baeza-Yates, R.A., Fox, G., Shahabi, C., Smith, M., Yang, Q., Ghani, R., Fan, W., Lempel, R., Nambiar, R. (eds.) In: Proceedings of the 2013 IEEE International Conference on Big Data, 6–9 Oct 2013, Santa Clara, CA, USA. IEEE (2013). http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6679357

  31. Huang, Y., Zhang, C.: New data types and operations to support geo-streams. In: Cova, T.J., Miller, H.J., Beard, K., Frank, A.U., Goodchild, M.F. (eds.) GIScience. Lecture Notes in Computer Science, vol. 5266, pp. 106–118. Springer (2008)

    Google Scholar 

  32. Hunter, T., Das, T., Zaharia, M., Abbeel, P., Bayen, A.M.: Large-scale estimation in cyberphysical systems using streaming data: a case study with arterial traffic estimation. IEEE T. Autom Sci Eng 10(4), 884–898 (2013)

    Article  Google Scholar 

  33. ISO 19108:2002: Geographic information – Temporal schema (2002)

    Google Scholar 

  34. ISO 19107:2003: Geographic information – Spatial schema (2003)

    Google Scholar 

  35. ISO 19141:2008: Geographic information – Schema for moving features (2008)

    Google Scholar 

  36. ISO/IEC 13249-3:2011: Information technology – Database languages – SQL multimedia and application packages – Part 3: Spatial (2011)

    Google Scholar 

  37. Jiang, J., Bao, H., Chang, E.Y., Li, Y.: MOIST: a scalable and parallel moving object indexer with school tracking. PVLDB 5(12), 1838–1849 (2012)

    Google Scholar 

  38. Kazemitabar, S.J., Kashani, F.B., McLeod, D.: Geostreaming in cloud. In: Ali, M.H., Hoel, E.G., Kashani, F.B. (eds.) Proceedings of the 2011 ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2011, Nov 1, 2011, Chicago, IL, USA. pp. 3–9. ACM (2011). http://doi.acm.org/10.1145/2064959.2064962

  39. Kornacker, M., Behm, A., Bittorf, V., Bobrovytsky, T., Ching, C., Choi, A., Erickson, J., Grund, M., Hecht, D., Jacobs, M., Joshi, I., Kuff, L., Kumar, D., Leblang, A., Li, N., Pandis, I., Robinson, H., Rorke, D., Rus, S., Russell, J., Tsirogiannis, D., Wanderman-Milne, S., Yoder, M.: Impala: a modern, open-source SQL engine for Hadoop. In: CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, Jan 4–7, 2015, Online Proceedings (2015). www.cidrdb.org

  40. Lu, J., Güting, R.H.: Parallel SECONDO: practical and efficient mobility data processing in the cloud. In: Hu et al. [30], pp. 17–25. http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6679357

  41. Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on MapReduce. In: Meng, X., Wang, H., Chen, Y. (eds.) Proceedings of the First International CIKM Workshop on Cloud Data Management, CloudDb 2009, Hong Kong, China, Nov 2, 2009. pp. 9–16. ACM (2009). http://doi.acm.org/10.1145/1651263.1651266

  42. Mahmood, A.R., Aly, A.M., Qadah, T., Rezig, E.K., Daghistani, A., Madkour, A., Abdelhamid, A.S., Hassan, M.S., Aref, W.G., Basalamah, S.: Tornado: a distributed spatio-textual stream processing system. PVLDB 8(12), 2020–2031 (2015). http://www.vldb.org/pvldb/vol8/p2020-mahmood.pdf

    Google Scholar 

  43. Martin, D.: JTS Topology Suite (2016). http://tsusiatsoftware.net/jts/main.html

  44. Meehan, J., Tatbul, N., Zdonik, S., Aslantas, C., Çetintemel, U., Du, J., Kraska, T., Madden, S., Maier, D., Pavlo, A., Stonebraker, M., Tufte, K., Wang, H.: S-store: streaming meets transaction processing. PVLDB 8(13), 2134–2145 (2015). http://www.vldb.org/pvldb/vol8/p2134-meehan.pdf

    Google Scholar 

  45. Miller, J., Raymond, M., Archer, J., Adem, S., Hansel, L., Konda, S., Luti, M., Zhao, Y., Teredesai, A., Ali, M.H.: An extensibility approach for spatio-temporal stream processing using Microsoft StreamInsight. In: Pfoser, D., Tao, Y., Mouratidis, K., Nascimento, M.A., Mokbel, M.F., Shekhar, S., Huang, Y. (eds.) SSTD. Lecture Notes in Computer Science, vol. 6849, pp. 496–501. Springer (2011)

    Google Scholar 

  46. Mokbel, M.F., Xiong, X., Hammad, M.A., Aref, W.G.: Continuous query processing of spatio-temporal data streams in PLACE. GeoInformatica 9(4), 343–365 (2005)

    Article  Google Scholar 

  47. Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Kaminsky, M., Dahlin, M. (eds.) ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP ’13, Farmington, PA, USA, Nov 3–6, 2013. pp. 439–455. ACM (2013). http://dl.acm.org/citation.cfm?id=2517349

  48. Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Fan, W., Hsu, W., Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) ICDMW 2010, The 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 13 Dec 2010. pp. 170–177. IEEE Computer Society (2010)

    Google Scholar 

  49. Nidzwetzki, J.K., Güting, R.H.: Distributed SECONDO: A highly available and scalable system for spatial data processing. In: Claramunt, C., Schneider, M., Wong, R.C., Xiong, L., Loh, W., Shahabi, C., Li, K. (eds.) Advances in Spatial and Temporal Databases - 14th International Symposium, SSTD 2015, Hong Kong, China, Aug 26–28, 2015. Proceedings. Lecture Notes in Computer Science, vol. 9239, pp. 491–496. Springer (2015). http://dx.doi.org/10.1007/978-3-319-22363-6

  50. Oracle: Oracle Fusion Middleware – Oracle CQL Language Reference for Oracle Event Processing, 12c Release (12.1.3.0). Oracle Corporation (2014)

    Google Scholar 

  51. Patroumpas, K., Sellis, T.K.: Managing trajectories of moving objects as data streams. In: Sander, J., Nascimento, M.A. (eds.) STDBM. pp. 41–48 (2004)

    Google Scholar 

  52. Patroumpas, K., Sellis, T.K.: Event processing and real-time monitoring over streaming traffic data. In: Martino, S.D., Peron, A., Tezuka, T. (eds.) W2GIS. Lecture Notes in Computer Science, vol. 7236, pp. 116–133. Springer (2012)

    Google Scholar 

  53. Qian, Z., He, Y., Su, C., Wu, Z., Zhu, H., Zhang, T., Zhou, L., Yu, Y., Zhang, Z.: TimeStream: reliable stream computation in the cloud. In: Hanzálek, Z., Härtig, H., Castro, M., Kaashoek, M.F. (eds.) Eighth Eurosys Conference 2013, EuroSys ’13, Prague, Czech Republic, April 14-17, 2013. pp. 1–14. ACM (2013)

    Google Scholar 

  54. Sarwat, M.: Interactive and scalable exploration of big spatial data - A data management perspective. In: Jensen, C.S., Xie, X., Zadorozhny, V., Madria, S., Pitoura, E., Zheng, B., Chow, C. (eds.) 16th IEEE International Conference on Mobile Data Management, MDM 2015, Pittsburgh, PA, USA, Vol. 1. pp. 263–270, June 15–18 2015. IEEE (2015). http://dx.doi.org/10.1109/MDM.2015.67

  55. Schneider, M.: Spatial and spatio-temporal data models and languages. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 2681–2685. Springer, New York (2009)

    Google Scholar 

  56. Stonebraker, M., Çetintemel, U., Zdonik, S.B.: The 8 requirements of real-time stream processing. SIGMOD Rec. 34(4), 42–47 (2005)

    Article  Google Scholar 

  57. Tan, H., Luo, W., Ni, L.M.: CloST: a Hadoop-based storage system for big spatio-temporal data analytics. In: Chen, X., Lebanon, G., Wang, H., Zaki, M.J. (eds.) 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, pp. 2139–2143 Oct 29–Nov 02, 2012. ACM (2012). http://doi.acm.org/10.1145/2396761.2398589

  58. White, T.: Hadoop: The Definitive Guide, 2nd edn. O’Reilly Media, Inc., California (2012)

    Google Scholar 

  59. Yu, J., Wu, J., Sarwat, M.: GeoSpark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 70:1–70:4. GIS ’15, ACM, New York, NY, USA (2015). http://doi.acm.org/10.1145/2820783.2820860

  60. Zhang, C., Huang, Y., Griffin, T.: Querying geospatial data streams in SECONDO. In: Agrawal, D., Aref, W.G., Lu, C.T., Mokbel, M.F., Scheuermann, P., Shahabi, C., Wolfson, O. (eds.) GIS. pp. 544–545. ACM (2009)

    Google Scholar 

  61. Zheng, Y., Chen, Y., Li, Q., Xie, X., Ma, W.: Understanding transportation modes based on GPS data for web applications. TWEB 4(1) (2010). http://doi.acm.org/10.1145/1658373.1658374

    Google Scholar 

  62. Zheng, Y., Xie, X., Ma, W.: GeoLife: a collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull. 33(2), 32–39 (2010). http://sites.computer.org/debull/A10june/geolife.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zdravko Galić .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 The Author(s)

About this chapter

Cite this chapter

Galić, Z. (2016). Spatio-Temporal Data Streams and Big Data Paradigm. In: Spatio-Temporal Data Streams. SpringerBriefs in Computer Science. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-6575-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6575-5_3

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-6573-1

  • Online ISBN: 978-1-4939-6575-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics