Abstract
Recent rapid development of wireless communication, mobile computing, global navigational satellite systems (GNSS), and spatially enabled sensors is leading to an exponential growth of available spatio-temporal data produced continuously at hight speed. Spatio-temporal data streams, i.e. real-time, transient, time-varying sequences of spatiotemporal data items, demonstrates at least two Big Data core features: volume and velocity. To handle the volumes of data and computation they involve, these applications need to be distributed over clusters. However, despite substantial work on cluster programming models for batch computation, there are few similarly high-level tools for stream processing. Obviously, there is a clear need for highly scalable spatio-temporal stream computing framework that can operate at high data rates and process massive amounts of big spatio-temporal data streams. In this chapter we present our approach and framework for an integrated big spatio-temporal data stream processing. The key concept here is that streaming data and persistent data are not intrinsically different - the persistent spatio-temporal data is simply streaming data that has been entered into the persistent structures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available on https://bitbucket.org/DarioOsm/mobydick—it relies on operations and functions of Apache Flink DataStream library [6] and JTS Topology Suite [43].
- 2.
Scala programming language does not support interfaces, but enables multiple inheritance by implementing interfaces as traits. For this reason TemporalObject is implemented as a trait that enables TemporalPoint to inherit from two types.
- 3.
Watermark is a special event generated at stream sources that coarsely advances event time. A watermark for time instant \(\tau \) states that event time has progressed to \(\tau \) in that particular stream, meaning that no events with a time instant smaller than \(\tau \) can arrive any more.
- 4.
On the contrary, some cluster computing frameworks (Spark, Trill, etc.) wraps data streams into mini-batches, i.e., it collects all data that arrives within a certain period of time and runs a regular batch program on the collected data.
- 5.
Batch processing applications run efficiently as special cases of stream processing applications.
- 6.
This GPS dataset was collected by 182 users in a period of over five years, from April 2007 to August 2012. It contains approximately 20 million points with a total distance of about 1.2 million kilometers and a total duration of 48,000\(+\)Â h. The data were logged in over 30 cities in China, USA, and Europe.
References
Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8(12), 1792–1803 (2015). http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M.J., Schelter, S., Höger, M., Tzoumas, K., Warneke, D.: The Stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014). http://dx.doi.org/10.1007/s00778-014-0357-y
Ali, M.H., Gerea, C., Raman, B.S., Sezgin, B., Tarnavski, T., Verona, T., Wang, P., Zabback, P., Kirilov, A., Ananthanarayan, A., Lu, M., Raizman, A., Krishnan, R., Schindlauer, R., Grabs, T., Bjeletich, S., Chandramouli, B., Goldstein, J., Bhat, S., Li, Y., Nicola, V.D., Wang, X., Maier, D., Santos, I., Nano, O., Grell, S.: Microsoft CEP server and online behavioral targeting. PVLDB 2(2), 1558–1561 (2009)
Ali, M.H., Chandramouli, B., Raman, B.S., Katibah, E.: Spatio-temporal stream processing in microsoft streaminsight. IEEE Data Eng. Bull. 33(2), 69–74 (2010)
Apache Software Foundation: Spark. http://spark.apache.org
Apache Software Foundation: Apache Flink (2016). http://flink.apache.org/
Apache Software Foundation: Apache Samza (2016). http://samza.apache.org
Apache Software Foundation: Apache Storm (2016). http://storm.apache.org
Apache Software Foundation: S4 (2016). http://incubator.apache.org/s4
Apache Software Foundation: Spark Streaming (2016). http://spark.apache.org/streaming
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Popa, L., Abiteboul, S., Kolaitis, P.G. (eds.) PODS. pp. 1–16. ACM (2002)
Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-tolerance in the Borealis distributed stream processing system. ACM Trans. Database Syst. 33(1), 3:1–3:44 (2008)
Ballard, C., Brandt, O., Devaraju, B., Farrell, D., Foster, K., Howard, C., Nicholls, P., Pasricha, A., Rea, R., Schulz, N., Shimada, T., Thorson, J., Tucker, S., Uleman, R.: IBM InfoSphere Streams: Accelerating Deployments with Analytic Accelerators. IBM (2014)
Biem, A., Bouillet, E., Feng, H., Ranganathan, A., Riabov, A., Verscheure, O., Koutsopoulos, H.N., Moran, C.: IBM InfoSphere Streams for scalable, real-time, intelligent transportation services. In: Elmagarmid, A.K., Agrawal, D. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6–10, 2010. pp. 1093–1104. ACM (2010). http://doi.acm.org/10.1145/1807167.1807291
California Center for Innovative Transportation: The Mobile Millennium Project (2016). http://traffic.berkeley.edu
Capriolo, E., Wampler, D., Rutherglen, J.: Programming Hive, 1st edn. O’Reilly Media, Inc., California (2012)
Chandramouli, B., Goldstein, J., Barnett, M., DeLine, R., Platt, J.C., Terwilliger, J.F., Wernsing, J.: Trill: a high-performance incremental query processor for diverse analytics. PVLDB 8(4), 401–412 (2014). http://www.vldb.org/pvldb/vol8/p401-chandramouli.pdf
Chandy, K.M., Lamport, L.: Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 63–75 (1985). http://doi.acm.org/10.1145/214451.214456
Commonwealth Computer Research, Inc.: GeoMesa (2016). http://www.geomesa.org
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: Mapreduce online. In: Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2010, Apr 28–30, 2010, San Jose, CA, USA. pp. 313–328. USENIX Association (2010)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Brewer, E.A., Chen, P. (eds.) OSDI. pp. 137–150. USENIX Association (2004)
Eldawy, A., Elganainy, M., Bakeer, A., Abdelmotaleb, A., Mokbel, M.: Sphinx: Distributed execution of interactive sql queries on big spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. pp. 78:1–78:4. GIS ’15, ACM, New York, NY, USA (2015). http://doi.acm.org/10.1145/2820783.2820869
Eldawy, A., Mokbel, M.F.: A demonstration of SpatialHadoop: an efficient MapReduce framework for spatial data. PVLDB 6(12), 1230–1233 (2013)
Fox, A., Eichelberger, C., Hughes, J., Lyon, S.: Spatio-temporal indexing in non-relational distributed databases. In: Hu et al. [30], pp. 291–299. http://dx.doi.org/10.1109/BigData.2013.6691586
Franklin, M.J., Krishnamurthy, S., Conway, N., Li, A., Russakovsky, A., Thombre, N.: Continuous analytics: Rethinking query processing in a network-effect world. In: CIDR (2009). www.crdrdb.org
Galić, Z., Mešković, E., Križanović, K., Baranović, M.: OCEANUS: a spatio-temporal data stream system prototype. In: Proceedings of the Third ACM SIGSPATIAL International Workshop on GeoStreaming. pp. 109–115. IWGS ’12, ACM, New York, NY, USA (2012). http://doi.acm.org/10.1145/2442968.2442982
Galić, Z., Baranović, M., Križanović, K., Mešković, E.: Geospatial data streams: formal framework and implementation. Data Knowl Eng 91, 1–16 (2014)
Golab, L., Özsu, M.T.: Data Stream Management. Synthesis lectures on data management. Morgan Claypool Publishers, San Rafael, CA (2010)
Hortonworks: Magellan: Geospatial Analytics on Spark. http://hortonworks.com/blog/magellan-geospatial-analytics-in-spark (2016)
Hu, X., Lin, T.Y., Raghavan, V.V., Wah, B.W., Baeza-Yates, R.A., Fox, G., Shahabi, C., Smith, M., Yang, Q., Ghani, R., Fan, W., Lempel, R., Nambiar, R. (eds.) In: Proceedings of the 2013 IEEE International Conference on Big Data, 6–9 Oct 2013, Santa Clara, CA, USA. IEEE (2013). http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6679357
Huang, Y., Zhang, C.: New data types and operations to support geo-streams. In: Cova, T.J., Miller, H.J., Beard, K., Frank, A.U., Goodchild, M.F. (eds.) GIScience. Lecture Notes in Computer Science, vol. 5266, pp. 106–118. Springer (2008)
Hunter, T., Das, T., Zaharia, M., Abbeel, P., Bayen, A.M.: Large-scale estimation in cyberphysical systems using streaming data: a case study with arterial traffic estimation. IEEE T. Autom Sci Eng 10(4), 884–898 (2013)
ISO 19108:2002: Geographic information – Temporal schema (2002)
ISO 19107:2003: Geographic information – Spatial schema (2003)
ISO 19141:2008: Geographic information – Schema for moving features (2008)
ISO/IEC 13249-3:2011: Information technology – Database languages – SQL multimedia and application packages – Part 3: Spatial (2011)
Jiang, J., Bao, H., Chang, E.Y., Li, Y.: MOIST: a scalable and parallel moving object indexer with school tracking. PVLDB 5(12), 1838–1849 (2012)
Kazemitabar, S.J., Kashani, F.B., McLeod, D.: Geostreaming in cloud. In: Ali, M.H., Hoel, E.G., Kashani, F.B. (eds.) Proceedings of the 2011 ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2011, Nov 1, 2011, Chicago, IL, USA. pp. 3–9. ACM (2011). http://doi.acm.org/10.1145/2064959.2064962
Kornacker, M., Behm, A., Bittorf, V., Bobrovytsky, T., Ching, C., Choi, A., Erickson, J., Grund, M., Hecht, D., Jacobs, M., Joshi, I., Kuff, L., Kumar, D., Leblang, A., Li, N., Pandis, I., Robinson, H., Rorke, D., Rus, S., Russell, J., Tsirogiannis, D., Wanderman-Milne, S., Yoder, M.: Impala: a modern, open-source SQL engine for Hadoop. In: CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, Jan 4–7, 2015, Online Proceedings (2015). www.cidrdb.org
Lu, J., Güting, R.H.: Parallel SECONDO: practical and efficient mobility data processing in the cloud. In: Hu et al. [30], pp. 17–25. http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6679357
Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on MapReduce. In: Meng, X., Wang, H., Chen, Y. (eds.) Proceedings of the First International CIKM Workshop on Cloud Data Management, CloudDb 2009, Hong Kong, China, Nov 2, 2009. pp. 9–16. ACM (2009). http://doi.acm.org/10.1145/1651263.1651266
Mahmood, A.R., Aly, A.M., Qadah, T., Rezig, E.K., Daghistani, A., Madkour, A., Abdelhamid, A.S., Hassan, M.S., Aref, W.G., Basalamah, S.: Tornado: a distributed spatio-textual stream processing system. PVLDB 8(12), 2020–2031 (2015). http://www.vldb.org/pvldb/vol8/p2020-mahmood.pdf
Martin, D.: JTS Topology Suite (2016). http://tsusiatsoftware.net/jts/main.html
Meehan, J., Tatbul, N., Zdonik, S., Aslantas, C., Çetintemel, U., Du, J., Kraska, T., Madden, S., Maier, D., Pavlo, A., Stonebraker, M., Tufte, K., Wang, H.: S-store: streaming meets transaction processing. PVLDB 8(13), 2134–2145 (2015). http://www.vldb.org/pvldb/vol8/p2134-meehan.pdf
Miller, J., Raymond, M., Archer, J., Adem, S., Hansel, L., Konda, S., Luti, M., Zhao, Y., Teredesai, A., Ali, M.H.: An extensibility approach for spatio-temporal stream processing using Microsoft StreamInsight. In: Pfoser, D., Tao, Y., Mouratidis, K., Nascimento, M.A., Mokbel, M.F., Shekhar, S., Huang, Y. (eds.) SSTD. Lecture Notes in Computer Science, vol. 6849, pp. 496–501. Springer (2011)
Mokbel, M.F., Xiong, X., Hammad, M.A., Aref, W.G.: Continuous query processing of spatio-temporal data streams in PLACE. GeoInformatica 9(4), 343–365 (2005)
Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Kaminsky, M., Dahlin, M. (eds.) ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP ’13, Farmington, PA, USA, Nov 3–6, 2013. pp. 439–455. ACM (2013). http://dl.acm.org/citation.cfm?id=2517349
Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Fan, W., Hsu, W., Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) ICDMW 2010, The 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 13 Dec 2010. pp. 170–177. IEEE Computer Society (2010)
Nidzwetzki, J.K., Güting, R.H.: Distributed SECONDO: A highly available and scalable system for spatial data processing. In: Claramunt, C., Schneider, M., Wong, R.C., Xiong, L., Loh, W., Shahabi, C., Li, K. (eds.) Advances in Spatial and Temporal Databases - 14th International Symposium, SSTD 2015, Hong Kong, China, Aug 26–28, 2015. Proceedings. Lecture Notes in Computer Science, vol. 9239, pp. 491–496. Springer (2015). http://dx.doi.org/10.1007/978-3-319-22363-6
Oracle: Oracle Fusion Middleware – Oracle CQL Language Reference for Oracle Event Processing, 12c Release (12.1.3.0). Oracle Corporation (2014)
Patroumpas, K., Sellis, T.K.: Managing trajectories of moving objects as data streams. In: Sander, J., Nascimento, M.A. (eds.) STDBM. pp. 41–48 (2004)
Patroumpas, K., Sellis, T.K.: Event processing and real-time monitoring over streaming traffic data. In: Martino, S.D., Peron, A., Tezuka, T. (eds.) W2GIS. Lecture Notes in Computer Science, vol. 7236, pp. 116–133. Springer (2012)
Qian, Z., He, Y., Su, C., Wu, Z., Zhu, H., Zhang, T., Zhou, L., Yu, Y., Zhang, Z.: TimeStream: reliable stream computation in the cloud. In: Hanzálek, Z., Härtig, H., Castro, M., Kaashoek, M.F. (eds.) Eighth Eurosys Conference 2013, EuroSys ’13, Prague, Czech Republic, April 14-17, 2013. pp. 1–14. ACM (2013)
Sarwat, M.: Interactive and scalable exploration of big spatial data - A data management perspective. In: Jensen, C.S., Xie, X., Zadorozhny, V., Madria, S., Pitoura, E., Zheng, B., Chow, C. (eds.) 16th IEEE International Conference on Mobile Data Management, MDM 2015, Pittsburgh, PA, USA, Vol. 1. pp. 263–270, June 15–18 2015. IEEE (2015). http://dx.doi.org/10.1109/MDM.2015.67
Schneider, M.: Spatial and spatio-temporal data models and languages. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 2681–2685. Springer, New York (2009)
Stonebraker, M., Çetintemel, U., Zdonik, S.B.: The 8 requirements of real-time stream processing. SIGMOD Rec. 34(4), 42–47 (2005)
Tan, H., Luo, W., Ni, L.M.: CloST: a Hadoop-based storage system for big spatio-temporal data analytics. In: Chen, X., Lebanon, G., Wang, H., Zaki, M.J. (eds.) 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, pp. 2139–2143 Oct 29–Nov 02, 2012. ACM (2012). http://doi.acm.org/10.1145/2396761.2398589
White, T.: Hadoop: The Definitive Guide, 2nd edn. O’Reilly Media, Inc., California (2012)
Yu, J., Wu, J., Sarwat, M.: GeoSpark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 70:1–70:4. GIS ’15, ACM, New York, NY, USA (2015). http://doi.acm.org/10.1145/2820783.2820860
Zhang, C., Huang, Y., Griffin, T.: Querying geospatial data streams in SECONDO. In: Agrawal, D., Aref, W.G., Lu, C.T., Mokbel, M.F., Scheuermann, P., Shahabi, C., Wolfson, O. (eds.) GIS. pp. 544–545. ACM (2009)
Zheng, Y., Chen, Y., Li, Q., Xie, X., Ma, W.: Understanding transportation modes based on GPS data for web applications. TWEB 4(1) (2010). http://doi.acm.org/10.1145/1658373.1658374
Zheng, Y., Xie, X., Ma, W.: GeoLife: a collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull. 33(2), 32–39 (2010). http://sites.computer.org/debull/A10june/geolife.pdf
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 The Author(s)
About this chapter
Cite this chapter
Galić, Z. (2016). Spatio-Temporal Data Streams and Big Data Paradigm. In: Spatio-Temporal Data Streams. SpringerBriefs in Computer Science. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-6575-5_3
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6575-5_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-6573-1
Online ISBN: 978-1-4939-6575-5
eBook Packages: Computer ScienceComputer Science (R0)