Encyclopedia of GIS

2017 Edition
| Editors: Shashi Shekhar, Hui Xiong, Xun Zhou

Data Stream Systems, Empowering with Spatiotemporal Capabilities

  • Mohamed Ali
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-17885-1_1589

Definition

Spatiotemporal data streaming (or geostreaming) refers to the acquisition, processing, and analysis of stream data that has geographical locations and/or spatial extents such as point coordinates, lines, or polygons.

Real-time stream data acquisition through sensors and probes has been widely used in numerous applications. Hence, integrating spatial operators in commercial data-streaming engines has gained tremendous interest in recent years. In this entry, we consider the Microsoft StreamInsight (StreamInsight, for brevity) as our industrial case study. We highlight the background beyond its temporal model and discuss the various efforts that leverage its temporal model to the spatial domain.

Historical Background

Spatial queries and operations are common and essential for a variety of location-aware applications, e.g., find out the gas stations nearby a driver’s location. During the last decade, accommodating spatial queries, e.g., K Nearest Neighbor (KNN) query, Reverse Nearest Neighbor (RNN) query, and range query, in data stream processing engines has attracted the database researchers’ interest. Supporting spatiotemporal features in a Data Stream Management System (DSMS) requires the system to be equipped with especial indices, e.g., R-tree, and operators, e.g., intersect or overlap. DSMSs consolidate streams of data from multiple sources and with different formats and types (including the spatial types) and evaluate the issued queries in low response times. Consequently, the data stream query processor extracts interesting patterns and trends from the feeded spatial and nonspatial data in real time, Abadi et al. (2005), Chandrasekaran et al. (2003), Cranor et al. (2003), and StreamBase Inc.

Scientific Fundamentals

Fundamentally, a Data Stream Management System gives its connected applications the ability to issue continuous queries that digest and evaluate streams of data in real-time basis (Ali et al. 2009; Chandramouli et al. 2009; Barga et al. 2007). Moreover, a streaming engine is expected to include an extensibility mechanism to smoothly combine domain-specific rules and policies into the query pipeline. Here, we consider Microsoft StreamInsight as an example data streaming. StreamInsight has been designed to be an extensible system that is able to incorporate user-defined modules and functions and execute them as part of the continuous query processing plan (Ali et al. 2011). Furthermore, streaming applications and systems require the continuous query processing engine to guarantee the ability to digest input data with high rates and with incomplete and/or inaccurate values.

To these ends, the StreamInsight is engineered to handle imperfections in event delivery and also to assure the consistency of the returned final results. Consistency here can be interpreted as set of tests to confirm the correctness of the generated answers before being delivered to the query issuer. Consistency also means that obsolete and missed tuples should not significantly affect the validity of the output.

To guarantee the efficiency and consistency measurements when dealing with spatial data, StreamInsight is extended with Microsoft SQL Server Spatial Library SQL which provides a simple an easy to use, scalable, and highly efficient execution environment for spatial data analysis and processing. SQL Spatial Library provides data type support for point, line, and polygon objects. Also, various methods are provided to handle these spatial data types. SQL Spatial Library adheres to the Open Geospatial Consortium Simple Feature Access specification (Open Geospatial Consortium) and is provided as part of the SQL Server Types Library.

The following brief explanation of terms, features, and components is crucial for understanding the event stream model in Microsoft StreamInsight. For more details, the reader is referred to Barga et al. (2007). A physical stream is a sequence of events. An event e i = 〈p, c〉 is a notification from the outside world that contains (1) a payload p = 〈p1, , p k 〉 and (2) a control parameter c that provides metadata. The control parameter includes an event generation time and a duration that indicate the period of time over which an event can influence output. We capture this temporal information by defining c = < LE, RE >, where the interval [LE, RE) specifies the period (or lifetime) over which the event contributes to output. The left endpoint (LE) of this interval, also called start time, is the application time of event generation. The event start time is also called the event timestamp. Assuming the event lasts for x time units, the right endpoint of an event, also called end time, is simply RE = LE + x.

StreamInsight allows users to issue compensations (or corrections) for earlier reported events, by the notion of retractions (Barga et al. 2007; Motwani et al. 2003; Ryvkina et al. 2006), which indicates a modification of the lifetime of an earlier event. This is supported by an optional third control parameter RE new , that indicates the new right endpoint of the corresponding event. Event deletion (called a full retraction) is expressed by setting RE new = LE (i.e., zero lifetime).

A Canonical History Table (CHT) is the logical representation of a stream. Each entry in a CHT consists of a lifetime (LE and RE) and the payload. All times are application times, as opposed to system times. Thus, StreamInsight models a data stream as a time-varying relation, motivated by early work on temporal databases by Jensen and Snodgrass (1992).

Table 1 shows an example CHT. This CHT can be derived from the actual physical events (either new inserts or retractions) with control parameter c = 〈LE, RE, RE new 〉. For example, Table 2 shows one possible physical stream with an associated logical CHT shown in Table 1. Note that a retraction event includes the new right endpoint of the modified event. The CHT (Table 1) is derived by matching each retraction in the physical stream (Table 2) with its corresponding insertion and adjusting RE of the event accordingly.
Data Stream Systems, Empowering with Spatiotemporal Capabilities, Table 1

Canonical History Table

ID

LE

RE

Payload

 

e 0

1

5

P 1

 

e 1

4

9

P 2

 
Data Stream Systems, Empowering with Spatiotemporal Capabilities, Table 2

Physical stream corresponding to CHT

ID

Type

LE

RE

RE new

Payload

 

e 0

Insertion

1

P 1

 

e 0

Retraction

1

10

P 1

 

e 0

Retraction

1

10

5

P 1

 

e 1

Insertion

4

9

P 2

 

We need to ensure that an event is not arbitrarily out of order; this is realized using time-based punctuations (Barga et al. 2007; Srivastava and Widom 2004; Tucker et al. 2003). A time-based punctuation is a special event that is used to indicate time progress. These punctuations are called Current Time Increments (CTIs) in StreamInsight. A CTI is associated with a timestamp t and indicates that there will be no future event in the stream that modifies any part of the time axis that is earlier than t. Note that we could still see retractions for events with LE less than t, as long as both RE and RE new  are greater than or equal to t.

There are two approaches for the spatiotemporal stream processing within StreamInsight: an extensibility approach and a native support approach. The extensibility approach combines the values of the StreamInsight extensibility framework and the SQL Spatial Library by giving the UDM writers the ability to invoke the library methods within their code. Alternatively, the native support approach deals with spatial attributes as first-class citizens and reasons about the spatial properties of incoming events and, more interestingly, provides consistency guarantees over space as well as time. For details on these two approaches, the reader is referred to Ali et al. (2010) and Jeremiah et al. (2011).

Key Applications

Spatiotemporal stream engines such as Microsoft StreamInsight are beneficial in many real applications and systems. Here we give two brief examples of these applications.

Traffic Management Systems

In a traffic management scenario, the system answers queries about the past, current, and future road conditions. Further, it suggests the best driving directions for newly added vehicles by taking future road conditions into consideration. Note that as long as the vehicle is on track, i.e., following the route planned by the system according to the expected speed, there is no need for the vehicle to transmit any events to the system, which results in reducing transmission load over the wireless network. However, if the vehicle changes its route selection policy, makes an unexpected turn, or stops for some time, the vehicle generates retraction and insertion events to adjust its path. In response to the retraction event, the system updates the result of its CQs and possibly generates compensation events or new speculative output. Further, we could define a spatiotemporal algebra with new streaming operators that natively take location into consideration; for example, we may add a spatiotemporal left-semi-join operator that accepts a proximity metric and outputs events related to the left input object only when it overlaps in time as well as space (within the proximity metric) with a matching object on the right input. For a detailed discussion on this application scenario and a streaming approach to the solution, the reader is referred to Ali et al. (2010) and Jalal et al. (2010).

Criminal Activity Tracking and Monitoring Systems

Court orders may require supervising agencies to track and monitor a specific set of offenders using ankle bracelets. According to the decision of the criminal justice system, each offender with a tracking device is assigned a designated spatiotemporal curfew. This curfew typically consists of confinement zones to which the offender is detained to and a set of restricted zones to which he is obliged to stay away from.

For example, an offender may be required to stay home at night during a court-ordered curfew. Also, a sex offender would be restricted from visiting school zones. Offenders are free to move around without the monitoring agencies being alerted as long as they remain within the designated confinement regions and as long as they do not enter restricted zones. A spatiotemporal DSMS helps (1) detect unauthorized activities in real time and provide alerts to a community corrections officer, a law enforcement dispatcher, or a control center and (2) mine for the offenders’ suspicious behavior and predict probable future threats beforehand. Unauthorized activities include protecting geographically defined regions (e.g., school zones) in which the offender is not allowed to be present. Suspicious behaviors include the meeting of offenders with each other on a regular basis, possibly near restricted zone. For a detailed discussion on this application scenario and a streaming approach to the solution, the reader is referred to Daubal et al. (2013).

Future Directions

Future directions for spatiotemporal data stream management systems would focus on big spatial data processing and analysis. In this paradigm, the geospatial data streaming (or geostreaming) will serve a key role at the intersection of mobility and cloud computing (Shekhar et al. 2012). Geostreaming will establish the query processing pipeline between the mobile devices with their streams of location updates and the cloud storage.

References

  1. Abadi D et al (2005) The design of the Borealis stream processing engine. In: CIDR. Asilomar, CAGoogle Scholar
  2. Ali M et al (2009) Microsoft CEP server and online behavioral targeting. In: VLDB. Lyon, FranceGoogle Scholar
  3. Ali M, Chandramouli B, Sethu Raman B, Katibah E (2010) Spatio-temporal stream processing in microsoft streaminsight. IEEE Data Eng Bull 33(2): 69–74Google Scholar
  4. Ali M, Chandramouli B, Goldstein J, Schindlauer R (2011) The extensibility framework in microsoft streaminsight. In: ICDE. Hannover, GermanyCrossRefGoogle Scholar
  5. Barga R et al (2007) Consistent streaming through time: a vision for event stream processing. In: CIDR. Asilomar, CAGoogle Scholar
  6. Chandrasekaran S et al (2003) TelegraphCQ: continuous dataflow processing for an uncertain world. In: CIDR. Asilomar, CACrossRefGoogle Scholar
  7. Chandramouli B, Goldstein J, Maier D (2009) On-the-fly progress detection in iterative stream queries. In: VLDB. Lyon, FranceGoogle Scholar
  8. Cranor C et al (2003) Gigascope: a stream database for network applications. In: SIGMOD. San Diego, CACrossRefGoogle Scholar
  9. Daubal M, Fajinmi O, Jangaard L, Simonson N, Yasutake B, Newell J, Ali M (2013) Safe step: a real-time gps tracking and analysis system for criminal activities using ankle bracelets. In: The ACM SIGSPATIAL conference on advances in geographic information systems, GIS. Orlando, FLCrossRefGoogle Scholar
  10. Jensen C, Snodgrass R (1992) Temporal specialization. In: ICDE. Tempe, AZCrossRefGoogle Scholar
  11. Jeremiah M, Raymond M, Archer J, Adem S, Hansel L, Konda S, Luti M, Zhao Y, Teredesai A, Ali M (2011) An extensibility approach for spatio-temporal stream processing using microsoft streaminsight. In: The international symposium on spatial and temporal databases, SSTD. Minneapolis, MNGoogle Scholar
  12. Kazemitabar SJ, Demiryurek U, Ali MH, Akdogan A, Shahabi C (2010) Geospatial stream query processing using microsoft sql server streaminsight. In: VLDB. SingaporeGoogle Scholar
  13. Motwani R et al (2003) Query processing, approximation, and resource management in a DSMS. In: CIDR. Asilomar, CAGoogle Scholar
  14. Open Geospatial Consortium. http://www.opengeospatial.org/standards/sfa (Last Accessed March 2016)
  15. Ryvkina E et al (2006) Revision processing in a stream processing engine: a high-level design. In: ICDE. Atlanta, GAGoogle Scholar
  16. Shekhar S, Evans MR, Gunturi V, Yang K (2012) Spatial big-data challenges intersecting mobility and cloud computing. In: The NSF workshop on social networks and mobility in the cloud. Washington DCCrossRefGoogle Scholar
  17. SQL Server Spatial Libraries. http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx (Last Accessed March 2016)
  18. Srivastava U, Widom J (2004) Flexible time management in data stream systems. In: PODS. Paris, FranceCrossRefGoogle Scholar
  19. StreamBase Inc. http://www.streambase.com/ (Last Accessed March 2016)
  20. Tucker P et al (2003) Exploiting punctuation semantics in continuous data streams. In: IEEE TKDEGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Center for Data Science, Institute of TechnologyUniversity of WashingtonTacomaUSA