A non-relational stream is a continuously generated, ordered collection of data items that are not relational tuples and therefore not readily processed by relational algebraic operators such as selection, projection, join, and aggregation. Each data item may be associated with a time stamp that represents the time when that data item was produced or received by a certain device or system. Many applications require highly efficient, low-latency, real-time processing techniques in order to keep up with high-volume data streams.
Non-relational data streams have been studied in the following forms: graph streams, spatial streams, text streams, and XML streams.
Each data item represents an insert, update, or delete operation on a vertex or an edge in a graph. Queries on these streams are concerned with estimating properties of the graph or finding patterns within that graph. See “Graph Mining on Streams”.
Spatial data streams represent the motion of objects (e.g., people and taxicabs). Queries on these streams often take the form of range or nearest neighbor (NN) inquiries (e.g., locating the nearest taxi to the person requesting it). See “Continuous Monitoring of Spatial Queries”.
Text streams (e.g., messages from Twitter, LinkedIn, Facebook, WeChat, and other social media) tend to be temporally ordered collections of text or text documents. Queries on text streams include text classification, topic detection and tracking (including event discovery from a stream of news stories), bursty event detection (e.g., reports of disease outbreaks), knowledge and opinion mining from blogs/chat transcripts, and search engine log files. See “Text Streaming Model”.
XML streams are continuously generated series of XML documents. Queries on these streams may select only the XML documents that match certain criteria or may transform input streams into different output streams. See “XML Stream Processing”.