Abstract
Stream processing applications such as algorithmic trading, MPEG processing, and web content analysis are ubiquitous and essential to business and entertainment. Language designers have developed numerous domain-specific languages that are both tailored to the needs of their applications, and optimized for performance on their particular target platforms. Unfortunately, the goals of generality and performance are frequently at odds, and prior work on the formal semantics of stream processing languages does not capture the details necessary for reasoning about implementations. This paper presents Brooklet, a core calculus for stream processing that allows us to reason about how to map languages to platforms and how to optimize stream programs. We translate from three representative languages, CQL, StreamIt, and Sawzall, to Brooklet, and show that the translations are correct. We formalize three popular and vital optimizations, data-parallel computation, operator fusion, and operator re-ordering, and show under which conditions they are correct. Language designers can use Brooklet to specify exactly how new features or languages behave. Language implementors can use Brooklet to show exactly under which circumstances new optimizations are correct. In ongoing work, we are developing an intermediate language for streaming that is based on Brooklet. We are implementing our intermediate language on System S, IBM’s high-performance streaming middleware.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: Semantic foundations and query execution. VLDB Journal, 121–142 (2006)
Arasu, A., Widom, J.: A denotational semantics for continuous queries over streams and relations. In: SIGMOD Record, pp. 6–11 (2004)
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: Stream computing on graphics hardware. In: TOG, pp. 777–786 (2004)
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: SIGMOD, pp. 379–390 (2000)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Drake, M., Hoffmann, H., Rabbah, R., Amarasinghe, S.: MPEG-2 decoding in a stream programming language. In: IPDPS, pp. 86–95 (2006)
Fegaras, L.: Optimizing queries with object updates. In: JIIS, pp. 219–242 (1999)
Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. In: TOPLAS, pp. 319–349 (1987)
Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.: Spade: The System S declarative stream processing engine. In: SIGMOD, pp. 1123–1134 (2008)
Ghelli, G., Onose, N., Rose, K., Siméon, J.: XML query optimization in the presence of side effects. In: SIGMOD, pp. 339–352 (2008)
Gurevich, Y., Leinders, D., den Bussche, J.V.: A theory of stream queries. In: DBLP, pp. 153–168 (2007)
Hoare, C.A.R.: Communicating sequential processes. In: CACM, pp. 666–677 (1978)
Igarashi, A., Pierce, B., Wadler, P.: Featherweight Java - a minimal core calculus for Java and GJ. In: TOPLAS, pp. 132–146 (1999)
Kahn, G.: The semantics of a simple language for parallel programming. In: IFIP, pp. 471–475 (1974)
Lämmel, R.: Google’s MapReduce Programming Model – Revisited. Science of Computer Programming Journal, 208–237 (2007)
Lee, E.A., Messerschmitt, D.G.: Synchronous data flow. In: Proc. IEEE, pp. 1235–1245 (1987)
Nielson, H.R., Nielson, F.: Semantics with applications: a formal introduction. John Wiley & Sons, Inc., Chichester (1992)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A not-so-foreign language for data processing. In: SIGMOD, pp. 1099–1110 (2008)
Pierce, B.C.: Types and programming languages. MIT Press, Cambridge (2002)
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: Parallel analysis with Sawzall. In: Scientific Programming, pp. 277–298 (2005)
Rinard, M.C., Diniz, P.C.: Commutativity analysis: a new analysis framework for parallelizing compilers. In: PLDI, pp. 54–67 (1996)
Soulé, R., Hirzel, M., Grimm, R., Gedik, B., Andrade, H., Kumar, V., Wu, K.-L.: A unified semantics for stream processing languages (extended). Technical Report 2010-924, New York University (2010)
Stephens, R.: A survey of stream processing. In: Acta Inf., pp. 491–541 (1997)
The StreamBase dialect of StreamSQL, http://streamsql.org/
Terry, D., Goldberg, D., Nichols, D., Oki, B.: Continuous queries over append-only databases. In: SIGMOD, pp. 321–330 (1992)
Thies, W., Karczmarek, M., Amarasinghe, S.P.: StreamIt: A language for streaming applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)
Thies, W., Karczmarek, M., Gordon, M., Maze, D., Wong, J., Hoffman, H., Brown, M., Amarasinghe, S.: StreamIt: A compiler for streaming applications. In: MIT Laboratory for Computer Science Technical Memo LCS-TM-622 (2001)
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, Ú., Gunda, P.K., Currey, J.: DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI, pp. 1–14 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Soulé, R. et al. (2010). A Universal Calculus for Stream Processing Languages. In: Gordon, A.D. (eds) Programming Languages and Systems. ESOP 2010. Lecture Notes in Computer Science, vol 6012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11957-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-11957-6_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11956-9
Online ISBN: 978-3-642-11957-6
eBook Packages: Computer ScienceComputer Science (R0)