The Commutativity Problem of the MapReduce Framework: A Transducer-Based Approach
MapReduce is a popular programming model for data parallel computation. In MapReduce, the reducer produces an output from a list of inputs. Due to the scheduling policy of the platform, the inputs may arrive at the reducers in different order. The commutativity problem of reducers asks if the output of a reducer is independent of the order of its inputs. Although the problem is undecidable in general, the MapReduce programs in practice are usually used for data analytics and thus require very simple control flow. By exploiting the simplicity, we propose a programming language for reducers where the commutativity problem is decidable. The main idea of the reducer language is to separate the control and data flow of programs and disallow arithmetic operations in the control flow. The decision procedure for the commutativity problem is obtained through a reduction to the equivalence problem of streaming numerical transducers (SNTs), a novel automata model over infinite alphabets introduced in this paper. The design of SNTs is inspired by streaming transducers (Alur and Cerny, POPL 2011). Nevertheless, the two models are intrinsically different since the outputs of SNTs are integers while those of streaming transducers are data words. The decidability of the equivalence of SNTs is achieved with an involved combinatorial analysis of the evolvement of the values of the integer variables during the runs of SNTs.
KeywordsDecision Procedure Reducer Program Transition Graph Simple Cycle Control Flow Graph
Yu-Fang Chen is partially supported by the MOST project No. 103-2221-E-001-019-MY3. Zhilin Wu is partially supported by the NSFC grants No. 61100062, 61272135, 61472474, and 61572478.
- 1.Alur, R., Cerny, P.: Streaming transducers for algorithmic verification of single-pass list-processing programs. In: POPL, pp. 599–610. ACM (2011)Google Scholar
- 2.Alur, R., Antoni, L.D, Deshmukh, J., Raghothaman, M., Yuan, Y.: Regular functions and cost register automata. In: LICS, pp. 13–22 (2013)Google Scholar
- 3.Chen, Y.F., Hong, C.D., Sinha, N., Wang, B.Y.: Commutativity of reducers. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 131–146. Springer, Heidelberg (2015)Google Scholar
- 4.Chen, Y., Lei, S., Wu, Z.: The commutativity problem of the mapreduce framework: a transducer-based approach. CoRR, abs/1605.01497 (2016)Google Scholar
- 7.Haase, C., Halfon, S.: Integer vector addition systems with states. In: Ouaknine, J., Potapov, I., Worrell, J. (eds.) RP 2014. LNCS, vol. 8762, pp. 112–124. Springer, Heidelberg (2014)Google Scholar
- 8.Hadoop. https://hadoop.apache.org
- 11.Leroux, J., Sutre, G.: Flat counter automata almost everywhere! In: Software Verification: Infinite-State Model Checking and Static Program Analysis. Dagstuhl Seminar Proceedings, vol. 6081 (2006)Google Scholar
- 13.Neven, F., Schweikardt, N., Servais, F., Tan, T.: Distributed streaming with finite memory. In: ICDT, pp. 324–341 (2015)Google Scholar
- 16.Spark. http://spark.apache.org
- 18.Xiao, T., Zhang, J., Zhou, H., Guo, Z., McDirmid, S., Lin, W., Chen, W., Zhou, L.: Nondeterminism in mapreduce considered harmful? An empirical study on non-commutative aggregators in mapreduce programs. In: ICSE, pp. 44–53 (2014)Google Scholar