Relational Equivalence Proofs Between Imperative and MapReduce Algorithms

  • Bernhard Beckert
  • Timo Bingmann
  • Moritz Kiefer
  • Peter Sanders
  • Mattias Ulbrich
  • Alexander WeiglEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11294)


Distributed programming frameworks like MapReduce, Spark and Thrill, are widely used for the implementation of algorithms operating on large datasets. However, implementing in these frameworks is more demanding than coming up with sequential implementations. One way to achieve correctness of an optimized implementation is by deriving it from an existing imperative sequential algorithm description through a sequence of behavior-preserving transformations.

We present a novel approach for proving equivalence between imperative and deterministic MapReduce algorithms based on partitioning the equivalence proof into a sequence of equivalence proofs between intermediate programs with smaller differences. Our approach is based on the insight that proofs are best conducted using a combination of two kinds of steps: (1) uniform context-independent rewriting transformations; and (2) context-dependent flexible transformations that can be proved using relational reasoning with coupling invariants.

We demonstrate the feasibility of our approach by evaluating it on two prototypical algorithms commonly used as examples in MapReduce frameworks: k-means and PageRank. To carry out the proofs, we use a higher-order theorem prover with partial proof automation. The results show that our approach and its prototypical implementation enable equivalence proofs of non-trivial algorithms and could be automated to a large degree.


  1. 1.
    Barthe, G., Crespo, J.M., Kunz, C.: Relational verification using product programs. In: Butler, M., Schulte, W. (eds.) FM 2011. LNCS, vol. 6664, pp. 200–214. Springer, Heidelberg (2011). Scholar
  2. 2.
    Beckert, B., Bingmann, T., Kiefer, M., Sanders, P., Ulbrich, M., Weigl, A.: Relational Equivalence Proofs Between Imperative and MapReduce Algorithms. ArXiv e-prints, January 2018. arXiv:1801.08766
  3. 3.
    Beckert, B., Bingmann, T., Kiefer, M., Sanders, P., Ulbrich, M., Weigl, A.: Proving equivalence between imperative and mapreduce implementations using program transformations. In: Third Workshop Models for Formal Analysis of Real Systems and Sixth International Workshop on Verification and Program Transformation. Electronic Proceedings in Theoretical Computer Science, vol. 268, pp. 185–199. Open Publishing Association (2018)Google Scholar
  4. 4.
    Bingmann, T., et al.: Thrill: high-performance algorithmic distributed batch data processing with C++. In: IEEE International Conference on Big Data, pp. 172–183. IEEE, December 2016. preprint arXiv:1608.05634
  5. 5.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998). Scholar
  6. 6.
    Chambers, C., et al.: FlumeJava: easy, efficient data-parallel pipelines. ACM SIGPLAN Notices 45(6), 363–375 (2010)CrossRefGoogle Scholar
  7. 7.
    Chen, Y.F., Hong, C.D., Lengál, O., Mu, S.C., Sinha, N., Wang, B.Y.: An Executable Sequential Specification for Spark Aggregation (2017). arXiv:1702.02439CrossRefGoogle Scholar
  8. 8.
    Chen, Y.-F., Hong, C.-D., Sinha, N., Wang, B.-Y.: Commutativity of reducers. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 131–146. Springer, Heidelberg (2015). Scholar
  9. 9.
    Chen, Y., Song, L., Wu, Z.: The Commutativity Problem of the MapReduce Framework: A Transducer-based Approach. CoRR abs/1605.01497 (2016). arXiv:1605.01497Google Scholar
  10. 10.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  11. 11.
    Elenbogen, D., Katz, S., Strichman, O.: Proving mutual termination. Form. Methods Syst. Des. 47(2), 204–229 (2015). Scholar
  12. 12.
    Felsing, D., Grebing, S., Klebanov, V., Rümmer, P., Ulbrich, M.: Automating regression verification. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pp. 349–360. ASE 2014. ACM, New York, NY, USA (2014)Google Scholar
  13. 13.
    Godlin, B., Strichman, O.: Regression verification. In: Proceedings of the 46th Annual Design Automation Conference, pp. 466–471. DAC 2009. ACM, New York, NY, USA (2009)Google Scholar
  14. 14.
    Grossman, S., Cohen, S., Itzhaky, S., Rinetzky, N., Sagiv, M.: Verifying equivalence of spark programs. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 282–300. Springer, Cham (2017). Scholar
  15. 15.
    Hawblitzel, C., Kawaguchi, M., Lahiri, S., Rebêlo, H.: Mutual summaries: unifying program comparison techniques. In: Informal proceedings of BOOGIE 2011 workshop (2011).
  16. 16.
    Kahn, G.: Natural semantics. STACS 87, 22–39 (1987)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Kiefer, M., Klebanov, V., Ulbrich, M.: Relational program reasoning using compiler IR - combining static verification and dynamic analysis. J. Autom. Reason. (2017)Google Scholar
  18. 18.
    Klebanov, V., Rümmer, P., Ulbrich, M.: Automating regression verification of pointer programs by predicate abstraction. J. Formal Methods Syst. Des. 52, 229–259 (2017)CrossRefzbMATHGoogle Scholar
  19. 19.
    Lahiri, S.K., Hawblitzel, C., Kawaguchi, M., Rebêlo, H.: SYMDIFF: a language-agnostic semantic diff tool for imperative programs. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 712–717. Springer, Heidelberg (2012). Scholar
  20. 20.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982). Scholar
  21. 21.
    Radoi, C., Fink, S.J., Rabbah, R., Sridharan, M.: Translating Imperative Code to MapReduce. SIGPLAN Not. 49(10), 909–927 (2014)CrossRefGoogle Scholar
  22. 22.
    The Coq development team: The Coq proof assistant reference manual. LogiCal Project, version 8.6 (2004).
  23. 23.
    Verdoolaege, S., Janssens, G., Bruynooghe, M.: Equivalence checking of static affine programs using widening to handle recurrences. ACM Trans. Program. Lang. Syst. 34(3), 11:1–11:35 (2012). Scholar
  24. 24.
    White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Sebastopol (2012)Google Scholar
  25. 25.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp. 10–10. HotCloud 2010, USENIX Association, Berkeley, CA, USA (2010).

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Bernhard Beckert
    • 1
  • Timo Bingmann
    • 1
  • Moritz Kiefer
    • 1
  • Peter Sanders
    • 1
  • Mattias Ulbrich
    • 1
  • Alexander Weigl
    • 1
    Email author
  1. 1.Institute of Theoretical InformaticsKarlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations