A List-Machine Benchmark for Mechanized Metatheory


We propose a benchmark to compare theorem-proving systems on their ability to express proofs of compiler correctness. In contrast to the first POPLmark, we emphasize the connection of proofs to compiler implementations, and we point out that much can be done without binders or alpha-conversion. We propose specific criteria for evaluating the utility of mechanized metatheory systems; we have constructed solutions in both Coq and Twelf metatheory, and we draw conclusions about those two systems in particular.

This is a preview of subscription content, access via your institution.


  1. 1.

    Appel, A.W.: Hints on Proving Theorems in Twelf (2000). http://www.cs.princeton.edu/~appel/twelf-tutorial

  2. 2.

    Appel, A.W., Leroy, X.: List-Machine Exercise (2006). http://www.cs.princeton.edu/~appel/listmachine/

  3. 3.

    Appel, A.W., Melliès, P.A., Richards, C.D., Vouillon, J.: A very modal model of a modern, major, general type system. In: Proc. 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’07), pp. 109–122. ACM, New York (2007)

    Google Scholar 

  4. 4.

    Aydemir, B.E., Bohannon, A., Fairbairn, M., Foster, J.N., Pierce, B.C., Sewell, P., Vytiniotis, D., Washburn, G., Weirich, S., Zdancewic, S.: Mechanized metatheory for the masses: the POPLmark challenge. In: Int. Conf. on Theorem Proving in Higher Order Logics (TPHOLs 2005), Lecture Notes in Computer Science, vol. 3603, pp. 50–65. Springer, Berlin (2005). http://plclub.org/mmm/

    Google Scholar 

  5. 5.

    Bertot, Y., Castéran, P.: Interactive Theorem Proving and Program Development—Coq’Art: The Calculus of Inductive Constructions. EATCS Texts in Theoretical Computer Science. Springer, New York (2004)

    MATH  Google Scholar 

  6. 6.

    The Coq Proof Assistant (1984–2010). Software and documentation available from http://coq.inria.fr/

  7. 7.

    Danvy, O.: Defunctionalized interpreters for programming languages. In: Proceeding of the 13th ACM SIGPLAN International Conference on Functional Programming, ICFP 2008, pp. 131–142. ACM, New York (2008)

    Google Scholar 

  8. 8.

    Delahaye, D., Dubois, C., Étienne, J.F.: Extracting purely functional contents from logical inductive types. In: Theorem Proving in Higher Order Logics (TPHOLs 2007), Lecture Notes in Computer Science, vol. 4732, pp. 70–85. Springer, Berlin (2007)

    Google Scholar 

  9. 9.

    Klein, G., Nipkow, T.: A machine-checked model for a Java-like language, virtual machine and compiler. ACM Trans. Program. Lang. Syst. 28(4), 619–695 (2006)

    Article  Google Scholar 

  10. 10.

    Leinenbach, D., Paul, W., Petrova, E.: Towards the formal verification of a C0 compiler. In: 3rd International Conference on Software Engineering and Formal Methods (SEFM 2005), pp. 2–11. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  11. 11.

    Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)

    Article  Google Scholar 

  12. 12.

    Leroy, X., Grall, H.: Coinductive big-step operational semantics. Inf. Comput. 207(2), 284–304 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  13. 13.

    Morrisett, G., Walker, D., Crary, K., Glew, N.: From System F to typed assembly language. ACM Trans. Program. Lang. Syst. 21(3), 528–569 (1999)

    Article  Google Scholar 

  14. 14.

    Pfenning, F., Schuermann, C.: Twelf User’s Guide, Version 1.4 (2002). http://www.cs.cmu.edu/~twelf/guide-1-4

  15. 15.

    Pfenning, F., Schürmann, C.: System description: Twelf—a meta-logical framework for deductive systems. In: CADE-16: Proceedings of the 16th International Conference on Automated Deduction, Lecture Notes in Computer Science, vol. 1632, pp. 202–206. Springer, Berlin (1999)

    Google Scholar 

  16. 16.

    Weirich, S.: Experience Report with Twelf (2005). http://lists.seas.upenn.edu/pipermail/poplmark/2005-August/000220.html. E-mail to POPLmark mailing list, August 17

  17. 17.

    Wright, A.K., Felleisen, M.: A syntactic approach to type soundness. Inf. Comput. 115(1), 38–94 (1994)

    MathSciNet  MATH  Article  Google Scholar 

  18. 18.

    Wu, D., Appel, A.W., Stump, A.: Foundational proof checkers with small witnesses. In: 5th ACM SIGPLAN International Conference on Principles and Practice of Declarative Programming, pp. 264–274. ACM, New York (2003)

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Xavier Leroy.

Additional information

Andrew W. Appel and Robert Dockins were supported in part by NSF Grant CNS-0910448.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Appel, A.W., Dockins, R. & Leroy, X. A List-Machine Benchmark for Mechanized Metatheory. J Autom Reasoning 49, 453–491 (2012). https://doi.org/10.1007/s10817-011-9226-1

Download citation


  • Theorem proving
  • Proof assistants
  • Program proof
  • Compiler verification
  • Typed machine language
  • Metatheory
  • Coq
  • Twelf