1 Introduction

When searching Google for “rewrite engine”, most of the references are about Apache web servers and rewrite engines for URLs. Such engines perform string rewriting, which is a particular case of term rewriting [1, 3], a very general model of computation based on the repeated application of simplification rules. Despite its simplicity, term rewriting has shown itself a suitable paradigm for expressing fundamental concepts of logics, mathematics, and computer science (e.g., concurrency, communication, interaction, etc.).

Beyond such theoretical aspects, the ideas of term rewriting influenced the design of specification and programming languages, many of which incorporate algebraic terms and rewrite rules. Software implementations of term rewriting have been developed including, of course, rewrite engines, but also a large variety of tools for compiler construction, program transformation, and formal verification by theorem proving or model checking.

In order to evaluate and compare the various rewrite engines available, a software competition named REC (Rewrite Engines Competition) was created in 2006. Organized together with WRLA (Workshop on Rewriting Logic and its Applications), REC provides a forum for sharing experiences among tool developers and potential users. Four editions of this competition have taken place so far: REC1 (2006), REC2 (2008), REC3 (2010), and REC4 (2018).

The present article, which is part of the TOOLympics project to celebrate the 25th anniversary of the TACAS conference by gathering numerous software competitions at ETAPS, provides a retrospective overview of past editions of the REC competition. Section 2 summarizes the developments of the competition; Sect. 3 lists all tools that have been assessed, and Sect. 4 presents the collection of benchmarks accumulated during the successive editions; finally, Sect. 5 draws perspectives for future editions of the REC competition.

2 Evolution of REC Competitions

In the mid-2000’s, it became manifest that the term-rewriting community was lacking a comparative study of the different rewrite engines available. The following excerpt, quoted from [11], articulates the motivation for such a study:

“The idea of organizing a rewrite competition arose from noticing various applications of rewriting in different areas and by different categories of researchers, many of them manifesting a genuine and explicit interest in term rewriting. We believe that many of us can benefit from such rewrite engine competitions, provided that they are fair and explicitly state what was tested in each case. For example, users of rewrite engine can more informatively select the right rewrite engine for their particular application. On the other hand, for rewrite engine developers, such events give them ideas on how to improve their tools and what to prioritize, as well as a clearer idea of how their engine compares to others.”

It was not clear, however, how to conduct such a study. The abstract and general nature of term rewriting has given birth to a great diversity in software implementations. General-purpose rewrite engines differ in the various forms of rewriting they support (conditional, nondeterministic, context-sensitive, etc.). Many other rewrite engines are specialized for particular problems and embedded into programming languages, theorem provers, environments for compiler construction and program transformation, etc. (see Sect. 3 for examples).

REC1 [11] faced such doubts about the right approach to follow and decided to focus on efficiency, measured in terms of CPU time and memory use. Only two tools participated in this first edition of the competition, organized together with WRLA 2006. A collection of benchmarks, namely term rewrite systems sorted in four categories (see Sect. 4), was produced. Each benchmark was translated by hand into the input language of each participating tool, and revised by tool developers to make sure this code was optimal for their tools.

REC2 [15] expanded on the ideas of REC1, with a double goal: (i) broaden the comparison by assessing the efficiency of a larger number of rewrite engines—indeed, five tools participated in REC2; and (ii) being a showcase for the term-rewriting community, with a dedicated session at WRLA 2008, where all participating tools were presented by their developers, who exposed the features and strengths of each tool and discussed the outcomes of the competition. Tool developers actively participated in the whole process of REC2, not merely for adapting competition benchmarks to the tools, but also for exchanging views on how to organize the competition and present its results. As a result of fruitful discussions, several changes were implemented, such as the design of a common language for expressing the benchmarks (see Sect. 4 below).

REC3 [14] followed the same approach as REC2, with a greater emphasis on automation and a larger set of term-rewriting benchmarks—including problems related to program transformation, a key application area of term rewriting. The developers of all the participating tools were involved in this competition, organized together with WRLA 2010. The reported results indicate the computation time spent by each tool on each benchmark.

REC4 [17] was the result of a long-term effort undertaken in 2015 and presented at WRLA 2018. The competition’s scope was broadened away from traditional rewrite engines to include functional and object-oriented languages. As a consequence, REC4 did not consider particular features implemented only in some tools, but focused instead on basic features common to all tools, namely term rewrite systems that are confluent and terminating, with free constructors and conditional rules. Tool execution and comparison of results was fully automated, making it unnecessary to include tool developers directly in the competition—although they were contacted by email, in case of problems, before the presentation of the results. A Top-5 podium was produced to indicate which tools can tackle the most problems within a given amount of time and memory.

3 Tools Assessed

So far, not fewer than 18 tools have been assessed during the REC competitions, as shown by Table 1. This table lists which tools participated in which editions of the competition. Not all tools have been assessed in all editions, as it happened, e.g., for prominent tools such as ELAN [4] and ASF+SDF [6], the development of which halted before or just after REC1.

Table 1. Languages and tools considered in the Rewrite Engines Competitions

It is worth pointing out the versatility of term rewriting and the diversity of its implementations. It is used in both specification and programming languages. These languages can be algebraic (e.g., CafeOBJ, LOTOS, Maude, mCRL2, Stratego/XT, etc.), functional (e.g., Clean, Haskell, LNT, OCaml, SML, etc.), or object-oriented (e.g., Rascal, Scala, Tom, etc.), and certain languages combine several of these traits, such as Opal, which is both algebraic and functional, or OCaml, which is both functional and object-oriented. Some languages also support higher-order programming (e.g., Haskell, OCaml), while others have built-in support for concurrency (e.g., LOTOS, LNT, Maude, mCRL2, etc.). Implementations encompass compilers and interpreters, certain languages (e.g., OCaml or Rascal) offering both, while other approaches (e.g., Tom) enable term rewrite systems to be embedded in general-purpose languages such as C or Java. Finally, some implementations (ASF/SDF, Stratego/XT, etc.) provide rich environments for language design, including support for lexical/syntactic analysis, construction and traversal of abstract syntax trees, as well as program transformations.

4 REC Benchmarks

As a byproduct of the efforts made in organizing the four REC competitions, a collection of benchmarks has been progressively accumulatedFootnote 1.

REC1 [11] set up the foundations of this collection, by gathering 41 term rewrite systems, split into four distinct categories: unconditional term rewrite systems (in which no rewrite rule has Boolean premises), conditional term rewrite systems (in which some rewrite rules have Boolean premises), rewriting modulo axioms (in which rewriting relies on certain axioms, such as commutativity and/or associativity), and rewriting modulo strategies (in which rewriting is context sensitive, guided by local strategies). Several REC1 benchmarks were derived from generic benchmarks parameterized by variables (e.g., the parameter of function computing the factorial of a natural number, the length of a list to be sorted, etc.) by giving particular values to these variables. Following the terminology used for the Model Checking Contest [20], we distinguish between models, which are generic benchmarks, and instances, which are benchmarks derived from generic benchmarks by giving actual values to parameters; the remaining benchmarks, which are not parameterized, are counted both as models and instances.

REC2 [15] brought a significant evolution: in REC1, each benchmark was specified in the input language of each tool, which was only feasible as the number of tools was small. REC2 introduced, to express its benchmarks, a common language, which we name REC-2008 and which was inspired by the TPDB language used at that time by the Termination Competition (the Confluence Competition uses a similar language). Several tools were adapted to accept this new language REC-2008 as input; for the other tools, translation was done manually.

REC3 [14] pursued in the same vein as REC2, while increasing the number of instances. REC3 also tried to expand the scope of the competition by introducing a separate collection of benchmarks meant for program transformation and expressed in an imperative language named TIL; however, this initiative was left with no follow-through.

REC4 [17], in order to address a larger set of specification and programming languages, introduced a new language REC-2017 derived from REC-2008 with additional restrictions ensuring that benchmarks are deterministic (hence, confluent), terminating, and free from equations between constructors. Consequently, the 3rd and 4th categories (rewriting modulo equations and rewriting modulo strategies) were removed, and the 1st and 2nd categories (unconditional and conditional rewriting) were merged into a single one, as most languages do not make such a distinction. The remaining REC-2008 benchmarks were upgraded to the REC-2017 language, and many new, significantly complex benchmarks were added to the collection. To provide for an objective comparison, scripts were developed to translate REC-2017 specifications to the input languages of all tools under assessment.

Table 2 gives a quantitative overview of the evolution of the REC benchmark collection; each cells having the form “\((m)\;n\)” denotes m models and n instances.

Table 2. Benchmarks considered in the Rewrite Engines Competitions

5 Conclusion

Term rewriting is a fundamental topic with many applications, as illustrated by the multiplicity of term-rewriting implementations in compilers and interpreters.

The Rewrite Engines Competitions (REC), the evolutions of which have been reviewed in the present article, stimulate the research interest in this field. One main lesson to be retained from these competitions is that performance of term rewriting significantly differs across implementations: there is room for enhancements and, following the latest REC competition (2018), three developer teams already reported plans to improve their tools to take into account the REC results.

Future REC competitions should address at least two points: (i) more languages should be assessed, inviting recent tools in the competition and keeping in mind that some tools may disappear if they are no longer maintained; (ii) more benchmarks should be considered, which will require dedicated effort to develop new benchmarks, given the lack of large, computationally intensive term rewriting systems freely available on the Web, and the subtle semantic differences that exist between the various flavours of term rewrite systems.