CoCo 2019: report on the eighth confluence competition

We report on the 2019 edition of the Confluence Competition, a competition of software tools that aim to prove or disprove confluence and related (undecidable) properties of rewrite systems automatically.


Introduction
Term rewriting is a Turing-complete model of computation, which underlies much of declarative programming and automated theorem proving.Confluence provides a general notion of determinism and has been conceived as one of the central properties of rewriting.A rewrite system R is a set of directed equations, so-called rewrite rules, which induces a rewrite relation → R on terms.It is called confluent if for all terms s, t and u such that s → * R t and s → * R u there exists a term v such that t → * R v and u → * R v. Confluence is equivalent to the Church-Rosser property, introduced in 1936 by Church and Rosser [8] to show the consistency of the λI-calculus, and guarantees that normal forms (which are terms t such that t → R u for no term u) are unique.
We provide two examples and refer to standard textbooks for comprehensive surveys [7,31,44].The first rewrite system describes the Coffee Bean Game, a variant of the Grecian Urn described in [11].sequence of black and white beans.In a move, a player must take two adjacent beans and put back one bean, according to the following set of rules R 1 : The player who puts the last white bean wins.For instance, the following is a valid game: In this case the player who started won, since the last white bean was put in the 13th move.It turns out that the moves of the players do not affect the outcome of the game, because the rules constitute a confluent system; the outcome depends solely on the initial configuration.
The second example is attributed to Henk Barendregt in [17].
Example 2 Consider the rewrite system R consisting of the following three rewrite rules c → g(c) f(x, x) → a g(x) → f(x, g(x)) The constant c rewrites in four steps to a: c → R g(c) → R f(c, g(c)) → R f(g(c), g(c)) → R a. Hence c → * R a and thus also c → R g(c) → * R g(a).Therefore, c rewrites to both a and g(a).The constant a is a normal form as none of the rewrite rules applies.The term g(a) admits exactly one (infinite) rewrite sequence: → R f(a, f(a, g(a))) → R f(a, f(a, f(a, g(a)))) Since the term a is not reached, R is not confluent.The weaker property of unique normal forms (both with respect to conversion and reduction) is satisfied.
Another property of rewrite systems that has received much attention, including a designated competition (term-COMP), is termination. 1A rewrite system R is terminating if its rewrite relation → R is well-founded.The rewrite system in Example 1 is terminating because in each step the number of beans decreases.
For terminating rewrite systems, confluence is decidable.The decision procedure (Knuth and Bendix [18]) is a landmark result in rewriting and implemented in all confluence tools.It amounts to checking whether all critical pairs are joinable.Critical pairs are formed by overlapping left-hand sides of rewrite rules to create (for finite rewrite systems) a finite number of local peaks t R ← s → R u.In Example 1, we have the following critical peaks: One easily checks that the resulting critical pairs (the end points of the peaks) are joinable, meaning that they can be rewritten to the same bean configuration.Hence, confluence is established.
In general, confluence and termination are undecidable properties of rewrite systems.As a consequence, no single automatable technique is sufficient to determine the status of every possible input problem.Tools implement a number of different techniques that are suitably combined to determine the status of a problem.Often, this falls short, also because of imposed time limits in competitions.
The remainder of this competition report is organized as follows.In the next section, we present a short overview of the organization of CoCo, including a description of the supporting infrastructure.The competition categories of CoCo 2019 are described in Sect.3, and Sect. 4 briefly describes the participating tools.Section 5 presents the results of CoCo 2019, and we conclude in Sect.6 with ideas for future editions of CoCo.

Competition
The focus on confluence research has shifted toward automation in the past decade.To stimulate these developments, the Confluence Competition (CoCo) 2 has been set up in 2012.Since its creation with 4 tools competing in 2 categories, CoCo has grown steadily and featured 12 categories in 2019, ranging from confluence of various rewrite formalisms to commutation and infeasibility.These are described in the next section.Since 2012 a total of 21 tools have participated in CoCo.Many of the tools participated in multiple categories.Tools operate on problems from the online database of confluence problems (COPS) 3 and a number of secret problems submitted shortly before the competition, in a format suitable for the category in which the tools participate.For each category, 100 problems consisting of all secret problems and a random selection from COPS are collected.
CoCo is executed on the cross-community competition platform StarExec [43].Tool authors upload their tools to StarExec two weeks before the competition, after which a test run is conducted involving a few selected problems for each category.This allows tool authors to fix last-minute bugs before the live competition.The steering committee of CoCo is responsible for running the competition on StarExec and exporting the results.Each tool has access to a single node and is given 60 s per problem.For a given problem, tools must answer YES (proved) or NO (disproved), followed by a justification that is understandable by a human expert; any other output signals that the tool could not determine the status of the problem.As human expertise is insufficient to guarantee correctness, CoCo supports certification categories, in which tool output is checked by an independent and formally verified certifier.The possibility in StarExec to reserve a large number of computing nodes allows to complete CoCo within a single slot of a workshop or conference.This live event of CoCo is shared with the audience via the online service LiveView [16] which continuously polls new results from StarExec while the competition is running.A  1. Since all categories deal with undecidable problems, and developing software tools is error-prone, YES/NO conflicts (which are situations where tools produce contradictory answers) appear once in a while.The real-time display of conflicts allows the CoCo steering committee to take action before winners are announced.Soon after each competition, the results are made available from the results page. 4 A few weeks after each live competition, there is a full run of tools on all eligible problems in the COPS database.Authors of tools with incorrect results have the possibility to submit a corrected version for the full run.

COPS
All problems in CoCo are selected from COPS, an online database for confluence and related properties in term rewriting.At the time of writing, COPS contains 1155 problems, including 471 collected from the literature.The problems are numbered consecutively starting from COPS #1.COPS supports several formats, to cater for the various CoCo categories.Via its web interface, everyone can retrieve and download problems and also upload new problems.The interface is designed in a way that novice users can easily learn problem formats.At the same time experts and tool builders can conveniently retrieve problem sets for their research and experiments.The former is achieved by syntax highlighting; for the latter a tagging mechanism is used.Tags are combined into queries for selecting problem sets.Different kinds of tags are supported.On the one hand, properties of rewrite systems like left-linearity, groundness, and termination are useful to filter the database for those problems that are sup-4 http://project-coco.uibk.ac.at/results/.
ported by a particular tool or technique.These include tags to distinguish the different input formats, which are automatically assigned when problems are submitted.For example, "trs !confluent !non_confluent" is the query to select all first-order rewrite systems whose confluence status is unknown, meaning that no tool produced a YES or NO answer.(At CoCo 2019 this query returned 292 problems.If we include the secret problems, the number is 299.)A second category of tags refers to problems that were used in full runs of CoCo.The literature tag is assigned to problems that appear in the literature, which includes papers presented at informal workshops like the International Workshop on Confluence and Ph.D. theses.
The data in COPS consist of problems and tags.Most of the tag files are generated automatically or updated by a collection of scripts that call external tools.To prevent duplicate problems in COPS, a duplicate checker is used, which is based on a program that transforms problems into a canonical form which is invariant under permutation of rules and renaming of function symbols. 5Currently, only problems in the basic TRS format (first-order, no conditions, no sorts) are supported.

CoCoWeb
Most of the tools that participate in CoCo can be downloaded, installed, and run on one's local machine, but this can be a painful process.Only few confluence tools-we are aware of CO3 [29], ConCon [41], and CSI [27,48]provide a convenient web interface to easily test the status of a confluence problem that is provided by the user.In [16] CoCoWeb 6 is presented, a web interface to execute confluence tools on confluence problems.This provides a single entry point to all tools that participate in CoCo.The typical use of CoCoWeb is to test whether a given confluence problem is known to be confluent or not.This is useful when preparing or reviewing an article, preparing or correcting exams about term rewriting, and when contemplating submitting a challenging problem to COPS.In particular, CoCoWeb is useful when crafting or looking for examples to illustrate a new technique.Using CoCoWeb on the rewrite system from Example 2 (COPS #47), we learn that (automatically) disproving confluence is much harder than showing unique normal forms (UNC); only a single confluence tool (CSI) answers NO on this problem (and only since 2018).This answer is certified by CeTA (see the description under CPF-TRS in Sect.3).

Categories
In this section, we briefly describe the 12 categories of CoCo 2019.For each category, we list the participating tools, and for most we provide one or two example problems.

TRS
The category TRS is about confluence of first-order term rewriting and has been part of CoCo from the very beginning.We give two examples.The first one is Combinatory Logic, which is confluent because it satisfies the orthogonality criterion.In 2019, three tools contested the TRS category: ACP, CoLL-Saigawa, and CSI.

CPF-TRS
CPF-TRS is a category for certified confluence proofs.CPF stands for Certification Problem Format, 7 an extendable format to express not only confluence but also termination and complexity proofs of first-order rewrite systems [37].The purpose of the certification categories (CPF-TRS and CPF-CTRS) is to ensure that tools produce correct answers.In these categories, tools have to produce certified proofs with their answers.The predominant approach to achieve this uses a combination of a confluence prover and independent certifier.First, the confluence prover analyzes confluence as usual, restricting itself to criteria supported by the certifier.If it is successful, the prover outputs its proof in CPF, which is then checked by the certifier.In our case, this is CeTA [45], a stateof-the-art certifier for rewriting techniques generated from IsaFoR, 8 a formalization of first-order term rewriting in the 7 http://cl-informatik.uibk.ac.at/software/cpf/. 8http://cl-informatik.uibk.ac.at/software/ceta/.
Isabelle/HOL proof assistant [28].Consequently, certificates must be expressed in CPF.Also this category has been part of CoCo from 2012.For CoCo 2019 the tools ACP and CSI teamed up with CeTA.

CTRS and CPF-CTRS
The categories CTRS and CPF-CTRS, introduced respectively, in 2014 and 2015, are concerned with (certified) confluence of conditional term rewriting, a formalism in which rewrite rules come equipped with conditions that are evaluated recursively using the rewrite relation.
The declaration (CONDITIONTYPE ORIENTED) in the above example problem specifies that the conditions (x == true and x == false) of the rules are interpreted as reachability (→ * ); a term not(t) can be rewritten to false using the first rule provided the argument term t rewrites to true.The competition restricts to this kind of conditional rewriting since the tools do so.In 2019, three tools contested the CTRS category: ACP, CO3, and ConCon.The combination of ConCon and CeTA was the only participant in the CPF-CTRS category.

HRS
The HRS category, introduced in 2015, deals with confluence of higher-order rewriting, i.e., rewriting with binders and functional variables, like in the following example: Here, Z is a higher-order variable, which is apparent from the variable declaration Z : o -> o.The example is not confluent because the term f (mu (\x.s x)) (mu (\x.s x)) can be rewritten to both a and b.The format supported by CoCo goes back to the higher-order rewrite systems of Mayr and Nipkow [21], with small modifications for increased readability.In 2019, the tool CSIˆho was the only participant of the HRS category.

GCR
This category is about ground-confluence of many-sorted term rewrite systems and was also introduced in 2015.The signature declaration (f 0 0 -> 1) in the example below (COPS #558) ensures that the binary function symbol f can only appear at the root of terms.Note that the (c -> 0) declaration specifies the constant symbol c, which does not appear in the rewrite rules, but is used to build the set of ground terms.
If (c -> 0) is omitted, then the system is ground confluent because the unjoinable peak f(c, b) ← f(c, a) → f(a, a) does not exist.In 2019, the tools AGCP and FORT participated in the GCR category.

NFP, UNC, and UNR
The three categories NFP, UNC, and UNR were introduced in 2016 and are about properties of first-order term rewrite systems related to unique normal forms.A rewrite system R has the normal form property (NFP) if every term that is convertible to a normal form, rewrites to that normal form (for all terms t and u, if t ↔ * R u and u is a normal form then t → * R u).We say that R has unique normal forms with respect to conversion (UNC) if different normal forms are not convertible (for all normal forms t and u, if t ↔ * R u then t = u).Finally, R has unique normal forms with respect to reduction (UNR) if no term rewrites to different normal forms.These three properties are weaker than confluence (CR): is not confluent but satisfies the three weaker properties.In 2019 CSI and FORT participated in all three categories whereas ACP joined the UNC category.

COM
The category COM is about commutation of first-order rewrite systems and was introduced in 2019.Two rewrite systems R and S commute if the inclusion Here, • denotes relation composition.Commutation is an important generalization of confluence.Apart from direct applications in rewriting, e.g., for confluence, standardization, normalization, and relative termination, commutation is the basis of many results in computer science, like correctness of program transformations [17], and bisimulation up-to [33].
To ensure compatibility of the signatures of the rewrite systems R and S, function symbols and variables in S are renamed on demand.We give an example of a commutation problem that illustrates the problem.Consider COPS #82 (consisting of the rewrite rules f(a) → f(f(a)) and f(x) → f(a)) and COPS #80 (consisting of a → f(a, b) and f(a, b) → f(b, a)).Since function symbol f is unary in the first and binary in the second rewrite system, it is renamed to f in COPS #80: The correct answer of this commutation problem is YES since the critical peak of R and S can be closed to a decreasing diagram [1].To reuse existing systems and avoid duplication, in COPS this problem is given as and an inlining tool generates the earlier problem (by replacing the (COPS 82 80) declaration with the content of COPS #82 and COPS #80, with f in the latter renamed into f as described above) before it is passed to tools participating in the commutation category.The COM category was contested by ACP, CoLL, and FORT.

INF
The INF category is about infeasibility problems.It was also introduced in 2019.Infeasibility problems originate from different sources.Critical pairs in a conditional rewrite system are equipped with conditions.If no satisfying substitution for the variables in the conditions exists, the critical pair is harmless and can be ignored when analyzing confluence of the rewrite system in question.In this case, the critical pair is said to be infeasible [31,Definition 7.1.8].Sufficient conditions for infeasibility of conditional critical pairs are reported in [19,42].
Another source of infeasibility problems is the dependency graph in termination analysis of rewrite systems [6].An edge from dependency pair 1 → r 1 to dependency pair 2 → r 2 exists in the dependency graph if two substitutions σ and τ can be found such that r 1 σ rewrites to 2 τ .(By renaming the variables in the dependency pairs apart, a single substitution suffices.)If no such substitutions exist, there is no edge, which may ease the task of proving termination of the underlying rewrite system [13,24].
We provide two example problems.The first one stems from the conditional critical pair between the two conditional rewrite rules in COPS #547: The correct answer of this infeasibility problem is YES since no term in the underlying conditional rewrite system rewrites to both a and b.In COPS, this problem is given as and an inlining tool generates the earlier problem before it is passed to tools participating in the infeasibility category.The == sign in the condition of infeasibility problems is interpreted as reachability (→ * ) if the rewrite system referenced in the (COPS n) declaration is a TRS or an oriented CTRS.If it is semi-equational CTRS, then == is interpreted as convertibility (↔ * ).
The second example is related to Example 2 from the introduction and is a special case since the condition in the infeasibility problem contains no variables: It has YES as correct answer since the term G(A) does not rewrite to A. This answer can be used to conclude that the underlying rewrite system is not confluent.

SRS
The category SRS is about confluence of string rewriting.String rewrite systems are term rewrite systems in which terms are strings.To ensure that the infrastructure developed for TRSs can be reused, the TRS format is used with the restriction that all function symbols are unary.So a string rewrite rule ab → ba is rendered as a(b(x)) → b(a(x)) where x is a variable.A concrete example is given below: The correct answer of this problem is YES since the addition of the redundant rules [26] f(x) -> f(f(f(x))) and f(x) -> x makes the critical pairs of the SRS development closed [32].
The SRS category was created to foster research on confluence techniques for string rewriting.In the Termination Competition, there is an active community developing powerful techniques for (relative) termination of string rewrite systems.We anticipate that these are beneficial when applied to confluence analysis.
The tools ACP, CSI, CoLL-Saigawa, and noko-leipzig participated in the SRS category.

Tools
In this section, we briefly present the tools that participated in CoCo 2019.More detailed descriptions are available online. 9All tools are available for testing via CoCoWeb.

ACP
The tool ACP 10 has been participating in CoCo from the beginning [5].In 2019, it participated in the COM, CPF-TRS, CTRS, SRS, TRS and UNC categories, winning three of them.New techniques for the latter category are described in [3].For the TRS category, ACP supports ordered rewriting [20].ACP is written in SML/NJ.

AGCP
The tool AGCP 11 participated in the GCR category.It uses rewriting induction to (dis)prove ground confluence of manysorted rewrite systems [2,4].AGCP is written in SML/NJ.

CeTA
CeTA 12 is a certifier for (non-)confluence (and other properties) of rewrite systems with and without conditions [45].It is used by ACP, CSI and ConCon to certify their generated (non-)confluence proofs.The combinations CSI+CeTA and ConCon+CeTA won the respective TRS-CPS and CTRS-CPF categories.New in 2019 is the support for ordered completion proofs for infeasibility of conditional rules and critical pairs 9 http://project-coco.uibk.ac.at/2019/participants/. 10 http://www.nue.ie.niigata-u.ac.jp/tools/acp/.

CO3
The tool CO3 13 participated in the CTRS and INF categories.CO3 is written in OCaml.It incorporates the new technique of narrowing trees [30].An early description can be found in [29].

CoLL
The tool CoLL 14 participated in the new COM category.It is written in OCaml and implements various commutation criteria for left-linear rewrite systems [36].

CoLL-Saigawa
The tool CoLL-Saigawa 15 participated in the SRS and TRS categories.It is a combination of CoLL, described above, and the earlier tool Saigawa [15] that participated in CoCo from the very start.CoLL-Saigawa is written in OCaml.

ConCon
The tool ConCon 16 participated in the CTRS, CTRS-CPF and INF categories.ConCon implements several techniques for oriented conditional rewrite systems [40] and employs Maed-Max [46] for infeasibility.ConCon is written in Scala.

CSI
The tool CSI 17 has been participating in CoCo from the beginning [27,48].In 2019, it participated in the CPF-TRS, NFP, SRS, TRS, UNC and UNR categories, winning four of them (the CPF-TRS category in combination with CeTA).CSI is written in OCaml.

CSIˆho
The tool CSIˆho 18 was the only participant of the HRS category.It implements several techniques for (dis)proving confluence of higher-order rewrite systems [25].CSIˆho is based on CSI and written in OCaml.

FORT
The tool FORT 19 is a decision and synthesis tool [34,35] for the first order theory of rewriting for finite left-linear, rightground rewrite systems.It implements the decision procedure for this theory [10], which uses tree automata techniques.In 2019 it participated in the COM, GCR, NFP, UNC and UNR categories, surprisingly winning the COM category.FORT is written in Java.

infChecker
The tool infChecker 20 is a new participant of CoCo.It uses the theorem prover Prover9 [22] and the model finding tools AGES [14] and Mace4 [22].Due to the latter, it is the only tool in the INF category that supports NO answers.The tool infChecker is written in Haskell.

MaedMax
The new tool MaedMax 21 participated in the INF category.It implements maximal ordered completion [46] and can output certificates [38] that can be checked by CeTA.The tool was developed as a completion tool and also works as a first-order theorem prover.Given an infeasibility problem, MaedMax translates it into an equivalent satisfiability problem.Maed-Max is written in OCaml.

Moca
The tool Moca 22 is a first-order theorem prover and another new participant of CoCo, joining the INF category.It implements maximal ordered completion [46] and the split-if encoding of [9].Moca is written in Haskell.

noko-leipzig
The new tool noko-leipzig 23 participated in the SRS category.It uses arctically weighted automata [12] for disproving confluence and is written in Haskell.

Results
In this section, we present the results of CoCo 2019.For each category, we mention problem selection and summarize the competition data.For every category, a problem set consists of 100 problems, including all secret problems and a certain number of unresolved problems in the last full run.These problems were randomly selected from the COPS database with the seed number 273 to control the selection.The number was composed of the three seed digits 2 (Hubert Garavel), 7 (Geoff Sutcliffe), 3 (Akihisa Yamada) provided by the panel members.For each category, tools are ranked based on the total number of YES and NO answers.The time tools spent on the problems have no effect on the score.
Full details are available online. 25

TRS
The ? lists the number of erroneous answers, and !lists the number of unique answers, which are the answers that no other tool produced.Moreover, the column ∅ gives the average time spent on each problem (including timeouts).ACP was ahead with 4 problems, breaking the 3-year hegemony of CSI.Due to a wrong answer for COPS #538, CoLL-Saigawa is not ranked.
In total, 82 problems were solved and 18 problems including 12 non-left-linear systems were unsolved.One of the oldest unsolved problems is COPS #126, consisting of the single rule f(f(x, y), z) → f(f(x, z), f(y, z)).

CPF-TRS
For the CPF-TRS category, the same problems as in the TRS category were selected.The results are summarized below:

Outlook
In the near future, we plan to merge CoCo with COPS and CoCoWeb, to achieve a single entry point for confluence problems, tools, and competitions.Moreover, the COPS submission interface will be extended with functionality to support submitters of new problems as well as the CoCo SC.We plan to reimplement the LiveView software for realtime visualization of CoCo runs, taking into account current limitations, future developments and demands.We will implement flexible scoring schemes and support joint categories based on ordered lists of properties.We will also investigate what additional features are needed to support our sister competition termCOMP.
We anticipate that in the years ahead new categories will be added to CoCo.Natural candidates are rewriting modulo AC, nominal rewriting, and constraint rewriting.Also, we will consider measures to increase the number of tools participating in the HRS category, which is the only CoCo category devoted to higher-order rewriting.Given the large research activity in this area, we are keen to keep the HRS category alive.One possibility is to allow a dependently typed higher-order formalism for expressing problems.
Apart from the improvements mentioned in the preceding paragraphs, the competition serves to highlight progress and challenges in confluence research.On the one hand, the gap between the certified categories and their uncertified counterparts is steadily diminishing, showcasing the progress on the verification front as well as suggesting which techniques are suitable candidates for formal verification to close the gap.On the other hand, problems whose status (YES or NO) is unknown or whose status is known from the literature but out of reach of tools, lead to further research into (automatable) techniques for (dis)proving confluence and related properties.Examples include [26,30,47].

Fig. 1
Fig. 1 Part of the LiveView of CoCo 2019 upon completion TRS category had one secret problem COPS #1133.String rewrite systems were excluded from the selection due to the creation of the SRS category.The results of the TRS category are summarized in the following table:The column The win of CSI+CeTA is no surprise since many of the techniques implemented in CSI have been certified.The numbers for ACP+CeTA are explained by a change in the CPF format that was missed by the ACP developers.(From the last column, we infer that ACP spent an average of about 6 s to produce a proof, which then could not be certified by CeTA.)For the full run of CoCo 2019, this was corrected, resulting in the following numbers (out of 501 problems):Of the 100 selected problems, 32 are left-linear and rightground, and hence in the scope of FORT:The outcome of the new COM category was a surprise.CoLL is a designated tool for commutation of left-linear rewrite systems and ACP has support for arbitrary rewrite systems.Due to erroneous answers by these tools, FORT came out on top:The new INF category had the highest number of contestants, including four new tools, and infChecker won by a large margin.It was the only tool capable of producing NO answers: A total of six secret problems (COPS #1125 -#1137) were CTRSThe CTRS category had a surprise winner in 2019.Due to wrong answers by ConCon and CO3, the first and second ranked tools of every earlier CoCo, the relative newcomer ACP (participating in the CTRS category since 2018) won.26CoCo 2020 featured two tools in the HRS category.SRSIn the SRS category, two secret problems (COPS #1131 and COPS #1132) were submitted.The new tool noko-leipzig produced the most NO answers, but the YES answers by CSI made the difference: