1 Introduction

It is very common to find an argument marked by the phrase “without loss of generality” (w.l.o.g.) in a mathematical proof by human. An argument of this kind is most often based on the symmetry or the invariance in the problem [9].

Suppose that we are going to prove, by an algebraic method, that the three median lines of a triangle meet at a point (Fig. 1). Six real variables are needed to represent three points on a plane. Since the concepts of ‘median lines’ and ‘meeting at a point’ are translation-invariant, we may fix one of the corners at the origin. Furthermore, because these concepts are also invariant under any invertible linear map, we may fix the other two points to, e.g., (1, 0) and (0, 1). Thus, all six variables were eliminated and the task of proof became much easier.

W.l.o.g. arguments may thus have strong impact on the efficiency of inference. It has drawn attention in several research areas including the relative strength of proof systems (e.g., [2, 3, 12, 20]), propositional SAT (e.g., [1, 6, 8, 17, 19]), proof assistants [9], and algebraic methods for geometry problem solving [7, 10].

Among others, Iwane and Anai [10] share exactly the same objective with us; both aim at solving geometry problems stated in natural language, using an algebraic method as the backend. Logical formulas resulted from mechanical translation of problem text tend to be huge and very redundant, while the computational cost of algebraic methods is generally quite sensitive to the size of the input measured by, e.g., the number of variables. Simplification of the input formula is hence a mandatory part of such a problem-solving system.

Fig. 1.
figure 1

Variable Elimination w.l.o.g. by Invariance

Iwane and Anai’s method operates on the first-order formula of real-closed fields (RCFs), i.e., a quantified boolean combination of equalities and inequalities between polynomials. They proposed to detect the invariance of a problem by testing the invariance of the polynomials under translation, scaling, and rotation. While being conceptually simple, it amounts to discover the geometric property of the problem solely by its algebraic representation. The detection of rotational invariance is especially problematic because, to test that on a system of polynomials, one needs to identify all the pairs (or triples) of variables that originate from the x and y (and z) coordinates of the same points. Thus their algorithm for 2D rotational invariance already incurs a search among a large number of possibilities and they left the detection of 3D rotational invariance untouched. Davenport [7] also suggests essentially the same method.

In this paper, we propose to detect the invariance in a more high-level language than that of RCF. We use algebraically indexed types (AITs) proposed by Atkey et al. [4] as the representation language. In AIT, each symbol in a formula has a type with indices. An indexed-type of a function indicates that its output undergoes the same or a related transformation as the input. The invariances of the functions are combined via type reconstruction and an invariance in a problem is detected.

The contribution of the current paper is summarized as follows:

  1. 1.

    A type reconstruction algorithm for AIT is derived. Atkey et al. [4] laid out the formalism of AIT but did not provide a type inference/reconstruction algorithm. We devised, for a version of AIT, a type reconstruction algorithm that is based on semantic unification in the theory of transformation groups.

  2. 2.

    A set of variable elimination rules are worked out. Type reconstruction in AIT discerns a more fine-grained notion of invariance than previous approaches. We derived a set of elimination rules that covers all cases.

  3. 3.

    The practicality of the proposed method is verified; it significantly enhanced the performance of a problem solver based on quantifier elimination for RCF, especially on the problems from International Mathematical Olympiads.

In the rest of the paper, we first introduce a math problem solver, on which the proposed method was implemented, and summarize the formalism of AIT. We then detail the type reconstruction procedure and the variable elimination rules. We finally present the experimental results and conclude the paper.

2 Todai Robot Math Solver and Problem Library

This work is a part of the development of the Todai Robot Math Problem Solver (henceforth ToroboMath) [13,14,15,16]. Figure 2 presents an overview of the system. ToroboMath is targeted at solving pre-university math problems. Our long-term goal is to develop a system that solves problems stated in natural language.

Fig. 2.
figure 2

Overview of Todai Robot Math Problem Solver

The natural language processing (NLP) module of the system accepts a problem text and derives its logical representation through syntactic analysis. Currently, it produces a correct logical form for around 50% of sentences [13], which is not high enough to cover a wide variety of problems. Although the motivation behind the current work is to cope with the huge formulas produced by the NLP module, we instead used a library of manually formalized problems for the evaluation of the formula simplification procedure.

Fig. 3.
figure 3

Example of Manually Formalized Problem (IMO 2012, Problem 5)

The problem library has been developed along with the ToroboMath system. It contains approximately one thousand math problems collected from several sources including the International Mathematical Olympiads (IMOs). Figure 3 presents a problem that was taken from IMO 2012.

Table 1. Example of Primitive Types
Fig. 4.
figure 4

Example of Axiom

The problems in the library are manually encoded in a polymorphic higher-order language, which is the same language as the output of the NLP module. Table 1 lists some of its primitive types. The language includes a large set of predicate and function symbols that are tailored for formalizing pre-university math problems. Currently, 1387 symbols are defined using 2808 axioms. Figure 4 provides an example of the axioms that defines the predicate \(\mathtt{maximum}\).

The problem solving module of the ToroboMath accepts a formalized problem and iteratively rewrites it using: (1) basic transformations such as \(\forall x.(x = \alpha \rightarrow \phi (x)) \Leftrightarrow \phi (\alpha )\) and beta-reduction, (2) simplification of expressions such as polynomial division and integration by computer algebra systems (CASs), and (3) the axioms that define the predicate and function symbols.

Once the rewritten formula is in the language of real-closed fields (RCFs) or Peano arithmetic, it is handed to a solver for the theory. For RCF formulas, we use an implementation of the quantifier-elimination (QE) procedure for RCF based on cylindrical algebraic decomposition. Finally, we solve the resulting quantifier-free formula with CASs and obtain the answer. The time complexity of RCF-QE is quite high; it is doubly exponential in the number of variables [5]. Hence, the simplification of the formula before RCF-QE is a crucial step.

3 Algebraically Indexed Types

This section summarizes the framework of AIT. We refrain from presenting it in full generality and describe its application to geometry ([4, \(\S \)2]) with the restriction we made on it in incorporating it into the type system of ToroboMath.

In AIT, some of the primitive types have associated indices. An index represents a transformation on the object of that type. For instance, in \(\texttt {Vec} \langle B,t \rangle \), the index B stands for an invertible linear transformation and t stands for a translation. The index variables bound by universal quantifiers signify that a function of that type is invariant under any transformations indicated by the indices, e.g.,

$$\begin{aligned}\begin{gathered} \texttt {midpoint} : \forall B\mathord :\mathsf {GL}_2.\forall t\mathord :\mathsf {T}_2.\ \texttt {Vec} \langle B,t \rangle \rightarrow \texttt {Vec} \langle B,t \rangle \rightarrow \texttt {Vec} \langle B,t \rangle . \end{gathered}\end{aligned}$$

The type of \(\texttt {midpoint} \) certifies that, when two points P and Q undergo an arbitrary affine transformation, the midpoint of P and Q moves accordingly.

3.1 Sort and Index Expression

The sort of an index signifies the kind of transformations represented by the index. We assume the set Sort of index sorts includes \(\mathsf {GL}_k\;(k = 1,2,3)\) (general linear transformations), \(\mathsf {O}_k\;(k=2,3)\) (orthogonal transformations), and \(\mathsf {T}_k\;(k=2,3)\) (translations). In the type of \(\texttt {midpoint} \), B is of sort \(\mathsf {GL}_2\) and t is of sort \(\mathsf {T}_2\).

An index expression is composed of index variables and index operators. In the current paper, we use the following operators: \(\langle +, -, 0 \rangle \) are addition, negation, and unit of \(\mathsf {T}_k\) \((k = 2, 3)\); \(\langle \;\cdot \;, {}^{-1}, 1 \rangle \) are multiplication, inverse, and unit of \(\mathsf {GL}_k\) and \(\mathsf {O}_k\); \(\det \) is the determinant; \(|\cdot |\) is the absolute value. An index context \(\varDelta \) is a list of index variables paired with their sorts: \(\varDelta = i_1\mathord : S_1, i_2\mathord : S_2, \dots , i_n\mathord : S_n\). The well-sortedness of an index expression e of sort S, written \(\varDelta \vdash e : S\), is defined analogously to the well-typedness in simple type theory.

3.2 Type, Term, and Typing Judgement

The set of primitive types, \(\textsc {PrimType} = \{\texttt {Bool} , \texttt {R} , \texttt {2d\!.Vec} , \texttt {3d\!.Vec} , \texttt {2d\!.Shape} , \dots \}\), is the same as that in the language of ToroboMath. A function tyArity: \(\textsc {PrimType} \rightarrow \textsc {Sort}^{*}\) specifies the number and sorts of indices appropriate for the primitive types: e.g., tyArity\((\texttt {2d.Vec} ) = (\mathsf {GL}_2, \mathsf {T}_2)\).

A judgement \(\varDelta \vdash A \textsf { type} \) means that type A is well-formed and well-indexed with respect to an index context \(\varDelta \). Here are the derivation rules:

$$\begin{aligned}\begin{gathered} \frac{ \texttt {X} \in \textsc {PrimType} \;\;\;\; \text {tyArity}(\texttt {X} ) = (S_1, \dots , S_m) \;\;\;\; \{\varDelta \vdash e_j:S_j\}_{1 \le j \le m} }{ \varDelta \vdash \texttt {X} \langle e_1, \dots , e_m \rangle \textsf { type} }\;\textsc {TyPrim} \\[1ex] \frac{\varDelta \vdash A \textsf { type} \;\;\; \varDelta \vdash B \textsf { type} }{\varDelta \vdash A \rightarrow B \textsf { type} }\;\textsc {TyArr} \;\;\;\;\;\;\; \frac{\varDelta , i\mathord :S \vdash A \textsf { type} }{\varDelta \vdash \forall i\mathord :S.A \textsf { type} }\;\textsc {TyForall} \end{gathered}\end{aligned}$$

While Atkey et al.’s system is formulated in the style of System F, we allow the quantifiers only at the outermost (prenex) position. The restriction permits an efficient type reconstruction algorithm analogous to Hindley-Milner’s, while being expressive enough to capture the invariance of the pre-defined functions in ToroboMath and the invariance in the majority of math problems.

The well-typedness of a term M, written \(\varDelta ; \varGamma \vdash M : A\), is judged with respect to an index context \(\varDelta \) and a typing context \(\varGamma = x_1:A_1, \dots , x_n:A_n\). A typing context is a list of variables with their types. A special context \(\varGamma _{\mathrm {ops}}\) consists of the pre-defined symbols and their types, e.g., \(+:\forall s\mathord :\mathsf {GL}_1.\ \texttt {R} \langle s \rangle \rightarrow \texttt {R} \langle s \rangle \rightarrow \texttt {R} \langle s \rangle \in \varGamma _{\mathrm {ops}}\). We assume \(\varGamma _{\mathrm {ops}}\) is always available in the typing derivation and suppress it in a judgement. The typing rules are analogous to those for lambda calculus with rank-1 polymorphism except for TyEQ:

$$\begin{aligned}\begin{gathered} \frac{ x : A \in \varGamma }{ \varDelta ; \varGamma \vdash x : A } \textsc {Var} \;\;\; \frac{\varDelta ; \varGamma \vdash M: \forall i\mathord :S.A \;\;\; \varDelta \vdash e\mathord :S}{\varDelta ; \varGamma \vdash M: A\{i \mapsto e\}} \;\textsc {UnivInst} \;\;\; \frac{\varDelta ; \varGamma , x: A \vdash M: B}{\varDelta ; \varGamma \vdash \lambda x.M: A \rightarrow B} \textsc {Abs} \\[1ex] \frac{\varDelta ; \varGamma \vdash M: A \rightarrow B \;\;\; \varDelta ; \varGamma \vdash N: A}{\varDelta ; \varGamma \vdash M N: B} \textsc {App} \;\;\; \frac{\varDelta ; \varGamma \vdash M: A \;\;\; \varDelta \vdash A \equiv B}{\varDelta ; \varGamma \vdash M: B} \;\textsc {TyEQ} \end{gathered}\end{aligned}$$

In the Abs and App rules, the meta-variables A and B only designate a type without quantifiers. In the UnivInst rule, \(A\{i \mapsto e\}\) is the result of substituting e for i in A. The ‘polymorphism’ of the types with quantifiers hence takes place only when a pre-defined symbol (e.g., \(\texttt {midpoint} \)) enters a derivation via the Var rule and then the bound index variable is instantiated via the UnivInst rule.

The type equivalence judgement \(\varDelta \vdash A \equiv B\) in the TyEQ rule equates two types involving semantically equivalent index expressions; thus, e.g., \(s\mathord :\mathsf {GL}_1 \vdash \texttt {R} \langle s\cdot s^{-1} \rangle \equiv \texttt {R} \langle 1 \rangle \) and \(O\mathord :\mathsf {O}_2 \vdash \texttt {R} \langle |\det O| \rangle \equiv \texttt {R} \langle 1 \rangle \).

3.3 Index Erasure Semantics and Transformational Interpretation

The abstraction theorem for AIT [4] enables us to know the invariance of a term by its type. The theorem relates two kinds of interpretations of types and terms: index erasure semantics and relational interpretations. We will restate the theorem with what we here call transformational interpretations (t-interpretations hereafter), instead of the relational interpretations. It suffices for the purpose of justifying our algorithm and makes it easier to grasp the idea of the theorem.

The index-erasure semantics of a primitive type \(\texttt {X} \langle e_1, \dots , e_n \rangle \) is determined only by \(\texttt {X} \). We thus write \(\left\lfloor \texttt {X} \langle e_1, \dots , e_n \rangle \right\rfloor = \left\lfloor \texttt {X} \right\rfloor \). The interpretation \(\left\lfloor \texttt {X} \right\rfloor \) is the set of mathematical objects intended for the type: e.g., \(\left\lfloor \texttt {2d.Vec} \langle B, t \rangle \right\rfloor = \left\lfloor \texttt {2d.Vec} \right\rfloor = \mathbb R^2\) and \(\left\lfloor \texttt {R} \langle s \rangle \right\rfloor = \left\lfloor \texttt {R} \right\rfloor = \mathbb R\). The index-erasure semantics of a non-primitive type is determined by the type structure: \(\left\lfloor A \rightarrow B \right\rfloor = \left\lfloor A \right\rfloor \rightarrow \left\lfloor B \right\rfloor \) and \(\left\lfloor \forall i\mathord :S.\ T \right\rfloor = \left\lfloor T \right\rfloor \).

The index-erasure semantics of a typing context \(\varGamma = x_1\mathord :\texttt {T} _1, \dots , x_n\mathord :\texttt {T} _n\) is the direct product of the domains of the variables: \(\left\lfloor \varGamma \right\rfloor = \left\lfloor \texttt {T} _1 \right\rfloor \times \cdots \times \left\lfloor \texttt {T} _n \right\rfloor \). The erasure semantics of a term \(\varDelta ; \varGamma \vdash M : A\) is a function of the values assigned to its free variables: \(\left\lfloor M \right\rfloor : \left\lfloor \varGamma \right\rfloor \rightarrow \left\lfloor A \right\rfloor \) and defined as usual (see, e.g., [18, 21]).

The t-interpretation of a type \(\texttt {T} \), denoted by \(\llbracket \texttt {T} \rrbracket \), is a function from the assignments to the index variables to a transformation on \(\left\lfloor \texttt {T} \right\rfloor \). To be precise, we first define the semantics of index context \(\varDelta = i_1\mathord :S_1, \dots , i_n\mathord :S_n\) as the direct product of the interpretation of the sorts: \(\llbracket \varDelta \rrbracket = \llbracket S_1 \rrbracket \times \cdots \times \llbracket S_n \rrbracket \), where \(\llbracket S_1 \rrbracket , \dots , \llbracket S_n \rrbracket \) are the intended sets of transformations: e.g., \(\llbracket \mathsf {GL}_2 \rrbracket = \mathrm {GL}_2\) and \(\llbracket \mathsf {T}_2 \rrbracket = \mathrm {T}_2\). The interpretation of an index expression e of sort S is a function \(\llbracket e \rrbracket : \llbracket \varDelta \rrbracket \rightarrow \llbracket S \rrbracket \) that is determined by the structure of the expression; for \(\rho \in \llbracket \varDelta \rrbracket \),

$$ \llbracket \texttt {f} (e_1, \dots , e_n) \rrbracket (\rho ) = \llbracket \texttt {f} \rrbracket (\llbracket e_1 \rrbracket (\rho ), \dots , \llbracket e_n \rrbracket (\rho )), \;\;\; \llbracket i_k \rrbracket (\rho ) = \rho (i_k), $$

where, in the last equation, we regard \(\rho \in \llbracket \varDelta \rrbracket \) as a function from index variables to their values. The index operations \(\det \) and \(|\cdot |\) are interpreted as intended.

The t-interpretation of a primitive type \(\texttt {X} \langle e_1, \dots , e_n \rangle \) is then determined by \(\texttt {X} \) and the structures of the index expressions \(e_1, \dots , e_n\). The t-interpretation of \(\texttt {Vec} \) and \(\texttt {Shape} \) is the affine transformation of vectors and geometric objects parametrized by \(\rho \in \llbracket \varDelta \rrbracket \); for index expressions \(\beta \mathord :\mathsf {GL}_2\) and \(\tau \mathord :\mathsf {T}_2\),

$$\begin{aligned} \llbracket \texttt {Vec} \langle \beta , \tau \rangle \rrbracket (\rho )&: \mathbb R^2 \ni x \mapsto M_{\llbracket \beta \rrbracket (\rho )} x + v_{\llbracket \tau \rrbracket (\rho )} \in \mathbb R^2 \\ \llbracket \texttt {Shape} \langle \beta , \tau \rangle \rrbracket (\rho )&: \mathcal {P}(\mathbb R^2) \ni S \mapsto \{M_{\llbracket \beta \rrbracket (\rho )} x + v_{\llbracket \tau \rrbracket (\rho )} \mid x \in S\}\in \mathcal {P}(\mathbb R^2), \end{aligned}$$

where \(M_{\llbracket \beta \rrbracket (\rho )}\) and \(v_{\llbracket \tau \rrbracket (\rho )}\) are the representation matrix and vector of \(\llbracket \beta \rrbracket (\rho )\) and \(\llbracket t \rrbracket (\rho )\), and \(\mathcal {P}(\mathbb R^2)\) denotes the power set of \(\mathbb R^2\). Similarly, for the real numbers,

$$ \llbracket \texttt {R} \langle \sigma \rangle \rrbracket (\rho ): \mathbb R\ni x \mapsto \llbracket \sigma \rrbracket (\rho )x \in \mathbb R. $$

That is, \(\llbracket \texttt {R} \langle \sigma \rangle \rrbracket (\rho )\) is a change of scale with the scaling factor determined by the expression \(\sigma \mathord :\mathsf {GL}_1\) and the assignment \(\rho \). For a primitive type \(\texttt {X} \) with no indices, its t-interpretation is the identity map on \(\left\lfloor \texttt {X} \right\rfloor \): i.e., \(\llbracket \texttt {X} \rrbracket (\rho ) = \mathrm {id}_{\left\lfloor X \right\rfloor }\).

The t-interpretation of a function type \(A \rightarrow B\) is a higher-order function that maps a (mathematical) function \(f: \left\lfloor A \right\rfloor \rightarrow \left\lfloor B \right\rfloor \) to another function on the same domain and codomain such that: \( \llbracket A \rightarrow B \rrbracket (\rho )(f) = \llbracket B \rrbracket (\rho ) \circ f \circ (\llbracket A \rrbracket (\rho ))^{-1} \). It is easy to check that this interpretation is compatible with currying. Equivalently, we may say that if \(g = \llbracket A \rightarrow B \rrbracket (\rho )(f)\), then f and g are in the commutative relation \(g \circ \llbracket A \rrbracket (\rho ) = \llbracket B \rrbracket (\rho ) \circ f\). The typing derivation in AIT is a way to ‘pull out’ the effect of transformation \(\llbracket A \rrbracket (\rho )\) on a free variable deep inside a term by combining such commutative relations.

The t-interpretation of a fully-quantified type is the identity map on its erasure semantics: \(\llbracket \forall i_1\mathord :S_1.\dots \forall i_n\mathord :S_n.\ T \rrbracket = \mathrm {id}_{\left\lfloor T \right\rfloor }\). We don’t define that of partially-quantified types because we don’t need it to state the abstraction theorem.

3.4 Abstraction Theorem

The abstraction theorem for AIT enables us to detect the invariance of (the erasure-semantics of) a term under a certain set of transformations on its free variables. We first define the t-interpretation of the typing context \(\varGamma = x_1:T_1, \dots , x_n:T_n\) as a simultaneous transformation of \(\eta = (v_1, \dots , v_n) \in \left\lfloor \varGamma \right\rfloor \):

$$ \llbracket \varGamma \rrbracket (\rho ): \left\lfloor \varGamma \right\rfloor \ni \eta \mapsto \llbracket \varGamma \rrbracket (\rho ) \circ \eta = (\llbracket T_1 \rrbracket (\rho )\circ v_1, \dots , \llbracket T_n \rrbracket (\rho )\circ v_n) \in \left\lfloor \varGamma \right\rfloor . $$

We now present a version of the abstraction theorem, restricted to the case of a term of quantifier-free type and restated with the t-interpretation:

Theorem 1

(Abstraction [4], restated using transformational interpretation).

If A is a quantifier-free type and \(\varDelta ; \varGamma \vdash M : A\), then for all \(\rho \in \llbracket \varDelta \rrbracket \) and all \(\eta \in \left\lfloor \varGamma \right\rfloor \), we have \(\llbracket A \rrbracket (\rho ) \circ \left\lfloor M \right\rfloor (\eta ) = \left\lfloor M \right\rfloor (\llbracket \varGamma \rrbracket (\rho )\circ \eta )\).

Here we provide two easy corollaries of the theorem. The first one is utilized to eliminate variables from a formula while preserving the equivalence.

Corollary 1

If \(\varDelta ; x_1: \texttt {T} _1, \dots , x_n: \texttt {T} _n \vdash \phi (x_1, \dots , x_n): \texttt {Bool} \), then for all \(\rho \in \llbracket \varDelta \rrbracket \), we have \(\phi (x_1, \dots , x_n) \Leftrightarrow \phi (\llbracket \texttt {T} _1 \rrbracket (\rho )\circ x_1, \dots , \llbracket \texttt {T} _n \rrbracket (\rho )\circ x_n)\).

This is by the abstraction theorem and the fact \(\llbracket \texttt {Bool} \rrbracket (\rho ) = \mathrm {id}_{\left\lfloor \texttt {Bool} \right\rfloor }\) for any \(\rho \). It indicates that, without loss of generality, we may ‘fix’ some of the variables to, e.g., zeros by appropriately choosing \(\rho \).

The second corollary is for providing more intuition about the theorem.

Corollary 2

If \(\epsilon ; \epsilon \vdash \lambda x_1.\ \dots . \lambda x_n.\ f(x_1, \dots , x_n): \forall \varDelta .\ \texttt {T} _1 \rightarrow \cdots \rightarrow \texttt {T} _n \rightarrow \texttt {T} _0\) then, for all \(\rho \in \llbracket \varDelta \rrbracket \) and all \(v_i \in \left\lfloor \texttt {T} _i \right\rfloor \;(i = 1, \dots , n)\),

$$ \llbracket \texttt {T} _0 \rrbracket (\rho ) \circ \left\lfloor f \right\rfloor (v_1, \dots , v_n) = \left\lfloor f \right\rfloor (\llbracket \texttt {T} _1 \rrbracket (\rho )\circ v_1, \dots , \llbracket \texttt {T} _n \rrbracket (\rho )\circ v_n). $$

In the statement, \(\forall \varDelta \) signifies the universal quantification over all index variables in \(\varDelta \). By this corollary, for instance, we can tell from the type of \(\texttt {midpoint} \) that, for all \(x_1, x_2 \in \mathbb R^2\) and for all \(g \in \mathrm {GL}_2\) and \(t \in \mathrm {T}_2\),

$$ \left\lfloor \texttt {midpoint} \right\rfloor (M_g x_1 + v_t, M_g x_2 + v_t) = M_g \left\lfloor \texttt {midpoint} \right\rfloor (x_1, x_2) + v_t. $$

3.5 Restriction on the Index Expressions of Sort \(\mathsf {GL}_k/\mathsf {O}_k \; (k \ge 2)\)

We found that the type reconstruction in AIT is far more straightforward when we assume an index expression of sort \(\mathsf {GL}_k\) or \(\mathsf {O}_k\) \((k \ge 2)\) includes at most one index variable of sort \(\mathsf {GL}_k\) or \(\mathsf {O}_k\) that is not in the determinant operator. Assuming this, any expression e of sort \(\mathsf {GL}_k\) or \(\mathsf {O}_k\) can be written in the form of

$$ e = \prod _{i\in I} s_i^{w_i} \cdot \prod _{i\in I}|s_i|^{x_i} \cdot \prod _{j\in J} \det (B_j)^{y_j} \cdot \prod _{j\in J} |\det (B_j)|^{z_j} \cdot B_0^{\delta }, $$

where \(\{s_i\}_{i\in I}\) are of sort \(\mathsf {GL}_1\), \(\{B_0\}\cup \{B_j\}_{j\in J}\) are of sort \(\mathsf {GL}_k\) or \(\mathsf {O}_k\), \(w_i, x_i, y_j, z_j \in \mathbb {Z}\), and \(\delta \in \{0, 1\}\). We henceforth say an expression e in the above form satisfies the head variable property and call \(B_0\) the head variable of e.

Empirically, this restriction is not too restrictive; as far as we are aware of, the invariance of all the pre-defined functions and predicates in ToroboMath is expressible with an indexed-type satisfying this.

4 Invariance Detection Through Type Reconstruction

We need type reconstruction in AIT for two purposes: to infer the invariance of the pre-defined symbols in ToroboMath and to infer the invariance in a math problem. To this end, we only have to derive the judgement \( \varDelta ; \varGamma \vdash \phi : \texttt {Bool} \) where \( \phi \) is either a defining axiom of a symbol or a formula of a problem. For a pre-defined symbol s, by a judgement \( \varDelta ; s : T, \dots \vdash \phi : \texttt {Bool} \), we know s is of type T and it has the invariance signified by T. For a problem \( \phi \), by the judgement \( \varDelta ; x_1 : T_1, \dots , x_n : T_n \vdash \phi : \texttt {Bool} \), we know the invariance of \( \phi \) under the transformation on the free variables \( x_1, \dots , x_n \) according to \( \llbracket T_1 \rrbracket , \dots , \llbracket T_n \rrbracket \).

Since all types are in prenex form, we can find the typing derivation by a procedure analogous to the Hindley-Milner (H-M) algorithm. It consists of two steps: deriving equations among index expressions, and solving them. The procedure for solving the equations in \(\mathsf {T}_2/\mathsf {T}_3\) is essentially the same as in the type inference for Kennedy’s unit-of-measure types [11], which is a precursor of AIT. Further development is required to solve the equations in \(\mathsf {GL}_2/\mathsf {GL}_3\), even under the restriction on the form of index expressions mentioned in Sect. 3.5, due to the existence of the index operations \(|\cdot |\) and \(\det \).

4.1 Equation Derivation

We first assign a type variable \( \alpha _i \) for each subterm \( t_i \) in \( \phi \). Then, for a subterm \( t_i \) in the form \( t_j t_k \) (i.e., application of \( t_j \) to \( t_k \)), we have the equation \( \alpha _j = \alpha _k \rightarrow \alpha _i \). The case for a subterm \( t_i \) in the form of \( \lambda x.t_j \) is also analogous to H-M and we omit it here. For a leaf term (i.e., a variable) \( t_i \), if it is one of the pre-defined symbols and \( t_i : \forall i_1 \mathord : S_1. \dots \forall i_n \mathord : S_n. T \in \varGamma _{\mathrm {ops}} \), we set \( \alpha _i = T\{ i_1 \mapsto \beta _1, \dots , i_n \mapsto \beta _n \} \), where \( \{ i_1 \mapsto \beta _1, \dots , i_n \mapsto \beta _n \} \) stands for the substitution of fresh variables \( \beta _1, \dots , \beta _n \) for \( i_1, \dots , i_n \). By solving the equations for the type and index variables \( \{ \alpha _i \} \) and \( \{ \beta _j \} \), we reconstruct the most general indexed-types of all the subterms.

For example, consider the following axiom defining \(\texttt {perpendicular} \):

$$ \forall v_1. \forall v_2. (\texttt {perpendicular} (v_1, v_2) \longleftrightarrow \texttt {inner-prod} (v_1, v_2) = 0 ), $$

and suppose that \( \texttt {inner-prod} \) is in \( \varGamma _{\mathrm {ops}} \). We are going to reconstruct the type of \( \texttt {perpendicular} \). The type of \( \texttt {inner-prod} \) is

$$ \texttt {inner-prod} : \forall s_1,s_2\mathord :\mathsf {GL}_1.\ \forall O\mathord :\mathsf {O}_2.\ \texttt {Vec} \langle s_1 O, 0 \rangle \rightarrow \texttt {Vec} \langle s_2 O, 0 \rangle \rightarrow \texttt {R} \langle s_1 \cdot s_2 \rangle $$

and it is instantiated as \( \texttt {inner-prod} : \texttt {Vec} \langle s_1 O, 0 \rangle \rightarrow \texttt {Vec} \langle s_2 O, 0 \rangle \rightarrow \texttt {R} \langle s_1 \cdot s_2 \rangle \) where \( s_1, s_2 \), and O are fresh variables. Since the type of \( \texttt {perpendicular} \) in the non-AIT version of our language is \( \texttt {Vec} \rightarrow \texttt {Vec} \rightarrow \texttt {Bool} \), we set fresh variables to all indices in the primitive types and have:

$$ \texttt {perpendicular} : \texttt {Vec} \langle \beta _1, \tau _1 \rangle \rightarrow \texttt {Vec} \langle \beta _2, \tau _2 \rangle \rightarrow \texttt {Bool} . $$

Since \( \texttt {perpendicular} \) is applied to \( v_1 \) and \( v_2 \), the types of \( v_1 \) and \( v_2 \) are equated to \( \texttt {Vec} \langle \beta _1, \tau _1 \rangle \) and \( \texttt {Vec} \langle \beta _2, \tau _2 \rangle \). Additionally, since \( \texttt {inner-prod} \) is also applied to \( v_1 \) and \( v_2 \), we have the following equations:

$$\begin{aligned} \texttt {Vec} \langle s_1 O, 0 \rangle = \texttt {Vec} \langle \beta _1, \tau _1 \rangle , \;\;\; \texttt {Vec} \langle s_2 O, 0 \rangle = \texttt {Vec} \langle \beta _2, \tau _2 \rangle \end{aligned}$$
(4.1)

If we have an equation between the same primitive type, by unifying both sides of the equation, in turn we have one or more equations between index expressions, i.e., if we have \( \texttt {X} \langle e_1, \dots , e_m \rangle = \texttt {X} \langle e_1', \dots , e_m' \rangle \), then we have: \( e_1 = e_1', \dots , e_m = e_m' \). For Eq. (4.1), we hence have \( s_1 O = \beta _1, s_2 O = \beta _2, 0 = \tau _1 \), and \( 0 = \tau _2 \). Thus, by recursively unifying all the equated types, we are left with a system of equations between index expressions.

4.2 Equation Solving

To solve the derived equations between index expressions, we need to depart from the analogy with the H-M algorithm. Namely, instead of applying syntactic unification, we need semantic unification, i.e., we solve the equations as simultaneous equations in the transformation groups.

We first order the equations with respect to the sort of the equated expressions. We then process them in the order \( \mathsf {T}_2 / \mathsf {T}_3 \rightarrow \mathsf {GL}_2 / \mathsf {GL}_3 \rightarrow \mathsf {GL}_1 \) as follows.Footnote 1

First, since equations of sort \( \mathsf {T}_2 / \mathsf {T}_3 \) are always in the form of \( \sum _{i} a_i t_i = 0 \ (a_i \in \mathbb {Z}) \), where \( \{ t_i \} \) are variables of sort \( \mathsf {T}_k \; (k \in \{2, 3\}) \), we can solve the equations as is the case with a linear homogeneous system. Although the solution may involve rational coefficients as in \( t_i = \sum _j \frac{n_{ij}}{m_{ij}} t_j \; (n_{ij}, m_{ij} \in \mathbb {Z}), \) we can clear the denominators by introducing new variables \( t_j' \) such that \( t_j = \mathrm {lcm} \{ m_{ij} \}_i \cdot t_j' \).

Next, by the head variable property, equations of sort \( \mathsf {GL}_2 / \mathsf {GL}_3 \) (henceforth \(\mathsf {GL}_{\ge 2}\)) are always in the form of \( \sigma _1 B_1 = \sigma _2 B_2 \), where \( \sigma _1 \) and \( \sigma _2 \) are index expressions of sort \( \mathsf {GL}_1 \), and \( B_1 \) and \( B_2 \) are the head variables of sort \( \mathsf {GL}_{\ge 2}\). We decompose these equations according to Table 2, which summarizes the following argument: Let E denote the identity transformation. Since \( \sigma _1 B_1 = \sigma _2 B_2 \iff \sigma _1^{-1} \sigma _2 E = B_1 B_2^{-1} \), there must be some \( s \in \mathsf {GL}_1 \) such that \( B_1 B_2^{-1} = s E \) and \( \sigma _1^{-1} \sigma _2 = s \). Furthermore, by the superset-subset relation between the sorts of \( B_1 \) and \( B_2 \), e.g., \( \mathrm {O}_2 \subset \mathrm {GL}_2 \) for \( B_1 : \mathsf {O}_2 \) and \( B_2 : \mathsf {GL}_2 \), we can express one of the broader sort with the other as a parameter.

The algorithm for \(\mathsf {GL}_{\ge 2}\) equations works as follows. First, we initialize the set of solution with the empty substitution: \(S \leftarrow \{\}\). For each \(\mathsf {GL}_{\ge 2}\) equation \(\sigma _1 B_1 = \sigma _2 B_2\), we look up Table 2 and find the \(\mathsf {GL}_{\ge 2}\) solution \(B_i \mapsto sB_j\) and one or more new \(\mathsf {GL}_1\) equations. We populate the current set of \(\mathsf {GL}_1\) equations with the new ones, and apply the solution \( B_i \mapsto sB_j \) to all the remaining \(\mathsf {GL}_1\) and \(\mathsf {GL}_{\ge 2}\) equations. We also compose the \(\mathsf {GL}_2\) solution \( B_i \mapsto sB_j \) with the current solution set: \(S \leftarrow S \circ \{B_i \mapsto sB_j\}\).

By processing all \(\mathsf {GL}_{\ge 2}\) equations as above, we are left with a partial solution S and a system of \(\mathsf {GL}_1\) equations, each of which is in the following form:

$$ \prod _{i\in I} s_i^{w_i} \cdot \prod _{i\in I} |s_i|^{x_i} \cdot \prod _{j\in J} \mathrm {det}(B_j)^{y_j} \cdot \prod _{j\in J} |\mathrm {det}(B_j)|^{z_j} = 1 \;\;\; ( w_i, x_i, y_j, z_j \in \mathbb {Z} ), $$

where we assume about I and J that \(\{s_i\}_{i\in I}\) are all the \(\mathsf {GL}_1\) variables, \(\{B_j\}_{j\in J}\) are all the remaining \(\mathsf {GL}_{\ge 2}\) variables, and \(I \cap J = \emptyset \). Letting \( u_i = s_i\cdot |s_i|^{-1} \), \( v_i = |s_i| \), \( u_j = \mathrm {det}(B_j)\cdot |\mathrm {det}(B_j)|^{-1} \), and \( v_i = |\mathrm {det}(B_j)| \), we have \( s_i = u_i v_i \) and \( \mathrm {det}(B_j) = u_j v_j \) for all \( i \in I \) and \( j \in J \). By using them, we have

$$ \prod _i u_i^{w_i} \cdot \prod _i v_i^{w_i + x_i} \cdot \prod _j u_j^{y_j} \cdot \prod _j v_j^{y_j + z_j} = 1. $$

Since \( u_i, u_j \in \{+1, -1\} \) and \(v_i, v_j > 0\) for all i and j, we know the above equation is equivalent to the following two equations:

$$ \prod _i u_i^{w_i} \cdot \prod _j u_j^{y_j} = 1, \;\;\;\;\; \prod _i v_i^{w_i + x_i} \cdot \prod _j v_j^{y_j + z_j} = 1. $$

We thus have two systems of equations, one in \(\{+1, -1\}\) and the other in \(\mathbb {R}_{>0}\). Now we temporarily rewrite the solution with \(u_i\) and \(v_i\): \(S \leftarrow S \circ \{s_i \mapsto u_i v_i\}_{i\in I}\).

First consider the system in \(\mathbb {R}_{>0}\). As long as there remains an equation involving a variable \(v_i\), which originates from a \(\mathsf {GL}_1\) variable, we solve it for \(v_i\) and compose the solution \(v_i \mapsto \prod _{i'\ne i} v_{i'}^{p_{i'}} \cdot \prod _{j} v_j^{q_{j}}\) with S while applying it to the remaining equations. The denominators of fractional exponents (i.e., \(p_{i'}, q_j \in \mathbb {Q}{\setminus }\mathbb {Z}\)) can be cleared similarly to the case of \(\mathsf {T}_k\) equations. If all the equations in \(\mathbb {R}_{>0}\) are solved this way, then S is the most general solution. Otherwise, there remain one or more equations of the form \(\prod _{j\in J'} |\det B_j|^{d_j} = 1\) for some \(J' \subset J\) and \(\{d_j\}_{j\in J'}\). This is the only case where we may miss some invariance of a formula; in general, we cannot express the most general solution to this equation only with the index variables of sort \(\mathsf {GL}_k\) and \(\mathsf {O}_k\). We make a compromise here and are satisfied with a less general solution \(S \circ \{B_j \mapsto E\}_{j\in J'}\). Fortunately, this does not frequently happen in practice. We made this compromise only on three out of 533 problems used in the experiment. We expect that having more sorts, e.g., \(\mathrm {SL}_k^{\pm } = \{M\in \mathrm {GL}_k \mid |\det M| = 1\}\), in the language of index expressions might be of help here, but leave it as a future work.

The system in \(\{+1, -1\}\) is processed analogously to that in \(\mathbb {R}_{>0}\). Finally, by restoring \(\{u_i, v_i\}_{i\in I}\) and \(\{u_j, v_j\}_{j\in J}\) in the solution S to their original forms, e.g., \(u_i \mapsto s_i\cdot |s_i|^{-1}\), we have a solution to the initial set of equations in terms of the variables of sort \(\mathsf {GL}_k\) and \(\mathsf {O}_k\).

Table 2. Decomposition of \( \mathsf {GL}_2/\mathsf {GL}_3 \) equation \( \sigma _i B_i = \sigma _j B_j \) (s: a fresh variable)

4.3 Type Reconstruction for Pre-defined Symbols with Axioms

We incrementally determined the indexed-types of the pre-defined symbols according to the hierarchy of their definitions. We first constructed a directed acyclic graph wherein the nodes are the pre-defined symbols and the edges represent the dependency between their definitions. We manually assigned an indexed-type to the symbols without defining axioms (e.g., \( + : \texttt {R} \rightarrow \texttt {R} \rightarrow \texttt {R} \)) and initialized \( \varGamma _{\mathrm {ops}} \) with them. We then reconstructed the indexed-types of other symbols in a topological order of the graph. After the reconstruction of the type of each symbol, we added the symbol with its inferred type to \( \varGamma _{\mathrm {ops}} \).

For some of the symbols, type reconstruction does not go as well as we hope. For example, the following axiom defines the symbol \( \texttt {midpoint} \):

$$ \forall p_1, p_2. (\texttt {midpoint} (p_1, p_2) = \frac{1}{2} \cdot (p_1 \;\texttt {+} \; p_2)). $$

At the beginning of the type reconstruction of \( \texttt {midpoint} \), the types of the symbols in the axiom are instantiated as follows:

$$\begin{aligned} \texttt {midpoint}&: \texttt {Vec} \langle \beta _1, \tau _1 \rangle \rightarrow \texttt {Vec} \langle \beta _2, \tau _2 \rangle \rightarrow \texttt {Vec} \langle \beta _3, \tau _3 \rangle \\ \cdot&: \texttt {R} \langle s_1 \rangle \rightarrow \texttt {Vec} \langle B_1, 0 \rangle \rightarrow \texttt {Vec} \langle s_1 B_1, 0 \rangle \\ \texttt {+}&: \texttt {Vec} \langle B_2, t_1 \rangle \rightarrow \texttt {Vec} \langle B_2, t_2 \rangle \rightarrow \texttt {Vec} \langle B_2, t_1 + t_2 \rangle . \end{aligned}$$

The derived equations between the index expressions are as follows:

$$ \{ B_2 = \beta _1, B_2 = \beta _2, B_1 = B_2, \beta _3 = s_1 B_1, s_1 = 1, t_1 = \tau _1, t_2 = \tau _2, 0 = t_1 + t_2, \tau _3 = 0 \}. $$

By solving these equations, we obtain the indexed-type of \( \texttt {midpoint} \) as follows:

$$ \texttt {midpoint} : \forall B_1 \mathord : \mathsf {GL}_2.\ \forall t_1 \mathord : \mathsf {T}_2.\ \texttt {Vec} \langle B_1, t_1 \rangle \rightarrow \texttt {Vec} \langle B_1, - t_1 \rangle \rightarrow \texttt {Vec} \langle B_1, 0 \rangle . $$

This type indicates that the midpoint of any two points P and Q remains the same when we move P and Q respectively to \( P + t_1 \) and \( Q - t_1 \) for any \( t_1 \in \mathbb {R}^2 \). While it is not wrong, the following type is more useful for our purpose:

$$\begin{aligned} \texttt {midpoint} : \forall B \mathord : \mathsf {GL}_2.\ \forall t \mathord : \mathsf {T}_2.\ \texttt {Vec} \langle B, t \rangle \rightarrow \texttt {Vec} \langle B, t \rangle \rightarrow \texttt {Vec} \langle B, t \rangle . \end{aligned}$$
(1)

To such symbols, we manually assigned a more appropriate type.Footnote 2

In the current system, 945 symbols have a type that includes indices. We manually assigned the types to 255 symbols that have no defining axioms. For 203 symbols we manually overwrote the inferred type as in the case of \(\texttt {midpoint} \). The types of the remaining 487 symbols were derived through the type reconstruction.

5 Variable Elimination Based on Invariance

In this section, we first provide an example of the variable elimination procedure based on invariance. We then describe the top-level algorithm of the variable elimination, which takes a formula as input and eliminates some of the quantified variables in it by utilizing the invariance indicated by an index variable. We finally list the elimination rule for each sort of index variable.

5.1 Example of Variable Elimination Based on Invariance

Let us consider again the proof of the existence of the centroid of a triangle. For triangle ABC, the configuration of the midpoints PQR of the three sides and the centroid G is described by the following formula:

$$ \psi (A, B, C, P, Q, R, G) := \left( \begin{array}{l} P = \texttt {midpoint} (B, C) \wedge \mathtt{on}(G, \mathtt{segment}(A, P)) \; \wedge \\ Q = \texttt {midpoint} (C, A) \wedge \mathtt{on}(G, \mathtt{segment}(B, Q)) \; \wedge \\ R = \texttt {midpoint} (A, B) \wedge \mathtt{on}(G, \mathtt{segment}(C, R)) \end{array} \right) $$

where \(\mathtt{on}(X, Y)\) stands for the inclusion of point X in a geometric object Y, and \(\mathtt{segment}(X, Y)\) stands for the line segment between points X and Y. Let \(\phi \) denote the existence of the centroid (and the three midpoints):

$$ \phi (A, B, C) := \exists G.\ \exists P.\ \exists Q.\ \exists R.\ \psi (A, B, C, P, Q, R, G). $$

Our goal is to prove \(\forall A.\ \forall B.\ \forall C.\ \phi (A, B, C)\).

The functions \(\texttt {midpoint} \), \(\texttt {on} \), and \(\texttt {segment} \) are invariant under translations and general linear transformations. The reconstruction algorithm hence derives

$$ \beta : \mathsf {GL}_2, \tau : \mathsf {T}_2 \; ; \; A: \texttt {Vec} \langle \beta , \tau \rangle , B: \texttt {Vec} \langle \beta , \tau \rangle , C: \texttt {Vec} \langle \beta , \tau \rangle \vdash \phi (A, B, C) : \texttt {Bool}. $$

By the abstraction theorem, this judgement implies the invariance of the proposition \(\phi (A, B, C)\) under arbitrary affine transformations:

$$ \forall g \in \mathrm {GL}_2.\ \forall t \in \mathrm {T}_2.\ \forall A, B, C.\ \phi (A, B, C) \Leftrightarrow \phi (t \circ g \circ A, t \circ g \circ B, t \circ g \circ C). $$

First, by considering the case of g being identity, we have

$$\begin{aligned} \forall t \in \mathrm {T}_2.\ \forall A, B, C.\ \phi (A, B, C) \Leftrightarrow \phi (t \circ A, t \circ B, t \circ C). \end{aligned}$$
(2)

By using this, we are going to verify \(\forall B, C.\ \phi (\mathbf{0}, B, C) \Leftrightarrow \forall A, B, C.\ \phi (A, B, C)\), by which we know that we only have to prove \(\forall B, C.\ \phi (\mathbf{0}, B, C)\).

Suppose that \(\forall B, C.\ \phi (\mathbf{0}, B, C)\) holds. Since \(\mathrm {T}_2\) acts transitively on \(\mathbb R^2\), for any \(A\in \mathbb R^2\), there exists \(t\in \mathrm {T}_2\) such that \(t\circ \mathbf{0}= A\). Furthermore, for any \(B, C \in \mathbb R^2\), by instantiating \(\forall B, C.\ \phi (\mathbf{0}, B, C)\) with \(B \mapsto t^{-1}\circ B\) and \(C \mapsto t^{-1}\circ C\), we have \(\phi (\mathbf{0}, t^{-1}\circ B, t^{-1}\circ C)\). By Eq. (2), we obtain \(\phi (t \circ \mathbf{0}, t\circ t^{-1}\circ B, t\circ t^{-1}\circ C)\), which is equivalent to \(\phi (A, B, C)\). Since ABC were arbitrary, we proved

$$ \forall B, C.\ \phi (\mathbf{0}, B, C) \Rightarrow \forall A, B, C.\ \phi (A, B, C). $$

The converse is trivial. We thus proved \(\forall B, C.\ \phi (\mathbf{0}, B, C) \Leftrightarrow \forall A, B, C.\ \phi (A, B, C)\).

The simplified formula, \(\forall B, C.\ \phi (\mathbf{0},B,C)\), is still invariant under the simultaneous action of \(\mathrm {GL}_2\) on B and C. Hence, by applying the type reconstruction again, we have \(\beta : \mathsf {GL}_2 \; ; \; B: \texttt {Vec} \langle \beta , 0 \rangle , C: \texttt {Vec} \langle \beta , 0 \rangle \vdash \phi (\mathbf{0}, B, C) : \texttt {Bool}\). It implies the following invariance: \(\forall g \in \mathrm {GL}_2.\ \forall B, C.\ \phi (\mathbf{0}, B, C) \Leftrightarrow \phi (\mathbf{0}, g \circ B, g \circ C)\).

We now utilize it to eliminate the remaining variables B and C. Although it is tempting to ‘fix’ B and C respectively at, e.g., \(\mathbf{e}_1 := (1, 0)\) and \(\mathbf{e}_2 := (0, 1)\), it incurs some loss of generality. For instance, when B is at the origin, there is no way to move B to \(\mathbf{e}_1\) by any \(g \in \mathrm {GL}_2\). We consider four cases:

  1. 1.

    B and C are linearly independent,

  2. 2.

    \(B \ne \mathbf{0}\), and B and C are linearly dependent,

  3. 3.

    \(C \ne \mathbf{0}\), and B and C are linearly dependent, and

  4. 4.

    B and C are both at the origin.

For each of these cases, we can find a suitable transformation in \(\mathrm {GL}_2\) as follows:

  1. 1.

    There exists \(g_1 \in \mathrm {GL}_2\) s.t. \(g_1 \circ B = \mathbf{e}_1\) and \(g_1 \circ C = \mathbf{e}_2\),

  2. 2.

    There exist \(g_2 \in \mathrm {GL}_2\) and \(r \in \mathbb R\) s.t. \(g_2 \circ B = \mathbf{e}_1\) and \(g_2 \circ C = r\mathbf{e}_1\),

  3. 3.

    There exist \(g_3 \in \mathrm {GL}_2\) and \(r' \in \mathbb R\) s.t. \(g_3 \circ C = \mathbf{e}_1\) and \(g_3 \circ B = r'\mathbf{e}_1\), and

  4. 4.

    We only have to know whether or not \(\phi (\mathbf{0}, \mathbf{0}, \mathbf{0})\) holds.

By a similar argument to the one for the translation-invariance, we have

$$ \forall B, C.\ \phi (\mathbf{0}, B, C) \Leftrightarrow \phi (\mathbf{0}, \mathbf{e}_1, \mathbf{e}_2) \wedge \forall r.\ \phi (\mathbf{0}, \mathbf{e}_1, r\mathbf{e}_1) \wedge \forall r'.\ \phi (\mathbf{0}, r'\mathbf{e}_1, \mathbf{e}_1) \wedge \phi (\mathbf{0}, \mathbf{0}, \mathbf{0}). $$

Thus, we eliminated all four coordinate values (i.e., x and y coordinates for B and C) in the first and the last case and three of them in the other two cases.

5.2 Variable Elimination Algorithm

The variable elimination algorithm works as follows. We traverse the formula of a problem in a top-down order and, for each subformula in the form of

$$ Qx_1.Qx_2.\cdots Qx_n.\ \phi (x_1, x_2, \dots , x_n, {\mathbf{y}}) \;\;\; (Q \in \{\forall , \exists \}) $$

where \({\mathbf{y}} = y_1, \dots , y_m\) are the free variables, we apply the type reconstruction procedure to \(\phi (x_1, x_2, \dots , x_n, {\mathbf{y}})\) and derive a judgement \(\varDelta ; \varGamma , x_1\mathord : \texttt {T} _1, \dots , x_n\mathord : \texttt {T} _n \vdash \phi (x_1, \dots , x_n, {\mathbf{y}}): \texttt {Bool} \). We then choose an index variable i that appears at least once in \(\texttt {T} _1, \dots , \texttt {T} _n\) but in none of the types of \({\mathbf{y}}\). It means the transformation signified by i acts on some of \(\{x_1, \dots , x_n\}\) but on none of \({\mathbf{y}}\). We select from \(\{x_1, \dots , x_n\}\) one or more variables whose types include i and are of the form \(\texttt {R} \langle \sigma \rangle \) or \(\texttt {Vec} \langle \beta , \tau \rangle \). Suppose that we select \(x_1, \dots , x_l\). Then we know the judgement \(\varDelta ; \varGamma , x_1\mathord : \texttt {T} _1, \dots , x_l\mathord : \texttt {T} _l \vdash Qx_{l+1}.\cdots Qx_{n}.\ \phi (x_1, \dots , x_n, {\mathbf{y}}) : \texttt {Bool}\) also holds. We then eliminate (or add restriction on) the bound variables \(x_1, \dots , x_l\) by one of the lemmas in Sect. 5.3 according to the sort of i. After the elimination, the procedure is recursively applied to the resulting formula and its subformulas.

5.3 Variable Elimination Rules

We now present how to eliminate variables based on a judgement of the form

$$ \varDelta ; \varGamma ,\ x_1: \texttt {T} _1, \dots , x_n: \texttt {T} _n \vdash \psi (x_1, \dots , x_n, {\mathbf{y}}): \texttt {Bool} $$

where \(\texttt {T} _1, \dots , \texttt {T} _n\) include no other variables than i; \(\varGamma = y_1\mathord : \texttt {U} _1, \dots , y_m\mathord : \texttt {U} _m\) is a typing context for \({\mathbf{y}} = y_1, \dots , y_m\); and \(\texttt {U} _1 \dots , \texttt {U} _m\) do not include i. Note that we can obtain a judgement of this form by the procedure in Sect. 5.2 and by substituting the unity of appropriate sorts for all index variables other than i in \(\texttt {T} _1, \dots , \texttt {T} _n\).

We provide the variable elimination rules as lemmas, one for each sort of i. They state the rules for variables bound by \(\forall \). The rules for \(\exists \) are analogous. In stating the lemma, we suppress \(\varDelta \) and \(\varGamma \) in the judgement and \({\mathbf{y}}\) in \(\psi \) for brevity but we still assume the above-mentioned condition hold.

Some complication arises due to the fact that if \(k \ne l\), then \(\texttt {T} _k\) and \(\texttt {T} _l\) may be indexed with different expressions of i. We thus need to consider potentially different transformations \(\llbracket \texttt {T} _1 \rrbracket (i), \dots , \llbracket \texttt {T} _n \rrbracket (i)\) applied simultaneously on \(x_1, \dots , x_n\). Please refer to supplementary material on the first author’s web page for a general argument behind the rules and the proofs of the lemmas (https://researchmap.jp/mtzk/?lang=en).

\(\mathsf {T}_k\): The following lemma states that, as we saw in Sect. 5.1, we have only to consider the truth of a formula \(\psi (x)\) at \(x = \mathbf{0}\) if \(\psi (x)\) is translation-invariant.

Lemma 1

If \(x: \texttt {Vec} \langle 1, \tau (t) \rangle \vdash \psi (x): \texttt {Bool} \) holds for \(t: \mathsf {T}_k \; (t \in \{2, 3\})\), then \(\forall x.\ \psi (x) \Leftrightarrow \psi (\mathbf{0})\).

\(\mathsf {O}_2\): The following lemma means that we may assume x is on the x-axis if \(\psi (x)\) is invariant under rotation and reflection.

Lemma 2

If \(x: \texttt {Vec} \langle \beta (O), 0 \rangle \vdash \psi (x): \texttt {Bool} \) holds for \(O: \mathsf {O}_2\), then \(\forall x.\ \psi (x) \Leftrightarrow \forall r.\ \psi (r \mathbf{e}_1)\).

\(\mathsf {O}_3\): A judgement in the following form implies different kinds of invariance according to \(\beta _1\) and \(\beta _2\):

$$\begin{aligned} x_1: \texttt {Vec} \langle \beta _1(O), 0 \rangle , x_2: \texttt {Vec} \langle \beta _2(O), 0 \rangle \vdash \psi (x_1, x_2): \texttt {Bool} . \end{aligned}$$
(3)

In any case, we may assume \(x_1\) is on the x-axis and \(x_2\) is on the xy-plane for proving \(\forall x_1, x_2.\ \psi (x_1, x_2)\), as stated in the following lemma.

Lemma 3

If judgement (3) holds for \(O: \mathsf {O}_3\), then

$$ \forall x_1.\ \forall x_2.\ \psi (x_1, x_2) \Leftrightarrow \forall p, q, r\in \mathbb R.\ \psi (p\mathbf{e}_1, q\mathbf{e}_1 + r\mathbf{e}_2). $$

\(\mathsf {GL}_1\): For \(s: \mathsf {GL}_1\), a judgement \(x: \texttt {R} \langle \sigma (s) \rangle \vdash \psi (x): \texttt {Bool} \) implies, either

  • \(\psi (x)\) is invariant under change of sign, i.e., \(\psi (x) \Leftrightarrow \psi (-x)\),

  • \(\psi (x)\) is invariant under positive scaling, i.e., \(\psi (x) \Leftrightarrow \psi (fx)\) for all \(f > 0\), or

  • \(\psi (x)\) is invariant under arbitrary scaling, i.e., \(\psi (x) \Leftrightarrow \psi (fx)\) for all \(f \ne 0\).

The form of \(\sigma \) determines the type of invariance. The following lemma summarizes how we can eliminate or restrict a variable for these cases.

Lemma 4

Let \(\sigma (s) = s^e \cdot |s|^f \;\; (e\ne 0 \text { or } f \ne 0)\) and suppose a judgement \(x: \texttt {R} \langle \sigma (s) \rangle \vdash \psi (s): \texttt {Bool} \) holds for \(s: \mathsf {GL}_1\). We have three cases:

  1. 1.

    if \(e + f = 0\), then \(\forall x.\ \psi (x) \Leftrightarrow \forall x \ge 0.\ \psi (x)\), otherwise,

  2. 2.

    if e is an even number, then \(\forall x.\ \psi (x) \Leftrightarrow \psi (1) \wedge \psi (0) \wedge \psi (-1)\), and

  3. 3.

    if e is an odd number, then \(\forall x.\ \psi (x) \Leftrightarrow \psi (1) \wedge \psi (0)\).

\(\mathsf {GL}_2\) For \(B: \mathsf {GL}_2\), a judgement in the following form implies different kinds of invariance of \(\psi (x_1, x_2)\) depending on the form of \(\beta _1\) and \(\beta _2\):

$$\begin{aligned} x_1: \texttt {Vec} \langle \beta _1(B), 0 \rangle , x_2: \texttt {Vec} \langle \beta _2(B), 0 \rangle \vdash \psi (x_1, x_2). \end{aligned}$$
(4)

The following lemma summarizes how we eliminate the variables in each case.

Lemma 5

Let \(\beta _j(B) = \det (B)^{e_j}\cdot |\det (B)|^{f_j} \cdot B\) and \(g_j = e_j + f_j \; (j \in \{1, 2\})\). If judgement (4) holds, then, letting \(\psi _0 := \psi (\mathbf{0}, \mathbf{0}) \wedge \forall r.\ \psi (r\mathbf{e}_1, \mathbf{e}_1) \wedge \forall r.\ \psi (\mathbf{e}_1, r\mathbf{e}_1)\) and \(\varPsi := \forall x_1.\ \forall x_2.\ \psi (x_1, x_2)\), the following equivalences hold:

  1. 1.

    If \(g_1 + g_2 + 1 = 0\) and

    • if \(e_1 + e_2\) is an even number, then \(\varPsi \Leftrightarrow \psi _0 \wedge \psi (\mathbf{e}_1, \mathbf{e}_2)\)

    • if \(e_1 + e_2\) is an odd number, then \(\varPsi \Leftrightarrow \psi _0 \wedge \psi (\mathbf{e}_1, \mathbf{e}_2) \wedge \psi (\mathbf{e}_1, -\mathbf{e}_2)\)

  2. 2.

    If \(g_1 + g_2 + 1 \ne 0\), then \(\varPsi \Leftrightarrow \psi _0 \wedge \forall r.\ \psi (r\mathbf{e}_1, \mathbf{e}_2)\).

A similar lemma holds for the invariances indicated by an index variable of sort \(\mathsf {GL}_3\). We refrain from presenting it for space reasons.

Table 3. Results on All RCF Problems in ToroboMath Benchmark
Table 4. Results on RCF Problems with Invariance Detected and Variable Eliminated
Fig. 5.
figure 5

Comparison of Elapsed Time with and without the Invariance Detection based on AITs (Left: All Problems; Right: Problems Solved within 60 s)

6 Experiment

We evaluated the effectiveness of the proposed method on the pre-university math problems in the ToroboMath benchmark. We used a subset of the problems that can be naturally expressible (by human) in the language of RCF. Most of them are either in geometry or algebra. Note that the formalization was done in the language introduced in Sect. 2 but not directly in the language of RCF. The problems are divided according to the source of the problems; IMO problems were taken from past International Mathematical Olympiads, Univ problems were from entrance exams of Japanese universities, and Chart problems were from a popular math practice book series. Please refer to another paper [16] on the ToroboMath benchmark for the details of the problems.

The type reconstruction and formula simplification procedures presented in Sect. 4 and Sect. 5 were implemented as a pre-processor of the formalized problems. The time spent for the preprocessing was almost negligible (0.76 s per problem on average) compared to that for solving the problems.

We compared the ToroboMath system with and without the pre-processor (respectively called AlgIdx and Baseline below). The Baseline system is equipped with Iwane and Anai’s invariance detection and simplification algorithm [10] that operates on the language of RCF while AlgIdx is not with it. Thus, our evaluation shall reveal the advantage of detecting and exploiting the invariance of the problem expressed in a language that directly encodes its geometric meaning.

Table 3 presents the results on all problems. The solver was run on each problem with a time limit of 600 s. The table lists the number of problems, the percentages of the problems solved within the time limit, and the average wall-clock time spent on the solved problems. The number of the solved problems is significantly increased in the IMO division. A modest improvement is observed in the other two divisions. Table 4 presents the results only on the problems in which at least one variable was eliminated by AlgIdx. The effect of the proposed method is quite clearly observed across all problem divisions and especially on IMO. On IMO, the average elapsed time on the problems solved by AlgIdx is longer than that by Baseline; it is because more difficult problems were solved by AlgIdx within the time limit. In fact, the average speed-up by AlgIdx (last column in Table 4) is around 500% on Univ and Chart; i.e., on the problems solved by both, AlgIdx output the answer five times faster than Baseline.

A curious fact is that both AlgIdx and Baseline tended to need more time to solve the problems on which an invariance was detected and eliminated by AlgIdx (i.e., Time in Table 4) than the average over all solved problems (Time in Table 3). It suggests that a problem having an invariance, or equivalently a symmetry, is harder for automatic solvers than those without it.

Figure 5 shows a comparison of the elapsed time for each problem. Each point represents a problem, and the x and y coordinates respectively indicate the elapsed time to solve (or to timeout) by Baseline and AlgIdx. We can see many problems that were not solved by Baseline within 600 s were solved within 300 s by AlgIdx. The speed-up is also observed on easier problems (those solved in 60 s) as can be seen in the right panel of Fig. 5.

Table 5. Percentage of Problems from which one or more Variables are Eliminated by the Rule for each Sort
Table 6. Most Frequent Invariance Types Detected and Eliminated

Table 5 lists the fraction of problems on which one or more variables are eliminated based on the invariance indicated by an index variable of each sort. Table 6 provides the distribution of the combination of the sorts of invariances detected and eliminated by AlgIdx.

7 Conclusion

A method for automating w.l.o.g. arguments on geometry problems has been presented. It detects an invariance in a problem through type reconstruction in AIT and simplifies the problem utilizing the invariance. It was especially effective on harder problems including past IMO problems. Our future work includes the exploration for a more elaborate language of the index expressions that captures various kind of invariance while keeping the type inference amenable.