Towards a geometry deductive database prover

The Geometry Automated-Theorem-Provers (GATP) based on the deductive database method use a data-based search strategy to improve the efficiency of forward chaining. An implementation of such a method is expected to be able to efficiently prove a large set of geometric conjectures, producing readable proofs. The number of conjectures a given implementation can prove will depend on the set of inference rules chosen, the deductive database method is not a decision procedure. Using an approach based in an SQL database library and using an in-memory database, the implementation described in this paper tries to achieve the following goals. Efficiency in the management of the inference rules, the set of already known facts and the new facts discovered, by the use of the efficient data manipulation techniques of the SQL library. Flexibility, by transforming the inference rules in SQL data manipulation language queries, will open the possibility of meta-development of GATP based on a provided set of rules. Natural language and visual renderings, possible by the use of a synthetic forward chaining method. Implemented as an open source library, that will open its use by third-party programs, e.g. the dynamic geometry systems.


Introduction
Geometry automated-theorem-proving began, in the late 1950, by adapting the traditional geometric proof methods to the general-purpose reasoning approaches developed in artificial intelligence [8].Subsequent work, e.g., [5,11], followed, mostly, the same synthetic reasoning of automating the traditional proof methods.Despite their initial success, and even though being able to produce readable proofs, these synthetic methods did not make much progress as they revealed to be narrow-scoped and inefficient.
In recent years, synthetic methods have seen a resurgence, with results like the ArgoCLP theorem prover [17], based on coherent logic, or the deductive database approach [4], with mixed results in different classes of geometric problems.
Since the early implementations of automated theorem provers for geometry, synthetic provers based on inference rules and using forward chaining reasoning are considered to be more suited for education proposes.Unlike the algebraic methods, and the semi-synthetic methods [3], they can produce readable synthetic proofs and, depending on the chosen set of rules, more adapted to a given school audience, e.g.secondary school students [19].
The authors will now present their recent efforts in writing a prover using the geometry deductive database method.The goal of such endeavour are to produce a GATP that is: efficient, flexible, with natural language and visual renderings and implemented as an open source library.

Overview of the paper
The paper is organised as follows: first, in Section 2, a brief description of the geometry deductive database method is presented.In Section 3 the prover is discussed.Final conclusions are drawn, and future work is foreseen in Section 4.

The geometry deductive database method
We now present a brief description of the geometry deductive database method (GDDM), and make some remarks regarding the inference rules and the deductive database.A thorough explanation of this method is provided in [4].

Brief description
The geometry deductive database method is a synthetic method that uses forward chaining to prove non-trivial geometry theorems efficiently.
Given a geometric configuration, the conjecture, the prover finds its fix-point with respect to a predefined set of inference rules/axioms.In other words, it finds all the properties that can be deduced from that configuration, using those axioms.A conjecture is proved if it is among the deduced facts.Otherwise, the conjecture is not proved, and a different approach must be used to prove or disprove it.
The algorithm is a data-based search strategy, where a list of new facts is kept and for each new fact the system applies all possible inference rules, eventually generating other new facts (see Fig. 1).The facts that are being discovered along the process are kept in an old facts list, being usable later in other inferences.In the original implementation [4] the process could be "restarted" by the introduction of auxiliary points.The list of rules that are able to introduce new points can be applied with the restriction that no new point can be introduced using a previously introduced point, ensuring that only a finite number of new points can be created.

Inference rules
In [4] a set of inference rules is proposed, along with some guidelines on how to select a "better" set of geometric (inference) rules.However, due to syntactic inconsistencies found Fig. 1 GDDM algorithm in that list, some adaptations/correction were made.We now present two examples were correction were necessary: -Rule D42 states in its assumptions that points P, Q, A and B are collinear, i.e., coll(P, Q, A, B), but coll (collinear) is a 3-ary predicate.The solution is simple, assume coll(P, Q, A) & coll(P, Q, B). -Rule 61 uses the expression AB = PQ, and nowhere is defined the equality of line segments.However, the predicate congruent segments is defined.Therefore, we used cong(A, B, P, Q) instead.
The set of rules presented in [4] is consistent1 but it is far from adequate.The Java Geometry Expert (JGEx) [20], written by the authors of [4], and the only, as far as the authors are aware, GATP that implements the GDDM, uses a different set of rules than the ones in [4].Indeed, there are some JGEx examples that cannot be proven by the set of rules presented in [4].
For example, for the problem 01.gex of the set of problems in JGEx related to the the GDD prover (see Fig. 2), the following rules are used: steps 3, 4 (r36) and 2 (rule without an explicit identifier) are in correspondence with rules D52 and D25 of the rule-set presented in [4], but the final step, rule r24, have no counterpart in that rule-set.A new rule2 must be added in order to be able to prove this example.where the condition, ¬col(F,G,D), is a non-degenerate (ndg) condition.
In example 02.gex, the very first step of the proof needs, again, this rule.To build an adequate set of rules, based on this set of rules, is among our ongoing projects. 3

The OGP-GDDM prover
The goals of the OGP-GDDM Prover are: to produce a GATP that is efficient, flexible, with natural language and visual renderings and implemented as an open source library.
Efficient by the use of an in-memory SQL database library, a data-based search strategy can be efficiently implemented.This is a still unproved claim.We expected to get a more definitive conclusion as soon as we improve our current implementation.Flexible by transforming the inference rules in SQL data manipulation language (DML) queries, we claim that a generic prover can be built.The prover would begin by including the rules of inference, transformed in DML queries, and then the attempt to prove the conjecture would be made, based of that "set or rules".Rendering being a synthetic method that uses a forward chaining reasoning, the natural and visual rendering of the proofs will be possible (e.g.see the implementation of the method in JGEx)4 [20].Open Source the implementation as a library will allow the use of the GATP by thirdparty programs (e.g.GeoGebra [10]).Its integration in the Open Geometry Prover Community Project (OGPCP) 5 [1] and the use of the First Order Format (FOF), as specified in [18], will allow an easy implementation of filters from/to the prover to/from other programs.The geometric predicates used are those described in [4].

Efficiency considerations
The geometry deductive databases method, as described in [4], uses the ideas from deductive database theory, e.g.fix.points and the treatment of negative clauses, proposing a structured deductive database and a data-based search strategy to improve search efficiency.Our idea is to avoid a painful and difficult task of implement such structured deductive database by using an off-the-shelf SQL database engine.The easy-of-use and power gain by the use of the Data Definition Language (DDL) and Data Manipulation Language (DML) will give an efficient (still to be proved) and flexible (see Section 3.2) solution.
Efficiency consideration strongly implied that a library, in-memory, database must be used.The library implementation to allow an easy integration in our prover and, given the expected dimension of the database, an in-memory solution is appropriated.The choice fell on SQLite. 6It provides an efficient and reliable management and query system.It is a library easily integrated in C/C++ programs, and with the needed option of an in-memory database.

Flexibility considerations
A rule based theorem prover, like the deductive database method, will reach its maximum usefulness (in terms of wider users base) if it can accept different rules sets.By transforming the inference rules in SQL data manipulation language queries, we aim at getting that degree of flexibility.
For now, the implementation has the rules in [4] (see Section 2.2) hard-coded.A modular approach was used with a clear separation between the parsing of the FOF input text, the application of the different inference rules, the overall prover mechanism, etc.
As an example, let's consider rule D1 about collinear points: The steps to add this rule to the prover are the following: " c o l l " " ( " " i d e n t i f i e r " " , " " i d e n t i f i e r " " , " " i d e n t i f i e r " " ) " that collinear facts may be added to the database, in this case the fact coll(A, B, C). 4. Add a rule to D1 as a function in prover.cpp(see Listing 1). 5. In the file prover.cpp,modify the function fixedPoint(), where the fix-point is calculated, to used the rule D1. 6.If needed, in the file prover.cpp,modify the function proved(), where the fixpoint is searched to check if the method proved a conjecture, in this case to search for coll(A, C, B). 7. If needed, in the file prover.cpp,modify the function showFixedPoint(), where the calculated fix-point is displayed, so that collinear facts are shown.
Our ultimate goal is to write a generic prover, capable of accept different set of rules and use then as an inference base.
The planned work goes in the direction of building a meta-prover-builder, that is, a program that, given a set of inference rules, previously checked for consistency, synthesise another program, a gddm-prover based on the rules just provided.If successful we will try to work on the next step a generic rule-based geometric automated theorem prover.

Rendering considerations
In a parallel project devoted to the use of automated deduction tools and methods in secondary schools, where the authors also participate, it is claimed that, one of the bottlenecks that the introduction of automated deduction systems in secondary schools face is the difficulty of tackling the task by automatic means [14].There is the need of efficient provers (e.g.Wu's (algebraic) method), but there is also the need of readable proof scripts (e.g.Area Method, semi-synthetic) and this two objectives are difficult to attain together [12,13].There is also the need of an automated deduction method that is adapted to the secondary Listing 1 Rule D1 schools learning, i.e. a method that uses the usual set of rules used on those schools and a method capable of rendering the proofs produced in the usual (in)formalism of secondary schools.As defended in [19], synthetic provers based on inference rules and using forward chaining reasoning are more suited for education proposes.
In [19] two problems, suited to a 7th year class (≈12-year-old students) are presented.An appropriated set of rules based on the set of rules [6,7] is presented and the proofs (that could be) produced (by an appropriated implementation) of the GDDM method are shown.
The GDDM method is not a decision procedure: this is not an important question when secondary schools education is concerned.The GDDM method is not the most efficient GATP (unproven statement, but probably valid), but that it is not the most important question when secondary schools education is concerned.The important question is the possibility to have the synthetic proof and in a language that students and teachers can understand [9].
Looking again to Fig. 2, where JGEx's GDD prover was used to prove a given geometric conjecture, we can see the construction and the conjecture (left window) and the proof developed by the GDD prover (right window).Looking now only to the right window, it can be seen that in the left section a synthetic proof is shown and in the right section the construction is shown.The conjecture (on the left) and its visual rendering (on the right) are both highlighted and this connection between the proof and the construction can be explored for each step of the proof.What is missing?A set of rules close to the needs of the secondary schools students and teachers, and a tool that can be easily used by them [20]. 7he development of the OGP-GDDM is an ongoing project, one of its goals is: to be able to render the synthetic proof produce by the GDDM in a (in)formal proof, hiding all the necessary details, translating it to a natural language form, etc.; to establish the connection with the construction, linking the proof and its visualisation.The first step is easy, it is a natural "collateral effect" of the GDDM prover, the others are more challenging.The idea is to use the environment Web Geometry Laboratory where an integrated DGS (GeoGebra) will allow to develop the construction, then the conjecture could be written by the students, and the GDDM prover call in the background and, after the proof was completed, the proof scrip and its visualisation on the construction (in GeoGebra), could be explored [15,16].

Open source library
If the target audience is not only the ATP community, but the "public" in general (e.g.students and teachers in Secondary Schools) a close-box prover with esoteric input and output languages will be close to completely useless.To reach a larger audience the GATP should be a "book" in a "library", ready to be "read" by other programs, i.e. they should be developed as a library, with a clear documented API, 8 ready to be incorporated in third-party programs (e.g.GeoGebra).
The GDDM prover is being developed as an open source library, available at GitHub9 [1].It uses, as input language, the TPTP, 10 First Order Format (FOF), as specified in [18], to allow an easy implementation of filters from/to the prover to/from other programs.It is part of the Open Geometry Prover Community Project (OGPCP) 11 [1].The OGPCP, aims to integrate different efforts in the development of GATP, namely: to provide a common open access repository for the development of GATPs; to provide an API to the different GATP in such a way that they can be easily used by users; to develop a portfolio strategies to allow choosing the best GATP for any given geometric conjecture; to interface with repositories of geometric knowledge, e.g.TGTP, 12 TPTP; to develop a GATP System Competition (GASC) to allow rating GATPs [2].

Source repository
The OGP-GDDM is hosted at GitHub. 9 It is code is be made available under the GNU General Public Licence, 13 version 3 or later, and the documentation under the GNU Free Documentation Licence, 14 version 1 or later.It is available only as sourcecode.Provided that you use a Unix-like environment and have the usual tools make, flex, bison, a C++ compiler and the SQLite library installed, it is a straightforward process to compile and install.Just change to the source code directory, provers/ogpgddm, and type the following commands: $ make $ sudo make install Again, this is an ongoing project.The goals are: to build a set of consistent set of rules that can mimic those used in the secondary schools; to implement those rules in an (reasonably) efficient GDDM prover (see Sections 3.1 and 3.2); to be capable of producing natural language and visual renderings (see Section 3.3; to be able to integrate all this in a tool that can be easily used in secondary schools (see Section 3.4);

Usage and examples
Considering that OGP-GDDM is part of OGPCP it must adhere to its API, that is, using the FOF format and, in the stand-alone version, having a behaviour consistent with the other OGPCP provers.

Usage as a library
The static library, libogpgddm.a, was created, it contains the necessary "books" to build a prover based in this library.It is enough to include the different "books" (e.g.#include "prover.hpp")and then, at compilation time, the option, -logpgddm, must be included.All the files, including the documentation, ogppm.pdf,are available in the GitHub repository.

Usage as a stand-alone program
A stand-alone version of the prover was already built, using the library.Its command line syntax is: ogpgddm <o p t i o n > | <c o n j e c t u r e > where, option, is one of: "-h" or "--help", prints a help message and exits; "-V" or "--version", prints OGP-GDDM version and exits.Considerations about the conjecture will be made below.The result of a proof is sent to the standard output, errors included.In its current stage, no file is created, but that will change in a future version-a file with the proof will be created.

Conjecture format
Once more, because OGP-GDDM is an OGPCP GATP, the conjectures are written in FOF.In its current version the inference rules are hard-coded, as such the the inclusion of axioms is ignored.
In its current version, 0.6.0,OGP-GDDM is already capable to parse any problem in the FOF format.Even with the problems in the set of inference rules (see Section 2.2) OGP-GDDM is already capable to prove some problems.For example, the Midpoint Theorem, problem GEO0007 of the TGTP repository of geometric problems.
Theorem 1 (Midpoint Theorem) Let ABC be a triangle, and let D and E be the midpoints of AC and BC respectively.Then the line DE is parallel to the base AB.Given that the JGEx implementation (using a modified set of rules) have found the fixedpoint in 0.002s (see the lower bar of the right window of Fig. 3), it is clear that the current implementation of GDDM still needs to be improved, in which efficiency is concerned.This is not a surprise, the current implementation of OGP-GDDM does not implement any efficiency oriented procedure.
In a, still in development, new version, 0.7.0, the rule B1 (see Section 2.2) was added.With that extra rule, the problem 01 of JGEx was already proved by OGP-GDDM.This clearly indicates that the current set of rules must be revised.

Conclusions and future work
In its current state, 0.6.0,our implementation of the GDD method is already able to prove some simple geometric conjectures, but still far from our overall goals: Efficiency Using [4] as a reference, some procedures should be optimised, also some changes in the structure of the database.After those improvements have being done, it will be necessary to validate the goodness of the approach used, i.e., the use of a SQLite in-memory database.

Flexibility
In its current state the set of rules is hard-coded in the provers overall code.
The implementation is very modular, so it is to add, delete or modify rules, but a more generic approach is thought.Two lines of research: the possibility to have a "provergenerator", i.e. a meta-program that, given a set of rules (transformed in DDL/DML queries) can synthesise another program, the prover for that set of rules.Another possibility is to build a generic GATP that can accept a set of rules, in a given format, and proceed with that set of rules.In the immediate future we will work on the first of this two approaches.Again, it will be necessary to validate the goodness of the used approach.

Listing 5 OGP-GDDM, JGEx problem 01
Natural language and visual renderings The development of a crude proof script will be easy to implement and it will be included in a next version of the OGP-GDDM prover.The natural language rendering, the next "natural" step will be, we anticipate, also not difficult.The linking of the natural language and the construction, with visual highlights, will require a third-party DGS, as already said we are planning to use the Web Geometry Laboratory and its embedded GeoGebra.

Open source library
The OGP-GDDM is being developed under the OGPCP, as an open source GATP that can be used as a stand-alone, program or as a library, included as in C++ program.We hope that "with a little help from our friends", OGP-GDDM can became an useful tool to many practitioners of the automated deduction area.
A parallel line of research, but with an important impact in the final outcome of the prover, is related to the set or rules used by the prover.This is an ongoing project, being done with the collaboration of other researchers, whose goal is to find a good set of rules to be used by rule based GATP.
The usefulness (or not) of the approach used, i.e., the use of a SQLite in-memory database, must be evaluated under two, somehow, conflicting, criteria, the efficiency and the flexibility.This is something that should be evaluated alongside the use of the GATP by different type of users, e.g.researchers, developers of programs incorporating automated deduction tools, students and teachers.We will try to follow those users and give support whenever necessary.
Funding Open access funding provided by FCT-FCCN (b-on).The authors were partially supported by FCT -Foundation for Science and Technology, I.P., within the scope of the project CISUC -UID/CEC/00326/2020 and by European Social Fund, through the Regional Operational Program Centro 2020.

1 .
If needed, add a rule to the scanner (scanner.ll)" c o l l " r e t u r n yy : : p a r s e r : : make COLL ( l o c ) ; and the correspondent lines of code to the parser (parser.yy)COLL " c o l l " ( . . . ) | c o l l { } ; ( . . . ) c o l l :

Fig. 3
Fig. 3 JGEx -TGTP, GEO0007 problem If needed, in the file dbRAM.cpp,create a new table in the database for the collinear predicate-for each predicate there is one table.3.If needed, in the file foftodb.cpp,modify the function readFileLoadDB() so n t 2 [ d r v .numGeoCmd ] = $5 ; d r v .p o i n t 3 [ d r v .numGeoCmd ] = $7 ; } ; 2.