Keywords

1 Introduction and Context

Mathematical Software, i.e. tools for effectively computing mathematical objects, is a broad discipline: the objects in question may be expressions such as polynomials or logical formulae, algebraic structures such as groups, or even mathematical theorems and their proofs. In recent years there have been examples of software that acts on such objects being improved through the use of artificial intelligence techniques. For example, [21] uses a Monte-Carlo tree search to find the representation of polynomials that are most efficient to evaluate; [22] uses a machine learnt branching heuristic in a SAT-solver for formulae in Boolean logic; [18] uses pattern matching to determine whether a pair of elements from a specified group are conjugate; and [1] uses deep neural networks for premise selection in an automated theorem prover. See the survey article [12] in the proceedings of ICMS 2018 for more examples.

Machine Learning (ML), that is statistical techniques to give computer systems the ability to learn rules from data, may seem unsuitable for use in mathematical software since ML tools can only offer probabilistic guidance, when such software prizes exactness. However, none of the examples above risked the correctness of the end-result in their software. They all used ML techniques to make non-critical choices or guide searches: the decisions of the ML carried no risk to correctness, but did offer substantial increases in computational efficiency. All mathematical software, no matter the mathematical domain, will likely involve such choices, and our thesis is that in many cases an ML technique could make a better choice than a human user, so-called magic constants [6], or a traditional human-designed heuristic.

Contribution and Outline

In Sect. 2 we briefly survey our recent work applying ML to improve an algorithm in a computer algebra system which acts on sets of polynomials. We describe how we proposed a more appropriate definition of model accuracy and used this to improve the selection of hyper-parameters for ML models; and a new technique for identifying features of the input polynomials suitable for ML.

These advances can be applied beyond our immediate application: the feature identification to any situation where the input is a set of polynomials, and the hyper-parameter selection to any situation where we are seeking to take a choice that minimises a computation time. Hence we saw value in packaging our techniques into a software pipeline so that they may be used more widely. Here, by pipeline we refer to a succession of computing tasks that can be run as one task. The software is freely available as a Zenodo repository here: https://doi.org/10.5281/zenodo.3731703

We describe the software pipeline and its functionality in Sect. 3. Then in Sect. 4 we describe its application on a dataset we had not previously studied.

2 Brief Survey of Our Recent Work

Our recent work has been using ML to select the variable ordering to use for calculating a cylindrical algebraic decomposition relative to a set of polynomials.

2.1 Cylindrical Algebraic Decomposition

A Cylindrical Algebraic Decomposition (CAD) is a decomposition of ordered \(\mathbb {R}^n\) space into cells arranged cylindrically, meaning the projections of cells all lie within cylinders over a CAD of a lower dimensional space. All these cells are (semi)-algebraic meaning each can be described with a finite sequence of polynomial constraints. A CAD is produced for either a set of polynomials, or a logical formula whose atoms are polynomial constraints. It may be used to analyse these objects by finding a finite sample of points to query and thus understand the behaviour over all \(\mathbb {R}^n\). The most important application of CAD is to perform Quantifier Elimination (QE) over the reals. I.e. given a quantified formula, a CAD may be used to find an equivalent quantifier free formulaFootnote 1.

CAD was introduced in 1975 [10] and is still an active area of research. The collection [7] summarises the work up to the mid-90s while the background section of [13], for example, includes a summary of progress since. QE has numerous applications in science [2], engineering [25], and even the social sciences [23].

CAD requires an ordering of the variables. QE imposes that the ordering matches the quantification of variables, but variables in blocks of the same quantifier and the free variables can be swappedFootnote 2. The ordering can have a great effect on the time/memory use of CAD, the number of cells, and even the underlying complexity [5]. Human designed heuristics have been developed to make the choice [3, 4, 11, 14] and are used in most implementations.

The first application of ML to the problem was in 2014 when a support vector machine was trained to choose which of these heuristics to follow [19, 20]. The machine learned choice did significantly better than any one heuristic overall.

2.2 Recent Work on ML for CAD Variable Ordering

The present authors revisited these experiments in [15] but this time using ML to predict the ordering directly (because there were many problems where none of the human-made heuristics made good choices and although the number of orderings increases exponentially, the current scope of CAD application means this is not restrictive). We also explored a more diverse selection of ML methods available in the Python library scikit-learn (sklearn) [24]. All the models tested outperformed the human made heuristics.

The ML models learn not from the polynomials directly, but from features: properties which evaluate to a floating point number for a specific polynomial set. In [20] and [15] only a handful of features were used (measures of degree and frequency of occurrence for variables). In [16] we developed a new feature generation procedure which used combinations of basic functions (average, sign, maximum) evaluated on the degrees of the variables in either one polynomial or the whole system. This allowed for substantially more features and improved the performance of all ML models. The new features could be used for any ML application where the input is a set of polynomials.

The natural metric for judging a CAD variable ordering is the corresponding CAD runtime: in the work above models were trained to pick the ordering which minimises this for a given input. However, this meant the training did not distinguish between any non-optimal ordering even though the difference between these could be huge. This led us to a new definition of accuracy in [17]: to picking an ordering which leads to a runtime within \(x\%\) of the minimum possible.

We then wrote a new version of the sklearn procedure which uses cross-validation to select model hyper-parameters to minimise the total CAD runtime of its choices, rather than maximise the number of times the minimal ordering is chosen. This also improved the performance of all ML models in the experiments of [17]. The new definition and procedure are suitable for any situation where we are seeking to take a choice that minimises a computation time.

3 Software Pipeline

The input to our pipeline is given by two distinct datasets used for training and testing, respectively. An individual entry in the data set is a set of polynomials that represent an input to a symbolic computation algorithms, in our case CAD. The output is a corresponding sequence of variable ordering suggestions for each set of polynomials in the testing dataset.

The pipeline is fully automated: it generates and uses the CAD runtimes for each set of polynomials under each admissible variable ordering; uses the runtimes from the training dataset to select the hyper-parameters with cross-validation and tune the parameters of the model; and evaluates the performance of those classifiers (along with some other heuristics for the problem) for the sets of polynomials in the testing dataset.

We describe these key steps in the pipeline below. Each of the numbered stages can be individually marked for execution or not in a run of the pipeline (avoiding duplication of existing computation). The code for this pipeline, written all in Python, is freely available at: https://doi.org/10.5281/zenodo.3731703.

I. Generating a Model Using the Training Dataset

(a) Measuring the CAD Runtimes: The CAD routine is run for each set of polynomials in the training dataset. The runtimes for all possible variable orderings are stored in a different file for each set of polynomials. If the runtime exceeds a pre-defined timeout, the value of the timeout is stored instead.

(b) Polynomial Data Parsing: The training dataset is first converted to a format that is easier to process into features. For this purpose, we chose the format given by the terms() method from the Poly class located in the sympy package for symbolic computation in Python.

Here, each monomial is defined by a tuple, containing another tuple with the degrees of each variable, and a value defining the monomial coefficient. The polynomials are then defined by lists of monomials given in this format, and a point in the training dataset consists of a list of polynomials. For example, one entry in the dataset is the set \(\{235 x_1+42 x_2^2, 2 x_1^2 x_3-1\}\) which is represented as

$$\begin{aligned} \left[ \left[ \left( (1,0,0),235\right) ,\left( (0,2,0),42\right) \right] ,\left[ \left( (2,0,1),2\right) ,\left( (0,0,0),-1\right) \right] \right] . \end{aligned}$$

All the data points in the training dataset are then collected into a single file called terms_train.txt after being placed into this format. Subsequently, the file y_train.txt is created storing the index of the variable ordering with the minimum computing times for each set of polynomials, using the runtimes measured in Step I(a).

(c) Feature Generation: Here each set of polynomials in the training dataset is processed into a fixed length sequence of floating point numbers, called features, which are the actual data used to train the ML models in sklearn. This is done with the following steps:

Fig. 1.
figure 1

Generating feature \(\text {av}_p\left( \text {max}_m\left( d_1^{m,p}\right) \right) \) from data stored in the format of Section I(b). Here \(d_1^{m,p}\) denotes the degree of variable \(x_1\) in polynomial number p and monomial number m, and \(\text {av}_p\) denotes the average function computed for all polynomials [16].

  1. i.

    Raw feature generation

    We systematically consider applying all meaningful combinations of the functions average, sign, maximum, and sum to polynomials with a given number of variables. This generates a large set of feature descriptions as proposed in [16]. The new format used to store the data described above allows for an easy evaluation of these features. An example of computing such features is given in Fig. 1. In [16] we described how the method provides 1728 possible features for polynomials constructed with three variables for example. This step generates the full set of feature descriptions, saved in a file called features_descriptions.txt, and the corresponding values of the features on the training dataset, saved in a file called features_train_raw.txt.

  2. ii.

    Feature simplification

    After computing the numerical values of the features in Step I(c)i this step will remove those features that are constant or repetitive for the dataset in question, as described in [16]. The descriptions of the remaining features are saved in a new file called features_descriptions_final.txt.

  3. iii.

    Final feature generation

    The final set of features is computed by evaluating the descriptions in features_descriptions_final.txt for the training dataset. Even though these were already evaluated in Step I(c)i we repeat the evaluation for the final set of feature descriptions. This is to allow the possibility of users entering alternative features manually and skipping steps i and ii. As noted above, any of the named steps in the pipeline can be selected or skipped for execution in a given run. The final values of the features are saved in a new file called features_train.txt.

(d) Machine Learning Classifier Training:

  1. i.

    Fitting the model hyperparameters by cross-validation

    The pipeline can apply four of the most commonly used deterministic ML models (see [15] for details), using the implementations in sklearn [24].

    • The K-Nearest Neighbors (KNN) classifier

    • The Multi-Layer Perceptron (MLP) classifier

    • The Decision Tree (DT) classifier

    • The Support Vector Machine (SVM) classifier

    Of course, additional models in sklearn and its extensions could be included with relative ease. The pipeline can use two different methods for fitting the hyperparameters via a cross-validation procedure on the training set, as described in [17]:

    • Standard cross-validation: maximizing the prediction accuracy (i.e. the number of times the model picks the optimum variable ordering).

    • Time-based cross-validation: minimizing the CAD runtime (i.e. the time taken to compute CADs with the model’s choices).

    Both methods tune the hyperparameters with cross-validation using the routine RandomizedSearchCV from the sklearn package in Python (the latter an adapted version we wrote). The cross-validation results (i.e. choice of hyperparameters) are saved in a file , where is the date and denotes the time when the file was generated.

  2. ii.

    Fitting the parameters

    The parameters of each model are subsequently fitted using the standard sklearn algorithms for each chosen set of hyperparameters. These are saved in a file called .

II. Predicting the CAD Variable Orderings Using the Testing Dataset

The models in Step I are then evaluated according to their choices of variable orderings for the sets of polynomials in the testing dataset. The steps below are listed without detailed description as they are performed similarly to Step I for the testing dataset.

(a) Polynomial Data Parsing: The values generated are saved in a new file called terms_test.txt.

(b) Feature Generation: The final set of features is computed by evaluating the descriptions in Step I(b)ii for the testing dataset. These values are saved in a new file called features_test.txt.

(c) Predictions Using ML: Predictions on the testing dataset are generated using the model computed in Step I(c). The model is run with the data in Step II(a)ii, and the predictions are stored in a file called .

(d) Predictions Using Human-Made Heuristics: In our prior papers [15,16,17] we compared the performance of the ML models with the human-designed heuristics in [4] and [11]. For details on how these are applied see [15]. Their choices are saved in two files entitled y_brown_test.txt and y_sotd_test.txt, respectively.

(e) Comparative Results: Finally, in order to compare the performance of the proposed pipeline, we must measure the actual CAD runtimes on the testing dataset. The results of the comparison is saved in a file with the template: .

Adapting the Pipeline to Other Algorithms

The pipeline above was developed for choosing the variable ordering for the CAD implementation in Maple’s Regular Chains Library [8, 9]. But it could be used to pick the variable ordering for other procedures which take sets of polynomials as input by changing the calls to CAD in Steps I(a) and II(e) to that of another implementation/algorithm. Step II(d) would also have to be edited to call an appropriate competing heuristic.

4 Application of Pipeline to New Dataset

The pipeline described in the previous section makes it easy for us to repeat our past experiments (described in Sect. 2) for a new dataset. All that is needed to do is replace the files storing the polynomials and run the pipeline.

To demonstrate this we test the proposed pipeline on a new dataset of randomly generated polynomials. We are not suggesting that it is appropriate to test classifiers on random data: we simply mean to demonstrate the ease with which the experiments in [15,16,17] that originally took many man-hours can be repeated with just a single code execution.

The randomly generated parameters are: the degrees of the three variables in each polynomial term, the coefficient of each term, the number of terms in a polynomial and the number of polynomials in a set. The means and standard deviations of these parameters were extracted from the problems in the nlsat datasetFootnote 3, which was used in our previous work [15] so that the dataset is of a comparable scale. We generated 7500 sets of random polynomials, where 5000 were used for training, and the remaining 2500 for testing.

The results of the proposed processing pipeline, including the comparison with the existing human-made heuristics are given in Table 1. The prediction time is the time taken for the classifier or heuristic to make its predictions for the problems in the training set. The total time adds to this the time for the actual CAD computations using the suggested orderings. We do not report the training time of the ML as this is a cost paid only once in advance. The virtual solvers are those which always make the best/worst choice for a problem (in zero prediction time) and are useful to show the range of possible outcomes. We note that further details on our experimental methodology are given in [15,16,17].

As with the tests on the original dataset [15, 16] the ML classifiers outperformed the human made heuristics, but for this dataset the difference compared to the Brown heuristic was marginal. We used a lower CAD timeout which may benefit the Brown heuristic as past analysis shows that when it makes sub-optimal choices these tend to be much worse. We also note that the relative performance of the Brown heuristic fell significantly when used on problems with more than three variables in [17]. The results for the sotd heuristic are bad because it had a particularly long prediction time on this random dataset. We note that there is scope to parallelize sotd which may make it more competitive.

Table 1. The comparative performance of DT, KNN, MLP, SVM, the Brown and sotd heuristics on the testing data for our randomly generated dataset. A random prediction, and the virtual best (VB) and virtual worst (VW) predictions are also included.

5 Conclusions

We presented our software pipeline for training and testing ML classifiers that select the variable ordering to use for CAD, and described the results of an experiment applying it to a new dataset.

The purpose of the experiment in Sect. 4 was to demonstrate that the pipeline can easily train classifiers that are competitive on a new dataset with almost no additional human effort, at least for a dataset of a similar scale (we note that the code is designed to work on higher degree polynomials but has only been tested on datasets of 3 and 4 variables so far). The pipeline makes it possible for a user to easily tune the CAD variable ordering choice classifiers to their particular application area.

Further, with only a little modification, as noted at the end of Sect. 3, the pipeline could be used to select the variable ordering for alternative algorithms that act on sets of polynomials and require a variable ordering. We thus expect the pipeline to be a useful basis for future research and plan to experiment with its use on such alternative algorithms in the near future.