1 Introduction

The robust estimation of camera geometry, one of the most important tasks in computer vision, is usually based on solving so-called minimal problems [26, 27, 43], i.e., problems that are solved from minimal samples of input data, inside a RANSAC framework [13, 18, 45]. Since the camera geometry estimation has to be performed many times inside RANSAC [18], fast solvers to minimal problems are of high importance. Minimal problems often result in polynomial systems with the following characteristics:

  1. 1.

    They are usually non-square, with more polynomials than the number of variables.

  2. 2.

    The coefficients of the constituent polynomials usually are non-generic, i.e. algebraically constrained [15, p. 109].

  3. 3.

    The constituent polynomials have the same structure, i.e. the same monomials, but the coefficient values vary with the input measurements. This property has enabled an offline + online strategy, which moves as many computations as possible from the “online” stage of solving equations to an earlier pre-processing “offline” stage.

Most of the state-of-the-art specific minimal solvers are based on Gröbner bases and the action matrix method [15, 47]. The Gröbner basis method was popularized in computer vision by Stewenius [50]. The first efficient Gröbner basis-based solvers were mostly hand-crafted [48, 49] and sometimes very unstable [51]. However, in the last 15 years much effort has been put into making the process of constructing the solvers more automatic [27, 32, 33] stable [9, 10] and more efficient [5, 31,32,33, 35, 38]. Now powerful tools are available for the automatic generation of efficient solvers based on Gröbner basis [27, 32].

Note that while all these methods are in the computer vision community known as Gröbner basis methods, most of them do not generate solvers that directly compute Gröbner bases. They usually compute polynomials from a border basis [40] or go “beyond” Gröbner and border bases [35]. However, all these methods are based on computing an action matrix.Footnote 1 Therefore, for the sake of simplicity, in this paper, we will interchangeably write “action matrix method” and “Gröbner basis method” to refer to the same class of methods [5, 27, 31,32,33, 35, 38, 47].

The first step in such action matrix methods is to compute a linear basis B of the quotient ring \(A = \mathbb {C}[X]/I\) where I denotes the ideal generated by input polynomial system. An action matrix method based on the Gröbner bases computes the basis of A by defining the division w.r.t. some Gröbner basis of the ideal I. In general, computing the Gröbner basis requires to fix some monomial ordering. Alternatively, a heuristic basis sampling method [35] that goes beyond Gröbner bases and monomial orderings in order to compute B can be used. The basis sampling method is related to the border basis methods [39, 40], which generalize the concept of monomial ordering and even propose a non-monomial basis of A. The second step in these action matrix methods is to compute a linear map \(T_f\), representing the multiplication in A, w.r.t. some polynomial f. The representative matrix \(\texttt{M}_f\) of the map \(T_f\) is what we call the action matrix. Recently, [54, 55] generalized the method used for generating the basis B. The proposed methods are known as normal form methods, developed for general systems of polynomial equations. They are not tailored to problems that appear in computer vision.

While the action matrix method for generating efficient minimal solvers has been thoroughly studied in computer vision and all recently generated action-matrix-based solvers are highly optimized in terms of efficiency and stability, little attention has been paid to an alternative algebraic method for solving systems of polynomial equations, i.e. the resultant method [15, Chapter 3]. The resultant method was manually applied to several computer vision problems [20, 23, 26, 28]. However, in contrast to the action matrix method, there is no method for automatically generating sparse resultant-based minimal solvers. The most relevant works in this direction are the methods using the subdivision of the Minkowski sum of all the Newton polytopes [11, 12, 16] and the iterative method based on the Minkowski sum of a subset of Newton polytopes [21]. However, the subdivision methods are not applicable to polynomial systems with non-generic coefficients, whereas the method in [21] leads (due to linearizations) to larger and less efficient solvers than the action matrix solvers.

Our first objective in this paper is to study the theory of sparse resultants and propose an approach for generating efficient solvers for solving camera geometry problems, by using an extra polynomial with a special form. Our approach, based on the Newton polytopes and their Minkowski sums, is used to generate sparse resultant matrices, followed by a Schur complement computation, and a conversion of the resultant constraint into an eigenvalue problem.

Our approach differs from the subdivision methods [11, 16] in how the Newton polytopes are used for generating minimal solvers. The subdivision methods compute the Minkowski sum of the Newton polytopes of all the polynomials and divide its lattice interior into cells, which are used to construct the sparse resultant matrices. Whereas in this paper, we make this process iterative, computing the Minkowski sum of each subset of Newton polytopes (of the input polynomials) and avoid computing a subdivision. We directly use the lattice interior to generate a candidate sparse resultant matrix. For systems with non-generic coefficients, such an iterative approach has been crucial in generating sparse resultant matrices for the minimal problems in computer vision. The sparse resultant matrix and the eigenvalue problem together represent our minimal solver. The solvers based on our proposed approach have achieved significant improvements over the state-of-the-art sparse resultant-based solvers [11, 16, 21] and achieved comparable or better efficiency and/or accuracy to state-of-the-art Gröbner basis-based solvers [32, 35, 38]. Moreover, our proposed approach has been fully automated [2] and can be incorporated in the existing tools for automatic generation of efficient minimal solvers [27, 32, 35].

There is a similarity between the solvers obtained using our sparse resultant-based approach and the solvers based on the action matrix methods [8, 27, 32, 35, 37, 38, 47]. Therefore, the second objective in this paper is to investigate this similarity. In a step-by-step manner, we demonstrate that for a minimal solver generated based on the action matrix method, we can change the steps performed by the sparse resultant method such that it leads to an equivalent solver. Similarly, we also demonstrate that for a given minimal solver generated based on the sparse resultant method, we can change the steps performed by the action matrix method such that it leads to an equivalent solver. Specifically, our contributions are:

  1. 1.

    A novel approach (Sect. 3), of adding an extra polynomial with a special form for generating a sparse resultant matrix whose sparse resultant constraint can be decomposed (Sect. 3.3) into an eigenvalue problem.

    • A scheme (Sect. 3.2) to iteratively test the Minkowski sum of the Newton polytopes of each subset of polynomials, searching for the most efficient minimal solver, in the presence of algebraically constrained coefficients.

    • Two procedures (Sect. 3.4) to reduce the sparse resultant matrix size, leading to comparable or better solver speeds than those generated by many state-of-the-art Gröbner basis-based methods.

    • A general method for automatic generation of efficient minimal solvers. The automatic generator is publicly available at [2].

  2. 2.

    A study of the constraints (Sect. 4) to be satisfied so that the solvers based on our sparse resultant method as well as the action matrix method are equivalent, i.e. they involve eigenvalue decomposition of exactly the same matrix, and the steps performed for constructing that matrix can be interchanged.

This paper is an extension of our work [3], where we proposed an extra polynomial sparse resultant method for generating efficient minimal solvers.

2 Theoretical Background and Related Work

In this section, we summarize the basic concepts and notation used in the paper. This notation is based on the book by [15], to which we refer the reader for more details.

Our goal is to compute the solutions to a system of m polynomial equations,

$$\begin{aligned} f_1(x_1,\dots ,x_n) = 0,\dots , f_m(x_1,\dots ,x_n)=0, \end{aligned}$$
(1)

in n variables, \(X = \lbrace x_1,\dots ,x_n \rbrace \). Let us denote the set of polynomials from (1) as \(\mathcal {F}=\{f_1,\dots ,f_m\}\). The variables in X can be ordered and formed into a vector. W.l.o.g. let us assume that this vector has the form, \(\textbf{x} = \begin{bmatrix} x_1&\dots&x_n \end{bmatrix}^\top \). Then for a vector \(\mathbf {\alpha } =\begin{bmatrix} \alpha _1&\dots&\alpha _n \end{bmatrix}^\top \in \mathbb {N}^{n}\), a monomial \(\textbf{x}^\mathbf {\alpha }\) is a product \(\textbf{x}^\mathbf {\alpha } = \prod _{i=1}^{n} x_i^{\alpha _i}\). Each polynomial \(f_i \in \mathcal {F}\) is expressed as a linear combination of a finite set of monomials

$$\begin{aligned} f_{i}(\textbf{x}) = \underset{\mathbf {\alpha } \in \mathbb {N}^{n}}{\Sigma } c_{i,\mathbf {\alpha }} \textbf{x}^{\mathbf {\alpha }}. \end{aligned}$$
(2)

We collect all monomials present in \(\mathcal {F}\) and denote the set as \(\text {mon}(\mathcal {F})\). Let \(\mathbb {C}[X]\) denote the set of all polynomials in unknowns X with coefficients in \(\mathbb {C}\). The ideal \(I = \langle f_1,\dots ,f_m \rangle \subset \mathbb {C}[X]\) is the set of all polynomial combinations of generators \(f_1,\dots ,f_m\). The set V of all solutions of the system (1) is called the affine variety. Each polynomial \(f \in I\) vanishes on the solutions of (1). Here we assume that the ideal I generates a zero-dimensional variety, i.e., the system (1) has a finite number of solutions (say r). Using the ideal I, we can define the quotient ring \(A = \mathbb {C}[X]/I\) as the set of equivalence classes in \(\mathbb {C}[X]\), where the equivalence is defined by the relation, \(a \sim b \iff (a - b) \in I\). If I has a zero-dimensional variety, then the quotient ring A is a finite-dimensional vector space over \(\mathbb {C}\) [15]. For an ideal I, there exist special sets of generators called Gröbner bases which have the nice property that the remainder after division is unique. Using a Gröbner basis we can define a linear basis for the quotient ring A.

Non-generic polynomial system In geometric computer vision, each coefficient, say \( c_{i,\mathbf {\alpha }}\) in (2), can be expressed as a parametric form, \( c_{i,\mathbf {\alpha }} = \phi (\mathcal {D})\) of the data measurements \(\mathcal {D}\). One of the important properties of the set of the coefficients of such a polynomial system is that of non-genericity:

Definition 1

The coefficients \(\lbrace c_{i,\mathbf {\alpha }} \rbrace \) of \(\mathcal {F}\) in (2), are defined to be non-generic, if and only if we can obtain a polynomial constraint on them by eliminating the parameters \(\mathcal {D}\).

A rigorous definition of the genericity of a property of a system is given in [15, p. 109].

Vector representation of a set Given a set B, one of the steps in this paper is to express it as a column vector by arranging its elements w.r.t. to a given ordering. Let us denote this operation as

$$\begin{aligned} \text {vec}(B). \end{aligned}$$
(3)

Matrix form In this paper, we will need to express a given set of polynomials \(\mathcal {F}\) via matrix multiplication. To achieve this, we fix an order for the polynomials in \(\mathcal {F}\), and also do the same for the monomials in \(B = \text {mon}(\mathcal {F})\). The matrix representation of \(\mathcal {F}\) has the form

$$\begin{aligned} \texttt{C}([c_{i, \mathbf {\alpha }}]) \ \textbf{b}, \end{aligned}$$
(4)

where \(\textbf{b}\) is a column vector of the monomials in B ordered w.r.t. the given monomial ordering and \(\texttt{C}([c_{i, \mathbf {\alpha }}])\) is the matrix of coefficients \(c_{i, \mathbf {\alpha }}\) of \(\mathcal {F}\). The entries of \( \texttt{C}([c_{i, \mathbf {\alpha }}])\) are row-indexed by the ordered set of polynomials in \(\mathcal {F}\) and column-indexed by the monomials in \(\textbf{b}\).

Monomial multiples of a polynomial set We begin with a polynomial system \(\mathcal {F}\) and the set of monomials in \(\mathcal {F}\), i.e., \(B = \text {mon}(\mathcal {F})\). Let \(B^\prime \supset B\) be another set of monomials.

Then an important step in our proposed method is to extend \(\mathcal {F}\) to the largest possible set, say \(\mathcal {F}^\prime \), such that \(\text {mon}(\mathcal {F}^\prime ) \subseteq B^\prime \). This extension is done by multiplying each \(f_i \in \mathcal {F}\) with monomials. i.e., we are generating polynomials from \(I = \langle f_1,\dots , f_m \rangle \). For this, we compute the set \(T_i\) of all possible monomials \(T_i = \lbrace \textbf{x}^{\mathbf {\alpha }} \rbrace \) for each \(f_i \in \mathcal {F}\), such that \(\text {mon}(\textbf{x}^{\mathbf {\mathbf {\alpha }}} f_i) \subset B^\prime , \ \textbf{x}^{\mathbf {\alpha }} \in T_i \). We will use the following shorthand notation to express such an operation

$$\begin{aligned} \mathcal {F}\overset{B^\prime }{\rightarrow }\ (\mathcal {F}^\prime , T), \end{aligned}$$
(5)

where \(T = \lbrace T_1,\dots ,T_m \rbrace \). Subsequently, in this operation we assume to have removed all monomials in \(B^\prime \) which are not present in the extended set \(\mathcal {F}^\prime \) and denote the modified set as \(B^\prime \) by abuse of notation. In other words, we will assume that \(B^\prime = \text {mon}(\mathcal {F}^\prime )\).

2.1 The Action Matrix Method

One of the well-known SOTA methods for polynomial solving is the Gröbner basis-based action matrix method [15, 47, 52]. It has been recently used to efficiently solve many minimal problems in computer vision [8, 9, 26, 27, 32, 35, 47]. It transforms the problem of finding the solutions to (1), to a problem of eigendecomposition of a special matrix. We list the steps performed by an action matrix method in “Appendix A”, and note here that the algorithm can essentially be distilled in a sequence of three important steps, viz. construct the set \(T_j\) of monomial multiples for each of the input polynomials \(f_j \in \mathcal {F}\), extend \(\mathcal {F}\) via monomial multiplication to \(\mathcal {F}^{\prime }\) and linearize the resulting system as a matrix product, \(\texttt{C} \textbf{b}\). A G–J elimination of \(\texttt{C}\) is then used to extract the required action matrix, whose eigenvalues give us the roots of \(\mathcal {F}\).

2.2 Resultants

Alternative to the action matrix method, we have the method of resultants for polynomial solving. Originally, resultants were used to determine whether a system of \(n+1\) polynomial equations in n unknowns has a common root or not. Let us have a system of polynomial equations as defined in (1), and in (2) and assume that \(m=n+1\).

Definition 2

The resultant \(Res([\lbrace c_{i,\mathbf {\alpha }} \rbrace ])\) of \(\mathcal {F}\) is defined to be an irreducible polynomial in the coefficients \(\lbrace c_{i,\mathbf {\alpha }} \rbrace \) which vanishes only if \(\lbrace c_{i,\mathbf {\alpha }} \rbrace \) are such that \(\mathcal {F}=0\) has non-trivial roots.

A more formal treatment of the theory of resultants can be obtained from [15, Chapter 3].

2.2.1 Polynomial Solving Using Resultants

A step in a sparse resultant method is to expand the given system of polynomials \(\mathcal {F}\) to a set of linearly independent polynomials. This is usually done by adding some monomial multiples of the original polynomials, i.e., using the operation (5). The expanded set of polynomials, say \(\mathcal {F}^\prime \), can be expressed in a matrix form as

$$\begin{aligned} \texttt{C}([c_{i, \mathbf {\alpha }}]) \ \textbf{b}. \end{aligned}$$
(6)

Usually, \(\texttt{C}([c_{i, \mathbf {\alpha }}])\) in (6) is a tall matrix.

The resultant-based method here requires the coefficient matrix \(\texttt{C}([c_{i, \mathbf {\alpha }}])\) to be a square invertible matrix for randomly assigned values to the coefficients, \(c_{i, \mathbf {\alpha }} \in \mathbb {C}_{\ne 0}\) i.e., \(\det \texttt{C}([c_{i, \mathbf {\alpha }}]) \ne 0\). A matrix with such properties is called the resultant matrix and in this paper we will denote it as \(\texttt{M}([c_{i, \mathbf {\alpha }}])\).Footnote 2 If \(\texttt{C}([c_{i, \mathbf {\alpha }}])\) is a square invertible matrix, we can rewrite (6) as

$$\begin{aligned} \texttt{M}([c_{i, \mathbf {\alpha }}]) \ \textbf{b}. \end{aligned}$$
(7)

Now, \(\mathcal {F}= 0\) implies \(\mathcal {F}^\prime = 0\) and it leads to

$$\begin{aligned} \texttt{M}([c_{i, \mathbf {\alpha }}]) \ \textbf{b} = \textbf{0}. \end{aligned}$$
(8)

Thus, the requirement for \(\mathcal {F}=0\) to have common solutions is the following condition on the coefficients of \(\mathcal {F}\),

$$\begin{aligned} \det \texttt{M}([c_{i, \mathbf {\alpha }}]) = 0. \end{aligned}$$
(9)

We call this the resultant constraint. It is a polynomial with \(c_{i, \mathbf {\alpha }}\) as variables. From the definition of a resultant \(Res([c_{i, \mathbf {\alpha }}])\) [15, Theorem 2.3], we have that \(Res([c_{i, \mathbf {\alpha }}])\) is a polynomial with \(c_{i, \mathbf {\alpha }}\) as variables and that it vanishes iff the system of equations \(\mathcal {F}=0\) has common roots. This gives us the necessary condition for the existence of roots of the system \(\mathcal {F}=0\), that \(\det \texttt{M}([c_{i, \mathbf {\alpha }}])\) must vanish if the resultant vanishes, i.e.,

$$\begin{aligned} Res([c_{i, \mathbf {\alpha }}]) = 0 \implies \det \texttt{M}([c_{i, \mathbf {\alpha }}]) = 0. \end{aligned}$$
(10)

This implies that given a polynomial system \(\mathcal {F}\), the resultant constraint (9) is a non-trivial multiple of its resultant \(Res([c_{i, \mathbf {\alpha }}])\).

While resultants are defined for a system of one more polynomial than the number of variables, we can employ them for solving a system of n polynomials in n variables. One way to do this, is to hide one of the variables to the coefficient field (in other words, consider it to be a constant), another way is to add an extra polynomial by introducing a new variable, and then hide this variable to the coefficient field. In both these approaches, we end up with a system where we have one more polynomial than the number of variables.

2.2.2 Hiding a Variable

By hiding one variable, say \(x_{n}\), to the coefficient field, we are left with n polynomials \(\mathcal {F}\) in \(n - 1\) variables. This gives us a way to use the concept of resultants and compute \(Res([c_{i, \mathbf {\alpha }}], x_{n})\) which now becomes a function of \(c_{i, \mathbf {\alpha }}\) and \(x_{n}\). In this case, (7) becomes

$$\begin{aligned} \texttt{M}([c_{i, \mathbf {\alpha }}],x_{n}) \textbf{b}, \end{aligned}$$
(11)

where the symbols have their usual meaning. For simplicity, we will denote \(\texttt{M}([c_{i, \mathbf {\alpha }}],x_{n})\) as \(\texttt{M}(x_{n})\) in the rest of this paper. Its determinant \(\det \texttt{M}(x_{n})\) is a multiple of the resultant \(Res(x_n)\). This is known as a hidden variable resultant and it is a polynomial in \(x_n\) whose roots are the \(x_{n}\)-coordinates of the solutions of the system of polynomial equations. For theoretical details and proofs we refer to [15, Chapter 7]. Such a hidden variable approach has been used in the past to solve various minimal problems [20, 23, 26, 28].

This approach leads to computing the roots of the polynomial, \(\det \texttt{M}(x_{n}) = 0 \). However, computing the determinant of a polynomial matrix \(\det \texttt{M}(x_n)\) and then its roots may be slow and unstable. Therefore, the most common way to solve the original system of polynomial equations is to first transform the following matrix equation

$$\begin{aligned} \texttt{M}(x_{n}) \textbf{b} = \textbf{0}, \end{aligned}$$
(12)

to a polynomial eigenvalue problem (PEP) [14], which is then expressed as,

$$\begin{aligned} (\texttt{M}_{0} + \texttt{M}_{1} \ x_{n}+\cdots + \texttt{M}_{l} \ x^{l}_{n})\textbf{b} = \textbf{0}, \end{aligned}$$
(13)

where l is the degree of the matrix \(\texttt{M}(x_{n})\) in the hidden variable \(x_n\) and \(\texttt{M}_{0},\dots ,\texttt{M}_{l}\) are matrices that depend only on the coefficients \(c_{i, \mathbf {\alpha }}\) of the original system of polynomials. The PEP (13) can be easily converted to a generalized eigenvalue problem (GEP), written as,

$$\begin{aligned} \texttt{A} \textbf{y} = x_{n} \texttt{B} \textbf{y}, \end{aligned}$$
(14)

and solved using standard efficient eigenvalue algorithms [28]. Basically, the eigenvalues give us the solutions to \(x_{n}\) and the rest of the variables can be extracted from the corresponding eigenvectors, \(\textbf{y}\). Such a transformation to a GEP relaxes the original problem of finding the solutions to the input polynomial system since the eigenvectors in general do not satisfy the monomial dependencies induced by the monomial vector \(\textbf{b}\) as well the monomial vector \(\textbf{y}\). Moreover, this relaxation may also introduce extra parasitic (zero) eigenvalues leading to slower polynomial solvers.

2.2.3 Adding an Extra Polynomial

Alternatively, we can add a new polynomial

$$\begin{aligned} f_{n+1} = u_0 + u_1 x_1 + \dots + u_n x_n \end{aligned}$$
(15)

to \(\mathcal {F}\). We thus have an augmented polynomial system.

$$\begin{aligned} \mathcal {F}_a = \mathcal {F}\cup \lbrace f_{m+1} \rbrace . \end{aligned}$$
(16)

We then compute the so-called u-resultant (see [56] and [15, Chapter 3]) by hiding \(u_0,\dots ,u_n\). In general, random values are assigned to \(u_1,\dots ,u_n\) (and \(u_0\) is a new unknown). Just like in the hidden variable method, this method hides the variable \(u_0\) to the coefficient field and generates the resultant matrix \(\texttt{M}(u_0)\). In this case, the extended system of polynomials (7) can be expressed as a matrix product as

$$\begin{aligned} \texttt{M}([c_{i, \mathbf {\alpha }}], u_0) \ \textbf{b}, \end{aligned}$$
(17)

where \(u_0\) is a new unknown variable and \(\textbf{b}\) is a vector of all monomials in this extended system. We will denote \(\texttt{M}([c_{i, \mathbf {\alpha }}], u_0)\) as \(\texttt{M}(u_0)\) in order to simplify the notation. Again, it holds that \(\text {det} \ \texttt{M}(u_0)\) is a multiple of the resultant \(Res(u_0)\). Thus, in order to compute the common zeros of \(\mathcal {F}=0\), we need to solve \(\text {det} \texttt{M}(u_0)=0\), similar to the case of the hidden variable resultant. Instead of computing the determinant of a polynomial matrix, here we also solve it as a GEP described in Sect. 2.2.2. However, as mentioned in the hidden variable method, such a method introduces spurious solutions, arising from the linearization of the original system in (17), i.e., not considering monomial dependencies in \(\textbf{b}\), as well as from transforming PEP to GEP and not considering the monomial dependencies in \(\textbf{y}\) in (14). Some of these spurious solutions (eigenvalues) can be removed by using Schur complement, described in the section below or by applying the method for removing zero eigenvalues, similar to [4, 26].

2.2.4 Schur Complement

One way to remove the spurious solutions introduced by the linearization of a hidden variable resultant matrix \(\texttt{M}(u_0)\) is to use the Schur complement of one its submatrices. Here, we briefly review this method. Let us first consider some partition of the set of monomials B as

$$\begin{aligned} B = B_1 \cup B_2. \end{aligned}$$
(18)

Note that \(\textbf{b} = \text {vec}(B) = \begin{bmatrix} \textbf{b}_1&\textbf{b}_2 \end{bmatrix}^\top \), where \(\textbf{b}_1=\text {vec}(B_1)\) and \(\textbf{b}_2=\text {vec}(B_2)\). The vectorization operation is defined in Eq. (3). This imposes a column partition on \(\texttt{M}(u_0)\) in (17). Moreover, we can order the rows of \(\texttt{M}(u_0)\) such that its upper block is independent of \(u_0\). Together, we obtain the following block partition of \(\texttt{M}(u_0)\):

$$\begin{aligned} \texttt{M}(u_0)= \begin{bmatrix} \texttt{M}_{11} &{} \texttt{M}_{12} \\ \texttt{M}_{21}(u_0) &{} \texttt{M}_{22}(u_0) \end{bmatrix}. \end{aligned}$$
(19)

Here, the upper block \(\begin{bmatrix} \texttt{M}_{11}&\texttt{M}_{12} \end{bmatrix}\) is independent of \(u_0\). Thus we can write (17) as

$$\begin{aligned} \begin{bmatrix} \texttt{M}_{11} &{} \texttt{M}_{12} \\ \texttt{M}_{21}(u_0) &{} \texttt{M}_{22}(u_0) \end{bmatrix} \ \begin{bmatrix} \textbf{b}_1 \\ \textbf{b}_2 \end{bmatrix}. \end{aligned}$$
(20)

The requirement for existence of solutions of \(\mathcal {F}= 0\) is that the vector in (20) should vanish. We thus obtain the following two vector equations

$$\begin{aligned}{} & {} \texttt{M}_{11} \textbf{b}_{1} + \texttt{M}_{12} \textbf{b}_{2} = \textbf{0} \end{aligned}$$
(21)
$$\begin{aligned}{} & {} \texttt{M}_{21}(u_0) \textbf{b}_{1} + \texttt{M}_{22}(u_0) \textbf{b}_{2} = \textbf{0}. \end{aligned}$$
(22)

If we were able to partition \(\texttt{M}(u_0)\) such that \(\texttt{M}_{12}\) is a square invertible matrix, we can eliminate \(\textbf{b}_2\) from these two equations, to obtain

$$\begin{aligned} (\underbrace{\texttt{M}_{21}(u_0) - \texttt{M}_{22}(u_0) \texttt{M}^{-1}_{12} \texttt{M}_{11}}_{\texttt{X}(u_0)}) \textbf{b}_{1} = \textbf{0}. \end{aligned}$$
(23)

The matrix \(\texttt{X}(u_0)\) is the Schur complement of the block \(\texttt{M}_{12}\) of \(\texttt{M}(u_0)\), which has the following property:

$$\begin{aligned} \text {det} (\texttt{M}(u_0)) = \text {det}(\texttt{M}_{12}) \ \text {det}(\texttt{X}), \end{aligned}$$
(24)

where, \(\text {det}(\texttt{M}_{12}) \ne 0\) by our assumption. Therefore, for a generic value of \(u_0\), \(\text {det}(\texttt{M}(u_0)) \ne 0 \Leftrightarrow \text {det}(\texttt{X}(u_0)) \ne 0\), which means that \(\texttt{X}(u_0)\) is a square invertible matrix and its determinant is also a multiple of the resultant. However, \(\texttt{X}(u_0)\) corresponds to a smaller eigenvalue problem as compared to the sparse resultant matrix \(\texttt{M}(u_0)\).

Both the resultant-based methods, described in Sects. 2.2.2 and 2.2.3, are proposed for square generic systems [16] via mixed subdivision of polytopes. However, the polynomial systems studied here are usually sparse, overdetermined and consist of polynomials with non-generic coefficients (see Definition 1). The sparsity of the systems can be exploited to obtain more compact resultants using specialized algorithms. Such resultants are commonly referred to as the sparse resultants.

Fig. 1
figure 1

An example of the Newton polytopes of two polynomials as well as their Minkowski sum: a \(P_1 = \text {NP}(f_1)\) b \(P_2 = \text {NP}(f_2)\) and c \(Q = P_1 + P_2\)

2.2.5 Polyhedral Geometry

Sparse resultants are studied via the theory of polytopes [15, Chapter 7]. Therefore, we define the important terms and notations related to polytopes, which we will later use in the text.

The Newton polytope \(\text {NP}(f)\), of a polynomial f is defined as a convex hull of the exponent vectors of all the monomials occurring in the polynomial (also known as the support of the polynomial). Therefore, we have \(\text {NP}(f_{i}) = \text {Conv}(A_{i})\) where \(A_{i}\) is the set of all integer vectors that are exponents of monomials with nonzero coefficients in \(f_{i}\). A Minkowski sum of any two convex polytopes \(P_1, P_2\) is defined as \(P_1+P_2 = \lbrace p_1 + p_2 \ | \ \forall p_1 \in P_1, \forall p_2 \in P_2 \rbrace \). We demonstrate the concept of a Newton polytope using a simple example.

Example 1

Let us consider a system of two polynomials \(\mathcal {F}= \lbrace f_1(\textbf{x}), f_2(\textbf{x}) \rbrace \), in two variables \(X = \lbrace x_1, x_2 \rbrace \)

$$\begin{aligned} f_1= & {} {c_{1,1}} {x_1}^{3} {x_{2}}^{3} + {c_{1,2}} {x_{1}}^{2} {x_{2}}^{3} + {c_{1,3}} {x_{1}}^{3} {x_{2}}^{2} \nonumber \\{} & {} +{c_{1,4}} {x_{1}}^{2} {x_{2}}^{2} + {c_{1,5}} {x_{2}}^{3} +{c_{1,6}} {x_{1}}^{2} {x_{2}} \nonumber \\{} & {} + {c_{1,7}} {x_{2}}^{2} + {c_{1,8}} {x_{1}} {x_{2}} +{c_{1,9}} {x_{1}}^{2} +{c_{1,10}} {x_{2}} \end{aligned}$$
(25)
$$\begin{aligned} f_2= & {} {c_{2,1}} {x_{1}}^{2} +{c_{2,2}} {x_{2}} +{c_{2,3}} {x_{1}} + {c_{2,4}}. \end{aligned}$$
(26)

Here, \(\lbrace c_{1,1},\dots , c_{1,10},c_{2,1},\dots ,c_{2,4} \rbrace \subset \mathbb {C} \) is the set of coefficients of \(\mathcal {F}\). The supports of \(\mathcal {F}\) are

$$\begin{aligned} A_{1}= & {} \lbrace [3,3],[2,3],[3,2], [2,2], [0,3], \nonumber \\{} & {} [2,1], [0,2], [1,1], [2,0], [0,1] \rbrace \subset \mathbb {Z}^2 \end{aligned}$$
(27)
$$\begin{aligned} A_{2}= & {} \lbrace [2,0], [0,1], [1,0], [0,0] \rbrace \subset \mathbb {Z}^2. \end{aligned}$$
(28)

The Newton polytopes \(P_1 = \text {NP}(f_1)\), \(P_2 = \text {NP}(f_2)\) as well as the Minkowski sum \(Q = P_1 + P_2\) are depicted in Fig. 1.

2.2.6 Sparse Resultants

Consider the SOTA methods in [11, 16, 17], for computing the sparse resultant for a well-determined generic system \(\mathcal {F}\). The main idea in all such methods is to compute the Minkowski sum of the Newton polytopes of all the polynomials, \(f_i \in \mathcal {F}\), and divide its interior into cells. Each cell determines a multiple of one of the input polynomials, \(f_i \in \mathcal {F}\) with a monomial. All such monomial multiples collectively lead to an extended polynomial system, \(\mathcal {F}^\prime \). The extended polynomial system, via linearization into a matrix product, then leads to a sparse resultant matrix \(\texttt{M}(x_n)\), if using the hidden-variable technique in Sect. 2.2.2, or \(\texttt{M}(u_0)\), if using the u-resultant of the general form in Sect. 2.2.3. As such the resulting solvers are usually quite large and not very efficient.

However, if \(\mathcal {F}\) has non-generic coefficients (see Definition 1), these methods may fail, as the algebraic constraints on the coefficients lead to algebraic constraints on the elements of the sparse resultant matrices \(\texttt{M}(x_n)\) or \(\texttt{M}(u_0)\), and hence, they may not satisfy the necessary rank conditions. For such systems, [21] recently proposed an iterative approach based on the hidden variable resultant, to test and extract \(\texttt{M}(x_n)\) in (7). Thereafter, it transforms (7) to a GEP (14) and solves for eigenvalues and eigenvectors to compute the roots. Our proposed approach here, extends this iterative scheme for generating a sparse resultant matrix \(\texttt{M}(u_0)\), specifically for the extra polynomial approach in Sect. 2.2.3.

3 Proposed Extra Polynomial Resultant Method

In this paper, we propose an iterative approach based on [21] and apply it to a modified version of the u-resultant method described in Sect. 2.2.3. Specifically, we propose the following modifications to the u-resultant method.

  1. 1.

    Instead of the general form (15) of the extra polynomial in the u-resultant-based method, in this paper, we propose to use the following special form of the extra polynomial

    $$\begin{aligned} f_{m+1} = x_{k} - u_0, \end{aligned}$$
    (29)

    where \(u_0\) is a new variable and \(x_k \in X\) is one of the input variables. In general, we can select any \(x_k \in X\). However, since in practice, selecting different \(x_k\)’s leads to solvers of different sizes and with different numerical stability, we test all \(x_k \in X\) when generating the final solver.

  2. 2.

    For non-generic and overdetermined polynomial systems \(\mathcal {F}\), we avoid computing the Minkowski sum of all the Newton polytopes \(NP(f), \forall f \in \mathcal {F}_a\), as proposed in the methods in [11, 16, 17]. Here, \(\mathcal {F}_a\) denotes the augmented polynomial system (see Eq. (16)). Instead, we iterate through each subset, \(\mathcal {F}_{\text {sub}} \subset \mathcal {F}_a\), and compute the Minkowski sum of \(NP(f), \forall f \in \mathcal {F}_{\text {sub}}\). Instead of dividing the Minkowski sum into cells, we simply use its lattice interior to determine the monomials B in the extended polynomial system \(\mathcal {F}^\prime \), i.e. \(B=\text {mon}(\mathcal {F}^\prime )\).

  3. 3.

    We exploit the form of the extra polynomial (29) and propose a block partition of \(\texttt{M}(u_0)\) (17), which facilitates its decomposition using Schur complement directly into a regular eigenvalue problem. This regular eigenvalue problem, compared to GEP that arises for general u-resultant polynomial (15), leads to fewer spurious eigenvalues and hence a faster solver.

Note that our method of iteratively testing various polynomial subsets has been quite effective in generating efficient solvers, as demonstrated on many minimal problems (see Sect. 6). The generated solvers are comparable in efficiency and speed with those generated by the SOTA action matrix-based methods [32, 35, 38].

3.1 Method Outline

In the following, we first go through the important steps performed by our method.

  1. 1.

    Let us consider a system of m polynomial equations, \(\mathcal {F}=0\) (1), in n variables, \(X = \lbrace x_1,\dots ,x_n \rbrace \). For all variables \(x_k \in X\), we perform the steps \(2-5\) in the offline phase.

  2. 2.

    [Offline] We fix the form of the extra polynomial to \(f_{m+1} = x_k - u_0\) (29) and augment \(\mathcal {F}\) with \(f_{m+1}\), to obtain the augmented system (see Eq. (16)). We hide the new variable \(u_0\) to the coefficient field which means that, \(\mathcal {F}_a \subset \mathbb {C}[X \cup \lbrace u_0 \rbrace ]\).

  3. 3.

    [Offline] We execute steps 3(a)-3(c), for each subset of polynomials \(\mathcal {F}_{\text {sub}} \subset \mathcal {F}_a\) and for every variable \(x_k \in X\).

    1. (a)

      From the set \(\mathcal {F}_{\text {sub}}\), we attempt to construct a favourable monomial set B, using a polytope-based method, described in Sect. 3.2.

    2. (b)

      We extend the polynomial system, \(\mathcal {F}_a\), using the computed monomial set B, represented as

      $$\begin{aligned} \mathcal {F}_a \overset{B}{\rightarrow }\ (\mathcal {F}_a^\prime , T), \end{aligned}$$
      (30)

      where \(T=\lbrace T_1,\dots ,T_{m+1} \rbrace \) (5).

    3. (c)

      The set of equations \(\mathcal {F}_a^\prime = 0\), can be expressed in a form of a matrix equation as \(\texttt{C}(u_0) \textbf{b} = \textbf{0}\), where \(\textbf{b} = \text {vec}(B) \), where the vectorization operation is defined in Eq. (3).

    The output of this step is described in detail in Sect. 3.2, is all favourable monomial sets B and the corresponding coefficient matrices \(\texttt{C}(u_0)\).

  4. 4.

    [Offline] For each favourable monomial set B and the corresponding coefficient matrix \(\texttt{C}(u_0)\) we perform the following:

    1. (a)

      We partition \(B=B_1 \cup B_2\) in two different ways,

      $$\begin{aligned} B_{1} ={} & {} B \cap T_{m+1} \text { or} \end{aligned}$$
      (31)
      $$\begin{aligned} B_{1} ={} & {} \lbrace \textbf{x}^{\mathbf {\alpha }} \in B \mid \dfrac{\textbf{x}^{\mathbf {\alpha }}}{x_k} \in T_{m+1} \rbrace . \end{aligned}$$
      (32)

      Note that \(B_2 = B \setminus B_1\).

    2. (b)

      Based on the two set-partitions of B, we attempt to partition the matrix \(\texttt{C}(u_0)\) in the following two ways.

      $$\begin{aligned}{} & {} \texttt{C}(u_0) \textbf{b}\! = \! \begin{bmatrix} \texttt{A}_{11} &{} \texttt{A}_{12} \\ \texttt{A}_{21}-u_0 \texttt{I} &{} \texttt{A}_{22} \end{bmatrix}\! \begin{bmatrix} \textbf{b}_1 \\ \textbf{b}_2 \end{bmatrix}\! = \! \textbf{0}, \ \ \end{aligned}$$
      (33)
      $$\begin{aligned} \text {or} \ \nonumber \\{} & {} \texttt{C}(u_0) \textbf{b} \! = \! \begin{bmatrix} \texttt{A}_{11} &{} \texttt{A}_{12} \\ \texttt{I} + u_0 \texttt{B}_{21} &{} u_0 \texttt{B}_{22} \end{bmatrix}\! \begin{bmatrix} \textbf{b}_1 \\ \textbf{b}_2 \end{bmatrix}\! = \! \textbf{0}, \ \ \end{aligned}$$
      (34)

      and the matrix \(\texttt{A}_{12}\) has the full column rank.

    The output of this step is the coefficient matrix \(\texttt{C}(u_0)\), for which a partitioning as (33) or (34), with the full column rank matrix \(\texttt{A}_{12}\) is possible, and which corresponds to smallest set \(B_1\). If we have more than one such choice of \(\texttt{C}(u_0)\), we select the smallest matrix \(\texttt{C}(u_0)\). This step is described in more detail in Sect. 3.3.

  5. 5.

    [Offline] In this step, we aim to reduce the size of the matrix \(\texttt{C}(u_0)\), selected in the previous step.

    1. (a)

      We first try to remove a combination of rows and columns from \(\texttt{C}(u_0)\) and the corresponding monomials from the favourable monomial set B, such that the resulting monomial set is still a favourable monomial set (Sect. 3.2) and that the coefficient matrix \(\texttt{C}(u_0)\) can still be partitioned as in (33) or in (34).

    2. (b)

      We next remove the extra rows from \(\texttt{C}(u_0)\) to obtain a sparse resultant matrix \(\texttt{M}(u_0)\) (7), while still respecting its block partition, as in (33) or (34).

    This step is described in Sect. 3.4.

  6. 6.

    [Online] In the final online solver, we fill the precomputed sparse resultant matrix \(\texttt{M}(u_0)\) with the coefficients coming from the input data/measurements. Then we compute the Schur complement of a block of \(\texttt{M}(u_0)\), as described in 2.2.4, which is then formulated as an eigenvalue problem of the matrix \(\texttt{X}\) (50) (or (52)). Finally, the eigendecomposition of the matrix \(\texttt{X}\) gives us the solutions to the input system of polynomial equations \(\mathcal {F}\), i.e. to the given instance of the minimal problem.

3.2 Computing a Favourable Monomial Set

Let the extra polynomial \(f_{m+1}\) have the form \(f_{m+1} = x_k - u_0\) (29) for \(x_k \in X\), and let us assume a subset \(\mathcal {F}_{\text {sub}} \subset \mathcal {F}_a\) of the augmented system \(\mathcal {F}_a = \mathcal {F}\cup \lbrace f_{m+1} \rbrace \) (16).

In the step 3(a) of our algorithm (Sect. 3.1), we attempt to generate so-called favourable monomial sets. Our method for constructing these sets is based on the polytope-based algorithm proposed in [21], where such sets were generated for the original system \(\mathcal {F}\) and its subsets. Here we describe our method that constructs such favourable sets for subsets \(\mathcal {F}_{\text {sub}}\) of the augmented system \(\mathcal {F}_a\) (16). In this subsection, we refer to the basic terminology and notations described in Sect. 2.2.5.

In our method, we begin by computing the support \(A_{j}=\text {supp}(f_{j})\) and the Newton polytope \(\text {NP}(f_{j}) = \text {conv}(A_{j})\) for each polynomial \(f_{j} \in \mathcal {F}_a\). We also compute the unit simplex \(\text {NP}_{0} \subset \mathbb {Z}^{n}\) and the Minkowski sum, \(Q = \text {NP}_{0} + \Sigma _{f \in \mathcal {F}_{\text {sub}}} \text {NP}(f)\). Let us construct a small displacement vector \(\mathbf {\delta } \in \mathbb {R}^n\), such that each of its elements is assigned one of the three values, \(\lbrace -10^{-1}, 0, 10^{-1} \rbrace \). In other words,

$$\begin{aligned} \mathbf {\delta } \in \lbrace \begin{bmatrix} \delta _{1}&\dots&\delta _{n} \end{bmatrix} \mid \delta _{i} \in \lbrace -10^{-1}, 0, 10^{-1} \rbrace ; i = 1,\dots ,n \rbrace . \end{aligned}$$
(35)

Then using this value of \(\mathbf {\delta }\), we compute the set of integer points inside \(Q+\mathbf {\delta }\), from which we compute a set of monomials, \(B = \lbrace \textbf{x}^{\mathbf {\alpha }} \mid \mathbf {\alpha } \in \mathbb {Z}^n \cap (Q+\mathbf {\delta }) \rbrace \), i.e. a potential candidate for a favourable monomial set. We demonstrate this step on a system of two polynomials in Example 2.

Example 2

Let us continue with Example 1, where we computed the Newton polytopes \(P_1 = \text {NP}(f_1)\) and \(P_2 = \text {NP}(f_2)\) as well as their Minkowski sum \(Q = P_1 + P_2\). Let us assume the displacement vector to be \(\mathbf {\delta } = [-0.1,-0.1]\). In Fig. 2, we depict the integer points in the interior of \(Q+\mathbf {\delta }\), i.e. the points in the set \(\mathbb {Z}^2 \cap (Q+\mathbf {\delta })\). This set of points give us a favourable set of monomials B

$$\begin{aligned} B= & {} \lbrace x_2, x_2^{2}, x_2^{3}, x_1^{2}, x_1^{3}, x_1 x_2, x_1 \,x_2^{2}, \nonumber \\{} & {} x_1 \,x_2^{3}, x_1^{2} x_2, x_1^{2} x_2^{2}, x_1^{2} x_2^{3}, x_1^{3} x_2, \nonumber \\{} & {} x_1^{3} x_2^{2}, x_1^{3} x_2^{3}, x_1^{4} x_2, x_1^{4} x_2^{2}, x_1^{4} x_2^{3} \rbrace . \end{aligned}$$
(36)

Selecting the displacement vector \(\delta \)

In Eq. (35), our choice of the displacement \(10^{-1}\) in each direction is not completely random, although to the best of our knowledge there is no theory which gives us a \(\delta \) leading to the smallest solver. All we know is that the displacement in each direction should be in the range \((-1.0, 1.0)\), and in this paper we have chosen a reasonably small value, i.e. \(10^{-1}\). A different choice will lead to a different favourable monomial set. To this end, we have observed, that as the number of vertices of the Minkowski sum Q increases the number of different solvers we can generate by choosing different displacements will also increase.

Fig. 2
figure 2

The integer points (shown in blue) in the interior of the Minkowski sum of the two Newton polytopes, \(Q = P_{1} + P_{2}\) after shifting it \(\mathbf {\delta } = [-0.1, -0.1]\) (Color figure online)

In the next step (see step 3(b) in Sect. 3.1), we extend the input polynomial system, \(\mathcal {F}_a\), using the set of monomials B as

$$\begin{aligned} \mathcal {F}_a \overset{B}{\rightarrow }\ (\mathcal {F}_a^{\prime }, T), \end{aligned}$$
(37)

where \(T=\lbrace T_1,\dots ,T_{m+1} \rbrace \) (see (5)). Each \(T_i\) denotes the set of monomials to be multiplied with the polynomial \(f_i \in \mathcal {F}\). The extended set of polynomials \(\mathcal {F}_a^{\prime }\) can be written in a matrix form as

$$\begin{aligned} \texttt{C}([c_{i,\alpha }],u_0) \ \textbf{b}, \end{aligned}$$
(38)

where \(\textbf{b}\) is a vector form of B w.r.t. some ordering of monomials. In the rest of the paper, we will denote the coefficient matrix \(\texttt{C}([c_{i,\alpha }], u_0)\) as \(\texttt{C}(u_0)\) for the sake of simplicity.

Our approach then evaluates the following three conditions:

$$\begin{aligned}{} & {} \Sigma _{j=1}^{m+1}|T_{j}| \ge |B|, \end{aligned}$$
(39)
$$\begin{aligned}{} & {} \min \limits _{j}|T_{j}| > 0, \end{aligned}$$
(40)
$$\begin{aligned}{} & {} \text {column rank of } \texttt{C}(u_0) = |B|, \end{aligned}$$
(41)

for a random value of \(u_0 \in \mathbb {C}\). The first condition (39) ensures that we have at least as many rows as the columns in the matrix \(\texttt{C}(u_0)\). The second condition (40) ensures that there is at least one row corresponding to every polynomial \(f_i \in \mathcal {F}_a\). The third condition (41) ensures that \(\texttt{C}(u_0)\) has full column rank and that we can extract a sparse resultant matrix from it. If these conditions are satisfied, we consider the set of monomials B to be a favourable set of monomials.

Note that we repeat this procedure for all variables \(x_k \in X\), for each subset \(\mathcal {F}_{\text {sub}} \subset \mathcal {F}_a\), and for each value of the displacement vector \(\mathbf {\delta }\) (35). The output of this step are all generated favourable monomial sets B and the corresponding coefficient matrices \(\texttt{C}(u_0)\).

3.3 Block Partition of \(\texttt{C}(u_0)\)

In the step 4 of our method (Sec 3.1), we iterate through all favourable monomial sets B and the coefficient matrices \(\texttt{C}(u_0)\), which was the output of the previous step.

For each favourable monomial set B, we know the corresponding sets of monomial multiples \(T_i\) for each polynomial \(f_i \in \mathcal {F}_a\) and the set \(T = \lbrace T_{1},\dots ,T_{m},T_{m+1} \rbrace \). We have also computed the corresponding coefficient matrix \(\texttt{C}(u_0)\). Assume the extra polynomial (29) to be fixed as \(f_{m+1} = x_k - u_0\), for \(x_k \in X\), while computing this favourable monomial set.

Since B was constructed such that it satisfies (41), \(\texttt{C}(u_0)\) has full column rank. Let us assume, \(\varepsilon = \mid B \mid \) and \(p = \Sigma _{j=1}^{m+1}|T_{j}|\). As \(\texttt{C}(u_0)\) satisfies the condition (39), we have \(p \ge \varepsilon \). Therefore, we can remove the extra \(p-\varepsilon \) rows from \(\texttt{C}(u_0)\), leading to a square invertible matrix. The algorithm of row removal approach is described in B.3. The square matrix so obtained, is a sparse resultant matrix \(\texttt{M}(u_0)\) and it satisfies the resultant matrix constraint (8), if the set of equations \(\mathcal {F}_a = 0\) has solutions.

Instead of directly solving the resultant constraint (9) or converting (8) to GEP, we exploit the structure of the extra polynomial \(f_{m+1}\) (29) and propose a special ordering of the rows and columns of \(\texttt{M}(u_0)\). This facilitates a block partition of \(\texttt{M}(u_0)\), such that the Schur complement of one of its block submatrices then helps us to reduce its matrix constraint (8) to a regular eigenvalue problem.Footnote 3

Rather than block partitioning the sparse resultant matrix \(\texttt{M}(u_0)\), we first fix a block partition of the coefficient matrix \(\texttt{C}(u_0)\) and then remove its rows, while respecting this partition, to obtain the sparse resultant matrix \(\texttt{M}(u_0)\). Note that we obtain two different block partitions on \(\textbf{b}\) and \(\texttt{C}(u_0)\), in (38), based on our selected set-partition of the favourable monomial set \(B = B_{1} \cup B_{2}\):

$$\begin{aligned}{} & {} B_{1} = B \cap T_{m+1} \text {or} \end{aligned}$$
(42)
$$\begin{aligned}{} & {} B_{1} = \left\{ \textbf{x}^{\mathbf {\alpha }} \in B \mid \dfrac{\textbf{x}^{\mathbf {\alpha }}}{x_k} \in T_{m+1} \right\} \end{aligned}$$
(43)

Note that \(B_{2} = B \setminus B_{1}\). We consider both these partitions in Proposition 1. Thereafter, we will describe our method of removing the rows from \(\texttt{C}(u_0)\) such that we can compute a Schur complement of a block of \(\texttt{M}(u_0)\).

Proposition 1

Consider a block partition of \(\texttt{C}(u_0)\) and \(\textbf{b}\) in (38) as

$$\begin{aligned} \texttt{C}(u_0)\ \textbf{b} = \begin{bmatrix} \texttt{C}_{11} &{} \texttt{C}_{12} \\ \texttt{C}_{21} &{} \texttt{C}_{22} \end{bmatrix} \begin{bmatrix} \textbf{b}_{1} \\ \textbf{b}_{2} \end{bmatrix}. \end{aligned}$$
(44)

We can achieve this in two different ways, based on our choice the set-partition of B in (42) or (43). In either case, if \(\texttt{C}_{12}\) has full column rank, then the resultant matrix constraint (8) can be converted to an eigenvalue problem, of the form \(\texttt{X} \ \textbf{b}_{1} = u_0\textbf{b}_{1}\) if we set-partition B as in (42), and of the form \(\texttt{X} \ \textbf{b}_{1} = -\dfrac{1}{u_0} \textbf{b}_{1}\) if we set-partition B as in (43). The matrix \(\texttt{X}\) is square and its entries are the functions of the coefficients of \(\mathcal {F}\).

Proof

Let us first consider the set-partition of B as in (42). Then, the special form of the extra polynomial (29) implies that

$$\begin{aligned} B_{1} \! =\! \lbrace \textbf{x}^{\mathbf {\alpha }} \in B | \ x_k \textbf{x}^{\mathbf {\alpha }} \in B \rbrace . \end{aligned}$$
(45)

We next order the monomials in B such that \(\textbf{b}\) can be partitioned as \(\textbf{b}\! = \begin{bmatrix} \text {vec}(B_{1})\; \text {vec}(B_{2}) \end{bmatrix}^{T}\! =\! \begin{bmatrix} \textbf{b}_1\; \textbf{b}_2 \end{bmatrix}^{T}\). This induces a column partition of \(\texttt{C}(u_0)\). Moreover, we can row partition \(\texttt{C}(u_0)\) by indexing the rows in its lower block by monomial multiples of \(f_{m+1}\). All such multiples are linear in \(u_0\), i.e., these multiples have the form \(\textbf{x}^{\mathbf {\alpha _j}} (x_{k} - u_0), \forall \textbf{x}^{\mathbf {\alpha _j}}\in T_{m+1}\). The upper block is row-indexed by monomial multiples of \(f_{1},\dots ,f_{m}\). Such a row and column partition of \(\texttt{C}(u_0)\) gives us a block partition as in (44). As \(\begin{bmatrix} \texttt{C}_{11}\!&\! \texttt{C}_{12} \end{bmatrix}\) contains polynomials independent of \(u_0\) and \(\begin{bmatrix} \texttt{C}_{21}\!&\! \texttt{C}_{22}\! \end{bmatrix}\) contains polynomials of the form \(\textbf{x}^{\mathbf {\alpha _j}} ( x_{k} - u_0)\), we obtain

$$\begin{aligned}{} & {} \texttt{C}_{11} = \texttt{A}_{11}, \; \; \; \texttt{C}_{12} = \texttt{A}_{12} \nonumber \\{} & {} \quad \texttt{C}_{21} = \texttt{A}_{21} + u_0 \texttt{B}_{21}, \; \; \; \texttt{C}_{22} = \texttt{A}_{22} + u_0 \texttt{B}_{22}, \end{aligned}$$
(46)

where \(\texttt{A}_{11}, \texttt{A}_{12}, \texttt{A}_{21}\), \(\texttt{A}_{22}\), \(\texttt{B}_{21}\) and \(\texttt{B}_{22}\) are matrices dependent only on the coefficients of input polynomials in \(\mathcal {F}_a\) (16). Based on our assumption in the statement of this proposition, \(\texttt{C}_{12} \) and hence \(\texttt{A}_{12}\), have full column rank. Substituting (46) in (44) gives

$$\begin{aligned} \texttt{C}(u_0) = \underbrace{\begin{bmatrix} \texttt{A}_{11} &{} \texttt{A}_{12} \\ \texttt{A}_{21} &{} \texttt{A}_{22} \end{bmatrix}}_{\texttt{C}_0} + u_0 \underbrace{\begin{bmatrix} \texttt{0} &{} \texttt{0} \\ \mathtt {B_{21}} &{} \mathtt {B_{22}} \end{bmatrix}}_{\texttt{C}_1}&\end{aligned}$$
(47)

We can order monomials so that \(\textbf{T}_{m+1} = \textbf{b}_1\). Now, the block partition of \(\texttt{C}(u_0)\) implies that \(\texttt{C}_{21}\) is column-indexed by \(\textbf{b}_1\) and row-indexed by \(\textbf{T}_{m+1}\). As \(\begin{bmatrix} \texttt{C}_{21}\!&\! \texttt{C}_{22} \end{bmatrix}\) has rows of form \(\textbf{x}^{\mathbf {\alpha _j}}(x_{k}\! -\! u_0)\), \(\textbf{x}^{\mathbf {\alpha _j}}\! \in \! T_{m+1}\! \implies \! \textbf{x}^{\mathbf {\alpha _j}}\! \in \! B_{1}\). This gives us \(\texttt{B}_{21} = \mathtt {-I}\), where \(\texttt{I}\) is an identity matrix of size \(\mid B_{1} \mid \times \mid B_{1} \mid \) and \(\texttt{B}_{22}\) is a zero matrix of size \(\mid B_{1}\mid \times \mid B_{2} \mid \). This also means that \(\texttt{A}_{21}\) is a square matrix of the same size as \(\texttt{B}_{21}\). Thus, we have

$$\begin{aligned} \texttt{C}(u_0) = \begin{bmatrix} \texttt{A}_{11} &{} \texttt{A}_{12} \\ \texttt{A}_{21} &{} \texttt{A}_{22} \\ \end{bmatrix} + u_0 \begin{bmatrix} \texttt{0} &{} \texttt{0} \\ - \texttt{I} &{} \texttt{0} \end{bmatrix}, \end{aligned}$$
(48)

where \(\texttt{C}(u_0)\) is a \(p \times \varepsilon \) matrix. If \(\texttt{C}(u_0)\) is a tall matrix, so is \(\texttt{A}_{12}\). Therefore, we can eliminate extra rows from the upper block \(\begin{bmatrix} \texttt{A}_{11}&\texttt{A}_{12} \end{bmatrix}\) so that we obtain a square invertible matrix \(\hat{\texttt{A}}_{12}\) while preserving the above-mentioned structure. Such a row-removal will also give us the sparse resultant matrix \(\texttt{M}(u_0)\) which satisfies the resultant matrix constraint (8). We will describe our method for removing the extra rows in Sect. 3.4. Let us now here, use the Schur complement technique 2.2.4, to reduce the size of the eigenvalue problem. From (48), we have

$$\begin{aligned}{} & {} \underbrace{\begin{bmatrix} \hat{\texttt{A}}_{11} &{} \hat{\texttt{A}}_{12} \\ \texttt{A}_{21} &{} \texttt{A}_{22} \end{bmatrix}}_{\hat{\texttt{C}}_{0}} \begin{bmatrix} \textbf{b}_1 \\ \textbf{b}_2 \end{bmatrix} + u_0 \begin{bmatrix} \texttt{0} &{} \texttt{0} \\ - \texttt{I} &{} \texttt{0} \end{bmatrix} \begin{bmatrix} \textbf{b}_1 \\ \textbf{b}_2 \end{bmatrix} = \textbf{0} \nonumber \\{} & {} \quad \implies \hat{\texttt{A}}_{11} \textbf{b}_1 + \hat{\texttt{A}}_{12} \textbf{b}_2 = \textbf{0}, \nonumber \\{} & {} \quad \texttt{A}_{21} \textbf{b}_1 + \texttt{A}_{22} \textbf{b}_2 - u_0 \textbf{b}_1 = \textbf{0}. \end{aligned}$$
(49)

Eliminating \(\textbf{b}_2\) from the above pair of equations, we obtain

$$\begin{aligned} \underbrace{(\texttt{A}_{21} - \texttt{A}_{22} \hat{\texttt{A}}_{12}^{-1} \hat{\texttt{A}}_{11})}_{\texttt{X}} \textbf{b}_1 = u_0 \textbf{b}_1. \end{aligned}$$
(50)

The matrix \(\texttt{X}\) is the Schur complement of \(\hat{\texttt{A}}_{12}\) w.r.t. \(\hat{\texttt{C}}_{0}\).

If we select the alternative set-partition of B, as in (43), we obtain a different block partition of \(\textbf{b}\) and \(\texttt{C}(u_0)\). Specifically, in (47), we have \(\texttt{A}_{21}\! =\! \texttt{I}\) and \(\texttt{A}_{22}\! =\! \texttt{0}\). By assumption, we have \(\texttt{A}_{12}\) has full column rank. Therefore, we can remove the extra rows from \(\texttt{C}(u_0)\) in (48) and obtain

$$\begin{aligned} \texttt{M}(u_0)= \begin{bmatrix} \hat{\texttt{A}}_{11} &{} \hat{\texttt{A}}_{12} \\ \texttt{I} &{} \texttt{0} \\ \end{bmatrix} + u_0 \begin{bmatrix} \texttt{0} &{} \texttt{0} \\ \texttt{B}_{21} &{} \texttt{B}_{22} \end{bmatrix}. \end{aligned}$$
(51)

Substituting the value of \(\texttt{M}(u_0)\), in (8), we get \(\hat{\texttt{A}}_{11} \textbf{b}_1 + \hat{\texttt{A}}_{12} \textbf{b}_2 = \textbf{0}\) and \(u_0(\texttt{B}_{21} \textbf{b}_1 + \texttt{B}_{22} \textbf{b}_2) + \textbf{b}_1 = \textbf{0}\). Eliminating \(\textbf{b}_2\) from these equations, we obtain an alternate eigenvalue formulation

$$\begin{aligned} \underbrace{(\texttt{B}_{21} - \texttt{B}_{22} \hat{\texttt{A}}_{12}^{-1} \hat{\texttt{A}}_{11})}_{\texttt{X}} \textbf{b}_1 = -(1/u_0) \textbf{b}_1. \;\; \end{aligned}$$
(52)

\(\square \)

The matrix \(\texttt{X}\) here, represents an alternative representation to the one in (50), of the Schur complement.

Note that this proposition allows us to test the existence of the Schur complement \(\texttt{X}\) (50) (or (52)), for a given favourable monomial set B and the corresponding coefficient matrix \(\texttt{C}(u_0)\). This condition is tested for each B and corresponding \(\texttt{C}(u_0)\), for both choices of set-partitions (33) or (32). Out of those that succeed, we select the coefficient matrix \(\texttt{C}(u_0)\), which led to the smallest possible Schur complement \(\texttt{X}\). If we have more than one such choice, we choose the smallest coefficient matrix \(\texttt{C}(u_0)\). Note that our iterative approach is crucial in increasing the chances of obtaining a minimal solver, even when the polynomials in \(\mathcal {F}\) have non-generic coefficients.

3.4 Row-Column Removal

The next step in our method is to attempt to reduce the size of the matrix \(\texttt{C}(u_0)\) selected in the previous step. For this, we employ a method, inspired by [26]. Here we select columns of \(\texttt{C}(u_0)\), one by one in a random order to test for their removal. For each such column, we select rows (say \(r_{1},\dots ,r_{s}\)) that contain nonzero entries in the column and also consider all columns (say \(c_{1},\dots ,c_{l}\)) that have nonzero entries in \(r_{1},\dots ,r_{s}\). Then, we can remove these s rows and l columns from \(\texttt{C}(u_0)\) only if the following conditions hold true for the resulting reduced matrix \(\texttt{C}_{\text {red}}(u_0)\).

  1. 1.

    After eliminating the monomials from T, we require that there is at least one monomial left in each \(T_{i}\).

  2. 2.

    If \(\texttt{C}(u_0)\) is of size \(p \times \varepsilon \), the reduced matrix \(\texttt{C}_{\text {red}}(u_0)\) would be of size \((p-s) \times (\varepsilon -l)\). Then, we require \(p-s \ge \varepsilon -l\) and \(\text {rank}(\texttt{C}_{\text {red}}(u_0)) = \varepsilon -l\).

  3. 3.

    \(\texttt{C}_{\text {red}}(u_0)\) must be block partitioned and decomposed as in Proposition 1.

We repeat the above process until there are no more columns that can be removed. Note that the last condition is important as it ensures that at each stage, the reduced matrix can still be partitioned and decomposed into an eigenvalue formulation (50) (or alternately (52)). Now by abuse of notation, let us denote \(\texttt{C}(u_0)\) to be the reduced matrix and denote B and T to be the reduced set of monomials and the set of monomial multiples, respectively.

If \(\texttt{C}(u_0)\) still has more rows than columns, we transform it into a square matrix by removing extra rows (say \(q_{1},\dots ,q_{j}\)) and the monomials from T indexing these rows, chosen in a way such that the three conditions mentioned above are still satisfied. Note that it is always possible to choose rows, such that these three conditions are satisfied. Moreover, our proposed approach first tries to remove as many rows as possible from the lower block, indexed by \(T_{m+1}\). This is to reduce \(|T_{m+1}|\)(\(=|B_{1}|\)) as much as possible and ensure that the matrix \(\texttt{A}_{21}\) and hence \(\texttt{X}\) (50) (or (52)) for eigenvalue problem has as small size as possible. Then, if there are more rows still to be removed, the rest are randomly chosen from the upper block indexed by \(\lbrace T_{1},\dots ,T_{m} \rbrace \). Algorithms for this step of matrix reduction are provided in “Appendix B”.

The sparse resultant matrix \(\texttt{M}(u_0)\) is constructed offline through these three steps. In the online stage, we fill in the coefficients of \(\texttt{M}(u_0)\) from the input measurements and compute a Schur complement \(\texttt{X}\) (50) (or (52)). Then, we extract the solutions to \(x_{1},\dots ,x_{n}\) by computing the eigenvectors of \(\texttt{X}\). The speed of execution of the solver depends on the size of \(\textbf{b}_1\)(=\(|B_{1} |\)) as well the size of \(\hat{\texttt{A}}_{12}\), while the accuracy of the solver largely depends on the condition number as well as the size of the matrix to be inverted i.e., \(\hat{\texttt{A}}_{12}\).

4 Action Matrix vs sparse resultant

In this section, we study the connection between a solver generated using an action matrix method [8, 9, 26, 27, 32, 35, 47], and a solver based on the sparse resultant method proposed in Sect. 3. Observe that, our sparse resultant method exploits only the structure of the input polynomials, via the geometry of their Newton polytopes, whereas the SOTA action matrix method algebraically investigates the input polynomial system via the Gröbner basis. Seemingly different, the generated solvers exhibit the same properties, and hence, in this section our objective is to throw some light on the similarities. Some of the symbols used in both the methods are the same. Therefore, to distinguish them, in this section we will use the prefixes \({}^{a}\) and \({}^{r}\) to, respectively, denote the symbols used in the action matrix method in Sect. 2.1 and the sparse resultant method in Sect. 3.1. Let us assume that both these methods are used to generate a solver for a system \(\mathcal {F}=0\) (1) of m polynomial equations in n variables with r solutions. An action matrix method results in a solver that is performing G–J elimination (or alternatively LU/QR decomposition) of an elimination template \({}^{a}\!\texttt{C}\) (A8) and the eigendecomposition of an action matrix \(\texttt{M}_f\) (A3) of size \(r \times r\) .Footnote 4 The proposed sparse resultant method, on the other hand, leads to a solver, which computes a Schur complement \(\texttt{X}\) (50) (or (52)) of size \(N \times N\), from the sparse resultant matrix \(\texttt{M}(u_0)\), followed by its eigendecomposition. In general, \(N \ge r\).

While the final solvers generated by these two methods are in general different, i.e., they perform different operations, at the same time, they demonstrate some similar properties, e.g., both these solvers extract solutions from eigenvectors of some matrices, and both involve computing a matrix inverse (G–J elimination can be replaced by matrix inverse). This motivates us to study the similarities of these solvers. Let us first define the equivalence of a solver generated using an action matrix method (AM) and a solver generated using the sparse resultant method from Sect. 3 (SRes).

Definition 3

For a given system of polynomial equations \(\mathcal {F}=0\) (1), an action matrix-based solver and a sparse resultant-based solver are defined to be equivalent, iff the following two conditions are satisfied:

  1. 1.

    The action matrix \(\texttt{M}_f\) (A3) is the same as the Schur complement \(\texttt{X}\) (50) (or (52)), i.e. \(\texttt{M}_f = \texttt{X}\).

  2. 2.

    The size of the matrix \({}^{a}\!\hat{\texttt{C}}\) is the same as the size of the upper block \(\begin{bmatrix} \hat{\texttt{A}}_{11}&\hat{\texttt{A}}_{12}\end{bmatrix}\) of the sparse resultant matrix \(\texttt{M}(u_0)\) (49) (or (51)) and the step of G–J elimination of the template matrix \({}^{a}\!\hat{\texttt{C}}\) in the online stage (step 7) of the action matrix method (Sect. 2.1) can be replaced with the matrix product \(\hat{\texttt{A}}_{12}^{-1} \hat{\texttt{A}}_{11}\), used to compute the Schur complement in the online stage (step 6) of our sparse resultant method (Sect. 3.1), or vice-versa.

Now, our goal is to define conditions under which the final online solvers generated by these two methods are equivalent.

We next demonstrate that if we have an action matrix-based solver (AM) for a given system \(\mathcal {F}=0\) (1), we can modify the steps performed by the sparse resultant method proposed in the Sect. 3.1, such that both the solvers are equivalent according to Definition 3.

4.1 \(\texttt{X}\) (SRes) from \(\texttt{M}_f\) (AM)

Let us assume that we have generated a solver for a system of polynomial equations, \(\mathcal {F}=0\) (1) using the action matrix method described in Sect. 2.1 and detailed in “Appendix A”. The action polynomial is assumed to be \(f=x_1\), where \( x_1 \in X\).

We first propose the following changes to the steps performed by the sparse resultant method in the offline stage, in Sect. 3.1. In what follows, we will assume that step i actually means the step i in Sect. 3.1.

C1R:

The step 1 is skipped, as there is no need to test each variable \(x_k \in X\).

C2R:

In the step 2, the extra polynomial is assumed to be \(f_{m+1} = x_1 - u_0\).

C3R:

In the step 3(a), the favourable set of monomials \({}^{r}\!B\) is directly constructed from \({}^{a}\!B\), as \({}^{r}\!B = {}^{a}\!B\). Moreover, the set of monomial multiples is constructed as \({}^{r}T_i = {}^{a}T_i, i=1,\dots ,m\) and \({}^{r}T_{m+1}\) = \(B_a\) (see Eq. (A5)). The step 3(b) is to be skipped, as the coefficient matrix \({}^{r}\texttt{C}(u_0)\) will be directly obtained from the action matrix-based solver, in subsequent steps. The output of this step then is the favourable monomial set \({}^{r}\! B\).

C4R:

Replace the step 4(a) by directly determining the set partition of \({}^{r}\!B=B_1\cup B_2\) (31)–(32), as \(B_1 = B_a \) and \( B_2 = \hat{B}_e \cup B_r\), where \(B_a, B_r\) and \(\hat{B}_e\) are defined in “Appendix A”. Moreover, the monomial ordering in \({}^{a}\textbf{b}\) determines the vectors, \(\textbf{b}_1 = \textbf{b}_a\) and \(\textbf{b}_2 = \begin{bmatrix} {\hat{\textbf{b}}}_e \\ \textbf{b}_r \end{bmatrix} \). The action matrix-based solver corresponds to the first version of set partition of \({}^{r}\!B\), considered in our sparse resultant method, i.e. the set partition as in (31).

C5R:

In the step 4(b), the upper block \(\begin{bmatrix} \texttt{A}_{11}&\texttt{A}_{12} \end{bmatrix}\) of \({}^{r}\texttt{C}(u_0)\) (33) is directly obtained from \({}^{a}\hat{\texttt{C}}\), as

$$\begin{aligned}{} & {} \texttt{A}_{11} = \begin{bmatrix} \texttt{C}_{13} \\ \texttt{C}_{23} \end{bmatrix}, \nonumber \\{} & {} \quad \texttt{A}_{12} = \begin{bmatrix} \hat{\texttt{C}}_{11} &{} \texttt{C}_{12} \\ \hat{\texttt{C}}_{21} &{} \texttt{C}_{22} \end{bmatrix}. \end{aligned}$$
(53)

Multiplying \(f_{m+1}\) with the monomials in \({}^{r}T_{m+1}\), the expanded set of polynomials is obtained as \(\lbrace \textbf{x}^{\mathbf {\alpha }} \left( x_1 - u_0 \right) \mid \textbf{x}^{\mathbf {\alpha }} \in {}^{r}T_{m+1} \rbrace \). It is possible to order the polynomials in this set, such that its matrix form is

$$\begin{aligned} \begin{bmatrix} \texttt{A}_{21} - u_0 \texttt{I}&\texttt{A}_{22} \end{bmatrix} \begin{bmatrix} \textbf{b}_1 \\ \textbf{b}_2 \end{bmatrix}. \end{aligned}$$
(54)

This determines the lower block of \({}^{r}\texttt{C}(u_0)\). Here, \(\texttt{A}_{21}\) and \(\texttt{A}_{22}\) are binary matrices, with entries either 0 or 1. The upper and the lower blocks together give us

$$\begin{aligned} {}^{r}\texttt{C}(u_0) \ {}^{r}\textbf{b} = \begin{bmatrix} \texttt{A}_{11} &{} \texttt{A}_{12} \\ \texttt{A}_{21} - u_0 \texttt{I} &{} \texttt{A}_{22} \end{bmatrix} \begin{bmatrix} \textbf{b}_1 \\ \textbf{b}_2 \end{bmatrix}. \end{aligned}$$
(55)

The output of the modified step 4, is the coefficient matrix \({}^{r}\texttt{C}(u_0)\).

C6R:

By construction, \({}^{r}\texttt{C}(u_0)\) is already a square invertible matrix and has no extra rows to be removed. As we do not need to attempt row-column removal from \({}^{r}\texttt{C}(u_0)\), the step 5 is to be skipped, to directly construct the sparse resultant matrix as \(\texttt{M}(u_0)= {}^{r}\texttt{C}(u_0)\). In this case, the submatrices \(\hat{\texttt{A}}_{11}\) and \(\hat{\texttt{A}}_{12}\) of \(\texttt{M}(u_0)\) are equal to the submatrices \( \texttt{A}_{11}\) and \( \texttt{A}_{12}\) of \({}^{r}\texttt{C}(u_0)\), respectively.

Applying these changes to the steps 1–5 in Sect. 3.1, i.e. the offline stage, we obtain the sparse resultant matrix \(\texttt{M}(u_0)\). After that, the online solver computes the Schur complement \(\texttt{X}\) of the block submatrix \(\hat{\texttt{A}}_{12}\) of \(\texttt{M}(u_0)\), followed by an eigenvalue decomposition (50) of \(\texttt{X}\) as described in the step 6 in Sect. 3.1. In the proposition C.1, we prove that the sparse resultant-based solver so obtained is equivalent to the action matrix-based solver.

4.2 \(\texttt{M}_f\) (AM) from \(\texttt{X}\) (SRes)

Let us assume that for a system of polynomial equations, \(\mathcal {F}=0\) (1), we have a sparse resultant-based solver, generated by following the steps in Sect. 3.1. In this solver the coefficient matrix \({}^{r}\texttt{C}(u_0)\) is assumed to be partitioned as in (33), and the alternative partition (34) will be considered in the next Sect. 4.3. Moreover, the Schur complement \(\texttt{X}\) is assumed to have as many eigenvalues as the number of solutions to \(\mathcal {F}=0\), i.e., \(N = r\).Footnote 5 The extra polynomial is supposed to be of the form \(f_{m+1} = x_1 - u_0\), for \(x_1 \in X\).

We first propose the following changes to the steps performed by the action matrix method in the offline stage, in “Appendix A”. In the following list, we assume that step i actually means the step i in “Appendix A”.

C1A:

There is no change to the step 1.

C2A:

In the step 2, set the basis of the quotient ring A, to be \(\mathcal {B}_A = \lbrace [\textbf{x}^{\mathbf {\alpha }}] \mid \textbf{x}^{\mathbf {\alpha }} \in B_1\rbrace \subset A\). By assumption, A is an r-dimensional vector space over \(\mathbb {C}\) spanned by the elements of \(\mathcal {B}_A\),Footnote 6. Construct the monomial set \(B_a = B_1\).

C3A:

In the step 3, assume the action polynomial \(f = x_1\).

C4A:

The monomial multiples, required in the step 4, are obtained as \({}^{a}T_i = {}^{r}T_i, i=1,\dots ,m\). This also gives us the extended polynomial set \(\mathcal {F}^\prime \). Note that \({}^{a}\!B = {}^{r}\!B = \text {mon}(\mathcal {F}^\prime )\).

C5A:

In the step 5, the required sets of reducible and excess monomials are determined as

$$\begin{aligned}{} & {} B_r = \lbrace x_1 \textbf{x}^{\mathbf {\alpha }} \mid \textbf{x}^{\mathbf {\alpha }} \in B_1 \rbrace \setminus B_1 \nonumber \\{} & {} \quad B_e = B_2 \setminus B_r. \end{aligned}$$
(56)

Note that by (45), \( \textbf{x}^{\mathbf {\alpha }} \in B_1 \implies x_1 \textbf{x}^{\mathbf {\alpha }} \in {}^{r}\!B \).

C6A:

In the step 6, the vector of monomials is determined as \({}^{a}\hat{\textbf{b}}={}^{r}\textbf{b}\). This will also determine the vectors \(\textbf{b}_a, \textbf{b}_r\) and \(\hat{\textbf{b}}_e\). Thus, the monomials in the action matrix method are ordered in the same way as the monomials in the sparse resultant method. Moreover, in the step 6, the elimination template \({}^{a}\hat{\texttt{C}}\), and its block partition, are determined as

$$\begin{aligned}{} & {} \begin{bmatrix} \hat{\texttt{C}}_{11} &{} \texttt{C}_{12} \\ \hat{\texttt{C}}_{21} &{} \texttt{C}_{22} \end{bmatrix} = \hat{\texttt{A}}_{12}, \nonumber \\{} & {} \quad \begin{bmatrix} \texttt{C}_{13} \\ \texttt{C}_{23} \end{bmatrix} = \hat{\texttt{A}}_{11}. \end{aligned}$$
(57)

The column partition of \({}^{a}\hat{\texttt{C}}\) is determined by the partition of the vector \({}^{a}\textbf{b}\). By the construction of \(\texttt{M}(u_0)\), \(\hat{\texttt{A}}_{12}\) is a square invertible matrix. Therefore, we can always find a row partition such that \(\texttt{C}_{22}\) is a square invertible matrix. As, \(\hat{\texttt{C}}_{22}\) and \(\hat{\texttt{A}}_{12}\) are square matrices, \(\hat{\texttt{C}}_{11}\) is also a square matrix. Therefore, there are no extra columns to be removed and the elimination template \({}^{a}\hat{\texttt{C}}\) will be such that its G–J elimination will lead to the required form, as in (A11).

Applying these changes to the steps 1–6 in Sect. 2.1, i.e. the offline stage, we obtain the elimination template \({}^{a}\hat{\texttt{C}}\). Subsequently, the online action matrix-based solver (step 7 in Sect. 2.1) computes the G–J elimination of \({}^{a}\hat{\texttt{C}}\), from which the entries of the action matrix \(\texttt{M}_f\) are read-off. This implies that the action matrix-based solver is equivalent to the sparse resultant-based solver, which is proved in Proposition C.2.

4.3 \(\texttt{M}_f\) (AM) from \(\texttt{X}\) (SRes) with Alternative Form

Let us assume that for a system of polynomial equations, \(\mathcal {F}=0\) (1), we have used the alternative partition of \({}^{r}\!B\) (32) and \({}^{r}\texttt{C}(u_0)\) (34), and generated a sparse resultant-based solver by following the steps in Sect. 3.1. This means that we need to assume that no solution of \(\mathcal {F}=0\), has \(x_1=0\). The other assumptions remain the same as in Sect. 4.2.

The alternative partition of the favourable monomial set \({}^{r}\!B\) (32) implies that the Schur complement \(\texttt{X}\) (52) gives us a representation of each coset \(\left[ \dfrac{\textbf{x}^{\mathbf {\alpha _j}}}{x_1} \right] , \textbf{x}^{\mathbf {\alpha _j}} \in B_1 \) as a linear combination of the cosets \([\textbf{x}^{\mathbf {\alpha _j}}], \forall \textbf{x}^{\mathbf {\alpha _j}} \in B_1\). However, the approach in Sect. 4.2 does not work, because in this case we need to set the action polynomial f to be \(1/x_1\).

Therefore, we can use the so-called Rabinowitsch trick [15], and propose the following changes to the offline stage of the action matrix method in Sect. 2.1. In the following list, we will assume that step i actually means the step i in Sect. 2.1.

C1A\(^\prime \):

In the step 1, the polynomial system \(\mathcal {F}\), is augmented with an extra polynomial \({}^{a}\!f_{m+1}=x_1 \lambda - 1\), where \(\lambda \) is a new variable. The augmented polynomial system is denoted as \({}^{a}\!\mathcal {F}_a\), and the augmented set of variables as \({}^{a}\!X_a = X \cup \lbrace \lambda \rbrace \). Consider the ideal \(I_a = \langle {}^{a}\!\mathcal {F}_a \rangle \) and the quotient ring \(A_a = \mathbb {C}[{}^{a}\!X_a]/I_a\).

C2A\(^\prime \):

Note that the number of solutions to \(\mathcal {F}=0\) is the same as that of \({}^{a}\!\mathcal {F}_a=0\), because we have assumed \(x_1 \ne 0\). Therefore, \(A_a\) is an r-dimensional vector space over \(\mathbb {C}\). Its basis is constructed as \(\mathcal {B}_A = \lbrace [\textbf{x}^{\mathbf {\alpha _j}}] \mid \textbf{x}^{\mathbf {\alpha _j}} \in B_1\rbrace \). Construct the monomial set \(B_a = B_1\).

C3A\(^\prime \):

In the step 3, the action polynomial is assumed to be \(f = \lambda \).

C4A\(^\prime \):

The sets of monomial multiples required in the step 4, are constructed as \({}^{a}T_i = {}^{r}T_i, i=1,\dots ,m+1\). From (32), \(\textbf{x}^{\mathbf {\alpha _j}} \in T_{m+1}\) implies \(x_1 \textbf{x}^{\mathbf {\alpha _j}} \in B_1\), which implies that \(\lambda x_1 \textbf{x}^{\mathbf {\alpha _j}} \in B_r\). The extended polynomial set \({}^{a}\!\mathcal {F}_a^\prime \) is obtained by multiplying each \(f_i \in {}^{a}\!\mathcal {F}_a\) with the monomials in \({}^{a}\!T_i\). In the step 4, the set of monomials is obtained \({}^{a}\!B = \text {mon}(\mathcal {F}_a^\prime )\).

C5A\(^\prime \):

In the step 5, the required monomial sets are

$$\begin{aligned}{} & {} B_r = \lbrace \lambda \textbf{x}^{\mathbf {\alpha }} \mid \textbf{x}^{\mathbf {\alpha }} \in B_1 \rbrace \nonumber \\{} & {} \quad B_e = B_2. \end{aligned}$$
(58)

Note that the set of monomials \({}^{a}\!B = B_e \cup B_r \cup B_a\), and \(B_a \cap B_r = \emptyset \).

C6A\(^\prime \):

In the step 6, the monomial vectors are set as \(\textbf{b}_a=\textbf{b}_1, \textbf{b}_r=\text {vec}(B_r)\) and \(\textbf{b}_e=\textbf{b}_2\). This will also fix the vector of monomials \({}^{a}\textbf{b} = \begin{bmatrix} \textbf{b}_e \\ \textbf{b}_r \\ \textbf{b}_a \end{bmatrix}\). The monomials in the vector \({}^{a}\textbf{b}\) in the action matrix method are ordered in the same way as the monomials in the sparse resultant method in Sect. 3.1. Moreover, as the monomials in \(B_r\) are in a one-to-one correspondence w.r.t. the monomials in \(B_1\), the monomial ordering in \(\textbf{b}_1\) fixes the order of the monomials in \(\textbf{b}_r\) as well.

Applying these changes to the steps 1–6 in Sect. 2.1, i.e. the offline stage, we obtain the elimination template \({}^{a}\hat{\texttt{C}}\). Then, the action matrix-based solver (step 7 in Sect. 2.1) computes the G–J elimination of \({}^{a}\hat{\texttt{C}}\), from which the entries of the action matrix \(\texttt{M}_f\) are read-off. This implies that the action matrix-based solver is equivalent to the sparse resultant-based solver, which is proved in Proposition C.3.

5 Comparison w.r.t. SOTA Sparse Resultant Methods

Our extra polynomial algorithm is based on the hidden variable method [21]. It also extends the method in [11] for square systems, by doing away with the computation of the mixed subdivision of the Minkowski sum of all the Newton polytopes. We consider these two methods to be the current SOTA for generating sparse resultant-based minimal solvers.

We select a subset of interesting minimal problems and demonstrate the importance of our algorithm in improving solver speeds compared to those generated by the SOTA resultant methods. Moreover, our algorithm has two important properties, viz. the special form of the extra polynomial and the row-column removal procedure. In this section, we demonstrate via an ablation study, how these two properties are critical in achieving substantially improved solver sizes. Refer to Table 1, where we list the sizes of the solvers generated using the SOTA sparse resultant methods [11, 21] (cols 3 & 4), the sizes of the solvers generated with an extra polynomial of a general form (see Sect. 2.2.3) (col 5), and the sizes of the solvers generated without the row-column removal step (see Sect. B.2) (col 6). Finally, we list the sizes of the solvers generated by our approach (col 7).

Note that the size of a solver based on the hidden variable method [21] is reported as the size of the underlying generalized eigenvalue problem (GEP) (14). If the matrix \(\texttt{A}\) (and \(\texttt{B}\)) in GEP is of dimensions \(r \times r\), the solver size can be considered to be equivalent to the size of an inverse of a matrix of dimensions \(r \times r\) and an eigenvalue computation of a matrix of size \(r \times r\). On the other hand, the size of the solver based on any one of the other methods is reported as a matrix of size \(s \times t\), which is equivalent to the size of an inverse of a matrix of dimensions \(s \times s\) and an eigenvalue computation of a matrix of size \( (t-s) \times (t-s)\). The difference in the time taken for inverting matrices of comparable sizes is minute, as the sparsity of the matrices enable highly optimized software implementation for matrix inverse. Therefore the speed of the solver is primarily governed by the size of the eigenvalue computation. With this discussion in mind, we observe from Table 1 that the solvers based on the GEP (col 1) will have worse or comparable speed to that of the solvers based on our approach (col 7). Observe that out of the five problems the approach by [11] failed to generate any solver as the systems are non-square, and hence, the subdivision methods cannot be applied. For all the problems, the method of u-resultant using an extra polynomial of a general form leads to larger elimination templates, whereas using an extra polynomial with a special form improves the template size. Additionally, the row-column removal step ultimately leads to the best possible solver sizes among all sparse resultant-based methods.

Table 1 A comparison of the sizes of the solvers for some minimal problems, generated by the iterative algorithm [21] (col 3), the polytope subdivision approach for square systems in [11] (col 4), the u-resultant method using an extra polynomial with a generic form (col 5), our proposed algorithm (see Sect. 3) without (col 6) and with row-column removal (col 7)

6 Experiments

In this section, we evaluate the performance of the minimal solvers, generated using our sparse resultant method, proposed in Sect. 3. We compare the stability as well as the computational complexities of these solvers with the state-of-art Gröbner basis-based solvers for many interesting minimal problems. The minimal problems selected for comparison represent a huge variety of relative and absolute pose problems and correspond to those studied in [35]. Note that for the two minimal problems, Rel. pose f+E+f 6pt and Abs. pose refractive P5P, we generated solvers from a simplified polynomial system as described in [38], instead of the original formulation.

Table 2 Size comparison of solvers based on [32] (col 6), Gröbner fan [35] (col 7), heuristic sampling [35] (col 9) and greedy parameter search [38] (col 10)

6.1 Computational Complexity

The biggest contributor towards computational complexity of a minimal solver is the size of the matrices that undergo crucial numerical operations. The solvers based on our sparse resultant method in Sect. 3, the state-of-the-art Gröbner basis solvers as well as in the original solvers proposed by the respective authors (see column 4) involve two crucial operations, a matrix inverseFootnote 7 and an eigenvalue decomposition. This is indicated by the size of the solver in the table, e.g. a sparse resultant-based solver of size \(11 \times 20\) means computing a matrix inverse \(\hat{\texttt{A}}_{12}^{-1}\) of size \(11 \times 11\), ultimately leading to a Schur complement \(\texttt{X}\) of size \(20-11 =9\), and an eigenvalue decomposition of this matrix \(\texttt{X}\). Similarly an action matrix-based solver of size \(11 \times 20\) means performing a G–J elimination of a \(11 \times 20\) matrix \(\texttt{C}\) and then an eigenvalue decomposition of an action matrix \(\texttt{M}_f\) of size \(20-11=9\). So in Table 2, we compare the size of the upper block \(\begin{bmatrix} \hat{\texttt{A}}_{11}&\hat{\texttt{A}}_{12} \end{bmatrix}\) of the sparse resultant matrix \(\texttt{M}(u_0)\) in our extra polynomial sparse resultant-based solvers, with the size of the elimination template \(\texttt{C}\) used in state-of-the-art Gröbner basis solvers as well as in the original solvers proposed by the respective authors (column 5).

The Gröbner basis-based solvers used for comparison include the solvers generated using the approach in [32] (column 6), the Gröbner fan-based and the heuristic-based approaches in [35] (columns 7 and 9, respectively), and the template optimization approach using a greedy parameter search in [38] (column 10). Out of the two methods for generating minimal solvers, proposed in [38], we consider the smallest solver for each minimal problem.

As we can see from Table. 2, for most minimal problems, our sparse resultant method leads to smaller solvers compared to the Gröbner basis solvers based on [32, 35] and of exactly the same size as the solvers based on [38]. The smallest solvers where size of the eigenvalue decomposition problem is the same as the number of solutions are written in bold in Table 2.

Moreover, for some minimal problems, our sparse resultant method leads to a larger eigenvalue problem than the number of solutions, written in italic in Table 2. For three such minimal problems, i.e. Rel. pose E+\(f\lambda \) 7pt, Unsynch. Rel. pose, and Optimal pose 2pt v2, our sparse resultant method leads to solvers with the smaller size as compared to the solvers based on the Gröbner basis-based methods [32, 35], whereas for the problem, Abs. pose quivers, our sparse resultant method leads to a smaller solver than the solvers based on [32, 35] and of comparable size w.r.t. the solver based on [38].

Note that for the two minimal problems, Optimal pose 2pt v2 and Rel. pose E angle+4pt, the elimination template \(\texttt{C}\) in the solvers based on the approach in [38], underwent an extra Schur complement-based step in the offline stage. Therefore, the template sizes reported for these two solvers are smaller than those in the solvers based on the other Gröbner basis methods in [32, 35]. For the problem, Rel. pose E angle+4pt, our sparse resultant method failed to generate a solver. This implies that none of the subsets \(\mathcal {F}_{\text {sub}}\) for this problem lead to a favourable monomial set (see Sect. 3.2). The primary reason for this failure is the non-genericity of the coefficients (see Definition 1). This points to a future research direction for such minimal problems. We need to look beyond Newton Polytopes of the subsets \(\mathcal {F}_{\text {sub}}\) and test other convex objects.

6.2 Stability Comparison

We evaluate and compare the stability of our solvers from Table 2 with the Gröbner basis-based solvers. As it is not feasible to generate scene setups for all considered problems, we instead evaluated the stability of minimal solvers using 5K instances of random data points. Stability measures include mean and median of \(Log_{10}\) of the normalized equation residuals for computed solutions, as well as the solver failures. The solver failures are an important measure of stability for real-world applications and are computed as the \(\%\) of 5K instances for which at least one solution has a normalized equation residual \(>10^{-3}\). These measures on randomly generated inputs have been shown to be sufficiently good indicators of solver stability [32]. Also, Table 3 shows the stability of the solvers for all the minimal problems from Table 2. We observe that for most of the minimal problems, our proposed sparse resultant-based solvers have no failures. Among those with some failures, for all except two minimal problems our sparse resultant-based solvers have fewer failures, as compared to the Gröbner basis-based solvers.

In general, our new method generates solvers that are stable with only very few failures. For a selected subset of nine minimal problems from Table 2, we computed the \(Log_{10}\) of the normalized equation residuals and depicted their histograms in Fig. 4. The histograms agree with our observation from Table 3, that our proposed sparse resultant method leads to solvers with only few failures.

Table 3 Stability comparison for solvers for the minimal problems from Table 2, generated by our new method, solvers generated using [32] (col 4), heuristic-based solvers [35] (col 5) and greedy parameter search [38] (col 6)

Note that as our new solvers are solving the same formulations of problems as the existing state-of-the-art solvers, the performance on noisy measurements and real data would be the same as the performance of the state-of-the-art solvers. The only difference in the performance comes from numerical instabilities that already appear in the noise-less case and are detailed in Table 3 (fail\(\%\)). For performance of the solvers in real applications, we refer the reader to papers where the original formulations of the studied problems were presented (see Table 2, column 2). Here we select two interesting problems, i.e., one relative and one absolute pose problem, and perform experiments on synthetically generated scenes and on real images, respectively.

E+ \(f\lambda \) solver on synthetic scenes We study the numerical stability of our sparse resultant-based solver for the problem of estimating the relative pose of one calibrated camera, and one camera with unknown focal length and radial distortion from 7-point correspondences, i.e. the Rel. pose E+\(f\lambda \) 7pt problem from Table 2. We consider the formulation “elim. \(\lambda \)” proposed in [35] that leads to the smallest solvers. We study the performance on noise-free data and compare it to the results of Gröbner basis solvers from Table 2.

We generated 10K scenes with 3D points drawn uniformly from a \(\left[ -10,10\right] ^3\) cube. Each 3D point was projected by two cameras with random feasible orientation and position. The focal length of the first camera was randomly drawn from the interval \(f_{gt} \in \left[ 0.5, 2.5\right] \) and the focal length of the second camera was set to 1, i.e. the second camera was calibrated. The image points in the first camera were corrupted by radial distortion following the one-parameter division model. The radial distortion parameter \(\lambda _{gt}\) was drawn at random from the interval \([-0.7, 0]\) representing distortions of cameras with a small distortion up to slightly more than GoPro-style cameras.

Figure 3 shows \(Log_{10}\) of the relative error of the distortion parameter \(\lambda \) (left) and the focal length f (right), obtained by selecting the real root closest to the ground truth. All tested solvers provide stable results with only a small number of runs with larger errors. The solver based on our sparse resultant method (green) is not only smaller, but also slightly more stable than the heuristic-based solver [35] (red) and the solver based on [32] (black). For most minimal problems, the solver based on our proposed sparse resultant method has fewer failures, as compared to the solver based on the greedy parameter approach in [38] (blue) (Fig. 4).

P4Pfr solver on real images

We evaluated our sparse resultant-based solver for a practical problem of estimating the absolute pose of camera with unknown focal length and radial distortion from four 2D-to-3D point correspondences, i.e. the P4Pfr solver, on real data. We consider the Rotunda dataset, which was proposed in [29], and in [34] it was used for evaluating P4Pfr solvers. This dataset consists of 62 images captured by a GoPro Hero4 camera. Example of an input image from this dataset (left) as well as undistorted (middle) and registered image (right) using our new solver, is shown in Fig. 5 (top). The Reality Capture software [44] was used to build a 3D reconstructions of this scene. We used the 3D model to estimate the pose of each image using the new P4Pfr resultant-based solver (\(28\times 40\)) in a RANSAC framework. Similar to [34], we used the camera and distortion parameters obtained from [44] as ground truth for the experiment. Figure 5 (bottom) shows the errors for the focal length, radial distortion, and the camera pose. Overall, the errors are quite small, e.g. most of the focal lengths are within \(0.1\%\) of the ground truth and almost all rotation errors are less than 0.1 degrees, which shows that our new solver works well for real data. We have summarized these results in Table 4 where we present the errors for the focal length, radial distortion, and the camera pose obtained using our proposed solver and for the sake of comparison we also list the errors, which were reported in [34], where the P4Pfr (\(40\times 50\)) solver was tested on the same dataset. Overall the errors are quite small, e.g. most of the focal lengths are within \(0.1\%\) of the ground truth and almost all rotation errors are less than 0.1 degrees, which shows that our new solver as well as the original solver work well for real data. The results of both solvers are very similar. However, note that the slightly different results reported in [34] are due to RANSAC’s random nature and a slightly different P4Pfr formulation (\(40\times 50\)) used in [34].

Fig. 3
figure 3

Histograms of \(Log_{10}\) relative error in radial distortion (left) and focal length (right) for Rel. pose E+\(f\lambda \) 7pt (elim \(\lambda \)) problem for 10K randomly generated synthetic scenes. These scenes represent cameras with different radial distortions, poses and focal lengths. For comparison, we generated solvers using our sparse resultant method (green), Gröbner basis method [32] (black), heuristic method [35] (red) and greedy parameter search method [38] (blue) (Color figure online)

Fig. 4
figure 4

Histograms of \(Log_{10}\) of normalized equation residual error for nine selected minimal problems. The methods tested for comparison are based on Gröbner basis [32], heuristics [32], greedy parameter search [38] and our sparse resultant method

7 Conclusion

In this paper, we have proposed a sparse resultant method for generating efficient solvers for minimal problems in computer vision. It uses a polynomial with a special form to augment the initial polynomial system \(\mathcal {F}\) in (1) and construct a sparse resultant matrix \(\texttt{M}(u_0)\) (7), using the theory of polytopes [15]. The special form enables us to decompose the resultant matrix constraint (8), into a regular eigenvalue problem (50) (or (52)) using the Schur complement. As demonstrated in Sect. 6, our sparse resultant method leads to minimal solvers with comparable speed and stability w.r.t. to the SOTA solvers based on the action matrix method [26, 32, 35, 38]. Note that the way our sparse resultant method is designed, for some minimal problems, it leads to solvers with larger eigenvalue problems but performing a smaller matrix inverse and with comparable or better stability compared to that of the Gröbner basis-based solvers.

While the action matrix method and the sparse resultant method are based on different mathematical theories, we have observed that the resulting solvers involve similar numerical operations, such as eigenvalue computation. This raises the question, “Under what conditions for a given minimal problem (a given system of polynomial equations), are the two solvers the same?” In Sect. 4, we have attempted to answer this question. Specifically, if we begin with an action matrix-based solver for a given minimal problem, then we propose a list of changes to be made to the steps in our sparse resultant method so that it leads to an equivalent sparse resultant-based solver. In the opposite direction, we also study the case when we begin with a sparse resultant-based solver and establish a list of changes to the steps performed by an action matrix method such that it leads to an equivalent solver. In other words, if we begin with an action matrix-based solver, it determines the extra polynomial \(f_{m+1}\) and the favourable monomial set, to be used in the sparse resultant method. Or, if we begin with a sparse resultant-based solver, it determines the monomial ordering, the extended polynomial set \(\mathcal {F}^{\prime }\), and the basis of the quotient ring to be used for computing an action matrix-based solver.

We hope that this discussion paves a path towards unifying the two approaches for generating minimal solvers, i.e., the action matrix methods and the sparse resultant method based on an extra polynomial. It is known that both these approaches are examples of the normal form methods for a given polynomial system \(\mathcal {F}\), recently studied in [40, 54, 55]. To the best of our knowledge, normal forms have yet not been incorporated in the automatic generators for minimal solvers in computer vision. This may be an interesting future topic to investigate.