The Sylvester and Bézout Resultant Matrices for Blind Image Deconvolution

Blind image deconvolution ($$\text {BID}$$ BID ) is one of the most important problems in image processing, and it requires the determination of an exact image $$\mathcal {F}$$ F from a degraded form of it $$\mathcal {G}$$ G when little or no information about $$\mathcal {F}$$ F and the point spread function $$(\text {PSF}$$ (PSF ) $$\mathcal {H}$$ H is known. Several methods have been developed for the solution of this problem, and one class of methods considers $$\mathcal {F}, \mathcal {G}$$ F,G and $$\mathcal {H}$$ H to be bivariate polynomials in which the polynomial computations are implemented by the Sylvester or Bézout resultant matrices. This paper compares these matrices for the solution of the problem of $$\text {BID}$$ BID , and it is shown that it reduces to a comparison of their effectiveness for greatest common divisor ($$\text {GCD}$$ GCD ) computations. This is a difficult problem because the determination of the degree of the $$\text {GCD}$$ GCD of two polynomials requires the calculation of the rank of a matrix, and this rank determines the size of the $$\text {PSF}$$ PSF . It is shown that although the Bézout matrix is symmetric (unlike the Sylvester matrix) and it is smaller than the Sylvester matrix, which has computational advantages, it yields consistently worse results than the Sylvester matrix for the size and coefficients of the $$\text {PSF}$$ PSF . Computational examples of blurred and deblurred images obtained with the Sylvester and Bézout matrices are shown, and the superior results obtained with the Sylvester matrix are evident.


Introduction
The removal of blur and other degradations from an image is important because it makes its subsequent processing for, for example, feature detection and feature extraction, significantly easier and more reliable.Images that arise in practical applications may suffer from degradations, which must be removed before the images are interrogated.This practical necessity has been a major motivation for the continued research into methods for the improvement of degraded images.
If the point spread function (PSF) is spatially invariant, then the blurred image G is formed by the convolution of the PSF H and the exact image F. If, in addition, the PSF is known, then the determination of a deblurred form of a blurred image reduces to linear deconvolution, which can be solved by, for example, methods of computational linear algebra [16].The more difficult problem arises when there is no information, or only partial information, about the PSF, in which case the problem reduces to blind image deconvolution (BID).In this circumstance, additional information about the exact image and/or the PSF must be provided in order to obtain a deblurred image.An example of this prior information is the knowledge that the PSF is separable, that is, if h(x, y) is the bivariate polynomial whose coefficients are the pixel values of the PSF, then it can be written as the product of its column component h c (x) and its row component h r (y), h(x, y) = h c (x)h r (y).
The first set of methods that were used for the solution of the problem of BID are reviewed in [17], but more recent methods have used Bayes' theorem, for which priors are placed on the PSF and the exact image [3,23,26,37].These methods require that the maximum a posteriori (MAP) estimates F and Ĥ of, respectively, F and H, F, Ĥ = arg max p (G|F, H) p (F) p (H) , (1) minimise the error associated with the convolution operation and have sparse derivatives, which arises because many natural images are composed of regions in which the gradients are small, with relatively few regions of high gradients.
The distribution of gradients x in F is often modelled as a generalised Laplacian, where σ is the standard deviation of the distribution.This distribution is frequently called a natural image prior because of its satisfaction by many natural images, and its imposition as a prior concentrates gradients at a small number of pixels.
The condition α = 2 yields the Gaussian distribution, but this prior does not yield a good deblurred image.
The simultaneous MAP estimate of F and H is illposed, and it is better to calculate the MAP estimate of H by marginalisation over all possible images F because this yields a better conditioned problem, and then calculate the MAP estimate of F [18,19].The calculation of the MAP estimate of H leads, however, to an intractable integral, but it is shown that an approximation can be made.This approximation yields a solution of (1) that uses the expectation-maximisation (EM) algorithm, which consists of two steps, the E-step and the M-step, that alternate until convergence is achieved.The E-step requires that a prior be placed on H and a non-blind deconvolution problem be solved, which allows an estimate F of F to be calculated.The M-step involves the calculation of an improved estimate of H, using the estimate F of the exact image.The E-and Msteps are then invoked again, using the most recent estimates of H and F, and the process is repeated until convergence occurs.The first step in the EM algorithm requires that a prior be placed on H, and a uniform distribution is usually assumed [18,26], but this is not realistic [19].
The image processing toolbox in Matlab has four functions that deblur an image: The functions deconvreg.m,deconvwnr.mand deconvlucy.m solve the problem of semi-blind image deconvolution because the PSF is specified as an input argument to these functions.The edges of the deblurred image obtained with these three functions may show ringing, which can be reduced by calling the function edgetaper.mbefore the deblurring function is called.The function edgetaper.mrequires the PSF as one of its arguments, and it can therefore only be used for solving the problem of semi-blind image deconvolution.
The function deconvblind.m is different because only an estimate of the PSF, rather than the exact PSF, need be specified as an input argument, but the computed PSF is very dependent on the estimate of its size, and less by the entries of the matrix that defines it.A comparison of these four functions and the Sylvester resultant matrix for the computation of a deblurred image is in [30,31], and it is noted in [15] that the best deblurred image obtained from the four functions requires visual inspection of several deblurred images that differ in the values of the input arguments, for example, the number of iterations and the noise power.
A different problem is solved in this paper because prior information on the PSF is not assumed (apart from the property of spatial invariance), and thus both the size and coefficients of the PSF are calculated.The method of image deblurring described in this paper considers F, G and H to be bivariate polynomials, and deblurring is achieved by performing polynomial computations on G and H.This approach is justified by the convolution operation, which defines the multiplication of two polynomials and the formation of a blurred image by a spatially invariant PSF.It therefore follows that if the pixel values of F, G and H are the coefficients of their polynomial forms, f (x, y), g(x, y) and h(x, y), respectively, then g(x, y) = f (x, y)h(x, y).The polynomial computations are implemented by resultant matrices, of which there are several types, but only the Bézout matrix [12,20,21] and the Sylvester matrix [11,22,24,30,31] have been used for image deblurring.Section 2 contains a comparison of these matrices for greatest common divisor (GCD) computations, which are required for the determination of the size of the PSF and the solution of the problem of BID.The application of these matrices to image deblurring is considered in Sect. 3 and it is shown that, as for GCD computations, the Sylvester matrix yields better results than the Bézout matrix.Section 4 contains a summary of the paper.
The computations in all the examples were performed using Matlab (64 bits).

Resultant Matrices and GCD Computations
It was stated in Sect. 1 that the convolution operation is common to polynomial multiplication and the formation of a blurred image by a spatially invariant PSF.The equivalence of these operations is now summarised, and more details are in [30].In particular, if the PSF is separable, then only one blurred image G is required for its determination, and this allows a deblurred form of G to be computed.The calculation of the PSF requires two GCD computations, one computation for the degree and coefficients of the column component h c (x) of the PSF and one computation for the degree and coefficients of the row component h r (y) of the PSF, and the PSF is equal to h c (x)h r (y).A deblurred image is then obtained by two deconvolution computations, where the first computation yields a partially deblurred image F by deconvolving h c (x) (h r (y)) from the given blurred image G, and the second computation yields a fully deblurred image by deconvolving h r (y) (h c (x)) from F. The GCD computations can be implemented by resultant matrices, of which the Sylvester and Bézout are the most popular, and the deconvolution computation reduces to the solution of a linear algebraic equation whose coefficient matrix is Toeplitz.Both these computations are ill-posed and thus noise, which is present in all images, makes the computation of their improved forms difficult.In particular, the presence of noise implies it is necessary to consider an approximate greatest common divisor (AGCD), rather than the GCD, of two polynomials [36].It is, however, convenient to consider the Sylvester and Bézout resultant matrices for the computation of the GCD of two exact polynomials, and to defer a discussion of their use for the computation of an AGCD of two noisy polynomials to Sect. 3, which considers the application of resultant matrices to image deblurring.
The Bézout matrix B( f , g) of two polynomials f (y) and g(y), which are of degrees m and n, m ≥ n, respectively, is used for GCD computations [4] and image deblurring [12,20,21].The matrix B( f , g) is defined by from which it follows that it is square and symmetric, and of order m.It is clear that if f (y) and g(y) are redefined as where J is the reverse unit matrix of order m.The Sylvester matrix S( f , g) of f (y) and g(y) is square and of order m + n, where C( f ) and D(g) are Toeplitz matrices whose entries are the coefficients of f (y) and g(y), respectively [4], Theorem 1 shows that the degree and coefficients of the GCD of f (y) and g(y) can be computed from the QR decomposition of their Sylvester matrix [ The coefficients of the GCD are obtained from computations on the solution x of the equation B m−d x = d, where the entries of d are functions of the coefficients of f (y) and g(y).
Theorem 3 shows that the GCD of f (y) and g(y) can also be computed from their Sylvester matrix and its subresultant matrices S k ( f , g), k = 1, . . ., min(m, n), S 1 ( f , g) = S( f , g), where the subresultant matrices are obtained by the deletion of rows and columns from S( f , g).Examples of this computation when f (y), g(y) and their GCD contain multiple roots of high degree are in [32,33]. ( The coefficients of the coprime polynomials lie in the null space of S d ( f , g).
Proof Since the degree of the GCD of f (y) and g(y) is d, there exist quotient polynomials u k (y) and v k (y), and a common divisor polynomial t k (y), such that for k = 1, . . ., d, where It follows from (6) that and if u k (y) and v k (y) are equal to the zero polynomial for then ( 7) and ( 8) can be written in matrix form as where k+1) are Toeplitz matrices of the form (3), and is the kth Sylvester subresultant matrix.The vectors u k and v k in (9) satisfy because the polynomials f (y) and g(y) possess common divisors of degrees 1, . . ., d, but they do not possess a common divisor of degree k > d.The result (5) follows from (9).
Theorem 2 does not assign a value to, or limits on the values of, det B k ( f , g) for k = 1, . . ., m − d − 1.This must be compared with Theorem 3, which distinguishes between the singular and non-singular subresultant matrices S k ( f , g) for all values of k and therefore allows the value of k for which a change from singularity to non-singularity, as k increases, to be calculated.This property of the subresultant matrices is used in Sect. 3 to determine the size of the PSF that is used to blur an image.
Table 1 lists some differences between the Sylvester matrix S( f , g) and Bézout matrix B( f , g) of the polynomials f (y) and g(y), The properties of the Sylvester and Bézout matrices in Table 1 suggest that the Bézout matrix is preferred to the Sylvester matrix [28].This preference is clear for the GCD computations in image deblurring, which require that m = n and thus the Bézout matrix is half the size of the Sylvester matrix for this application, and it is also symmetric.Also, Boito [8, page 84] notes that the computation of the GCD of f (y) and g(y) using the QR decomposition of S( f , g) and B( f , g) may miss large common roots, and that this problem 123 Table 1 The properties of the Sylvester matrix S( f , g) and Bézout matrix B( f , g), where f (y) and g(y), which are of degrees m and n, respectively, are defined in (11) Property Sylvester matrix S( f , g) Bézout matrix B( f , g) Every entry is a bilinear term The polynomials f1 (y) and f2 (y) are of the same degree, and similarly, the polynomials g1 (y) and g2 (y) are of the same degree affects the Sylvester matrix more than the Bézout matrix.It is shown, however, in [33] that significantly improved results are obtained with the Sylvester matrix and its subresultant matrices are processed by three operations before computations are performed on these matrices: 1. Normalisation Equation (10) shows that the coefficients of f (y) and g(y) are decoupled in S k ( f , g) because they occupy, respectively, its first n − k + 1 columns and last m − k + 1 columns.If the coefficients of g(y) are much larger than the coefficients of f (y), then the left partition C k ( f ) is approximately equal to the zero matrix with respect to the right partition D k (g).This imbalance between the partitions of S k ( f , g) can lead to incorrect results for GCD computations, and thus the first preprocessing operation is the normalisation of the coefficients of f (y) and g(y) in order to balance the left and right partitions of S k ( f , g).

Scale g(y) by a ConstantThe GCD of two polynomials is
defined up to an arbitrary non-zero constant α, and thus where ∼ denotes equivalence to within an arbitrary nonzero constant.The matrices S k ( f , g) are therefore written as where α is a constant whose optimal value is determined from the solution of a linear programming problem.This issue is discussed below.3. Scale the Independent Variable Computations on matrices whose entries vary widely in magnitude may cause numerical problems, and it is therefore desirable to minimise the ratio r of the maximum entry to the minimum entry of S k ( f , αg), for every value of k.This is accomplished by the substitution y = θw in (2), where w is the new independent variable and θ is a parameter.It is shown in [33] that α 0 and θ 0 , the optimal values of α and θ , are chosen to minimise r , and that they are the solution of a linear programming problem.Furthermore, these optimal values are valid for all values of k = 1, . . ., min(m, n), and thus the linear programming problem need only be solved once.
The application of these three preprocessing operations to the polynomials f (y) and g(y) in ( 2) yields the polynomials f (w) and α 0 ḡ(w) whose coefficients are āi θ m−i 0 , i = 0, . . ., m, and α 0 b j θ n− j 0 , j = 0, . . ., n, respectively, where āi and b j are the normalised coefficients of f (y) and g(y), respectively.All computations are performed on the Sylvester matrix and its subresultant matrices S k ( f , α 0 ḡ), k = 1, . . ., min(m, n), and the importance of these operations for GCD computations with the Sylvester matrix is shown in [32,33].They are also required for the computation of multiple roots of a polynomial, where the multiplicities of its distinct roots are obtained by a series of GCD computations and polynomial deconvolutions [29,34].Examples 1 and 2 consider Theorems 1, 2 and 3 for the calculation of the degree of the GCD of two polynomials.

Example 1 Consider the polynomials f (y) and g(y) whose GCD is of degree
and for which m = 12.Noise δa i and δb j was added to the coefficients a i and b j of, respectively, f (y) and g(y), δa i = εr i a i , i = 0, . . ., 12, and δb j = εr j b j , j = 0, . . ., 12, where ε = 10 −5 , and r i and r j are uniformly distributed random variables in the interval [− 1, 1]. Figure 1 shows the normalised singular values of B( f , g), with and without noise, and it is seen that, in the absence of noise, the rank of B( f , g) is equal to 3, as required from ( 4) because m = 12 and d = 9.The figure also shows that the addition of noise causes a significant deterioration in the results because the maximum change (about four orders of magnitude) in the normalised singular values occurs between i = 3 and i = 4, but this change is significantly smaller than in the absence of noise.
Theorem 2 requires that the rank of the principal submatrices of B( f , g) be considered and Fig. 2 shows the variation of the condition number κ (B k ( f , g)) of B k ( f , g) with the order k of the submatrix, with and without noise.It is seen that, in the absence of noise, it decreases slowly for k = 12, 11, . . ., 5, 4, and that there is a significant change in κ (B k ( f , g)) between k = 4 and k = 3. Theorem 2 suggests, therefore, that the value of d is given by m − d = 3, that is, d = 9, which is correct.The condition number of B 3 ( f , g) is approximately equal to 10 5 , which is large, but many orders of magnitude smaller than the values of κ (B k ( f , g)) , k = 4, . . ., 12. Figure 2 shows that the degree of the GCD of f (y) and g(y) cannot be determined in the presence of noise.
Figure 3 shows the normalised singular values of S( f , α 0 ḡ), with and without noise.In both cases, the maximum change in the normalised singular values occurs from i = 15 to i = 16, and thus the computed degree of the GCD is d = 9, which is correct.Figure 4 shows the condition number of each Sylvester subresultant matrix S k ( f , α 0 ḡ) and it is seen that, in the presence of noise, the maximum change in the condition number, about four orders of magnitude, occurs from k = 9 to k = 10, and thus the computed degree of the GCD is correct.

Example 2
The Sylvester matrix and subresultant matrices, and the Bézout matrix and principal submatrices, of the polynomials f (y) = 6y 13  were formed and the procedure described in Example 1 was followed, except that noise was not added to the polynomials.Figure 5 shows the normalised singular values of B( f , g), and it is seen that the degree of the GCD of f (y) and g(y) is four, and that this rank deficiency is clearly defined.Figure 6 shows the variation of the condition number of the principal submatrices B k ( f , g) with k, but a conclusion cannot be drawn from the graph.
Figure 7 shows the normalised singular values of the Sylvester matrix S( f , α 0 ḡ), and Fig. 8 shows the condition numbers of the Sylvester matrix and its subresultant matrices S k ( f , α 0 ḡ), k = 1, . . ., 11.These figures are consistent with Fig. 5 because they show that the degree of the GCD of f (y) and g(y) is four.Furthermore, the rank deficiency is clearly defined in Figs. 5, 7 and 8, which must be compared with the unsatisfactory result in Fig. 6.The polynomials f (y) and g(y) are, however, coprime, and Table 2 shows they have four very close roots.It therefore follows that B( f , g) and S( f , g) are, from theoretical considerations, non-singular, and the proximity of four roots of f (y) to four roots of g(y) manifests itself in the nearsingularity of these matrices, with a rank loss of four, that is, these matrices are numerically singular.This distinction between the theoretical result, B( f , g) and S( f , g) are nonsingular, and the computational result, B( f , g) and S( f , g) are numerically singular with a rank loss of four, can be quantified by calculating the relative separation of the roots in Table 2.In particular, if α 1 and α 2 are the roots of f (y) with positive imaginary parts, and β 1 and β 2 are the roots of g(y) with positive imaginary parts, then the relative separation of α 1 and β 1 , and their complex conjugates, is Similarly, the relative separation of α 2 and β 2 , and their complex conjugates, is equal to 2.96 × 10 −7 .
The singular value decomposition (SVD) was used in Examples 1 and 2 to calculate the degree of the GCD of two polynomials, but it is computationally expensive.Furthermore, it will be shown in Example 4 that the calculation

Four roots of f (y)
Four roots of g(y) of the size of a separable PSF requires 25 GCD computations in each direction, which implies that a large number of SVDs must be computed.It is therefore desirable to consider computationally cheaper methods for the calculation of the size of the PSF.Sections 2.1 and 2.2 consider, respectively, methods for the computation of the degree and coefficients of the GCD of two polynomials.

The Degree of the GCD of Two Polynomials
The Sylvester and Bézout matrices require different methods for the calculation of the degree of the GCD of two polynomials and they are therefore considered separately.Theorem 1 shows that the degree d of the GCD of f (y) and g(y) can be calculated from the rank loss of the Sylvester matrix S( f , g), and Theorem 3 shows that d can also be calculated from the change from singularity to non-singularity of the Sylvester subresultant matrices.These methods require the SVD, but the presence of noise may lead to a significant deterioration in the results, as shown in Examples 1 and 2, because the small singular values of a matrix A are sensitive to perturbations in A. Also, the SVD is cubic in complexity and its application to each subresultant matrix in Theorem 3 is therefore expensive.This theorem can, however, be implemented efficiently by the QR decomposition, which is cubic in complexity, of S( f , g), and the QR decomposition of each subresultant matrix is then computed using its update formula, which is quadratic in complexity [14], because S k+1 ( f , g) is formed by the deletion of two columns and one row from S k ( f , g).The calculation of the degree of the GCD therefore reduces to the calculation of the singular or non-singular nature of each subresultant matrix S k ( f , g) from the upper triangular matrix R k , where Care must be exercised when the QR decomposition is used to compute the rank r of a matrix A ∈ R p×q , p ≥ q, because if the QR decomposition of AΠ , where Π is a permutation matrix, is where R 22 ∈ R r ×r , then σ q−r +1 ≤ R 22 2 where σ i is the ith singular value of A and the singular values are arranged in non-increasing order.It follows that if R 22 2 is small, then A has r small singular values, but the converse is not true because it does not follow that, even with column pivoting, the existence of r small singular values implies that R 22 2 is small.Although R 22 2 cannot be used to calculate the rank of AΠ , Example 3 shows that the rank of S k ( f , α 0 ḡ) can be calculated from the row sums and diagonal entries of The polynomials f (w) and α 0 ḡ(w) were obtained by applying the preprocessing operations to f (y) and g(y), as described in Sect. 2. Figures 9 and 10 show, respectively, the diagonal entries of R k and the row sums of R k for k = 1, . . ., 220.The points that define the subresultant matrix indexed by k = 70 are marked in the figures, and it is seen that they can, in principle, be used to determine the degree of the GCD of f (y) and g(y).The figures also show that the diagonal entries and row sums of R k have structure.
Example 3 shows that the degree d of the GCD of f (y) and g(y) (or equivalently, the degree of the GCD of f (w) and ḡ(w)) requires the calculation of λ k , the difference between the maximum diagonal entry and the minimum diagonal entry of R k , and μ k , the difference between the maximum row sum and the minimum row sum of R k , for k = 1, . . ., min(m, n), where R k,i, j is entry (i, j) of R k .
These methods of calculating d from the Sylvester resultant matrix and its subresultant matrices are heuristic and they are not mathematically rigorous.They are, however, justified by computational experiments, which show that they lead to good results for GCD computations.Also, Example 4 shows that good results are obtained when they are used to calculate the size of the PSF from a blurred image in the presence of added noise and uncertainty in the PSF.
Example 1 shows that the SVD cannot be used to calculate the degree of the GCD of two polynomials from their Bézout matrix B( f , g) and its leading principal submatrices B k ( f , g) because it yields unsatisfactory results when the polynomials are perturbed by noise.Furthermore, Example 2 shows that, even in the absence of added noise, the matrices B k ( f , g) may return unsatisfactory results.These issues are addressed in Sect.2.3.

The Coefficients of the GCD of Two Polynomials
The computation of the coefficients of the GCD of f (y) and g(y) from their Sylvester matrix is considered initially, after which the Bézout matrix is considered for this computation.
It is assumed that the degree d of the GCD has been calculated, and that the polynomials have been preprocessed, as discussed in Sect.2, thereby yielding the polynomials f (w) and α 0 ḡ(w).The calculation of the coefficients of t(w), the GCD of f (w) and α 0 ḡ(w), is addressed for Bernstein basis polynomials in [9], and the same method can be used for power basis polynomials.In particular, Theorem 3 shows that S d ( f , α 0 ḡ), which is of order (m +n −d +1)×(m +n −2d + 2), has unit rank loss and (9) shows that the coprime polynomials vd (w) and ūd (w) lie in the null space of S d ( f , α 0 ḡ).There is therefore one equation that defines the linear dependence of the columns of S d ( f , α 0 ḡ), and if the pth column of S d ( f , α 0 ḡ) is one of these linearly dependent columns and it is denoted by b, then the homogeneous equation ( 9) can be written as 123 where it follows from (10) that the coefficient matrix Ād is formed by the concatenation of two Toeplitz matrices and the subsequent removal of the pth column, the = is replaced by an ≈ because it is assumed that f (y) and g(y) (and therefore f (w) and ḡ(w)) are corrupted by added noise and they therefore have an AGCD, not a GCD, and vd and ūd are vectors that contain the coefficients of the coprime polynomials vd (w) and ūd (w), f (w) ≈ ūd (w)t(w) and α 0 ḡ(w) ≈ vd (w)t(w).
The structure of Ād allows ( 14) to be replaced by where Ēd has the same structure as Ād , and ē has the same structure as b, from which it follows that a structurepreserving matrix method is used to compute its solution, that is, the matrix Ēd and the vectors ē and x [25].The solution is under-determined but it is shown in [9] that the addition of a constraint allows a unique solution to be computed.The approximation ( 14) and the exact equation ( 15) can be interpreted in terms of polynomial computations.In particular, it follows from ( 7) that ( 14) is equivalent to the approximate equality of two deconvolutions, and ( 15) is obtained by the addition of the polynomials δ f (w), α 0 δ ḡ(w), δ ūd (w) and δ vd (w) to f (w), α 0 ḡ(w), ūd (w) and vd (w), respectively, such that the approximate deconvolutions are replaced by exact deconvolutions, where the entries of Ēd and ē in (15) are the coefficients of δ f (w) and α 0 δ ḡ(w), and the entries of x are the coefficients of the coprime polynomials ūd (w) + δ ūd (w) and vd (w) + δ vd (w).
The solution of (15) yields the corrected polynomials f (w) and g(w), and the corrected coprime polynomials ũd (w) and ṽd (w), These corrected polynomials allow an AGCD t(w) to be computed, where T ( p) is a Toeplitz matrix whose entries are the coefficients of the polynomial p(w), and t, f and g are vectors of the coefficients of t(w), f (w) and g(w), respectively.Consider now the computation of the coefficients of the GCD t(y) of f (y) and g(y) from their Bézout matrix B( f , g).The preprocessing operations that are necessary for the Sylvester matrix are not required for the Bézout matrix and computations can therefore be performed directly on B( f , g).The simplest method requires the QR decomposition of B( f , g) because the coefficients of t(y) are in the last non-zero row of the upper triangular matrix R [8, Theorem 2.6.11].Another method requires the solution of the equation Ax = b where A is a square symmetric matrix of order m − d [5, Algorithm 9.1].This method is used in [20,21] to calculate the size of a separable PSF from a blurred image, and these two methods (the QR decomposition and the solution of Ax = b) are compared in Example 5 for the computation of a deblurred image.Finally, it is noted that the Bézout matrix does not satisfy an equation that is similar to (15) because, as stated in Table 1, its entries are bilinear, not linear.

Comparison of the Sylvester and Bézout Matrices for AGCD Computations
Some properties of the Sylvester and Bézout matrices are stated in Table 1, and the smaller size and symmetry of the Bézout matrix suggest it is preferred to the Sylvester matrix for AGCD computations.Numerical experiments show, however, that the Sylvester matrix yields better results because it is less sensitive to noise than the Bézout matrix, that is, the Sylvester matrix returns the correct degree of the GCD, and coefficients with a small backward error, for much higher noise levels than the Bézout matrix.The Bézout matrix B( f , g) may yield poor results because each of its entries requires one or more computations of the form i, j (a i b j − a j b i ), which results in a large error, due to floating point cancellation, when a i b j − a j b i 1 [7].Furthermore, this error is significant in the presence of additive noise, which is much larger than the error due to floating point cancellation.By contrast, each non-zero entry of the Sylvester matrix is either a i or b j , and thus this error does not occur.This problem also arises when computations are performed on the Bézout matrix of polynomials expressed in the Bernstein basis [35], and the results in this paper show that if f (y) and g(y) are processed before their Sylvester matrix is formed, as discussed in Sect.2, then the Sylvester matrix yields significantly better results than the Bézout matrix.
Figures 2 and 6 are consistent with the results of other examples considered by the authors, and with the result of Bini and Gemignani [6], who obtain very large condition numbers of the leading principal submatrices B k ( f , g).Their results also show that the ratio of the smallest singular values of two successive leading principal submatrices exhibits dramatic changes, such that computations based on these singular values are unreliable.These results imply that GCD computations that require the leading principal submatrices of B( f , g) are prone to numerical instability and they are therefore not recommended.

Resultant Matrices and Image Deblurring
This section considers the Sylvester and Bézout matrices for the solution of the problem of BID in order to determine if the Sylvester matrix yields better deblurred images than the Bézout matrix.
The authors of [20,21] claim they successfully solved the problem of BID by computing a deblurred image using the Bézout matrix.It is assumed in these references that the same image F is blurred by two different PSFs, H 1 and H 2 , thereby yielding the blurred images G 1 and G 2 , where N 1 and N 2 are the noise samples added to the first and second blurred images, respectively, and * denotes convolution.These equations can be expressed in polynomial form, from which it follows that f (x, y) (the exact image, which is to be computed) is an AGCD of g 1 (x, y) and g 2 (x, y) (the given blurred images) if h 1 (x, y) and h 2 (x, y) (the PSFs) are coprime polynomials.It is shown in [20,21] that these bivariate polynomial computations can be reduced to a series of univariate polynomial AGCD computations, from which a deblurred image can be calculated, using the leading principal submatrices of the Bézout matrix.The authors claim good results for these computations, despite its numerical problems, which are discussed in Sect. 2. The Matlab code (which is referenced in [20,21]) was therefore examined, and it is clear that the authors hardcode the size of the PSF as arguments of the function that deblurs the images, that is, the size of the PSF is defined, not calculated.It therefore follows that these authors solve the problem of semi-blind deconvolution because they calculate only the coefficients of the PSF, and they do not calculate its size.This semi-blind deconvolution problem is significantly easier than the totally blind deconvolution problem because the calculation of the size of the PSF is not trivial since it reduces to the calculation of the (numerical) rank of a matrix whose entries are corrupted by noise.
A separable Gaussian PSF is used in [20,21], and comparison of the Sylvester and Bézout matrices for BID therefore requires that a PSF that satisfies this property be used.This property is not realistic for practical problems, but it is the simplest form of a PSF because a deblurred image can be calculated from one blurred image [30,31], and the computations are simplified because the Fourier transform is not required.By contrast, two blurred images are required for the computation of a non-separable PSF, and the Fourier transform is used to reduce the two-dimensional BID problem to two sets of GCD computations on univariate polynomials.This approach is used in [22,24], but the signal-to-noise ratios (SNRs) of the images in the examples in these references are 50 dB and 45 dB, respectively, which are very large values of the SNR of an image.Experimental results by the authors of this paper showed that the solution of the BID problem for a non-separable PSF degrades rapidly in the presence of noise because of the use of the Fourier transform in the computations, and it is suggested that this deterioration is the reason for the very high SNRs in the examples in [22,24].Since the objective of this paper is a comparison of the Sylvester and Bézout matrices for BID, it is necessary to remove all sources of error that could make the analysis and interpretation of the results difficult.It therefore follows that, for the purposes of this paper, it is necessary to consider a separable PSF.Furthermore, this comparison is included in Example 4, for which the SNR of the blurred and noisy image is 12 dB, and Example 7 shows that a weakly non-separable PSF can be approximated by a separable PSF in the absence of noise and uncertainty in the PSF.
A blurred image G is formed by the convolution of the exact image F and the PSF H, as shown in (16) for two blurred images, and it necessarily follows from this model that G is larger than F. This difference in the sizes of G and F is not satisfied in practical problems because the given blurred image and computed deblurred image are the same size.Deconvolution of H from G in computational experiments requires, therefore, that G be cropped to the same size as F after it has been formed by the convolution operation, and the borders of this cropped image must then be restored by extrapolation, thereby yielding a modified blurred image G * .The best extrapolation function would return G * = G, in which case deconvolution of H from G * would return F. It is, however, now shown that the extrapolation of the border pixels of the cropped form of G introduces several complications.
The number of pixels that are extrapolated is a function of the size of the PSF because if the size of the PSF is r × s pixels, where r and s are odd and, by definition of the problem of BID, unknown, then (s − 1)/2 pixels must be extrapolated on the left and right borders of the cropped form of G, and (r − 1)/2 pixels must be extrapolated at the top and bottom borders of the cropped form of G, thereby yielding the image G * that is the same size as G. Different extrapolation functions give rise to different images G * and, in the absence of prior information, it is necessary to compute several deblurred images that differ in the values of r and s, and it may also be necessary to consider different extrapolation functions.If a quantitative measure of the quality of a deblurred image cannot be developed, then visual inspection of the deblurred images is required to determine the best deblurred image.This visual test is simplified if, for a given extrapolation function, the difference in the deblurred images obtained with the correct values of r and s, and with incorrect values of r and s, is significant.The effect on the deblurred image of a difference in the size of the PSF that is used to blur an image, and then deblur its blurred form, is considered in Example 6.
The discussion above shows that the problem of BID requires careful consideration of the boundary conditions.It is clear that exponential extrapolation functions are a natural choice because the extrapolated pixel values are positive and they decay smoothly as the distance from the centre of the PSF increases.These boundary conditions for the solution of the problem of BID are significantly different from the boundary conditions that are imposed when the PSF is known and the deblurred image is obtained from the solution of a linear algebraic equation Ax = b.In particular, the entries of A are functions of the PSF and they are therefore not known when the problem of BID is to be solved.If, however, the PSF (and therefore A) are known, the simplest boundary condition is the zero boundary condition, which yields satisfactory results when the image outside the field of view is black, but artefacts, for example, ringing at the boundaries, may be obtained for images that do not satisfy this property.This has led to the development of anti-reflective boundary conditions [2,27], synthetic boundary conditions [13], and periodic and reflexive boundary conditions [16].Anti-reflective boundary conditions are the preferred boundary conditions in the absence of noise because discontinuities in the image and its derivative at its edges are not imposed, thereby minimising the effects of ringing.None of these boundary conditions are physically realistic, but they are imposed in order to preserve structure in the coefficient matrix A and thus allow fast algorithms to be used [1].They are imposed for mathematical convenience and must be compared with the more realistic boundary conditions defined by extrapolation functions, which allow the extrapolated pixels to be estimated from the pixels that lie in the border region of the cropped image.
The cropping and extrapolation procedures required for the solution of the problem of BID are not implemented in [20,21], and comparison of the Sylvester and Bézout matrices therefore requires that the blurred image G, rather than the cropped and extrapolated image G * , be used.Although this is not realistic in practical problems (as noted above), it is appropriate for the work considered in this paper because the extrapolation procedure introduces errors in G * .Specifically, if G * is used, it may not be possible to identify the cause of the differences between the deblurred images obtained from the Sylvester and Bézout matrices, that is, differences that arise from the cropping and extrapolation procedures, and differences that arise from the properties of the Sylvester and Bézout matrices.
The coefficients of the GCD of f (y) and g(y) are contained in the last non-zero row of the upper triangular matrix R of the QR decomposition of B( f , g), and similarly for S( f , g), and Chang and Paige [10] show that an error in a matrix A = Q R introduces an error in R that is much smaller than κ(A), where κ(A) is the condition number of A. Examples 4 and 5 show, however, that superior results for the coefficients of the GCD are obtained by using a structurepreserving matrix method to solve (15), which is derived from the Sylvester subresultant matrix S d ( f , α 0 ḡ), rather than by using the QR decomposition of the Bézout matrix.
Example 4 Figure 11 shows a separable Gaussian PSF, which is 29 × 29 pixels, and Fig. 12 shows an exact image and the blurred image formed by convolving the exact image with the PSF.Noise was not included in the formation of the blurred image.
The calculation of the size of the PSF using the Sylvester and Bézout matrices is described in [30,31] and it requires that the same procedure be used for both matrices.Con-sider initially the calculation of horizontal extent of the which requires the selection of 25 random pairs of rows of the blurred image, thus forming 25 pairs of polynomials whose coefficients are the pixel values of the chosen rows.This enables 25 GCD computations to be performed, and the length of the row component of the PSF is therefore equal to one plus the mode of the degree of the GCD from these 25 computations.This procedure is repeated for the column component of the PSF by selecting 25 random pairs of columns of the blurred image.Although this method of selecting 25 pairs of rows and 25 pairs of columns was used for the Sylvester and Bézout matrices, the computation of the degrees of the GCDs (one GCD for the row component of the PSF and one GCD for the column component of the PSF) from these 25 pairs of rows and 25 pairs of columns was performed differently for these two matrices.In particular, Theorems 1 and 3, respectively, were used to compute the rank of the Bézout matrix, and the rank of the Sylvester matrix and each of its subresultant matrices.
Consider initially the Bézout matrix, whose rank loss was computed using the SVD.This matrix was formed for each of the 25 pairs of rows, and each of the 25 pairs of columns, of the blurred image G, and the SVD was used to calculate the rank (and therefore the rank loss) of each of these 50 matrices.By contrast, the rank loss of the Sylvester matrix and its subresultant matrices was computed using the QR decomposition, as described in Sect.2.1 and Example 3. In particular, the degree of the GCD was computed using (12), and identical results were obtained using (13).
Figures 13 and 14 show the histograms of the results for the computation of the size of the PSF obtained from the Bézout matrix and the correct degree, 28, is achieved in 24 of the 25 trials for the row component of the PSF, and in all the trials for the column component of the PSF.Slightly better results were obtained when the Sylvester matrix was used because the correct degree was achieved in all the trials for the row and column components of the PSF.
These results from the histograms for the degrees of the GCD in the horizontal and vertical directions (and there-Fig.13 The computation of the degree of the row component of the PSF, in the absence of noise, using the Bézout matrix, for Example 4 Fig. 14 The computation of the degree of the column component of the PSF, in the absence of noise, using the Bézout matrix, for Example 4 fore the size of the PSF) the coefficients the PSF to be calculated, as described in Sect.2.2 for the Sylvester matrix, or, as shown in Theorem 1, from the upper triangular matrix R of the QR decomposition of the Bézout matrix.For both methods, the deblurred image is then obtained by two deconvolution operations, one operation for the deconvolution of the row (column) component of the PSF from the given blurred image, thereby yielding a partially deblurred image F, and one operation for the deconvolution of the column (row) component of the PSF from F, thereby yielding a fully deblurred image.
The experiment was repeated but uncertainty E was included in the PSF and noise N was added, such that the blurred image is defined by which can be expressed in polynomial form as g(x, y) = f (x, y) (h(x, y) + e(x, y)) + n(x, y).
If g i, j , f i, j , h i, j , e i, j and n i, j are the coefficients of the polynomials g(x, y), f (x, y), h(x, y), e(x, y) and n(x, y), respectively, then the relative uncertainty in each coefficient of the PSF and the relative error due to the noise were defined by 0 < e i, j h i, j ≤ ε i, j , and 0 < n i, j f (x, y) h(x, y) + e(x, y) i, j respectively, where ε i, j is a uniformly distributed random variable in the range 10 −6 , 10 −5 .This range of values of ε i, j yields a normwise relative error of the blurred image with respect to the exact image of 0.25, which corresponds to a signal-to-noise ratio of 12 dB.The dependence of the uncertainty E in the PSF and noise N on the pixel coordinates i and j is included for two reasons: 1.It cannot be assumed in real images that ε i, j is approximately constant for all values of i and j, that is, across the entire image, and thus a variation of one order of magnitude 10 −6 , 10 −5 in the relative errors defines a more realistic scenario.2. It makes the inclusion of a tolerance, which is dependent on the relative error, for the determination of the size of the PSF difficult, and it therefore provides a stringent test for the calculation of the size of the PSF from the numerical rank of a matrix.
Figures 15 and 16 show the histograms of the results obtained when the rank deficiency of the Bézout matrix was used to determine the size of the PSF.It is clear that bad results were obtained because the correct degrees (28 for both components) were not obtained in any of the 50 trials.Much better results were obtained when the Sylvester matrix and its subresultant matrices were used because, as shown in Figs. 17 and 18, the correct degrees of the row and column components of the PSF were achieved in 14 and 13 trials, respectively, and the mode of each set of results therefore returned the correct degree of each GCD, and therefore the correct size of the PSF.
The discussion in Sect.2.3 and Example 4 show that the Bézout matrix cannot be used for GCD computations and image deblurring because even a modest level of noise causes unsatisfactory results to be obtained since it does not yield the correct degree of the GCD, and therefore the correct size of the PSF.Example 5 extends these examples by considering the application of this matrix to BID when neither additive noise nor uncertainty in the PSF are included in the formation of a blurred image.This procedure was repeated in order to calculate the coefficients of the row component h r (y) of the PSF.The deblurred image was then obtained by performing two deconvolutions, one deconvolution for h c (x) and one deconvolution for h r (y). 2. The Matlab code in [20,21] was executed, for which the semi-blind deconvolution problem was solved because the size of the PSF was defined (hardcoded) and not calculated.The coefficients of the PSF in each direction were then calculated from a leading principal submatrix of B( f , g), as stated in Theorem 2 and [5, Algorithm 9.1].
The deblurred images are shown in Fig. 19 and they are of poor quality because they show spurious artefacts.The relative errors in the images obtained using methods (1) and (2) are 5.64×10 −2 and 1.04×10 −1 , respectively.The deblurred image obtained from the Sylvester matrix is shown in Fig. 20, and it is clear that it is superior to the deblurred images in Fig. 19.Its relative error is 2.14×10 −9 , which is many orders of magnitude smaller.
The solution of the problem of BID requires that the size and coefficients of the PSF be computed, and the former computation is significantly harder than the latter computation because it reduces to the determination of the (numerical) rank of a matrix, which is a difficult problem.The effect on a deblurred image of a difference in the size of the PSF for blurring an image and then deblurring the blurred image, which occurs when the computed rank of the resultant matrix is incorrect, is considered in Example 6.

Example 6
The image in Fig. 21 was blurred by the separable Gaussian PSF shown in Fig. 22.The size of the PSF is 17 × 17 pixels, and the blurred image was deblurred by square separable Gaussian PSFs of widths 9, 11, . . ., 17, . . ., 23, 25 pixels using the Sylvester matrix, as described in Example 4. The variation of the relative error of each deblurred image with the error in the width of the PSF is shown in Fig. 23, where the error in the width of the PSF is defined as error in width of PSF = width of PSF for blurring −width of PSF for deblurring.
The graph has a sharp minimum when the error in the width of the PSF used to deblur the blurred image is zero.If, however, the widths of the PSF used to blur the exact image and then deblur its blurred form differ, then the relative error in the deblurred image is very large.
The result of Example 6 is consistent with the function deconvblind.m in Matlab, for which the computed PSF is affected strongly by the initial estimate of its size, and less strongly by the values of the entries of its matrix form, or equivalently, by the coefficients of its polynomial representation [15].The PSFs in Examples 4, 5 and 6 are separable, but, as noted above, although this restriction is appropriate for some theoretical investigations, it is not realistic because the PSF in most imaging systems is not separable.If the PSF is weakly separable, that is, h(x, y) ≈ h c (x)h r (y), then it may be possible to approximate a non-separable PSF by a separable PSF, which would make the computation of a deblurred image simpler and faster.Example 7 considers the error in the deblurred image that results from this approximation.with a non-separable PSF (σ 2 > 0), and all these blurred images were deblurred assuming a separable PSF (σ 2 = 0).The method described in Example 4 was used to deblur each image, and thus the size of the PSF was calculated using the QR decomposition of the Sylvester matrix and histograms of the results were drawn, as shown in Figs. 17 and 18.
The effect of σ 2 > 0 is, as expected, a decrease in the number of occurrences of the mode of each histogram, that is, the correct degrees of the PSF (14 in the horizontal and vertical directions) are achieved less frequently.This causes an increase in the error in the computed PSF, and therefore an increase in the relative error in the deblurred image.This error was calculated for each deblurred image, and its variation with σ 2 is shown in Fig. 24.The minimum error occurs, as expected, at σ 2 = 0, and the error then increases, initially rapidly and then more slowly, as σ 2 increases.The relative error in the deblurred image when σ 2 = 0.25 is 2.54 × 10 −2 , but this image is of high quality, as shown in Fig. 25.
Example 7 shows that a weakly non-separable PSF can be approximated by a separable PSF in the absence of noise and uncertainty in the PSF, which makes the computation of a deblurred image simpler and faster.Further investigation is, however, required to determine the validity of this approximation in the presence of noise and uncertainty in the PSF.

Summary
This paper has considered the Sylvester and Bézout matrices for AGCD computations and the solution of the problem of BID.It has been shown that the Bézout matrix does not yield good results because floating point cancellation may occur when its entries are computed since it is necessary to Fig. 25 The deblurred image for σ 2 = 0.25, for Example 7 evaluate terms of the form i, j (a i b j − a j b i ) where a i and b j are the coefficients of the polynomials f (y) and g(y), respectively.This problem does not arise when the entries of the Sylvester matrix are formed because each of its nonzero entries is either the coefficient a i or the coefficient b j .Also, computations on the leading principal submatrices of the Bézout matrix are not recommended because they yield unstable results.Computations on the Sylvester subresultant matrices are, however, much more reliable and they can be used for AGCD computations.
Examples showed that the Bézout matrix cannot be used to compute the size of the PSF in the presence of noise and uncertainty in the PSF, which is a disadvantage of this matrix because these degradations are present in images obtained in practical problems.The Sylvester matrix yields good results because it allows high quality deblurred images to be computed, even when these degradations are present.
The improved results obtained with the Sylvester matrix justify its use, even though it is not symmetric (unlike the Bézout matrix, which is symmetric), and it is larger than the Bézout matrix.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/),which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Example 3
Let the degrees of the polynomials p(y) and q(y) be 150 and 200, respectively, and let their coefficients be uniformly distributed random variables in the ranges [− 10, 10] and [− 10, 20], respectively.The polynomial d(y) is of degree 70, its coefficients are uniformly distributed random variables in the range [− 30, 30], and it satisfies d(y) = GCD ( f , g), where f (y) are g(y) are defined by f (y) = p(y)d(y) and g(y) = q(y)d(y).

Figures 9 and 10
show that d is defined by the maximum change in λ k and μ k between successive values of k,

Fig. 9 3 Fig. 10
Fig.9The magnitudes of the diagonal entries R k (i, i) for Example 3

Fig. 15 4 Fig. 16 Fig. 17 4 Fig. 18 Example 5
Fig.15 The computation of the degree of the row component of the PSF, in the presence of noise, using the Bézout matrix, for Example 4

Fig. 19
Fig.19 The deblurred images using the QR decomposition of the Bézout matrix (left) and the Matlab code in[20,21] (right), for Example 5

Fig. 20
Fig.20 The deblurred image using the Sylvester matrix, for Example 5

Fig. 21
Fig. 21 The exact image for Examples 6 and 7

Table 2
Four of the roots of f (y) and four of the roots of g(y), for Example 2