Skip to main content

An empirical approach for probing the definiteness of kernels

Abstract

Models like support vector machines or Gaussian process regression often require positive semi-definite kernels. These kernels may be based on distance functions. While definiteness is proven for common distances and kernels, a proof for a new kernel may require too much time and effort for users who simply aim at practical usage. Furthermore, designing definite distances or kernels may be equally intricate. Finally, models can be enabled to use indefinite kernels. This may deteriorate the accuracy or computational cost of the model. Hence, an efficient method to determine definiteness is required. We propose an empirical approach. We show that sampling as well as optimization with an evolutionary algorithm may be employed to determine definiteness. We provide a proof of concept with 16 different distance measures for permutations. Our approach allows to disprove definiteness if a respective counterexample is found. It can also provide an estimate of how likely it is to obtain indefinite kernel matrices. This provides a simple, efficient tool to decide whether additional effort should be spent on designing/selecting a more suitable kernel or algorithm.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. The package CEGO is available on CRAN at http://cran.r-project.org/package=CEGO.

References

  • Bader DA, Moret BM, Warnow T, Wyman SK, Yan M, Tang J, Siepel AC, Caprara A (2004) Genome rearrangements analysis under parsimony and other phylogenetic algorithms (grappa) 2.0. https://www.cs.unm.edu/~moret/GRAPPA/. Accessed 16 Nov 2016

  • Bartz-Beielstein T, Zaefferer M (2017) Model-based methods for continuous and discrete global optimization. Appl Soft Comput 55:154–167

    Article  Google Scholar 

  • Berg C, Christensen JPR, Ressel P (1984) Harmonic analysis on semigroups, volume 100 of graduate texts in mathematics. Springer, New York

    Book  Google Scholar 

  • Beume N, Naujoks B, Emmerich M (2007) SMS-EMOA: multiobjective selection based on dominated hypervolume. Eur J Oper Res 181(3):1653–1669

    Article  Google Scholar 

  • Boytsov L (2011) Indexing methods for approximate dictionary searching: comparative analysis. J Exp Algorithmics 16:1–91

    MathSciNet  Article  Google Scholar 

  • Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167

    Article  Google Scholar 

  • Camastra F, Vinciarelli A (2008) Machine learning for audio, image and video analysis: theory and applications. Advanced information and knowledge processing. Springer, London

    Book  Google Scholar 

  • Campos V, Laguna M, Martí R (2005) Context-independent scatter and tabu search for permutation problems. INFORMS J Comput 17(1):111–122

    MathSciNet  Article  Google Scholar 

  • Camps-Valls G, Martín-Guerrero JD, Rojo-Álvarez JL, Soria-Olivas E (2004) Fuzzy sigmoid kernel for support vector classifiers. Neurocomputing 62:501–506

    Article  Google Scholar 

  • Chen Y, Gupta MR, Recht B (2009) Learning kernels from indefinite similarities. In: Proceedings of the 26th annual international conference on machine learning (ICML ’09), New York, NY, USA. ACM, pp 145–152

  • Constantine G (1985) Lower bounds on the spectra of symmetric matrices with nonnegative entries. Linear Algebra Appl 65:171–178

    MathSciNet  Article  Google Scholar 

  • Cortes C, Haffner P, Mohri M (2004) Rational kernels: theory and algorithms. J Mach Learn Res 5:1035–1062

    MathSciNet  MATH  Google Scholar 

  • Curriero F (2006) On the use of non-euclidean distance measures in geostatistics. Math Geol 38(8):907–926

    MathSciNet  Article  Google Scholar 

  • Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  • Deza M, Huang T (1998) Metrics on permutations, a survey. J Comb Inf Syst Sci 23(1–4):173–185

    MathSciNet  MATH  Google Scholar 

  • Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Springer, Berlin

    Book  Google Scholar 

  • Feller W (1971) An introduction to probability theory and its applications, vol 2. Wiley, Hoboken

    MATH  Google Scholar 

  • Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling. Wiley, Hoboken

    Book  Google Scholar 

  • Gablonsky J, Kelley C (2001) A locally-biased form of the direct algorithm. J Glob Optim 21(1):27–37

    MathSciNet  Article  Google Scholar 

  • Gärtner T, Lloyd J, Flach P (2003) Kernels for structured data. In: Matwin S, Sammut C (eds) Inductive logic programming, vol 2583. Lecture Notes in Computer Science. Springer, Berlin, pp 66–83

    Chapter  Google Scholar 

  • Gärtner T, Lloyd J, Flach P (2004) Kernels and distances for structured data. Mach Learn 57(3):205–232

    Article  Google Scholar 

  • Haussler D (1999) Convolution kernels on discrete structures. Technical report UCSC-CRL-99-10, Department of computer science, University of California at Santa Cruz

  • Hirschberg DS (1975) A linear space algorithm for computing maximal common subsequences. Commun ACM 18(6):341–343

    MathSciNet  Article  Google Scholar 

  • Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In Proceedings of LION-5, pp 507–523

  • Ikramov K, Savel’eva N (2000) Conditionally definite matrices. Journal of Mathematical Sciences 98(1):1–50

    MathSciNet  Article  Google Scholar 

  • Jiao Y, Vert J.-P (2015) The Kendall and Mallows kernels for permutations. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 1935–1944

  • Kendall M, Gibbons J (1990) Rank correlation methods. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Lee C (1958) Some properties of nonbinary error-correcting codes. IRE Trans Inf Theory 4(2):77–82

    MathSciNet  Article  Google Scholar 

  • Li H, Jiang T (2004) A class of edit kernels for SVMS to predict translation initiation sites in eukaryotic mrnas. In: Proceedings of the eighth annual international conference on resaerch in computational molecular biology (RECOMB ’04), New York, NY, USA. ACM, pp 262–271

  • Loosli G, Canu S, Ong C (2015) Learning SVM in Krein spaces. IEEE Trans Pattern Anal Mach Intell 38(6):1204–1216

    Article  Google Scholar 

  • Marteau P-F, Gibet S (2014) On recursive edit distance kernels with application to time series classification. IEEE Trans Neural Netw Learn Syst PP(99):1–1

    Google Scholar 

  • Moraglio A, Kattan A (2011) Geometric generalisation of surrogate model based optimisation tocombinatorial spaces. In: Proceedings of the 11th European conference on evolutionary computation in combinatorial optimization (EvoCOP’11), Berlin, Heidelberg, Germany. Springer, pp 142–154

  • Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Murphy KP (2012) Machine learning. MIT Press Ltd., Cambridge

    MATH  Google Scholar 

  • Ong CS, Mary X, Canu S, Smola AJ (2004) Learning with non-positive kernels. In: Proceedings of the twenty-first international conference on machine learning (ICML ’04), New York, NY, USA. ACM, pp 81–88

  • Pawlik M, Augsten N (2015) Efficient computation of the tree edit distance. ACM Trans Database Syst 40(1):1–40

    MathSciNet  Article  Google Scholar 

  • Pawlik M, Augsten N (2016) Tree edit distance: robust and memory-efficient. Inf Syst 56:157–173

    Article  Google Scholar 

  • Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, Cambridge

    MATH  Google Scholar 

  • Reeves CR (1999) Landscapes, operators and heuristic search. Ann Oper Res 86:473–490

    MathSciNet  Article  Google Scholar 

  • Schiavinotto T, Stützle T (2007) A review of metrics on permutations for search landscape analysis. Comput Oper Res 34(10):3143–3153

    Article  Google Scholar 

  • Schleif F-M, Tino P (2015) Indefinite proximity learning: a review. Neural Comput 27(10):2039–2096

    MathSciNet  Article  Google Scholar 

  • Schleif F-M, Tino P (2017) Indefinite core vector machine. Pattern Recognit 71:187–195

    Article  Google Scholar 

  • Schölkopf B (2001) The kernel trick for distances. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems, vol 13. MIT Press, Cambridge, pp 301–307

    Google Scholar 

  • Sevaux M, Sörensen K (2005) Permutation distance measures for memetic algorithms with population management. In: Proceedings of 6th metaheuristics international conference (MIC’05), University of Vienna, pp. 832–838

  • Singhal A (2001) Modern information retrieval: a brief overview. IEEE Bull Data Eng 24(4):35–43

    Google Scholar 

  • Smola AJ, Ovári ZL, Williamson RC (2000) Regularization with dot-product kernels. In: Advances in neural information processing systems vol 13, Proceedings. MIT Press, pp 308–314

  • van der Loo MP (2014) The stringdist package for approximate string matching. R J 6(1):111–122

    Article  Google Scholar 

  • Vapnik VN (1998) Statistical learning theory, vol 1. Wiley, New York

    MATH  Google Scholar 

  • Voutchkov I, Keane A, Bhaskar A, Olsen TM (2005) Weld sequence optimization: the use of surrogate models for solving sequential combinatorial problems. Comput Methods Appl Mech Eng 194(30–33):3535–3551

    Article  Google Scholar 

  • Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM 21(1):168–173

    MathSciNet  Article  Google Scholar 

  • Wu G, Chang EY, Zhang Z (2005) An analysis of transformation on non-positive semidefinite similarity matrix for kernel machines. In: Proceedings of the 22nd international conference on machine learning

  • Zaefferer M, Bartz-Beielstein T (2016) Efficient global optimization with indefinite kernels. In: Parallel problem solving from nature-PPSN XIV. Springer, pp 69–79

  • Zaefferer M, Stork J, Bartz-Beielstein T (2014a) Distance measures for permutations in combinatorial efficient global optimization. In: Bartz-Beielstein T, Branke J, Filipič B, Smith J (eds) Parallel problem solving from nature-PPSN XIII. Springer, Cham, pp 373–383

    Chapter  Google Scholar 

  • Zaefferer M, Stork J, Friese M, Fischbach A, Naujoks B, Bartz-Beielstein T (2014b) Efficient global optimization for combinatorial problems. In: Proceedings of the 2014 conference on genetic and evolutionary computation (GECCO ’14), New York, NY, USA. ACM, pp 871–878

  • Zhan X (2006) Extremal eigenvalues of real symmetric matrices with entries in an interval. SIAM J Matrix Anal Appl 27(3):851–860

    MathSciNet  Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Zaefferer.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Distance measures for permutations

In the following, we describe the distance measures employed in the experiments.

  • The Levenshtein distance is an edit distance measure:

    \({d} _{Lev}(\pi ,\pi ') = edits_{\pi \rightarrow \pi '}\)

    Here, \(edits_{\pi \rightarrow \pi '}\) is the minimal number of deletions, insertions, or substitutions required to transform one string (or here: permutation) \(\pi \) into another string \(\pi '\). The implementation is based on Wagner and Fischer (1974).

  • Swaps are transpositions of two adjacent elements. The Swap distance [also: Kendall’s Tau (Kendall and Gibbons 1990; Sevaux and Sörensen 2005) or Precedence distance (Schiavinotto and Stützle 2007)] counts the minimum number of swaps required to transform one permutation into another. For permutations, it is (Sevaux and Sörensen 2005):

    $$\begin{aligned} {d} _{Swa}(\pi ,\pi ')&= \sum _{i=1}^{m} \sum _{j=1}^{m} z_{ij} ~~ \text {with}\\ z_{ij}&= \left\{ \begin{array}{l l} 1 &{} \quad \text {if } \pi _i < \pi _j ~\text {and}~ \pi '_i > \pi '_j ,\\ 0 &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$
  • An interchange operation is the transposition of two arbitrary elements. Respectively, the Interchange (also: Cayley) distance counts the minimum number of interchanges (\(interchanges_{\pi \rightarrow \pi '}\)) required to transform one permutation into another (Schiavinotto and Stützle 2007):

    \({d} _{Int}(\pi ,\pi ') = interchanges_{\pi \rightarrow \pi '}\)

  • The Insert distance is based on the longest common subsequence \(LCSeq(\pi ,\pi ')\). The longest common subsequence is the largest number of elements that follow each other in both permutations, with interruptions. The corresponding distance is

    \({d} _{Ins}(\pi ,\pi ') = m-LCSeq(\pi ,\pi ').\)

    We use the algorithm described by Hirschberg (1975). The name is due to its interpretation as an edit distance measure. The corresponding edit operation is a combination of insertion and deletion. A single element is moved from one position (delete) to a new position (insert). It is also called Ulam’s distance (Schiavinotto and Stützle 2007).

  • The Longest Common Substring distance is based on the largest number of elements that follow each other in both permutations, without interruption. Unlike the longest common subsequence all elements have to be adjacent. If \(LCStr(\pi ,\pi ')\) is the length of the longest common string, the distance is

    $$\begin{aligned} {d} _{LCStr}(\pi ,\pi ')= m-LCStr(\pi ,\pi '). \end{aligned}$$
  • The R-distance (Campos et al. 2005; Sevaux and Sörensen 2005) counts the number of times that one element follows another in one permutation, but not in the other. It is identical with the uni-directional adjacency distance (Reeves 1999). It is computed by

    $$\begin{aligned} {d} _{R}(\pi ,\pi ')&= \sum _{i=1}^{m-1} y_i ~~ \text {with}\\ y_i&= \left\{ \begin{array}{ll} 0 &{} \quad \text {if }\exists j : \pi _i=\pi '_j ~\text {and}~ \pi _{i+1}=\pi '_{j+1} ,\\ 1 &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$
  • The (bi-directional) Adjacency distance (Reeves 1999; Schiavinotto and Stützle 2007) counts the number of times two elements are neighbors in one, but not in the other permutation. Unlike R-distance (uni-directional), the order of the two elements does not matter. It is computed by

    $$\begin{aligned} {d} _{Adj}(\pi ,\pi ')&= \sum _{i=1}^{m-1} y_i ~~ \text {with}\\ y_i&= \left\{ \begin{array}{l l} 0 &{} \quad \text {if }\exists j : \pi _i=\pi '_j ~\text {and}~ \pi _{i+1} \in \{\pi '_{j+1}, \pi '_{j-1} \},\\ 1 &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$
  • The Position distance (Schiavinotto and Stützle 2007) is identical with the Deviation distance or Spearman’s footrule (Sevaux and Sörensen 2005), \({d} _{\text {Pos}}(\pi ,\pi ') = \sum _{k=1}^{m} |i-j | ~~\text {where}~~\pi _i = \pi '_j = k\) .

  • The non-metric Squared Position distance is Spearman’s rank correlation coefficient (Sevaux and Sörensen 2005). In contrast to the Position distance, the term \(|i-j|\) is replaced by \((i-j)^2\).

  • The Hamming distance or Exact Match distance simply counts the number of unequal elements in two permutations, i.e., \({d} _{Ham}(\pi ,\pi ') = \sum _{i=1}^{m} a_i, ~~\text {where}~~ a_i = \left\{ \begin{array}{l l} 0 &{} \quad \text {if } \pi _i = \pi '_i,\\ 1 &{} \quad \text {otherwise.} \end{array} \right. \)

  • The Euclidean distance is \({d} _{Euc}(\pi ,\pi ') = \sqrt{\sum _{i=1}^{m} (\pi _i-\pi '_i)^2}\) .

  • The Manhattan distance (A-Distance, cf. (Sevaux and Sörensen 2005; Campos et al. 2005)) is \({d} _{Man}(\pi ,\pi ') = \sum _{i=1}^{m} |\pi _i-\pi '_i|\) .

  • The Chebyshev distance is \({d} _{Che}(\pi ,\pi ') = \underset{1 \le i \le m}{\max }(|\pi _i-\pi '_i|)\) .

  • For permutations, the Lee distance (Lee 1958; Deza and Huang 1998) is \({d} _{Lee}(\pi ,\pi ') = \sum _{i=1}^{m} \min (|\pi _i-\pi '_i|,m-|\pi _i-\pi '_i|)\) .

  • The non-metric Cosine distance is based on the dot product of two permutations. It is derived from the cosine similarity (Singhal 2001) of two vectors:

    $$\begin{aligned} {d} _{Cos}(\pi ,\pi ') = 1 - \frac{\pi \cdot \pi '}{||\pi ||~||\pi '||}. \end{aligned}$$
  • The Lexicographic distance regards the lexicographic ordering of permutations. If the position of a permutation \(\pi \) in the lexicographic ordering of all permutations with fixed m is \(L(\pi )\), then the Lexicographic distance metric is

    $$\begin{aligned} {d} _{Lex}(\pi ,\pi ') =| L(\pi ) - L(\pi ')|. \end{aligned}$$
Table 3 Minimal examples for indefinite distance matrices. The matrix in the table is the actual distance matrix, while the eigenvalue refers to the transformed matrix \(\hat{D}\) derived from Eq. (3). The lower triangular matrix is omitted due to symmetry

Appendix B: Minimal examples for indefinite sets

To showcase the usefulness of the proposed methods, this section lists small example datasets and the respective indefinite distance matrices. Besides the standard permutation distances we also tested:

  • Signed permutations, reversal distance Permutations where each element has a sign are referred to as signed permutations. An application example for signed permutations is, e.g., weld path optimization (Voutchkov et al. 2005). The reversal distance counts the number of reversals required to transform one permutation into another. We used the non-cyclic reversal distance provided in the GRAPPA library version 2.0 (Bader et al. 2004).

  • Labeled trees, tree edit distance Trees in general are widely applied as solution representation, e.g., in Genetic Programming. In this study, we considered labeled trees. The tree edit distance counts the number node insertions, deletions or relabels. We used the efficient implementation in the APTED 0.1.1 library (Pawlik and Augsten 2015, 2016). The labeled trees will be denoted with the bracket notation: curly brackets indicate the tree structure, letters indicate labels (internal and terminal nodes).

  • Strings, optimal String Alignment distance (OSA) The OSA is an non-metric edit distance that counts insertions, deletions, substitutions and transpositions of characters. Each substring can be edited no more than once. It is also called the restricted Damerau-Levenshtein distance (Boytsov 2011). We used the implementation in the stringdist R-package (van der Loo 2014).

  • Strings, Jaro–Winkler distance The Jaro Winkler distance is based on the number of matching characters in two strings as well as the number of transpositions required to bring all matches in the same order. We used the implementation in the stringdist R-package (van der Loo 2014).

The respective results are listed in Table 3. All of the listed distance measures are shown to be non-CNSD.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zaefferer, M., Bartz-Beielstein, T. & Rudolph, G. An empirical approach for probing the definiteness of kernels. Soft Comput 23, 10939–10952 (2019). https://doi.org/10.1007/s00500-018-3648-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3648-1

Keywords

  • Definiteness
  • Kernel
  • Distance
  • Sampling
  • Optimization
  • Evolutionary algorithm