Abstract
Many high-dimensional practical data sets have hierarchical structures induced by graphs or time series. Such data sets are hard to process in Euclidean spaces, and one often seeks low-dimensional embeddings in other space forms to perform the required learning tasks. For hierarchical data, the space of choice is a hyperbolic space because it guarantees low-distortion embeddings for tree-like structures. The geometry of hyperbolic spaces has properties not encountered in Euclidean spaces that pose challenges when trying to rigorously analyze algorithmic solutions. We propose a unified framework for learning scalable and simple hyperbolic linear classifiers with provable performance guarantees. The gist of our approach is to focus on Poincaré ball models and formulate the classification problems using tangent space formalisms. Our results include a new hyperbolic perceptron algorithm as well as an efficient and highly accurate convex optimization setup for hyperbolic support vector machine classifiers. Furthermore, we adapt our approach to accommodate second-order perceptrons, where data are preprocessed based on second-order information (correlation) to accelerate convergence, and strategic perceptrons, where potentially manipulated data arrive in an online manner and decisions are made sequentially. The excellent performance of the Poincaré second-order and strategic perceptrons shows that the proposed framework can be extended to general machine learning problems in hyperbolic spaces. Our experimental results, pertaining to synthetic, single-cell RNA-seq expression measurements, CIFAR10, Fashion-MNIST and mini-ImageNet, establish that all algorithms provably converge and have complexity comparable to those of their Euclidean counterparts.
Similar content being viewed by others
References
Chien E, Pan C, Tabaghi P, Milenkovic O (2021) Highly scalable and provably accurate classification in poincaré balls, In: 2021 IEEE international conference on data mining (ICDM). IEEE, pp 61–70
Krioukov D, Papadopoulos F, Kitsak M, Vahdat A, Boguná M (2010) Hyperbolic geometry of complex networks. Phys Rev E 82(3):036106
Sarkar R (2011) Low distortion delaunay embedding of trees in hyperbolic plane, In: international symposium on graph drawing. Springer, pp 355–366
Sala F, De Sa C, Gu A, Re C (2018) Representation tradeoffs for hyperbolic embeddings, In: international conference on machine learning, vol. 80. PMLR, pp 4460–4469
Nickel M, Kiela D (2017) Poincaré embeddings for learning hierarchical representations, In: Advances in Neural Information Processing Systems, pp 6338–6347
Papadopoulos F, Aldecoa R, Krioukov D (2015) Network geometry inference using common neighbors. Phys Rev E 92(2):022807
Tifrea A, Becigneul G, Ganea O-E (2019) Poincaré glove: hyperbolic word embeddings, In: international conference on learning representations, [Online]. Available: https://openreview.net/forum?id=Ske5r3AqK7
Linial N, London E, Rabinovich Y (1995) The geometry of graphs and some of its algorithmic applications. Combinatorica 15(2):215–245
Cho H, DeMeo B, Peng J, Berger B (2019) Large-margin classification in hyperbolic space, In: international conference on artificial intelligence and statistics. PMLR, pp 1832–1840
Monath N, Zaheer M, Silva D, McCallum A, Ahmed A (2019) Gradient-based hierarchical clustering using continuous representations of trees in hyperbolic space, In: ACM SIGKDD international conference on knowledge discovery & data mining, pp 714–722
Weber M, Zaheer M, Rawat AS, Menon A, Kumar S (2020) Robust large-margin learning in hyperbolic space, In: Advances in Neural Information Processing Systems
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Ganea O, Bécigneul G, Hofmann T (2018) Hyperbolic neural networks, In: Advances in Neural Information Processing Systems, pp 5345–5355
Shimizu R, Mukuta Y, Harada T (2021) Hyperbolic neural networks++, In: international conference on learning representations, [Online]. Available: https://openreview.net/forum?id=Ec85b0tUwbA
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization, In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10 657–10 665
Cesa-Bianchi N, Conconi A, Gentile C (2005) A second-order perceptron algorithm. SIAM J Comput 34(3):640–668
Ahmadi S, Beyhaghi H, Blum A, Naggita K (2021) The strategic perceptron, In: proceedings of the 22nd ACM conference on economics and computation, pp 6–25
Cesa-Bianchi N, Conconi A, Gentile C (2004) On the generalization ability of online learning algorithms. IEEE Trans Inf Theory 50(9):2050–2057
Olsson A, Venkatasubramanian M, Chaudhri VK, Aronow BJ, Salomonis N, Singh H, Grimes HL (2016) Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537(7622):698–702
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, arXiv preprint arXiv:1708.07747
Ravi S, Larochelle H (2017) Optimization as a model for few-shot learning, In: international conference on learning representations, [Online]. Available: https://openreview.net/forum?id=rJY0-Kcll
Brückner M, Scheffer T (2011) Stackelberg games for adversarial prediction problems, In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 547–555
Hardt M, Megiddo N, Papadimitriou C, Wootters M (2016) Strategic classification, In: proceedings of the 2016 ACM conference on innovations in theoretical computer science, pp 111–122
Liu Q, Nickel M, Kiela D (2019) Hyperbolic graph neural networks, In: Advances in Neural Information Processing Systems, pp 8230–8241
Nagano Y, Yamaguchi S, Fujita Y, Koyama M (2019) A wrapped normal distribution on hyperbolic space for gradient-based learning, In: international conference on machine learning. PMLR, pp 4693–4702
Mathieu E, Lan CL, Maddison CJ, Tomioka R, Teh YW (2019) Continuous hierarchical representations with poincaré variational auto-encoders, In: Advances in Neural Information Processing Systems
Skopek O, Ganea O-E, Bécigneul G (2020) Mixed-curvature variational autoencoders, In: international conference on learning representations, [Online]. Available: https://openreview.net/forum?id=S1g6xeSKDS
Ungar AA (2008) Analytic hyperbolic geometry and Albert Einstein’s special theory of relativity. World Scientific
Vermeer J (2005) A geometric interpretation of ungar’s addition and of gyration in the hyperbolic plane. Topol Appl 152(3):226–242
Ratcliffe JG, Axler S, Ribet K (2006) Foundations of hyperbolic manifolds, vol 149. Springer, Berlin
Graham RL (1972) An efficient algorithm for determining the convex hull of a finite planar set. Info Pro Lett 1:132–133
Barber CB, Dobkin DP, Huhdanpaa H (1996) The quickhull algorithm for convex hulls. ACM Trans Math Softw (TOMS) 22(4):469–483
Tabaghi P, Pan C, Chien E, Peng J, Milenković O (2021) Linear classifiers in product space forms, arXiv preprint arXiv:2102.10204
Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74
Klimovskaia A, Lopez-Paz D, Bottou L, Nickel M (2020) Poincaré maps for analyzing complex hierarchies in single-cell data. Nat Commun 11(1):1–9
Khrulkov V, Mirvakhabova L, Ustinova E, Oseledets I, Lempitsky V (2020) Hyperbolic image embeddings, In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6418–6428
Cannon JW, Floyd WJ, Kenyon R, Parry WR et al (1997) Hyperbolic geometry. Flavors Geom 31(59–115):2
Sherman J, Morrison WJ (1950) Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann Math Stat 21(1):124–127
Acknowledgements
We thank anonymous reviewers for their very useful comments and suggestions. This work was done while Puoya Tabaghi and Jianhao Peng were at University of Illinois Urbana-Champaign. This work was supported by the NSF grant 1956384.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Accompanying codes can be found at: https://github.com/thupchnsky/PoincareLinearClassification. A shorter version of this work [1] was presented as a regular paper at the International Conference on Data Mining (ICDM), 2021.
Appendices
Proof of Lemma 4.2
By the definition of Möbius addition, we have
Thus,
Next, use \(\Vert b\Vert =r\) and \(a^Tb = r\Vert a\Vert \cos (\theta )\) in the above expression:
The function in (34) attains its maximum at \(\theta = 0\) and \(r=R\). We also observe that (33) is symmetric in a, b. Thus, the same argument holds for \(\Vert b\oplus a\Vert \).
Convex Hull algorithms in Poincaré Ball model
We introduce next a generalization of the Graham scan and Quickhull algorithms for the Poincaré ball model. In a nutshell, we replace lines with geodesics and vectors \(\overrightarrow{AB}\) with tangent vectors \(\log _A(B)\) or equivalently \((-A)\oplus B\). The pseudo-code for the Poincaré version of the Graham scan is listed in Algorithm 2, while Quickhull is listed in Algorithm 6 (both for the two-dimensional case). The Graham scan has worst-case time complexity \(O(N\log N)\), while Quickhull has complexity \(O(N\log N)\) in expectation and \(O(N^2)\) in the worst case. The Graham scan only works for two-dimensional points while Quickhull can be generalized for higher dimensions [33].
Hyperboloid perceptron
The hyperboloid model and the definition of hyperplanes. The hyperboloid model \({\mathbb {L}}^n_c\) is another model for representing points in a n-dimensional hyperbolic space with curvature \(-c\;(c>0)\). Specifically, it is a Riemannian manifold \(({\mathbb {L}}^n_c, g^{{\mathbb {L}}})\) for which
Throughout the remainder of this section, we restrict our attention to \(c=1\); all results can be easily generalized to arbitrary values of c.
We make use of the following bijection between the Poincaré ball model and hyperboloid model, given by
Additional properties of the hyperboloid model can be found in [38].
The recent work [34] introduced the notion of a hyperboloid hyperplane of the form
where \(w\in {\mathbb {R}}^{n+1}\) and \([w,w]=1\). The second equation is a consequence of the fact that \(\text {asinh}(\cdot )\) is an increasing function and will not change the sign of the argument. Thus, the classification result based on the hyperplane is given by \(\text {sgn}\left( [w,x]\right) \) for some weight vector w, as shown in Figure 9.
The hyperboloid perceptron. The definition of a linear classifier in the hyperboloid model is inherently different from that of a classifier in the Poincaré ball model, as the former is independent of the choice of reference point p. Using the decision hyperplane defined in (36), a hyperboloid perceptron [34] described in Algorithm 8 can be shown to have easily established performance guarantee.
Theorem C.1
Let \((x_i , y_i )_{i=1}^N\) be a labeled data set from a bounded subset of \({\mathbb {L}}^n\) such that \(\Vert x_i\Vert \le R\;\forall i\in [N]\). Assume that there exists an optimal linear classifier with weight vector \(w^\star \) such that \(y_i\text {asinh}([w^\star , x_i])\ge \varepsilon \) (\(\varepsilon \)-margin). Then, the hyperboloid perceptron in Algorithm 8 will correctly classify all points with at most \(O\left( \frac{1}{\sinh ^2(\varepsilon )}\right) \) updates.
Proof
According to the assumption, the optimal normal vector \(w^\star \) satisfies \(y_t\text {asinh}([w^\star , x_t]) \ge \varepsilon \) and \([w^\star , w^\star ]=1\). So we have
where the first inequality holds due to the \(\varepsilon \)-margin assumption and because \(y_t\in \{-1,+1\}\). We can also upper bound \(\Vert w_{k+1}\Vert \) as
where the first inequality follows from \(y_{t}\left[ w_{k}, x_{t}\right] \le 0,\) corresponding to the case when the classifier makes a mistake. Combining (37) and (38), we have
which completes the proof. In practice, we can always perform data processing to control the norm of \(w^\star \). Also, for small classification margins \(\varepsilon \), we have \(\sinh (\varepsilon )\sim \varepsilon \). As a result, for data points that are very close to the decision boundary (\(\varepsilon \) is small), Theorem C.1 shows that the hyperbolid perceptron has roughly the same convergence rate as its Euclidean counterpart \(\left( \frac{R}{\varepsilon }\right) ^2\).
To experimentally confirm the convergence of Algorithm 8, we run synthetic data experiments similar to those described in Sect. 4. More precisely, we first randomly generate a \(w^\star \) such that \([w^\star , w^\star ] = 1\). Then, we generate a random set of \(N = 5,000\) points \({x_i}_{i=1}^N\) in \({\mathbb {L}}^2\). For margin values \(\varepsilon \in [0.1,1]\), we remove points that violate the required constraint on the distance to the classifier (parameterized by \(w^\star \)). Then, we assign binary labels to each data point according to the optimal classifier so that \(y_i = \text {sgn}([w^\star , x_i])\). We repeat this process for 100 different values of \(\varepsilon \) and compare the classification results with those of Algorithm 1 of [11]. Since the theoretical upper bound \(O\left( \frac{1}{\sinh (\varepsilon )}\right) \) claimed in [11] is smaller than \(O\left( \frac{1}{\sinh ^2(\varepsilon )}\right) \) in Theorem C.1, we also plot the upper bound for comparison. From Fig. 9, one can conclude that (1) Algorithm 8 always converges within the theoretical upper bound provided in Theorem C.1, and (2) both methods disagree with the theoretical convergence rate results of [11].
Proof of Theorem 4.2 and 4.3
Let \(x_i\in {\mathbb {B}}^n\) and let \(v_i=\log _{p}(x_i)\) be its logarithmic map value. The distance between the point and the hyperplane defined by \(w\in T_p{\mathbb {B}}^n\) and \(p\in {\mathbb {B}}^n\) can be written as (see also (14))
For support vectors, \(|\langle v_i, w\rangle |=1\) and \(\Vert v_i\Vert \ge 1/\Vert w\Vert \). Note that \(f(x)=\frac{2\tanh (x)}{x(1-\tanh ^2(x))}\) is an increasing function in x for \(x>0\) and \(g(y)=\sinh ^{-1}(y)\) is an increasing function in y for \(y\in {\mathbb {R}}\). Thus, the distance in (40) can be lower bounded by
The goal is to maximize the distance in (41). To this end, observe that \(h(x)=\frac{2x}{1-x^2}\) is an increasing function in x for \(x\in (0,1)\) and \(\tanh (\sigma _p/2\Vert w\Vert )\in (0,1)\). So maximizing the distance is equivalent to minimizing \(\Vert w\Vert \) (or \(\Vert w\Vert ^2\)), provided \(\sigma _p\) is fixed. Thus, the Poincaré SVM problem can be converted into the convex problem of Theorem 4.2; the constraints are added to force the hyperplane to correctly classify all points in the hard-margin setting. The formulation in Theorem 4.3 can also be seen as arising from a relaxation of the constraints and consideration of the trade-off between margin values and classification accuracy.
Proof of Theorem 5.1
We generalize the arguments in [16] to hyperbolic spaces. Let \(A_0 = aI\). The matrix \(A_k\) can be recursively computed from \(A_k = A_{k-1} + z_tz_t^T\), or equivalently \(A_k = aI + X_kX_k^T\). Without loss of generality, let \(t_k\) be the time index of the \(k^{th}\) error.
where \(\mathrm {(a)}\) is due to the Sherman–Morrison formula [39] below.
Lemma E.1
( [39]) Let A be an arbitrary \(n\times n\) positive-definite matrix. Let \(x\in {\mathbb {R}}^n\). Then, \(B = A+xx^T\) is also a positive-definite matrix and
Note that the inequality holds since \(A_{k-1}\) is a positive-definite matrix, and thus, so is its inverse. Therefore, we have
where \(\lambda _i\) are the eigenvalues of \(X_kX_k^T\). Claim \(\mathrm {(b)}\) follows from Lemma E.2 while \(\mathrm {(c)}\) is due to the fact \(1-x\le -\log (x),\forall x>0\).
Lemma E.2
( [16]) Let A be an arbitrary \(n\times n\) positive-semidefinite matrix. Let \(x\in {\mathbb {R}}^n\) and \(B = A-xx^T\). Then,
where \(\text {det}_{\ne 0}(B)\) is the product of non-zero eigenvalues of B.
This leads to the upper bound for \(\xi _k^TA_k^{-1}\xi _k\). For the lower bound, we have
Also, recall \(\xi _k = \sum _{j\in [k]}y_{\sigma (j)}z_{\sigma (j)}\). Combining the bounds, we get
This leads to the bound \(k \le \frac{\Vert A_k^{1/2}w^\star \Vert }{\varepsilon '}\sqrt{\sum _{i\in [n]}\log (1+\frac{\lambda _i}{a})}\). Finally, since \(\Vert w^\star \Vert = 1\), we have
which follows from the definition of \(\lambda _{w^\star }\). Hence,
which completes the proof.
Proof of Theorem 5.2
To prove Theorem 5.2, we need the following lemmas F.1 and F.2.
Lemma F.1
For any \({\tilde{v}}_t\) in the update rule, we have \(\eta _t y_t \left\langle {\tilde{v}}_t, w^\star \right\rangle \ge \sinh (\varepsilon )\), where \(w^\star \) stands for the optimal classifier in Assumption 4.1. Also, for any \(w_k\), we have \(\left\langle w_k, w^\star \right\rangle \ge 0\).
Proof
The proof is by induction. Initially, \(w_1={\textbf{0}}\) and all arriving points get classified as positive. The first mistake occurs when the first negative point \(z_t\) arrives, which gets classified as positive. In this case, \(w_2=-\eta _t v_t\), where \(v_t=\log _p(z_t)\) and \(\eta _t = \frac{2\tanh \left( \frac{\sigma _p\Vert v_t\Vert }{2}\right) }{\left( 1-\tanh \left( \frac{\sigma _p\Vert v_t\Vert }{2}\right) ^2\right) \left\| v_t\right\| }\). Also, \(v_t\) must be unmanipulated (i.e., \(v_t=u_t\)) since it will always be classified as positive. Therefore, based on Assumption 4.1, we have
Next, suppose that \(w_{t-1}\) denotes the weight vector at the end of step \(t-1\) and \(\left\langle w_{t-1}, w^\star \right\rangle \ge 0\). We need to show that \(\eta _t y_t \left\langle {\tilde{v}}_t, w^\star \right\rangle \ge \sinh (\varepsilon )\). By definition, for any point such that \(\frac{\left\langle v_t, w_{t-1} \right\rangle }{w_{t-1}}\ne \frac{\alpha }{\sigma _p}\), \({\tilde{v}}_t=v_t\). According to Observation 1, those points are also not manipulated, i.e., \({\tilde{v}}_t=v_t=u_t\). Therefore, the claim holds. For data points such that \(\frac{\left\langle v_t, w_{t-1} \right\rangle }{w_{t-1}}=\frac{\alpha }{\sigma _p}\), if they are positive, we have \({\tilde{v}}_t=v_t=u_t+\beta \frac{w_{t-1}}{\Vert w_{t-1}\Vert }\), where \(0\le \beta \le \frac{\alpha }{\sigma _p}\). The reason behind \(\beta \) always being positive is that all rational agents want to be classified as positive so the only possible direction of change is \(w_{t-1}\). Hence,
For data points with negative labels such that \(\frac{\left\langle v_t, w_{t-1} \right\rangle }{w_{t-1}}=\frac{\alpha }{\sigma _p}\), \({\tilde{v}}_t=v_t-\frac{\alpha w_{t-1}}{\sigma _p\Vert w_{t-1}\Vert }\) and \(v_t=u_t+\beta \frac{w_{t-1}}{\Vert w_{t-1}\Vert }\). This implies that \({\tilde{v}}_t=u_t+\left( \beta -\frac{\alpha }{\sigma _p}\right) \frac{w_{t-1}}{\Vert w_{t-1}\Vert }\). Therefore,
Combining the above two claims, we get \(\eta _t y_t \left\langle {\tilde{v}}_t, w^\star \right\rangle \ge \sinh (\varepsilon )\).
The last step is to assume \(\left\langle w_{t-1}, w^\star \right\rangle \ge 0\) and \(\eta _t y_t \left\langle {\tilde{v}}_t, w^\star \right\rangle \ge \sinh (\varepsilon )\). I this case, we need to show \(\left\langle w_t, w^\star \right\rangle \ge 0\). If the classifier does not make a mistake at step t, the claim is obviously true since \(w_{t-1}=w_t\). If he classifier makes a mistake, we have
This completes the proof.
Lemma F.2
If Algorithm 5 makes a mistake on an observed data point \(v_t\), then \(y_t\left\langle {\tilde{v}}_t, w_{t-1} \right\rangle \le 0\).
Proof
If the algorithm makes a mistake on a positive example, we have \(\frac{\left\langle v_t, w_{t-1} \right\rangle }{\Vert w_{t-1}\Vert }< \frac{\alpha }{\sigma _p}\). By Observation 2, no point will fall within the region \(0<\frac{\left\langle v_t, w_{t-1} \right\rangle }{\Vert w_{t-1}\Vert }< \frac{\alpha }{\sigma _p}\). Thus, one must have \(\frac{\left\langle v_t, w_{t-1} \right\rangle }{\Vert w_{t-1}\Vert }\le 0\). Since \(y_t=+1\), \({\tilde{v}}_t=v_t\). Therefore, \(\left\langle {\tilde{v}}_t, w_{t-1} \right\rangle \le 0\). If the algorithm makes a mistake on a negative point, we have \(\frac{\left\langle v_t, w_{t-1} \right\rangle }{\Vert w_{t-1}\Vert }\ge \frac{\alpha }{\sigma _p}\). For the case \(\frac{\left\langle v_t, w_{t-1} \right\rangle }{\Vert w_{t-1}\Vert } > \frac{\alpha }{\sigma _p}\), we have \({\tilde{v}}_t=v_t\). In this case, \(\left\langle {\tilde{v}}_t, w_{t-1} \right\rangle \ge 0\) obviously holds. For \(\frac{\left\langle v_t, w_{t-1} \right\rangle }{\Vert w_{t-1}\Vert } = \frac{\alpha }{\sigma _p}\), we have
The above equality implies that for a negative sample we have \(\left\langle {\tilde{v}}_t, w_{t-1} \right\rangle \ge 0\). Therefore, for any mistaken data point, \(y_t\left\langle {\tilde{v}}_t, w_{t-1} \right\rangle \le 0\).
We are now ready to prove Theorem 5.2.
Proof
The analysis follows along the same lines as that for the standard Poincaré perceptron algorithm described in Sect. 4. We first lower bound \(\Vert w_{k+1}\Vert \) as
where the first bound follows from the Cauchy–Schwartz inequality, while the second inequality was established in Lemma F.1. Next, we upper bound \(\Vert w_{k+1}\Vert \) as
where the first inequality was established Lemma F.2 while the second inequality follows from the fact that the manipulation budget is \(\alpha \).
Combining (50) and (51), we obtain
which completes the proof.
Detailed experimental setting
For the first set of experiments, we have the following hyperparameters. For the Poincaré perceptron, there are no hyperparameters to choose. For the Poincaré second-order perceptron, we adopt the strategy proposed in [16]. That is, instead of tuning the parameter a, we set it to 0 and change the matrix inverse to pseudo-inverse. For the Poincaré SVM and the Euclidean SVM, we set \(C=1000\) for all data sets. This theoretically forces SVM to have a hard decision boundary. For the hyperboloid SVM, we surprisingly find that choosing \(C=1000\) makes the algorithm unstable. Empirically, \(C=10\) in general produces better results despite the fact that it still leads to softer decision boundaries and still breaks down when the point dimensions are large. As the hyperboloid SVM works in the hyperboloid model of a hyperbolic space, we map points from the Poincaré ball to points in the hyperboloid model as follows. Let \(x\in {\mathbb {B}}^n\) and \(z\in {\mathbb {L}}^n\) be its corresponding point in the hyperboloid model (Table 4). Then,
On the other hand, Olsson’s scRNA-seq data contain 319 points from 8 classes and we perform a \(70\%/30\%\) random split to obtain training (231) and test (88) point sets. CIFAR10 contains 50, 000 training points and 10, 000 testing points from 10 classes. Fashion-MNIST contains 60, 000 training points and 10, 000 testing points from 10 classes. Mini-ImageNet contains 8, 000 data points from 20 classes, and we do \(70\%/30\%\) random split to obtain training (5, 600) and test (2, 400) point sets. For all data sets, we choose the trade-off coefficient \(C=5\) and use it with all three SVM algorithms to ensure a fair comparison. We also find that in practice the performance of all three algorithms remains stable when \(C\in [1,10]\).
Additional experimental results
Here we report the performance of another two commonly used linear classifiers (linear regression, logistic regression) in Euclidean space to compare it with the Poincaré SVM. Note that there exist no established linear regression and logistic regression methods for hyperbolic geometries at this moment.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pan, C., Chien, E., Tabaghi, P. et al. Provably accurate and scalable linear classifiers in hyperbolic spaces. Knowl Inf Syst 65, 1817–1850 (2023). https://doi.org/10.1007/s10115-022-01820-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01820-3