Online binary classification from similar and dissimilar data

Shu, Senlin; Wang, Haobo; Wang, Zhuowei; Han, Bo; Xiang, Tao; An, Bo; Feng, Lei

doi:10.1007/s10994-023-06434-6

Online binary classification from similar and dissimilar data

Published: 20 December 2023

(2023)
Cite this article

Machine Learning Aims and scope Submit manuscript

Senlin Shu¹,
Haobo Wang²,
Zhuowei Wang³,
Bo Han⁴,
Tao Xiang¹,
Bo An⁵ &
…
Lei Feng ORCID: orcid.org/0000-0003-2839-5799⁵

140 Accesses
1 Altmetric
Explore all metrics

Abstract

Similar-dissimilar (SD) classification aims to train a binary classifier from only similar and dissimilar data pairs, which indicate whether two instances belong to the same class (similar) or not (dissimilar). Although effective learning methods have been proposed for SD classification, they cannot deal with online learning scenarios with sequential data that can be frequently encountered in real-world applications. In this paper, we provide the first attempt to investigate the online SD classification problem. Specifically, we first adapt the unbiased risk estimator of SD classification to online learning scenarios with a conservative regularization term, which could serve as a naive method to solve the online SD classification problem. Then, by further introducing a margin criterion for whether to update the classifier or not with the received cost, we propose two improvements (one with linearly scaled cost and the other with quadratically scaled cost) that result in two online SD classification methods. Theoretically, we derive the regret, mistake, and relative loss bounds for our proposed methods, which guarantee the performance on sequential data. Extensive experiments on various datasets validate the effectiveness of our proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Triply stochastic gradient method for large-scale nonlinear similar unlabeled classification

Article 06 July 2021

Joint Semi-supervised Similarity Learning for Linear Classification

Cost-sensitive sparse group online learning for imbalanced data streams

Article 20 November 2023

Data availability

Not applicable.

Code availability

Not applicable.

References

Bao, H., Niu, G., & Sugiyama, M. (2018). Classification from pairwise similarity and unlabeled data. In ICML, pp. 452–461.
Bao, H., Shimada, T., Xu, L., Sato, I., & Sugiyama, M. (2020). Similarity-based classification: Connecting similarity learning to binary classification. arXiv preprint arXiv:2006.06207.
Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. http://archive.ics.uci.edu/ml/index.php.
Cao, Y., Feng, L., Xu, Y., An, B., Niu, G., & Sugiyama, M. (2021). Learning from similarity-confidence data. In ICML, pp. 1272–1282.
Cao, Y., Wan, Z., Ren, D., Yan, Z., & Zuo, W. (2022). Incorporating semi-supervised and positive-unlabeled learning for boosting full reference image quality assessment. In CVPR, pp. 5851–5861.
Chen, R., Tang, Y., Zhang, W., & Feng, W. (2022). Deep multi-view semi-supervised clustering with sample pairwise constraints. Neurocomputing, 500, 832–845.
Article Google Scholar
Crammer, K., Kulesza, A., & Dredze, M. (2009). Adaptive regularization of weight vectors. In NeurIPS, pp. 414–422.
Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3(Jan), 951–991.
Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7(Mar), 551–585.
MathSciNet Google Scholar
Dekel, O., Gilad-Bachrach, R., Shamir, O., & Xiao, L. (2012). Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13(1).
Er, M. J., Venkatesan, R., & Wang, N. (2016). An online universal classifier for binary, multi-class and multi-label classification. In ICSMC, pp. 003701–003706. IEEE.
Feng, L., Lv, J.-Q., Han, B., Xu, M., Niu, G., Geng, X., An, B., & Sugiyama, M. (2020). Provably consistent partial-label learning. In NeurIPS.
Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296.
Article Google Scholar
Hoi, S. C., Sahoo, D., Lu, J., & Zhao, P. (2021). Online learning: A comprehensive survey. Neurocomputing, 459, 249–289.
Article Google Scholar
Ishida, T., Niu, G., & Sugiyama, M. (2018). Binary classification for positive-confidence data. In NeurIPS, pp. 5917–5928.
Jian, L., Gao, F., Ren, P., Song, Y., & Luo, S. (2018). A noise-resilient online learning algorithm for scene classification. Remote Sensing, 10(11), 1836.
Article Google Scholar
Kaneko, T., Sato, I., & Sugiyama, M. (2019). Online multiclass classification based on prediction margin for partial feedback. arXiv preprint arXiv:1902.01056.
Kiryo, R., Niu, G., Du Plessis, M. C., & Sugiyama, M. (2017). Positive-unlabeled learning with non-negative risk estimator. In NeurIPS, pp. 1675–1685.
Kivinen, J., Smola, A. J., & Williamson, R. C. (2004). Online learning with kernels. IEEE Transactions on Signal Processing, 52(8), 2165–2176.
Article MathSciNet Google Scholar
Koçak, M. A., Shasha, D. E., & Erkip, E. (2016). Conjugate conformal prediction for online binary classification. In UAI. Citeseer.
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article Google Scholar
Li, Z., & Liu, J. (2009). Constrained clustering by spectral kernel learning. In ICCV, pp. 421–427.
Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. In KDD, pp. 661–670.
Liu, D., Zhang, P., & Zheng, Q. (2015). An efficient online active learning algorithm for binary classification. Pattern Recognition Letters, 68, 22–26.
Article Google Scholar
Lu, N., Niu, G., Menon, A. K., & Sugiyama, M. (2019). On the minimal supervision for training any binary classifier from only unlabeled data. In ICLR.
Lu, N., Zhang, T., Niu, G., & Sugiyama, M. (2020). Mitigating overfitting in supervised classification from two unlabeled datasets: A consistent risk correction approach. In AISTATS, pp. 1115–1125.
Lu, D., & Weng, Q. (2007). A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing, 28(5), 823–870.
Article Google Scholar
MacQueen, J., et al. (1967). Some methods for classification and analysis of multivariate observations. In Berkeley symposium on mathematical statistics and probability, pp. 281–297.
Maheshwara, S. S., & Manwani, N. (2023). Rolnip: Robust learning using noisy pairwise comparisons. In ACML, pp. 706–721.
Natarajan, N., Dhillon, I. S., Ravikumar, P. K., & Tewari, A. (2013). Learning with noisy labels. In NeurIPS, pp. 1196–1204.
Plessis, M. C., Niu, G., & Sugiyama, M. (2015). Convex formulation for learning from positive and unlabeled data. In ICML, pp. 1386–1394.
Shalev-Shwartz, S., et al. (2011). Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2), 107–194.
Article Google Scholar
Shimada, T., Bao, H., Sato, I., & Sugiyama, M. (2020). Classification from pairwise similarities/dissimilarities and unlabeled data via empirical risk minimization. Neural Computation.
Shinoda, K., Kaji, H., & Sugiyama, M. (2020). Binary classification from positive data with skewed confidence. In IJCAI, pp. 3328–3334.
Tao, Q., Scott, S., Vinodchandran, N., & Osugi, T. T. (2004). Svm-based generalized multiple-instance learning via approximate box counting. In ICML, p. 101.
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al. (2001). Constrained k-means clustering with background knowledge. In ICML, pp. 577–584.
Wang, H., Qiang, Y., Chen, C., Liu, W., Hu, T., Li, Z., & Chen, G. (2020). Online partial label learning. In ECML PKDD.
Wu, D.-D., Wang, D.-B., & Zhang, M.-L. (2022). Revisiting consistency regularization for deep partial label learning. In ICML, pp. 24212–24225.
Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2(2), 165–193.
Article MathSciNet Google Scholar
Zhang, C., Gong, C., Liu, T., Lu, X., Wang, W., & Yang, J. (2020). Online positive and unlabeled learning. In IJCAI, pp. 2248–2254.

Download references

Funding

This research is supported by Natural Science Foundation of China (No. 62106028), Chongqing Overseas Chinese Entrepreneurship and Innovation Support Program, and CAAI-Huawei MindSpore Open Fund.

Author information

Authors and Affiliations

Chongqing University, Chongqing, China
Senlin Shu & Tao Xiang
Zhejiang University, Hangzhou, China
Haobo Wang
University of Technology Sydney, Sydney, Australia
Zhuowei Wang
Hong Kong Baptist University, Hong Kong, China
Bo Han
Nanyang Technological University, Singapore, Singapore
Bo An & Lei Feng

Authors

Senlin Shu
View author publications
You can also search for this author in PubMed Google Scholar
Haobo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuowei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Han
View author publications
You can also search for this author in PubMed Google Scholar
Tao Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Bo An
View author publications
You can also search for this author in PubMed Google Scholar
Lei Feng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: S-S; methodology: S-S; Theoretical analysis: F-L, H-W; Writing-original draft preparation: S-S, F-L; Writing-review and editing: S-S, Z-W; Funding acquisition: B-H, T-X, B-A, F-L.

Corresponding author

Correspondence to Lei Feng.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Editors: Vu Nguyen and Dana Yogatama.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Theorem 1

As we have shown, our proposed OSD-OGD algorithm actually employs the following update method:

$$\begin{aligned} \varvec{w}_{t+1} = \mathop {\textrm{arg}\,\textrm{min}}\limits _{\varvec{w} \in {\mathcal {W}}}\sum \nolimits _{i=1}^t R_{i}^{\textrm{SD}}(\varvec{w}) + \frac{1}{2\gamma }\left\| \varvec{w}\right\| _2^2, \end{aligned}$$

which is exactly the follow-the-regularized-leader procedure with Euclidean (Shalev-Shwartz, 2011). As can be easily verified, the Euclidean regularization $\frac{1}{2\gamma } \left\| \varvec{w}\right\| _2^2$ is $\frac{1}{\gamma }$-strongly-convex with respect to $\left\| \cdot \right\| _2$. Recall the assumptions that ${\mathcal {L}}_{t}$ is $\rho _t$-Lipschitz with respect to $\left\| \cdot \right\| _2$ and $\frac{1}{T}\sum \nolimits _{t=1}^T\rho _t^2\le \rho ^2$. Then, by using the Theorem 2.11 in Shalev-Shwartz (2011), we have that for $\varvec{w}_{\star }\in {\mathcal {W}}$,

$$\begin{aligned} \sum \nolimits _{t=1}^T {\mathcal {L}}_{t}(\varvec{w}_t) -\sum \nolimits _{t=1}^T {\mathcal {L}}_{t}(\varvec{w}_{\star })&\le \frac{1}{2\gamma }(\Vert \varvec{w}_{\star }\Vert _{2}^{2} -\min _{\varvec{v}\in {\mathcal {W}}}\Vert \varvec{v}\Vert _{2}^{2}) + \gamma T \rho ^{2} \le \frac{1}{2\gamma }\Vert \varvec{w}_{\star } \Vert _{2}^{2} + \gamma T \rho ^{2}, \end{aligned}$$

because $\min _{\varvec{v}\in {\mathcal {W}}}\Vert \varvec{v}\Vert _{2}^{2}\ge 0$ always holds. In particular, if for every hypothesis $\varvec{w}\in {\mathcal {W}}$, it satisfies $\left\| \varvec{w}\right\| _2\le B$ and $\gamma =B/(\rho \sqrt{2T})$, we have

$$\begin{aligned} \sum \nolimits _{t=1}^T {\mathcal {L}}_{t}(\varvec{w}_t) -\sum \nolimits _{t=1}^T {\mathcal {L}}_{t}(\varvec{w}_{\star }) \le B\rho \sqrt{2T}, \end{aligned}$$

which completes the proof of Theorem 1.

Appendix 2: Proof of Theorem 2

Following (Crammer et al., 2006), for some $\varvec{v}\in {\mathcal {W}}$, we define

$$\begin{aligned} \Delta _{t} = \Vert \varvec{w}_{t} - \varvec{v}\Vert ^{2}_{2} - \Vert \varvec{w}_{t+1} - \varvec{v}\Vert ^{2}_{2} \end{aligned}$$

and consider upper and lower bounds of $\sum \nolimits _{t=1}^{T}\Delta _{t}$. By initializing $\varvec{w}_{1}$ to zero vector and using telescoping sum, we can obtain

$$\begin{aligned} \sum \nolimits _{t=1}^{T}\Delta _{t}&= \sum \nolimits _{t=1}^{T}(\Vert \varvec{w}_{t} - \varvec{v}\Vert ^{2}_{2} - \Vert \varvec{w}_{t+1} - \varvec{v}\Vert ^{2}_{2}) \\&= \Vert \varvec{w}_{1} - \varvec{v}\Vert ^{2}_{2} - \Vert \varvec{w}_{T+1} - \varvec{v}\Vert ^{2}_{2}\\&\le \Vert \varvec{v}\Vert ^{2}_{2}. \end{aligned}$$

Since $\varvec{w}_{t+1} =\varvec{w}_{t} - \tau _{t}\nabla R_{t}^{SD}(\varvec{w})$, we can obtain

$$\begin{aligned} \Delta _{t}&= \Vert \varvec{w}_{t} - \varvec{v}\Vert ^{2}_{2} - \Vert \varvec{w}_{t} - \tau _{t}\nabla R_{t}^{\textrm{SD}}(\varvec{w}) - \varvec{v}\Vert ^{2}_{2} \\&= 2\tau _{t}(\varvec{w}_{t} - \varvec{v})^{\top }\nabla R_{t}^{\textrm{SD}}(\varvec{w}) - \tau _{t}^{2}\Vert \nabla R_{t}^{\textrm{SD}}(\varvec{w})\Vert ^{2}_{2}. \end{aligned}$$

Since $R_{t}^{\textrm{SD}}(\varvec{w})$ is $\lambda$-convex, we have

$$\begin{aligned} R_{t}^{\textrm{SD}}(\varvec{v})- R_{t}^{\textrm{SD}}(\varvec{w}_{t}) \ge (\varvec{v} - \varvec{w}_{t})^{\top }\nabla R_{t}^{\textrm{SD}}(\varvec{w}_{t}) + \frac{\lambda }{2} \Vert \varvec{v} - \varvec{w}_{t}\Vert ^{2}_{2}. \end{aligned}$$

Combining the above inequalities, we have

$$\begin{aligned} \Vert \varvec{v}\Vert ^{2} \ge \sum \nolimits _{t=1}^{T}\tau _{t}\left( 2R_{t}^{\textrm{SD}} (\varvec{w}_{t})-\tau _{t}\Vert \nabla R_{t}^{\textrm{SD}} (\varvec{w}_{t})\Vert ^{2}_{2} -2R_{t}^{\textrm{SD}} (\varvec{v})\right) . \end{aligned}$$

(14)

For OSD-LSPA, if a prediction mistake occurs, then $R_{t}^{\textrm{SD}} (\varvec{w}_{t}) \ge G$ and $R_{t}^{\textrm{SD}} (\varvec{w}_{t})-\tau _{t}\Vert \nabla R_{t} (\varvec{w}_{t})\Vert ^{2}_{2}\ge 0$. Therefore, we can obtain

$$\begin{aligned} \sum \nolimits _{t=1}^{T}\tau _{t}R_{t}^{\textrm{SD}} (\varvec{w}_{t})\le \Vert \varvec{v}\Vert ^{2}_{2} +2C\sum \nolimits _{t=1}^{T}R_{t}^{\textrm{SD}}(\varvec{v}) \end{aligned}$$

(15)

Using our assumption that $\Vert \nabla R_{t}^{\textrm{SD}} (\varvec{w}_{t})\Vert ^{2}_{2}\le r^2$ and the definitions $\tau _{t} = \min (C, \max (0,\frac{A+ \varvec{w}_{t}^{\top }\nabla R_{t}^{\textrm{SD}}(\varvec{w})}{\Vert \nabla R_{t}^{\textrm{SD}} (\varvec{w}_{t})\Vert ^{2}_{2}}))$, $R_{t}^{\textrm{SD}} (\varvec{w}_{t}) = A + \varvec{w}_{t}^{\top }\nabla R_{t}^{\textrm{SD}}(\varvec{w}_{t})$, we conclude that if a prediction mistake occurs then it holds that

$$\begin{aligned} \min \left( CG, \frac{G^2}{r^2}\right) \le \tau _{t}R_{t}^{\textrm{D}} (\varvec{w}_{t}). \end{aligned}$$

Since $\sum \nolimits _{t=1}^{T}E_{t}(\varvec{w}_{t})$ denote the number of prediction mistakes made on the entire sequence, it holds that

$$\begin{aligned} \min \left( CG, \frac{G^2}{r^2}\right) \sum \nolimits _{t=1}^{T}E_{t} (\varvec{w}_{t}) \le \sum \nolimits _{t=1}^{T} \tau _{t}R_{t}^{\textrm{SD}}(\varvec{w}_{t}). \end{aligned}$$

(16)

Combining Eq. (15) with Eq. (16), we conclude that

$$\begin{aligned} \sum \nolimits _{t=1}^{T}E_{t}(\varvec{w}_{t}) \le \max \left( \frac{1}{CG}, \frac{r^2}{G^2}\right) \left( \Vert \varvec{v}\Vert ^{2}_{2} + 2C\sum \nolimits _{t=1}^{T}R_{t}^{\textrm{SD}}(\varvec{v})\right) , \end{aligned}$$

which completes the proof of Theorem 2.

Appendix 3: Proof of Theorem 3

Recall Eq. (14), we have

$$\begin{aligned} \Vert \varvec{v}\Vert ^{2}_{2}&\ge \sum \nolimits _{t=1}^{T}\tau _{t}\left( 2R_{t}^{\textrm{SD}} (\varvec{w}_{t})-\tau _{t}\Vert \nabla R_{t}^{\textrm{SD}} (\varvec{w}_{t})\Vert ^{2}_{2} -2R_{t}^{\textrm{SD}} (\varvec{v})\right) . \end{aligned}$$

Defining $\alpha =1/\sqrt{2C}$, we subtract the non-negative term $(\alpha \tau _{t} - R_{t}^{\textrm{SD}}(\varvec{v})/\alpha )^{2}$ from each summand on the right-hand side of the above inequality, to obtain

$$\begin{aligned} \Vert \varvec{v}\Vert ^{2}_{2}&\ge \sum \nolimits _{t=1}^{T}\left( 2\tau _{t}R_{t}^{\textrm{SD}} (\varvec{w}_{t})-\tau _{t}^{2}\Vert \nabla R_{t}^{\textrm{SD}} (\varvec{w}_{t})\Vert ^{2}_{2} -2\tau _{t}R_{t}^{\textrm{SD}} (\varvec{v}) - (\alpha \tau _{t} - R_{t}^{\textrm{SD}} (\varvec{v})/\alpha )^{2}\right) \\&= \sum \nolimits _{t=1}^{T}\left( 2\tau _{t}R_{t}^{\textrm{SD}} (\varvec{w}_{t})-\tau _{t}^{2}\Vert \nabla R_{t}^{\textrm{SD}} (\varvec{w}_{t})\Vert ^{2}_{2}-2\tau _{t}R_{t}^{\textrm{SD}} (\varvec{v})- (\alpha \tau _{t})^{2}\right. \\&\qquad \qquad \qquad \left. - \left( \frac{R_{t}^{\textrm{SD}} (\varvec{v})}{\alpha }\right) ^{2} + 2\tau _{t}R_{t}^{\textrm{SD}}(\varvec{w}_{t})\right) \\&= \sum \nolimits _{t=1}^{T}\left( 2\tau _{t}R_{t}^{\textrm{SD}} (\varvec{w}_{t})-\tau _{t}^{2}\left( \Vert \nabla R_{t}^{\textrm{SD}} (\varvec{w}_{t})\Vert ^{2}_{2}+ \frac{1}{2C}\right) - 2C (R_{t}^{\textrm{SD}}(\varvec{v}))^{2}\right) . \end{aligned}$$

Using the definitions $\tau _{t}=\max (0, \frac{2C(A+\varvec{w}_{t} ^{\top }\nabla R_{t}^{\textrm{SD}} (\varvec{w}))}{2C\Vert \nabla R_{t}^{\textrm{SD}} (\varvec{w})\Vert ^{2}_{2} +1})$ and $R_{t}^{\textrm{SD}} (\varvec{w}_{t}) = \varvec{w}_{t} ^{\top }\nabla R_{t}^{\textrm{SD}} (\varvec{w}) +A$. It is clear that when $R_{t}^{\textrm{SD}} (\varvec{w}_{t})\le 0$, the classifier is not updated. So we consider the case that $R_{t}^{\textrm{SD}}(\varvec{w}_{t})\ge 0$. Then we obtain

$$\begin{aligned} \Vert \varvec{v}\Vert ^{2}_{2} \ge \sum \nolimits _{t=1}^{T} \left( \frac{(R_{t}^{\textrm{SD}}(\varvec{w}_{t}))^2}{r^2 +\frac{1}{2C}} -2C (R_{t}^{\textrm{SD}}(\varvec{v}))^{2}\right) . \end{aligned}$$

Rearranging terms above, we can obtain

$$\begin{aligned} \sum \nolimits _{t=1}^{T}(R_{t}^{\textrm{SD}}(\varvec{w}_{t}))^2 \le \left( r^2 +\frac{1}{2C}\right) \left( \Vert \varvec{v}\Vert ^{2}_{2} + 2C \sum \nolimits _{t=1}^{T}(R_{t}^{\textrm{SD}}(\varvec{v}))^{2}\right) , \end{aligned}$$

which completes the proof of Theorem 3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shu, S., Wang, H., Wang, Z. et al. Online binary classification from similar and dissimilar data. Mach Learn (2023). https://doi.org/10.1007/s10994-023-06434-6

Download citation

Received: 02 June 2023
Revised: 11 August 2023
Accepted: 07 October 2023
Published: 20 December 2023
DOI: https://doi.org/10.1007/s10994-023-06434-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online binary classification from similar and dissimilar data

Abstract

Access this article

Similar content being viewed by others

Triply stochastic gradient method for large-scale nonlinear similar unlabeled classification

Joint Semi-supervised Similarity Learning for Linear Classification

Cost-sensitive sparse group online learning for imbalanced data streams

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of Theorem 1

Appendix 2: Proof of Theorem 2

Appendix 3: Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online binary classification from similar and dissimilar data

Abstract

Access this article

Similar content being viewed by others

Triply stochastic gradient method for large-scale nonlinear similar unlabeled classification

Joint Semi-supervised Similarity Learning for Linear Classification

Cost-sensitive sparse group online learning for imbalanced data streams

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of Theorem 1

Appendix 2: Proof of Theorem 2

Appendix 3: Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation