### Abstract

This work studies the problem of binary classification with the F-score as the performance measure. We propose a post-processing algorithm for this problem which fits a threshold for any score base classifier to yield high F-score. The post-processing step involves only unlabeled data and can be performed in logarithmic time. We derive a general finite sample post-processing bound for the proposed procedure and show that the procedure is minimax rate optimal, when the underlying distribution satisfies classical nonparametric assumptions. This result improves upon previously known rates for the F-score classification and bridges the gap between standard classification risk and the F-score. Finally, we discuss the generalization of this approach to the set-valued classification.

This is a preview of subscription content, access via your institution.

## Notes

The (weighted) harmonic mean of any real number and zero is defined as zero.

It is assumed that \(\mathbf{X}_{n+1},\ldots,\mathbf{X}_{n+N}\) are independent from \((\mathbf{X}_{1},Y_{1}),\ldots,(\mathbf{X}_{n},Y_{n}),(\mathbf{X},Y)\).

For this discussion let us assume that \(n\) is even.

We only assumed that \(\mathbb{P}(Y=1)>0\) which is hardly an assumption in this context.

Again it is assumed that the (weighted) harmonic mean of any real number and zero is zero.

Indeed note that \(R(\theta)=H_{1}(\theta)+H_{2}(\theta)\) with \(H_{1}(\theta)=b^{2}\theta\) being strictly increasing and \(H_{2}(\theta)=-\sum_{k=1}^{K}\mathbb{E}(\eta_{k}({x})-\theta)_{+}\) being non-decreasing.

## REFERENCES

J.-Y. Audibert,

*Progressive Mixture Rules are Deviation Suboptimal*, in NIPS, (2007), pp. 41–48.J.-Y. Audibert and A. B. Tsybakov, ‘‘Fast learning rates for plug-in classifiers,’’ Ann. Statist.

**35**(2), 608–633 (2007).H. Bao and M. Sugiyama, Calibrated surrogate maximization of linear-fractional utility in binary classification (2019), arXiv preprint arXiv:1905.12511.

M. Binkhonain and L. Zhao, A review of machine learning algorithms for identification and classification of non-functional requirements. Expert Systems with Applications: X, 1:100001 (2019).

E. Chzhen, C. Denis, and M. Hebiri, Minimax semi-supervised confidence sets for multi-class classification. preprint (2019). https://arxiv.org/abs/1904.12527.

S. Conte and C. Boor,

*Elementary Numerical Analysis: An Algorithmic Approach*(McGraw-Hill Higher Education, 3rd edition, 1980).J. Del Coz, J. Díez, and A. Bahamonde, ‘‘Learning nondeterministic classifiers,’’ Journal of Machine Learning Research

**10**(10) (2009).K. Dembczynski, A. Jachnik, W. Kotlowski, W. Waegeman, and E. Hüllermeier, ‘‘Optimizing the f-measure in multi-label classification: Plugin rule approach versus structured loss minimization,’’ in International conference on machine learning, 1130–1138 (2013).

K. Dembczynski, W. Waegeman, W. Cheng, and E. Hüllermeier, ‘‘An exact algorithm for f-measure maximization,’’ Advances in neural information processing systems

**24**, 1404–1412 (2011).C. Denis and M. Hebiri, ‘‘Confidence sets with expected sizes for multiclass classification,’’ JMLR

**18**(1), 3571–3598 (2017).L. Devroye, L. Györfi, and G. Lugosi,

*A Probabilistic Theory of Pattern Recognition*, vol. 31 of*Applications of Mathematics*(Springer-Verlag, New York, 1996).P. Flach, ‘‘Performance evaluation in machine learning: The good, the bad, the ugly, and the way forward,’’ In Proceedings of the AAAI Conference on Artificial Intelligence

**33**, 9808–9814 (2019).S. Gadat, T. Klein, and C. Marteau, ‘‘Classification in general finite dimensional spaces with the k-nearest neighbor rule,’’ Ann. Statist.

**44**(3), 982–1009 (2016).A. Gunawardana and G. Shani, ‘‘A survey of accuracy evaluation metrics of recommendation tasks,’’ Journal of Machine Learning Research

**10**(12) (2009).M. Jansche, ‘‘Maximum expected f-measure training of logistic regression models,’’ in Proceedings of the conference on human language technology and empirical methods in natural language processing, Association for Computational Linguistics, 692–699 (2005).

N. Japkowicz and M. Shah,

*Evaluating Learning Algorithms: A Classification Perspective*(Cambridge University Press, 2011).O. Koyejo, N. Natarajan, P. Ravikumar, and I. Dhillon, ‘‘Consistent binary classification with generalized performance metrics,’’ in NIPS, 2744–2752 (2014).

S. Kpotufe and G. Martinet, ‘‘Marginal singularity, and the benefits of labels in covariate-shift,’’ in Conference on Learning Theory, 1882–1886 (2018).

M. Lapin, M. Hein, and B. Schiele, ‘‘Top-k multiclass svm,’’ in Advances in Neural Information Processing Systems, 325–333 (2015).

O. Mac Aodha, E. Cole, and P. Perona, ‘‘Presence-only geographical priors for fine-grained image classification,’’ in Proceedings of the IEEE International Conference on Computer Vision, 9596–9606 (2019).

E. Mammen and A. B. Tsybakov, ‘‘Smooth discrimination analysis,’’ Ann. Statist.

**27**(6), 1808–1829 (1999).D. R. Martin, C. C. Fowlkes, and J. Malik, ‘‘Learning to detect natural image boundaries using local brightness, color, and texture cues,’’ IEEE transactions on pattern analysis and machine intelligence

**26**(5), 530–549 (2004).T. Mortier, M. Wydmuch, K. Dembczyński, E. Hüllermeier, and W. Waegeman, Efficient set-valued prediction in multi-class classification (2019), arXiv preprint arXiv:1906.08129.

D. R. Musicant, V. Kumar, A. Ozgur, et al. ‘‘Optimizing f-measure with support vector machines,’’ in FLAIRS conference, 356–360 (2003).

H. Narasimhan, R. Vaish, and S. Agarwal, ‘‘On the statistical consistency of plug-in classifiers for non-decomposable performance measures,’’ in NIPS, 1493–1501 (2014).

S. P. Parambath, N. Usunier, and Y. Grandvalet, ‘‘Optimizing f-measures by cost-sensitive classification,’’ in Advances in Neural Information Processing Systems, 2123–2131 (2014).

W. Polonik, ‘‘Measuring mass concentrations and estimating density contour clusters-an excess mass approach,’’ Ann. Statist.

**23**(3), 855–881 (1995).H. G. Ramaswamy, A. Tewari, S. Agarwal, et al. ‘‘Consistent algorithms for multiclass classification with an abstain option,’’ Electronic Journal of Statistics

**12**(1), 530–554 (2018).P. Rigollet, R. Vert, et al., ‘‘Optimal rates for plug-in estimators of density level sets,’’ Bernoulli

**15**(4), 1154–1178 (2009).M. Sadinle, J. Lei, and L. Wasserman, ‘‘Least ambiguous set-valued classifiers with bounded error levels,’’ Journal of the American Statistical Association

**114**(525), 223–234 (2019).C. Scott, ‘‘Calibrated asymmetric surrogate losses,’’ Electronic Journal of Statistics

**6**, 958–992 (2012).C. J. Stone, ‘‘Consistent nonparametric regression,’’ The annals of statistics, 595–620 (1977).

E. F. Tjong Kim Sang and F. De Meulder, ‘‘Introduction to the conll-2003 shared task: Language-independent named entity recognition,’’ in

*Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003*, Vol. 4, pp. 142–147.A. B. Tsybakov, ‘‘Optimal aggregation of classifiers in statistical learning,’’ Ann. Statist.

**32**(1), 135–166 (2004).A. B. Tsybakov,

*Introduction to Nonparametric Estimation. Springer Series in Statistics*(Springer, New York, 2009).V. Vovk, I. Nouretdinov, V. Fedorova, I. Petej, and A. Gammerman, ‘‘Criteria of efficiency for set-valued classification,’’ Annals of Mathematics and Artificial Intelligence

**81**, 21–47 (2017).W. Waegeman, K. Dembczyński, A. Jachnik, W. Cheng, and E. Hüllermeier, ‘‘On the bayes-optimality of f-measure maximizers,’’ Journal of Machine Learning Research

**15**, 3333–3388 (2014).B. Yan, S. Koyejo, K. Zhong, and P. Ravikumar, Binary classification with karmic, threshold-quasi-concave metrics, in ICML, vol. 80 (2018).

Y. Yang, ‘‘Minimax nonparametric classification: Rates of convergence,’’ IEEE Transactions on Information Theory

**45**(7), 2271–2284 (1999).M.-J. Zhao, N. Edakunni, A. Pocock, and G. Brown, ‘‘Beyond fano’s inequality: bounds on the optimal f-score, ber, and cost-sensitive risk and their implications,’’ JMLR

**14**(Apr), 1033–1090 (2013).

## Author information

### Authors and Affiliations

### Corresponding author

## About this article

### Cite this article

Chzhen, E. Optimal Rates for Nonparametric F-Score Binary Classification via Post-Processing.
*Math. Meth. Stat.* **29**, 87–105 (2020). https://doi.org/10.3103/S1066530720020027

Received:

Revised:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.3103/S1066530720020027

### Keywords:

- F-score
- universal consistency
- minimax rate
- margin assumption
- nonparametric classification