Machine Learning

, Volume 98, Issue 3, pp 435–454 | Cite as

Efficient \(F\) measure maximization via weighted maximum likelihood

  • Georgi Dimitroff
  • Georgi Georgiev
  • Laura Toloşi
  • Borislav Popov


The classification models obtained via maximum likelihood-based training do not necessarily reach the optimal \(F_\beta \)-measure for some user’s choice of \(\beta \) that is achievable with the chosen parametrization. In this work we link the weighted maximum entropy and the optimization of the expected \(F_\beta \)-measure, by viewing them in the framework of a general common multi-criteria optimization problem. As a result, each solution of the expected \(F_\beta \)-measure maximization can be realized as a weighted maximum likelihood solution within the maximum entropy model - a well understood and behaved problem for which standard (off the shelf) gradient methods can be used. Based on this insight, we present an efficient algorithm for optimization of the expected \(F_\beta \) using weighted maximum likelihood with dynamically adaptive weights.


  1. Berger, A., Della Pietra, V., & Della Pietra, S. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39–71.Google Scholar
  2. Carpenter, B. (2007). Lingpipe for 99.99 % recall of gene mentions. In: Proceedings of the 2nd BioCreative, workshop.Google Scholar
  3. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.MathSciNetMATHGoogle Scholar
  4. Culotta, A. (2004). Confidence estimation for information extraction. In: Proceedings of Human language technology conference and North American chapter of the Association for Computational Linguistics (HLT-NAACL).Google Scholar
  5. Dembczyn’ski, K., Waegeman, W., Cheng, W., Hü llermeier, E. (2011). An exact algorithm for f-measure maximization. In: Neural information processing systems : 2011 conference book. Neural Information Processing Systems Foundation.Google Scholar
  6. Ehrgott, M. (2005). Multi criteria optimization. New Jersery: Springer.Google Scholar
  7. Ganchev, K., Crammer, K., Pereira, F., Mann, G., Bellare, K., Mccallum, A., Carroll, S., Jin, Y., White, P. (2007). Penn/umass/chop biocreative ii systems 1 penn/umass/chop biocreative ii systems. In: Proceedings of the second bioCreative challenge evaluation workshop.Google Scholar
  8. Ganchev, K., Pereira, O., Mandel, M., Carroll, S., White, P. (2007). Semi-automated named entity annotation. In: Proceedings of the linguistic annotation workshop, Prague, Czech Republic. Association for, Computational Linguistics.Google Scholar
  9. Geoffrion, A. (1968). Proper efficiency and the theory of vector maximization. Journal of Mathematical Analysis and Applications, 22, 618–630.MathSciNetCrossRefMATHGoogle Scholar
  10. Georgiev, G., Ganchev, K., Momtchev, V., Peychev, D., Nakov, P., Roberts, A. (2009). Tunable domain-independent event extraction in the mira framework. In: Proceedings of the workshop on current trends in biomedical natural language processing: Shared Task, BioNLP ’09, pp. 95–98.Google Scholar
  11. Jansche, M. (2005).Maximum expected F-measure training of logistic regression models. In: HLT ’05, Association for computational linguistics, Morristown, NJ, USA, pp. 692–699Google Scholar
  12. Joachims, T. (2005).A support vector method for multivariate performance measures. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 377–384. ACM Press.Google Scholar
  13. Klinger, R., Friedrich, C.M. (2009). User’s choice of precision and recall in named entity recognition. In: Proceedings of the International Conference RANLP-2009, pp. 192–196.Google Scholar
  14. Lafferty, J. (2001).Conditional random fields: Probabilistic models for segmenting and labeling sequence data. pp. 282–289.Google Scholar
  15. Minkov, E., Wang, R., Tomasic, A., Cohen, W. (2006). NER systems that suit user’s preferences: adjusting the recall-precision trade-off for entity extraction. In: Proceedings of NAACL, pp. 93–96.Google Scholar
  16. Nan, Y., Chai, K.M.A., Lee, W.S., Chieu, H.L. (2012). Optimizing f-measure: A tale of two approaches. In: ICML.
  17. Saif, H., Fernandez, M., He, Y., Alani, H. (2013). Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the sts-gold.Google Scholar
  18. Simecková, M. (2005). Maximum weighted likelihood estimator in logistic regression.Google Scholar
  19. Suzuki, J., McDermott, E., Isozaki, H. (2006). Training conditional random fields with multivariate evaluation measures. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for, Computational Linguistics, ACL-44, pp. 217–224.Google Scholar
  20. Vandev, D.L., Neykov, N.M. (1998). About regression estimators with high breakdown point. Statistics 32, 111–129. Scholar
  21. Yang, Y., Pedersen, J.O. (1997). A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML ’97, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Copyright information

© The Author(s) 2014

Authors and Affiliations

  • Georgi Dimitroff
    • 1
  • Georgi Georgiev
    • 1
  • Laura Toloşi
    • 1
  • Borislav Popov
    • 1
  1. 1.Ontotext ADSofiaBulgaria

Personalised recommendations