v-soft margin multi-task learning logistic regression

  • Chengquan HuangEmail author
  • Shitong Wang
  • Xingguang Pan
  • Anqi Bi
Original Article


Coordinate descent (CD) is an effective method for large scale classification problems with simple operations and fast convergence speed. In this paper, inspired by v-soft margin support vector machine and multi-task learning support vector machine for classification, a novel v-soft margin multi-task learning logistic regression (v-SMMTL-LR) for pattern classification is proposed to improve the generalization performance of logistic regression (LR). The dual of v-SMMTL-LR can be viewed as dual coordinate descent (CDdual) problem with equality constraint and then its large scale classification method named v-SMMTL-LR-CDdual is developed. The proposed method v-SMMTL-LR-CDdual can maximize the between-class margin and effectively improve the generalization performance of LR for large scale multi-task learning scenarios. Experimental results show that the proposed method v-SMMTL-LR-CDdual is effective for large scale multi-task datasets or comparatively high dimensional multi-task datasets and that it is competitive to other related single-task and multi-task learning algorithms.


Logistic regression Multi-task learning Coordinate descent Dual coordinate descent 



This work was supported in part by the National Natural Science Foundation of China under Grants (61272210, 61572236), the Fundamental Research Funds for the Central Universities (JUDCF13030, JUSRP51614A), 2013 Postgraduate Student’s Creative Research Fund of Jiangsu Province under Grant CXZZ13_0760, and the Natural Science Foundation of Guizhou Province under Grant [2013]2136.


  1. 1.
    Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bakker B, Heskes T (2003) Task clustering and gating for Bayesian multitask learning. J Mach Learn Res 4:83–99zbMATHGoogle Scholar
  3. 3.
    Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 109–117Google Scholar
  4. 4.
    Jiang YZ, Chung FL, Ishibuchi H et al (2015) Multitask TSK fuzzy system modeling by mining intertask common hidden structure. IEEE Trans Cybern 45(3):548–561CrossRefGoogle Scholar
  5. 5.
    Xue Y, Liao XJ, Carin L et al (2007) Multi-task learning for classification with Dirichlet process priors. J Mach Learn Res 8:35–63MathSciNetzbMATHGoogle Scholar
  6. 6.
    Li D, Hu G, Wang Y et al (2015) Network traffic classification via non-convex multi-task feature learning. Neurocomputing 152:322–332CrossRefGoogle Scholar
  7. 7.
    He X, Mourot G, Maquin D et al (2014) Multi-task learning with one-class SVM. Neurocomputing 133:416–426CrossRefGoogle Scholar
  8. 8.
    Parameswaran S, Weinberger KQ (2010) Large margin multi-task metric learning. In: Proceedings of advances in neural information processing systems, pp 1867–1875Google Scholar
  9. 9.
    Bottou L, Bousquet O (2007) The tradeoffs of large scale learning. In: Proceedings of advances in neural information processing systems, pp 161–168Google Scholar
  10. 10.
    Musa AB (2013) Comparative study on classification performance between support vector machine and logistic regression. Int J Mach Learn Cybern 4(1):13–24CrossRefGoogle Scholar
  11. 11.
    Ekbal A, Saha S, Sikdar UK (2014) On active annotation for named entity recognition. Int J Mach Learn Cybern 1–8Google Scholar
  12. 12.
    Yu HF, Huang FL, Lin CJ (2011) Dual coordinate descent methods for logistic regression and maximum entropy models. Mach Learn 85(1–2):41–75MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Darroch JN, Ratcliff D (1972) Generalized iterative scaling for log-linear models. Ann Math Stat 43(5):1470–1480MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Della PS, Della PV, Lafferty J (1997) Inducing features of random fields. IEEE Trans Pattern Anal Mach Intell 19(4):380–393CrossRefGoogle Scholar
  15. 15.
    Goodman J (2002) Sequential conditional generalized iterative scaling. In: Proceedings of the 40th annual meeting of the association of computational linguistics, pp 9–16Google Scholar
  16. 16.
    Jin R, Yan R, Zhang J et al (2003) A faster iterative scaling algorithm for conditional exponential model. In: Proceedings of the 20th international conference on machine learning, pp 282–289Google Scholar
  17. 17.
    Huang FL, Hsien CJ, Chang KW et al (2010) Iterative scaling and coordinate descent methods for maximum entropy. J Mach Learn Res 11:815–848MathSciNetzbMATHGoogle Scholar
  18. 18.
    Minka TP (2007) A comparison of numerical optimizers for logistic regression.
  19. 19.
    Komarek P, Moore AW (2005) Making logistic regression a core data mining tool: a practical investigation of accuracy, speed, and simplicity. Technical report TR-05-27, Robotics Institute of Carnegie Mellon University, PittsburghGoogle Scholar
  20. 20.
    Lin CJ, Weng RC, Keerthi SS (2008) Trust region Newton method for large-scale logistic regression. J Mach Learn Res 9:627–650MathSciNetzbMATHGoogle Scholar
  21. 21.
    Keerthi SS, Duan KB, Shevade SK et al (2005) A fast dual algorithm for kernel logistic regression. Mach Learn 61(1–3):151–165CrossRefzbMATHGoogle Scholar
  22. 22.
    Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Proceedings of advances in kernel methods: support vector learning, pp 185–208Google Scholar
  23. 23.
    Keerthi SS, Shevade SK, Bhattacharyya C et al (2001) Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput 13(3):637–649CrossRefzbMATHGoogle Scholar
  24. 24.
    Hsieh CJ, Chang KW, Lin CJ et al (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th international conference on machine learning, pp 408–415Google Scholar
  25. 25.
    Chen PH, Lin CJ, Schölkopf B (2005) A tutorial on v-support vector machines. Appl Stoch Models Bus Ind 21(2):111–136MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Gu X, Wang ST, Xu M (2014) A new cross-multidomain classification algorithm and its fast version for large datasets. Acta Autom Sin 40(3):531–547Google Scholar
  27. 27.
    Luo ZQ, Tseng P (1992) On the convergence of coordinate descent method for convex differentiable minimization. J Optim Theory Appl 72(1):7–35MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Lewis DD, Yang Y, Rose TG et al (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397Google Scholar
  29. 29.
    Cai D, He XF (2012) Manifold adaptive experimental design for text categorization. IEEE Trans Knowl Data Eng 24(4):707–719CrossRefGoogle Scholar
  30. 30.
    Dai WY, Xue GR, Yang Q et al (2007) Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 210–219Google Scholar
  31. 31.
    Duan LX, Tsang IW, Xu D (2012) Domain transfer multiple kernel learning. IEEE Trans Pattern Anal Mach Intell 34(3):465–479CrossRefGoogle Scholar
  32. 32.
    Duan L, Xu D, Tsang IW (2012) Domain adaptation from multiple sources: a domain-dependent regularization approach. IEEE Trans Neural Netw Learn Syst 23(3):504–518CrossRefGoogle Scholar
  33. 33.
    Gu X, Chung FL, Ishibuchi H et al (2015) Multitask coupled logistic regression and its fast implementation for large multitask datasets. IEEE Trans Cybern 45(9):1953–1966CrossRefGoogle Scholar
  34. 34.
    Jiang Y, Chung FL, Wang S et al (2015) Collaborative fuzzy clustering from multiple weighted views. IEEE Trans Cybern 45(4):688–701CrossRefGoogle Scholar
  35. 35.
    Kreyszig E (1970) Introductory mathematical statistics: principles and methods. Wiley, New YorkzbMATHGoogle Scholar
  36. 36.
    Baxter J (2000) A model of inductive bias learning. J Artif Intell Res 12(1):149–198MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Yu K, Tresp V, Schwaighofer A (2005) Learning Gaussian processes from multiple tasks. In: Proceedings of the 22nd international conference on machine learning, pp 1012–1019Google Scholar
  38. 38.
    Lawrence ND, Platt JC (2004) Learning to learn with the informative vector machine. In: Proceedings of the twenty-first international conference on machine learning, p 65Google Scholar
  39. 39.
    Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6:1817–1853MathSciNetzbMATHGoogle Scholar
  40. 40.
    Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6:615–637MathSciNetzbMATHGoogle Scholar
  41. 41.
    Gao J, Fan W, Jiang J et al (2008) Knowledge transfer via multiple model local structure mapping. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 283–291Google Scholar
  42. 42.
    Al-Stouhi S, Reddy C K (2014) Multi-task clustering using constrained symmetric non-negative matrix factorization. In: Proceedings of the 2014 SIAM international conference on data mining, pp 785–793Google Scholar
  43. 43.
    Wang XZ, Xing HJ, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRefGoogle Scholar
  44. 44.
    Wang XZ, Ashfaq RAR, Fu AM (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29(3):1185–1196MathSciNetCrossRefGoogle Scholar
  45. 45.
    Ashfaq R A R, Wang XZ, Huang JZX et al (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci 378:484–497CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Chengquan Huang
    • 1
    • 2
    Email author
  • Shitong Wang
    • 1
  • Xingguang Pan
    • 1
  • Anqi Bi
    • 1
  1. 1.School of Digital MediaJiangnan UniversityWuxiPeople’s Republic of China
  2. 2.Engineering Training CenterGuizhou Minzu UniversityGuiyangPeople’s Republic of China

Personalised recommendations