Neural Processing Letters

, Volume 50, Issue 2, pp 1341–1360 | Cite as

Multi-step Training of a Generalized Linear Classifier

  • Kanishka TyagiEmail author
  • Michael Manry


We propose a multi-step training method for designing generalized linear classifiers. First, an initial multi-class linear classifier is found through regression. Then validation error is minimized by pruning of unnecessary inputs. Simultaneously, desired outputs are improved via a method similar to the Ho-Kashyap rule. Next, the output discriminants are scaled to be net functions of sigmoidal output units in a generalized linear classifier. This classifier is trained via Newton’s algorithm. Performance gains are demonstrated at each step. Using widely available datasets, the final network’s tenfold testing error is shown to be less than that of several other linear and generalized linear classifiers reported in the literature.


Linear classifiers Nonlinear functions Pruning Orthogonal least squares Newton’s algorithm 



  1. 1.
    Deng L, Li X (2013) Machine learning paradigms for speech recognition: an overview. IEEE Trans Audio Speech Lang Process 21(5):1060–1089CrossRefGoogle Scholar
  2. 2.
    Olsson F (2009) A literature survey of active machine learning in the context of natural language processing. Swedish Institute of Computer ScienceGoogle Scholar
  3. 3.
    Rao A, Noushath S (2010) Subspace methods for face recognition. Comput Sci Rev 4(1):1–17CrossRefGoogle Scholar
  4. 4.
    Benenson R, Omran M, Hosang J, Schiele B (2014) Ten years of pedestrian detection, what have we learned? In: European conference on computer vision. Springer, Berlin, pp 613–627Google Scholar
  5. 5.
    Zhang D, Zuo W, Yue F (2012) A comparative study of palmprint recognition algorithms. ACM Comput Surv (CSUR) 44(1):2CrossRefGoogle Scholar
  6. 6.
    KB Mujitha B, Ajil Jalal VV, Nishad K (2015) Analytics, machine learning & nlp–use in biosurveillance and public health practice. Online J Public Health Inf 7(1)Google Scholar
  7. 7.
    Su X, Taghi KM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009:4CrossRefGoogle Scholar
  8. 8.
    Nguyen-Tuong D, Peters J (2011) Model learning for robot control: a survey. Cogn Process 12(4):319–340CrossRefGoogle Scholar
  9. 9.
    Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23(1):89–109CrossRefGoogle Scholar
  10. 10.
    Chen Y, Tsai FS, Chan KL (2008) Machine learning techniques for business blog search and mining. Expert Syst Appl 35(3):581–590CrossRefGoogle Scholar
  11. 11.
    Schwabacher M, Goebel K (2007) A survey of artificial intelligence for prognostics. In: AAAI fall symposium, pp 107–114Google Scholar
  12. 12.
    Grimmer J (2015) We are all social scientists now: how big data, machine learning, and causal inference work together. PS Polit Sci Polit 48(01):80–83CrossRefGoogle Scholar
  13. 13.
    Tarca AL, Carey VJ, Chen X-W, Romero R, Drăghici S (2007) Machine learning and its applications to biology. PLoS Comput Biol 3(6):e116CrossRefGoogle Scholar
  14. 14.
    Schölkopf B, Tsuda K, Vert J-P (2004) Kernel methods in computational biology. MIT Press, CambridgeCrossRefGoogle Scholar
  15. 15.
    Chen H, Chung W, Xu JJ, Wang G, Qin Y, Chau M (2004) Crime data mining: a general framework and some examples. Computer 37(4):50–56CrossRefGoogle Scholar
  16. 16.
    Song X, Fan G, Rao M (2005) Automatic CRP mapping using nonparametric machine learning approaches. IEEE Trans Geosci Remote Sens 43(4):888–897CrossRefGoogle Scholar
  17. 17.
    Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New YorkzbMATHGoogle Scholar
  18. 18.
    Rumelhart DE, McClelland JL, Group PR et al (1988) Parallel distributed processing, vol 1. IEEEGoogle Scholar
  19. 19.
    Gore R, Li J, Manry MT, Liu L-M, Yu C, Wei J (2005) Iterative design of neural network classifiers through regression. Int J Artif Intell Tools 14(01n02):281–301CrossRefGoogle Scholar
  20. 20.
    Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW (1990) The multilayer perceptron as an approximation to a bayes optimal discriminant function. IEEE Trans Neural Netw 1(4):296–298CrossRefGoogle Scholar
  21. 21.
    Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection, vol 589. Wiley, New YorkzbMATHGoogle Scholar
  22. 22.
    Bishop CM (2006) Pattern recognition. Mach Learn 128Google Scholar
  23. 23.
    Fu K, Cheng D, Tu Y, Zhang L (2016) Credit card fraud detection using convolutional neural networks. In: International conference on neural information processing. Springer, Berlin, pp 483–490Google Scholar
  24. 24.
    Murli D, Jami S, Jog D, Nath S (2015) Credit card fraud detection using neural networks. Int J Stud Res Technol Manag 2(2):84–88Google Scholar
  25. 25.
    Kuruvilla J, Gunavathi K (2014) Lung cancer classification using neural networks for Ct images. Comput Methods Programs Biomed 113(1):202–209CrossRefGoogle Scholar
  26. 26.
    Sonar D, Kulkarni U (2016) Lung cancer classification. Int J Comput Sci Eng 8:51Google Scholar
  27. 27.
    Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673CrossRefGoogle Scholar
  28. 28.
    Abbass HA (2002) An evolutionary artificial neural networks approach for breast cancer diagnosis. Artif Intell Med 25(3):265–281CrossRefGoogle Scholar
  29. 29.
    Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874zbMATHGoogle Scholar
  30. 30.
    Hsieh C-J, Chang K-W, Lin C-J, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th international conference on machine learning. ACM, pp 408–415Google Scholar
  31. 31.
    Keerthi SS, Sundararajan S, Chang K-W, Hsieh C-J, Lin C-J (2008) A sequential dual method for large scale multi-class linear SVMs. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 408–416Google Scholar
  32. 32.
    Rubin DB (1983) Iterative recursive least squares. In: Encyclopaedia of statistical sciences. Wiley, New York, pp 272–275Google Scholar
  33. 33.
    Abu-Mostafa YS, Magdon-Ismail M, Lin H-T (2012) Learning from data, vol 4. AMLBook SingaporeGoogle Scholar
  34. 34.
    Maldonado F, Manry M, Kim T-H (2003) Finding optimal neural network basis function subsets using the schmidt procedure. In: Proceedings of the international joint conference on neural networks, vol 1. IEEE, pp 444–449Google Scholar
  35. 35.
    Chen S, Cowan CF, Grant PM (1991) Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans Neural Netw 2(2):302–309CrossRefGoogle Scholar
  36. 36.
    Manry M, Chandrasekaran H, Hsieh C (2001) Signal processing applications of the multilayer perceptron. In: Hu YH, Hwang J-N (eds) Handbook on neural network signal processing. CRC PressGoogle Scholar
  37. 37.
    Robinson MD, Manry MT (2013) Two-stage second order training in feedforward neural networks. In: FLAIRS conferenceGoogle Scholar
  38. 38.
    Hagiwara M (1990) Novel backpropagation algorithm for reduction of hidden units and acceleration of convergence using artificial selection. In: IJCNN international joint conference on neural networks. IEEE, pp 625–630Google Scholar
  39. 39.
    Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2(2):164–168MathSciNetCrossRefGoogle Scholar
  40. 40.
    Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441MathSciNetCrossRefGoogle Scholar
  41. 41.
    Wille J (1997) On the structure of the hessian matrix in feedforward networks and second derivative methods. In: International conference on neural networks, vol 3. IEEE, pp 1851–1855Google Scholar
  42. 42.
    Ma C, Tang J (2008) The quadratic convergence of a smoothing Levenberg–Marquardt method for nonlinear complementarity problem. Appl Math Comput 197(2):566–581MathSciNetzbMATHGoogle Scholar
  43. 43.
    Ahookhosh M, Aragon FJ, Fleming RM, Vuong PT (2017) Local convergence of Levenberg–Marquardt methods under holder metric subregularity. arXiv preprint arXiv:1703.07461
  44. 44.
    Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Class 10(3):61–74Google Scholar
  45. 45.
    Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic pressGoogle Scholar
  46. 46.
    Liano K (1996) Robust error measure for supervised neural network learning with outliers. IEEE Trans Neural Netw 7(1):246–250CrossRefGoogle Scholar
  47. 47.
    Hughes G (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14(1):55–63CrossRefGoogle Scholar
  48. 48.
    Vapnik V, Izmailov R (2015) V-matrix method of solving statistical inference problems. J Mach Learn Res 16:1683–1730MathSciNetzbMATHGoogle Scholar
  49. 49.
    Li J, Manry MT, Liu L-M, Yu C, Wei J (2004) Iterative improvement of neural classifiers. In: FLAIRS conference, pp 700–705Google Scholar
  50. 50.
    Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, BaltimorezbMATHGoogle Scholar
  51. 51.
    Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press, LondonzbMATHGoogle Scholar
  52. 52.
    Liu L, Manry M, Amar F, Dawson M, Fung A (1994) Image classification in remote sensing using functional link neural networks. In: Proceedings of the IEEE southwest symposium on image analysis and interpretation. IEEE, pp 54–58Google Scholar
  53. 53.
    Ho Y-C, Kashyap R (1965) An algorithm for linear inequalities and its applications. IEEE Trans Electron Comput 5:683–688CrossRefGoogle Scholar
  54. 54.
    Ho Y-C, Kashyap R (1966) A class of iterative procedures for linear inequalities. SIAM J Control 4(1):112–115MathSciNetCrossRefGoogle Scholar
  55. 55.
    Narasimha PL, Delashmit WH, Manry MT, Li J, Maldonado F (2008) An integrated growing–pruning method for feedforward network training. Neurocomputing 71(13):2831–2847CrossRefGoogle Scholar
  56. 56.
    Riedmiller M, Braun H (1992) Rprop-a fast adaptive learning algorithm. In: Proceedings of ISCIS VII), Universitat, CiteseerGoogle Scholar
  57. 57.
    Yau H-C, Manry MT (1991) Iterative improvement of a nearest neighbor classifier. Neural Netw 4(4):517–524CrossRefGoogle Scholar
  58. 58.
    Bailey RR, Pettit EJ, Borochoff RT, Manry MT, Jiang X (1993) Automatic recognition of usgs land use/cover categories using statistical and neural network classifiers. In: Optical engineering and photonics in aerospace sensing, international society for optics and photonics, pp 185–195Google Scholar
  59. 59.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  60. 60.
    Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning, vol 2011, p 5Google Scholar
  61. 61.
    Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical reportGoogle Scholar
  62. 62.
    Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Electron Agric 24(3):131–151CrossRefGoogle Scholar
  63. 63.
    Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397Google Scholar
  64. 64.
    Mitchell TM (1997) Machine learning book. McGraw-Hill Science/Engineering/MathGoogle Scholar
  65. 65.
    Pourreza-Shahri R, Saki F, Kehtarnavaz N, Leboulluec P, Liu H (2013) Classification of ex-vivo breast cancer positive margins measured by hyperspectral imaging. In: 20th IEEE International Conference on Image Processing (ICIP). IEEE, pp 1408–1412Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electrical EngineeringThe University of Texas at ArlingtonArlingtonUSA

Personalised recommendations