Multi-step Training of a Generalized Linear Classifier

Tyagi, Kanishka; Manry, Michael

doi:10.1007/s11063-018-9915-4

Multi-step Training of a Generalized Linear Classifier

Published: 29 September 2018

Volume 50, pages 1341–1360, (2019)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

221 Accesses
26 Citations
Explore all metrics

Abstract

We propose a multi-step training method for designing generalized linear classifiers. First, an initial multi-class linear classifier is found through regression. Then validation error is minimized by pruning of unnecessary inputs. Simultaneously, desired outputs are improved via a method similar to the Ho-Kashyap rule. Next, the output discriminants are scaled to be net functions of sigmoidal output units in a generalized linear classifier. This classifier is trained via Newton’s algorithm. Performance gains are demonstrated at each step. Using widely available datasets, the final network’s tenfold testing error is shown to be less than that of several other linear and generalized linear classifiers reported in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non Linear Fitting Methods for Machine Learning

Don’t Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML

Observing a Naïve Bayes Classifier’s Performance on Multiple Datasets

References

Deng L, Li X (2013) Machine learning paradigms for speech recognition: an overview. IEEE Trans Audio Speech Lang Process 21(5):1060–1089
Article Google Scholar
Olsson F (2009) A literature survey of active machine learning in the context of natural language processing. Swedish Institute of Computer Science
Rao A, Noushath S (2010) Subspace methods for face recognition. Comput Sci Rev 4(1):1–17
Article Google Scholar
Benenson R, Omran M, Hosang J, Schiele B (2014) Ten years of pedestrian detection, what have we learned? In: European conference on computer vision. Springer, Berlin, pp 613–627
Zhang D, Zuo W, Yue F (2012) A comparative study of palmprint recognition algorithms. ACM Comput Surv (CSUR) 44(1):2
Article Google Scholar
KB Mujitha B, Ajil Jalal VV, Nishad K (2015) Analytics, machine learning & nlp–use in biosurveillance and public health practice. Online J Public Health Inf 7(1)
Su X, Taghi KM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009:4
Article Google Scholar
Nguyen-Tuong D, Peters J (2011) Model learning for robot control: a survey. Cogn Process 12(4):319–340
Article Google Scholar
Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23(1):89–109
Article Google Scholar
Chen Y, Tsai FS, Chan KL (2008) Machine learning techniques for business blog search and mining. Expert Syst Appl 35(3):581–590
Article Google Scholar
Schwabacher M, Goebel K (2007) A survey of artificial intelligence for prognostics. In: AAAI fall symposium, pp 107–114
Grimmer J (2015) We are all social scientists now: how big data, machine learning, and causal inference work together. PS Polit Sci Polit 48(01):80–83
Article Google Scholar
Tarca AL, Carey VJ, Chen X-W, Romero R, Drăghici S (2007) Machine learning and its applications to biology. PLoS Comput Biol 3(6):e116
Article Google Scholar
Schölkopf B, Tsuda K, Vert J-P (2004) Kernel methods in computational biology. MIT Press, Cambridge
Book Google Scholar
Chen H, Chung W, Xu JJ, Wang G, Qin Y, Chau M (2004) Crime data mining: a general framework and some examples. Computer 37(4):50–56
Article Google Scholar
Song X, Fan G, Rao M (2005) Automatic CRP mapping using nonparametric machine learning approaches. IEEE Trans Geosci Remote Sens 43(4):888–897
Article Google Scholar
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York
MATH Google Scholar
Rumelhart DE, McClelland JL, Group PR et al (1988) Parallel distributed processing, vol 1. IEEE
Gore R, Li J, Manry MT, Liu L-M, Yu C, Wei J (2005) Iterative design of neural network classifiers through regression. Int J Artif Intell Tools 14(01n02):281–301
Article Google Scholar
Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW (1990) The multilayer perceptron as an approximation to a bayes optimal discriminant function. IEEE Trans Neural Netw 1(4):296–298
Article Google Scholar
Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection, vol 589. Wiley, New York
MATH Google Scholar
Bishop CM (2006) Pattern recognition. Mach Learn 128
Fu K, Cheng D, Tu Y, Zhang L (2016) Credit card fraud detection using convolutional neural networks. In: International conference on neural information processing. Springer, Berlin, pp 483–490
Murli D, Jami S, Jog D, Nath S (2015) Credit card fraud detection using neural networks. Int J Stud Res Technol Manag 2(2):84–88
Google Scholar
Kuruvilla J, Gunavathi K (2014) Lung cancer classification using neural networks for Ct images. Comput Methods Programs Biomed 113(1):202–209
Article Google Scholar
Sonar D, Kulkarni U (2016) Lung cancer classification. Int J Comput Sci Eng 8:51
Google Scholar
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673
Article Google Scholar
Abbass HA (2002) An evolutionary artificial neural networks approach for breast cancer diagnosis. Artif Intell Med 25(3):265–281
Article Google Scholar
Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Hsieh C-J, Chang K-W, Lin C-J, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th international conference on machine learning. ACM, pp 408–415
Keerthi SS, Sundararajan S, Chang K-W, Hsieh C-J, Lin C-J (2008) A sequential dual method for large scale multi-class linear SVMs. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 408–416
Rubin DB (1983) Iterative recursive least squares. In: Encyclopaedia of statistical sciences. Wiley, New York, pp 272–275
Abu-Mostafa YS, Magdon-Ismail M, Lin H-T (2012) Learning from data, vol 4. AMLBook Singapore
Maldonado F, Manry M, Kim T-H (2003) Finding optimal neural network basis function subsets using the schmidt procedure. In: Proceedings of the international joint conference on neural networks, vol 1. IEEE, pp 444–449
Chen S, Cowan CF, Grant PM (1991) Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans Neural Netw 2(2):302–309
Article Google Scholar
Manry M, Chandrasekaran H, Hsieh C (2001) Signal processing applications of the multilayer perceptron. In: Hu YH, Hwang J-N (eds) Handbook on neural network signal processing. CRC Press
Robinson MD, Manry MT (2013) Two-stage second order training in feedforward neural networks. In: FLAIRS conference
Hagiwara M (1990) Novel backpropagation algorithm for reduction of hidden units and acceleration of convergence using artificial selection. In: IJCNN international joint conference on neural networks. IEEE, pp 625–630
Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2(2):164–168
Article MathSciNet Google Scholar
Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441
Article MathSciNet Google Scholar
Wille J (1997) On the structure of the hessian matrix in feedforward networks and second derivative methods. In: International conference on neural networks, vol 3. IEEE, pp 1851–1855
Ma C, Tang J (2008) The quadratic convergence of a smoothing Levenberg–Marquardt method for nonlinear complementarity problem. Appl Math Comput 197(2):566–581
MathSciNet MATH Google Scholar
Ahookhosh M, Aragon FJ, Fleming RM, Vuong PT (2017) Local convergence of Levenberg–Marquardt methods under holder metric subregularity. arXiv preprint arXiv:1703.07461
Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Class 10(3):61–74
Google Scholar
Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic press
Liano K (1996) Robust error measure for supervised neural network learning with outliers. IEEE Trans Neural Netw 7(1):246–250
Article Google Scholar
Hughes G (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14(1):55–63
Article Google Scholar
Vapnik V, Izmailov R (2015) V-matrix method of solving statistical inference problems. J Mach Learn Res 16:1683–1730
MathSciNet MATH Google Scholar
Li J, Manry MT, Liu L-M, Yu C, Wei J (2004) Iterative improvement of neural classifiers. In: FLAIRS conference, pp 700–705
Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, Baltimore
MATH Google Scholar
Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press, London
MATH Google Scholar
Liu L, Manry M, Amar F, Dawson M, Fung A (1994) Image classification in remote sensing using functional link neural networks. In: Proceedings of the IEEE southwest symposium on image analysis and interpretation. IEEE, pp 54–58
Ho Y-C, Kashyap R (1965) An algorithm for linear inequalities and its applications. IEEE Trans Electron Comput 5:683–688
Article Google Scholar
Ho Y-C, Kashyap R (1966) A class of iterative procedures for linear inequalities. SIAM J Control 4(1):112–115
Article MathSciNet Google Scholar
Narasimha PL, Delashmit WH, Manry MT, Li J, Maldonado F (2008) An integrated growing–pruning method for feedforward network training. Neurocomputing 71(13):2831–2847
Article Google Scholar
Riedmiller M, Braun H (1992) Rprop-a fast adaptive learning algorithm. In: Proceedings of ISCIS VII), Universitat, Citeseer
Yau H-C, Manry MT (1991) Iterative improvement of a nearest neighbor classifier. Neural Netw 4(4):517–524
Article Google Scholar
Bailey RR, Pettit EJ, Borochoff RT, Manry MT, Jiang X (1993) Automatic recognition of usgs land use/cover categories using statistical and neural network classifiers. In: Optical engineering and photonics in aerospace sensing, international society for optics and photonics, pp 185–195
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning, vol 2011, p 5
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report
Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Electron Agric 24(3):131–151
Article Google Scholar
Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
Google Scholar
Mitchell TM (1997) Machine learning book. McGraw-Hill Science/Engineering/Math
Pourreza-Shahri R, Saki F, Kehtarnavaz N, Leboulluec P, Liu H (2013) Classification of ex-vivo breast cancer positive margins measured by hyperspectral imaging. In: 20th IEEE International Conference on Image Processing (ICIP). IEEE, pp 1408–1412

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, 76019, USA
Kanishka Tyagi & Michael Manry

Authors

Kanishka Tyagi
View author publications
You can also search for this author in PubMed Google Scholar
Michael Manry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kanishka Tyagi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix

1.1 A.1 Datasets

In order to evaluate the performance of all the improvements and our proposed algorithm, we used many publicly available datasets. Table 5 tabulate the specifications for these datasets. It should be noted here that all the datasets that we have used in our experiments have balanced classes.

Table 5 Specification of datasets

Full size table

1.1.1 A.1.1 Gongtrn Dataset

The raw data consists of images from hand printed numerals [57] collected from 3000 people by the Internal Revenue Service. We randomly chose 300 characters from each class to generate 3000 character training data. Images are 32 by 24 binary matrices. An image scaling algorithm is used to remove size variation in characters. The feature set contains 16 elements. The 10 classes correspond to 10 Arabic numerals.

1.1.2 A.1.2 Comf18 Dataset

The training data file is generated from segmented images [58]. Each segmented region is separately histogram equalized to 20 levels. Then the joint probability density of pairs of pixels separated by a given distance and a given direction is estimated. We use 0, 90, 180, 270 degrees for the directions and 1, 3, and 5 pixels for the separations. The density estimates are computed for each classification window. For each separation, the co-occurrences for for the four directions are folded together to form a triangular matrix. From each of the resulting three matrices, six features are computed: angular second moment, contrast, entropy, correlation, and the sums of the main diagonal and the first off diagonal. This results in 18 features for each classification window.

1.1.3 A.1.3 MNIST Dataset

The digits data used in this book is taken from the MNIST data set [59], which itself was constructed by modifying a subset of the much larger dataset produced by NIST (the National Institute of Standards and Technology). It comprises a training set of 60,000 examples and a test set of 10,000 examples. The original NIST data had binary (black or white) pixels. To create MNIST,these images were size normalized to fit in a 20 20 pixel box while preserving their aspect ratio. As a consequence of the anti-aliasing used to change the resolution of the images, the resulting MNIST digits are grey scale. These images were then centered in a 28 28 box. This dataset is a classic within the machine learning community and has been extensively studied.

1.1.4 A.1.4 Google Street View Dataset

The Google street view housing numbers (SVHN) [60] is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST (e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.

1.1.5 A.1.5 CIFAR Dataset

The CIFAR-10 dataset [61] consists of 60,000 \(32\times 32\) colour images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images. The dataset is divided into five training batches and one test batch, each with 10,000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

1.1.6 A.1.6 COVER

This dataset [62] is contains forest cover type for a given observation (\(30\times 30\) meter cell) that was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types).

1.1.7 A.1.7 RCV1

Reuters Corpus Volume I (RCV1) [63] is an archive of over 800,000 manually categorized newswire stories made available by Reuters, Ltd. for research purposes. The dataset is extensively described in [1].

1.1.8 A.1.8 NEWS-20

The 20 Newsgroups dataset [64] is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.

1.1.9 A.1.9 Breast cancer

The breast cancer dataset [65] is a collection of 989 features that are reduced in dimension using principal component analysis to 42 features.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tyagi, K., Manry, M. Multi-step Training of a Generalized Linear Classifier. Neural Process Lett 50, 1341–1360 (2019). https://doi.org/10.1007/s11063-018-9915-4

Download citation

Published: 29 September 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11063-018-9915-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-step Training of a Generalized Linear Classifier

Abstract

Access this article

Similar content being viewed by others

Non Linear Fitting Methods for Machine Learning

Don’t Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML

Observing a Naïve Bayes Classifier’s Performance on Multiple Datasets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Appendix

1.1 A.1 Datasets

1.1.1 A.1.1 Gongtrn Dataset

1.1.2 A.1.2 Comf18 Dataset

1.1.3 A.1.3 MNIST Dataset

1.1.4 A.1.4 Google Street View Dataset

1.1.5 A.1.5 CIFAR Dataset

1.1.6 A.1.6 COVER

1.1.7 A.1.7 RCV1

1.1.8 A.1.8 NEWS-20

1.1.9 A.1.9 Breast cancer

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-step Training of a Generalized Linear Classifier

Abstract

Access this article

Similar content being viewed by others

Non Linear Fitting Methods for Machine Learning

Don’t Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML

Observing a Naïve Bayes Classifier’s Performance on Multiple Datasets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Appendix

A Appendix

1.1 A.1 Datasets

1.1.1 A.1.1 Gongtrn Dataset

1.1.2 A.1.2 Comf18 Dataset

1.1.3 A.1.3 MNIST Dataset

1.1.4 A.1.4 Google Street View Dataset

1.1.5 A.1.5 CIFAR Dataset

1.1.6 A.1.6 COVER

1.1.7 A.1.7 RCV1

1.1.8 A.1.8 NEWS-20

1.1.9 A.1.9 Breast cancer

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation