Second Order Training of a Smoothed Piecewise Linear Network

Rawat, Rohit; Manry, Michael T.

doi:10.1007/s11063-017-9618-2

Second Order Training of a Smoothed Piecewise Linear Network

Published: 05 April 2017

Volume 46, pages 915–942, (2017)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

284 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we introduce a smoothed piecewise linear network (SPLN) and develop second order training algorithms for it. An embedded feature selection algorithm is developed which minimizes training error with respect to distance measure weights. Then a method is presented which adjusts center vector locations in the SPLN. We also present a gradient method for optimizing the SPLN output weights. Results with several data sets show that the distance measure optimization, center vector optimization, and output weight optimization, individually and together, reduce testing errors in the final network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Treating Artificial Neural Net Training as a Nonsmooth Global Optimization Problem

Piecewise linear neural networks and deep learning

Article 09 June 2022

Sobolev Training with Approximated Derivatives for Black-Box Function Regression with Neural Networks

References

Aksoy S, Haralick R, Cheikh F, Gabbouj M (2000) A weighted distance approach to relevance feedback. In: International conference on pattern recognition, vol 15, pp 812–815
Bandyopadhyay S, Maulik U (2002) An evolutionary technique based on K-means algorithm for optimal clustering in RN. Inf Sci 146(1–4):221–237
Article MATH Google Scholar
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
MATH Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
MATH Google Scholar
Brotherton T, Johnson T (2001) Anomaly detection for advanced military aircraft using neural networks. In: Proceedings of 2001 IEEE aerospace conference
Cai X, Tyagi K, Manry MT (2011) An optimal construction and training of second order RBF network for approximation and illumination invariant image segmentation. In: The 2011 international joint conference on neural networks (IJCNN), pp 3120–3126
Chandrasekaran H, Li J, Delashmit WH, Narasimha PL, Yu C, Manry MT (2007) Convergent design of piecewise linear neural networks. Neurocomputing 70(4):1022–1039. http://www.sciencedirect.com/science/article/pii/S0925231206002372
Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203. doi:10.1016/j.patcog.2007.04.010. http://www.sciencedirect.com/science/article/pii/S0031320307002038
Chen G, Teboulle M (1994) A proximal-based decomposition method for convex minimization problems. Math Program 64(1):81–101. doi:10.1007/BF01582566
Article MathSciNet MATH Google Scholar
Chien MJ, Kuh E (1977) Solving nonlinear resistive networks using piecewise-linear analysis and simplicial subdivision. IEEE Trans Circuits Syst 24(6):305–317. doi:10.1109/TCS.1977.1084349
Article MathSciNet MATH Google Scholar
Cormen TH (2009) Introduction to algorithms. MIT Press, Cambridge
MATH Google Scholar
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553. doi:10.1016/j.dss.2009.05.016
Article Google Scholar
Craven MW, Shavlik JW (1997) Using neural networks for data mining. FGCS Future Gener Comput Syst 13(2–3):211–229
Article Google Scholar
Dawson MS, Olvera J, Fung AK, Manry MT (1992) Inversion of surface parameters using fast learning neural networks. In: IGARSS’92, pp 910–912
Dawson MS, Fung AK, Manry MT (1993) Surface parameter retrieval using fast learning neural networks. Remote Sens Rev 7(1):1–18
Article Google Scholar
Dettman JW (1988) Mathematical methods in physics and engineering. Dover Publications, New York
MATH Google Scholar
Du Q, Faber V, Gunzburger M (1999) Centroidal Voronoi tessellations: applications and algorithms. SIAM Rev 41(4):637–676. doi:10.1137/S0036144599352836
Article MathSciNet MATH Google Scholar
Fan J, Li R (2006) Statistical challenges with high dimensionality: feature selection in knowledge discovery. arXiv:math/0602133
Fan J, Fan Y, Lv J (2008) High dimensional covariance matrix estimation using a factor model. J Econom 147(1):186–197. http://www.sciencedirect.com/science/article/pii/S0304407608001346
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 1–67. http://www.jstor.org/stable/2241837
Fujisawa T, Kuh ES (1972) Piecewise-linear theory of nonlinear networks. SIAM J Appl Math 22(2):307–328. doi:10.1137/0122030
Article MathSciNet MATH Google Scholar
Guyon I (1991) Applications of neural networks to character recognition. Int J Pattern Recognit Artif Intell 5(1):353–382
Article Google Scholar
Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993
Article Google Scholar
Hammer B, Villmann T (2002) Generalized relevance learning vector quantization. Neural Netw 15(8):1059–1068. http://www.sciencedirect.com/science/article/pii/S0893608002000795
Haykin S (1994) Neural networks a comprehensive foundation. Macmillan [u.a.], New York. iD: 263545311
Karthikeyan M, Glen RC, Bender A (2005) General melting point prediction based on a diverse compound data set and artificial neural networks. J Chem Inf Model 45(3):581–590. http://pubs.acs.org/doi/abs/10.1021/ci0500132
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Article Google Scholar
Kuhn M (2013) QSARdata: quantitative structure activity relationship (QSAR) data sets. https://CRAN.R-project.org/package=QSARdata. R package version 1.3
Lawrence S, Giles CL, Tsoi AC, Back AD (1997) Face recognition: a convolutional neural-network approach. IEEE Trans Neural Netw 8(1):98–113
Article Google Scholar
Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2(2):164–168
Article MathSciNet MATH Google Scholar
Lewis FL, Jagannathan S, Yeildirek A (1998) Neural network control of robot manipulators and nonlinear systems. CRC, Boca Raton
Google Scholar
Li J, Manry MT, Narasimha PL, Yu C (2006) Feature selection using a piecewise linear network. IEEE Trans Neural Netw 17(5):1101–1115
Article Google Scholar
Lichman M (2013) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml
Lu H, Setiono R, Liu H (1996) Effective data mining using neural networks. IEEE Trans Knowl Data Eng 8(6):957–961
Article Google Scholar
Luo ZQ, Tseng P (1992) On the convergence of the coordinate descent method for convex differentiable minimization. J Optim Theory Appl 72(1):7–35. doi:10.1007/BF00939948
Article MathSciNet MATH Google Scholar
Maldonado FJ, Manry MT (2002) Optimal pruning of feedforward neural networks based upon the Schmidt procedure. In: Asilomar conference on signals systems and computers, IEEE; 1998, vol 2, pp 1024–1028
Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441
Article MathSciNet MATH Google Scholar
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179
Article Google Scholar
Nerrand O, Roussel-Ragot P, Personnaz L, Dreyfus G, Marcos S (1993) Neural networks and nonlinear adaptive filtering: unifying concepts and new algorithms. Neural Comput 5(2):165–199
Article Google Scholar
Nocedal J, Wright S (2006) Numerical optimization. Springer, Berlin
MATH Google Scholar
Oh Y, Sarabandi K, Ulaby FT (1992) An empirical model and an inversion technique for radar scattering from bare soil surfaces. IEEE Trans Geosci Remote Sens 30(2):370–381
Article Google Scholar
Ortega JM, Rheinboldt WC (1970) Iterative solution of nonlinear equations in several variables, vol 30. SIAM, Philadelphia
MATH Google Scholar
Pea JM, Lozano JA, Larraaga P (1999) An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recognit Lett 20(10):1027–1040. doi:10.1016/S0167-8655(99)00069-0. http://www.sciencedirect.com/science/article/pii/S0167865599000690
Samworth RJ (2012) Optimal weighted nearest neighbour classifiers. Ann Stat 40(5):2733–2763. doi:10.1214/12-AOS1049. http://projecteuclid.org/euclid.aos/1359987536
Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell 6(1):81–87
Article MATH Google Scholar
Shepherd AJ (2012) Second-order methods for neural networks: fast and reliable training methods for multi-layer perceptrons. Springer, Berlin
Google Scholar
Franti P et al (2015) Clustering datasets. http://cs.uef.fi/sipu/datasets/
Subbarayan S, Kim KK, Manry MT, Devarajan V, Chen HH (1996) Modular neural network architecture using piece-wise linear mapping. In: 1996 Conference record of the thirtieth Asilomar conference on signals, systems and computers, 1996, pp 1171–1175
Tikhonov AN, Arsenin VI (1977) Solutions of ill-posed problems. Winston, Washington
MATH Google Scholar
Turner R (2016) deldir: delaunay triangulation and dirichlet (Voronoi) tessellation. CRAN. https://CRAN.R-project.org/package=deldir. R package version 0.1-12
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Article Google Scholar
Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37(3):328–339. doi:10.1109/29.21701
Wang YJ, Lin CT (1998) A second-order learning algorithm for multilayer networks based on block Hessian matrix. Neural Netw 11(9):1607–1622
Article Google Scholar
White H (1988) Economic prediction using neural networks: the case of IBM daily stock returns. Proc IEEE Int Conf Neural Netw 2:451–458
Article Google Scholar
Wilson CL, Candela GT, Watson CI (1994) Neural network fingerprint classification. J Artif Neural Netw 1(2):203–228
Google Scholar
Yeh IC (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cem Concr Res 28(12):1797–1808. http://www.sciencedirect.com/science/article/pii/S0008884698001653

Download references

Acknowledgements

We thank the reviewers for their insightful suggestions.

Author information

Authors and Affiliations

The University of Texas at Arlington, Arlington, TX, USA
Rohit Rawat & Michael T. Manry

Authors

Rohit Rawat
View author publications
You can also search for this author in PubMed Google Scholar
Michael T. Manry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohit Rawat.

Additional information

The research was partially sponsored by the National Science Foundation Award CMMI-1434401.

Appendix

1.1 Calculations for Distance Measure Optimization

Taking the gradient of the error in Eq. (12) with respect to the distance measure weight change element $e_{b}(v)$,

$$\begin{aligned} \frac{\partial E}{\partial e_{b}\left( v \right) } = - \frac{2}{N_{{\mathrm {v}}}}\sum _{p = 1}^{N_{{\mathrm {v}}}}{\sum _{i = 1}^{M}{\left[t_{p}\left( i \right) - y_{p}\left( i \right) \right]\frac{\partial y_{p}\left( i \right) }{\partial e_{b}\left( v \right) }}} \end{aligned}$$

(36)

where

$$\begin{aligned} \frac{\partial y_{p}\left( i \right) }{\partial e_{b}\left( v \right) } = \sum _{k = 1}^{K}{\frac{\partial \theta \left( k \right) }{\partial e_{b}\left( v \right) } y_{\text {pk}}(i)} \end{aligned}$$

Here, we represent $d_{p}\left( k \right) $ with $d_{k}$ to improve readability.

$$\begin{aligned}&\frac{\partial \theta \left( k \right) }{\partial e_{b}\left( v \right) } = \frac{D\frac{\partial d_{k}^{- a}}{\partial e_{b}\left( v \right) } - d_{k}^{- a}\sum _{m = 1}^{K}\frac{\partial d_{m}^{- a}}{\partial e_{b}\left( v \right) }}{D^{2}}\\&\frac{\partial d_{k}^{- a}}{\partial e_{b}\left( v \right) } = - \frac{a}{d_{k}^{a + 1}}\frac{\partial d_{k}}{\partial e_{b}\left( v \right) }\\&\frac{\partial d_{k}}{\partial e_{b}\left( v \right) } = \left( x_{p}\left( v \right) - m_{k}\left( v \right) \right) ^{2} \end{aligned}$$

The elements of the Gauss–Newton Hessian matrix ${\mathbf {H}}_{b}$ are calculated as

$$\begin{aligned} h_{b}\left( u,v \right) = \frac{\partial ^{2}E}{\partial e_{b}\left( u \right) \ \partial e_{b}(v)} = \frac{2}{N_{{\mathrm {v}}}}\sum _{p = 1}^{N_{{\mathrm {v}}}}{\sum _{i = 1}^{M}{\frac{\partial y_{p}\left( i \right) }{\partial e_{b}\ (u)} \frac{\partial y_{p}\left( i \right) }{\partial e_{b}\ (v)}}} \end{aligned}$$

(37)

1.2 Calculations for Center Vector Optimization

The gradient of the SPLN error from Eq. (12) with respect to the uth cluster’s center vector element ${\mathbf {m}}_{u}(v)$ is calculated as:

$$\begin{aligned} g_{m}\left( u,v \right) = \frac{\partial E}{\partial m_{u}\left( v \right) } = - \frac{2}{N_{{\mathrm {v}}}}\sum _{p = 1}^{N_{{\mathrm {v}}}}{\sum _{i = 1}^{M}{\left[t_{p}\left( i \right) - y_{p}\left( i \right) \right]\frac{\partial y_{p}\left( i \right) }{\partial m_{u}(v)}}} \end{aligned}$$

(38)

where

$$\begin{aligned}&\frac{\partial y_{p}\left( i \right) }{\partial m_{u}(v)} = \sum _{k = 1}^{K}{\frac{\partial \theta \left( k \right) }{\partial m_{u}\left( v \right) } \ y_{\text {pk}}(i)}\\&\frac{\partial \theta \left( k \right) }{\partial m_{u}\left( v \right) } = \frac{\delta \left( u - k \right) \ D\frac{\partial d_{u}^{- a}}{\partial m_{u}\left( v \right) } - d_{k}^{- a}\frac{\partial d_{u}^{- a}}{\partial m_{u}\left( v \right) }}{D^{2}}\\&\frac{\partial d_{u}^{- a}}{\partial m_{u}\left( v \right) } = \frac{2 \; a}{d_{u}^{a + 1}} \ b(v) \left( x_{p}\left( v \right) - m_{u}\left( v \right) \right) \end{aligned}$$

The elements of the Gauss–Newton Hessian matrix ${\mathbf {H}}_{m}$ are given as

$$\begin{aligned} h_m\left( u,v \right) = \frac{\partial ^{2}E}{\partial z_{m}\left( u \right) \ \partial z_{m}(v)} = \frac{2}{N_{{\mathrm {v}}}}\sum _{p = 1}^{N_{{\mathrm {v}}}}{\sum _{i = 1}^{M}{\frac{\partial y_{p}\left( i \right) }{\partial z_{m}(u)} \frac{\partial y_{p}\left( i \right) }{\partial z_{m}(v)}}} \end{aligned}$$

(39)

and the gradient of the error with respect to the learning factor elements

$$\begin{aligned} g_{zm}\left( u \right) = \frac{\partial E}{\partial z_{m}\left( u \right) } = - \frac{2}{N_{{\mathrm {v}}}}\sum _{p = 1}^{N_{{\mathrm {v}}}}{\sum _{i = 1}^{M}{\left[t_{p}\left( i \right) - y_{p}\left( i \right) \right]\frac{\partial y_{p}\left( i \right) }{\partial z_{m}\left( u \right) }}} \end{aligned}$$

(40)

where

$$\begin{aligned}&\frac{\partial y_{p}\left( i \right) }{\partial z_{m}\left( u \right) } = \sum _{k = 1}^{K}{\frac{\partial \theta \left( k \right) }{\partial z_{m}\left( u \right) } y_{\text {pk}}(i)}\\&\frac{\partial \theta \left( k \right) }{\partial z_{m}(u)} = \frac{\delta \left( u - k \right) \, D\frac{\partial d_{u}^{- a}}{\partial z_{m}(u)} - d_{k}^{- a}\frac{\partial d_{u}^{- a}}{\partial z_{m}(u)}}{D^{2}}\\&\frac{\partial d_{u}^{- 1}}{\partial z_{m}(u)} = \frac{2 \, a}{d_{u}^{a + 1}}\sum _{v = 1}^{N}{b(v) \left( x_{p}\left( v \right) - m_{u}\left( v \right) \right) g_{m}(u,v)} \end{aligned}$$

1.3 Description of Datasets

(1), (2) Red and White wine quality data sets The two datasets are related to red and white variants of the Portuguese “Vinho Verde” wine [12]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

(3) Twod data set This training file is used in the task of inverting the surface scattering parameters from an inhomogeneous layer above a homogeneous half space, where both interfaces are randomly rough. The data file contains 2768 patterns. It has eight inputs and seven outputs [14, 15].

(4) Three spirals data This is a synthetic data set used in [8], which consists of three two-dimensional spirals, each labelled for a different class. It was converted to a regression problem by decoding the classes as binary outputs. It is available at [47].

(5) Oh7 data This data set is given in [41]. The training set contains VV and HH polarization at L-band $30^{\circ }$, $40^{\circ }$, C-band $10^{\circ }$, $30^{\circ }$, $40^{\circ }$, $50^{\circ }$, $60^{\circ }$, and X-band $30^{\circ }$, $40^{\circ }$, $50^{\circ }$ along with the corresponding unknowns rms surface height, surface correlation length, and volumetric soil moisture content in g / cubic cm. The file has 20 inputs, 3 outputs and 10,453 training patterns.

(6) Melting point data This data set comes from [26] where a robust and general model is developed for the prediction of melting points. It has a diverse set of 4401 examples of compounds with 202 descriptors that capture molecular physicochemical and other graph-based properties. It is included in the R package QSARdata [28].

(7) Concrete data This data set predicts compressive strength of high performance concrete from its components and age. It comprises of eight inputs, the first seven being the quantities of cement, slag, fly ash, water, superplasticizer, coarse aggregate, and fine aggregate in kg/m³. The eighth input is age in days. The output variable is the compressive strength in MPa [56].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rawat, R., Manry, M.T. Second Order Training of a Smoothed Piecewise Linear Network. Neural Process Lett 46, 915–942 (2017). https://doi.org/10.1007/s11063-017-9618-2

Download citation

Published: 05 April 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s11063-017-9618-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Second Order Training of a Smoothed Piecewise Linear Network

Abstract

Access this article

Similar content being viewed by others

Treating Artificial Neural Net Training as a Nonsmooth Global Optimization Problem

Piecewise linear neural networks and deep learning

Sobolev Training with Approximated Derivatives for Black-Box Function Regression with Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Calculations for Distance Measure Optimization

1.2 Calculations for Center Vector Optimization

1.3 Description of Datasets

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Second Order Training of a Smoothed Piecewise Linear Network

Abstract

Access this article

Similar content being viewed by others

Treating Artificial Neural Net Training as a Nonsmooth Global Optimization Problem

Piecewise linear neural networks and deep learning

Sobolev Training with Approximated Derivatives for Black-Box Function Regression with Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Calculations for Distance Measure Optimization

1.2 Calculations for Center Vector Optimization

1.3 Description of Datasets

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation