Second Order Training of a Smoothed Piecewise Linear Network

Article

Abstract

In this paper, we introduce a smoothed piecewise linear network (SPLN) and develop second order training algorithms for it. An embedded feature selection algorithm is developed which minimizes training error with respect to distance measure weights. Then a method is presented which adjusts center vector locations in the SPLN. We also present a gradient method for optimizing the SPLN output weights. Results with several data sets show that the distance measure optimization, center vector optimization, and output weight optimization, individually and together, reduce testing errors in the final network.

Keywords

Smoothed PLN Embedded feature selection Optimizing center vectors 

References

  1. 1.
    Aksoy S, Haralick R, Cheikh F, Gabbouj M (2000) A weighted distance approach to relevance feedback. In: International conference on pattern recognition, vol 15, pp 812–815Google Scholar
  2. 2.
    Bandyopadhyay S, Maulik U (2002) An evolutionary technique based on K-means algorithm for optimal clustering in RN. Inf Sci 146(1–4):221–237CrossRefMATHGoogle Scholar
  3. 3.
    Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonMATHGoogle Scholar
  4. 4.
    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, BelmontMATHGoogle Scholar
  5. 5.
    Brotherton T, Johnson T (2001) Anomaly detection for advanced military aircraft using neural networks. In: Proceedings of 2001 IEEE aerospace conferenceGoogle Scholar
  6. 6.
    Cai X, Tyagi K, Manry MT (2011) An optimal construction and training of second order RBF network for approximation and illumination invariant image segmentation. In: The 2011 international joint conference on neural networks (IJCNN), pp 3120–3126Google Scholar
  7. 7.
    Chandrasekaran H, Li J, Delashmit WH, Narasimha PL, Yu C, Manry MT (2007) Convergent design of piecewise linear neural networks. Neurocomputing 70(4):1022–1039. http://www.sciencedirect.com/science/article/pii/S0925231206002372
  8. 8.
    Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203. doi:10.1016/j.patcog.2007.04.010. http://www.sciencedirect.com/science/article/pii/S0031320307002038
  9. 9.
    Chen G, Teboulle M (1994) A proximal-based decomposition method for convex minimization problems. Math Program 64(1):81–101. doi:10.1007/BF01582566 MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Chien MJ, Kuh E (1977) Solving nonlinear resistive networks using piecewise-linear analysis and simplicial subdivision. IEEE Trans Circuits Syst 24(6):305–317. doi:10.1109/TCS.1977.1084349 MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Cormen TH (2009) Introduction to algorithms. MIT Press, CambridgeMATHGoogle Scholar
  12. 12.
    Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553. doi:10.1016/j.dss.2009.05.016 CrossRefGoogle Scholar
  13. 13.
    Craven MW, Shavlik JW (1997) Using neural networks for data mining. FGCS Future Gener Comput Syst 13(2–3):211–229CrossRefGoogle Scholar
  14. 14.
    Dawson MS, Olvera J, Fung AK, Manry MT (1992) Inversion of surface parameters using fast learning neural networks. In: IGARSS’92, pp 910–912Google Scholar
  15. 15.
    Dawson MS, Fung AK, Manry MT (1993) Surface parameter retrieval using fast learning neural networks. Remote Sens Rev 7(1):1–18CrossRefGoogle Scholar
  16. 16.
    Dettman JW (1988) Mathematical methods in physics and engineering. Dover Publications, New YorkMATHGoogle Scholar
  17. 17.
    Du Q, Faber V, Gunzburger M (1999) Centroidal Voronoi tessellations: applications and algorithms. SIAM Rev 41(4):637–676. doi:10.1137/S0036144599352836 MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Fan J, Li R (2006) Statistical challenges with high dimensionality: feature selection in knowledge discovery. arXiv:math/0602133
  19. 19.
    Fan J, Fan Y, Lv J (2008) High dimensional covariance matrix estimation using a factor model. J Econom 147(1):186–197. http://www.sciencedirect.com/science/article/pii/S0304407608001346
  20. 20.
    Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 1–67. http://www.jstor.org/stable/2241837
  21. 21.
    Fujisawa T, Kuh ES (1972) Piecewise-linear theory of nonlinear networks. SIAM J Appl Math 22(2):307–328. doi:10.1137/0122030 MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Guyon I (1991) Applications of neural networks to character recognition. Int J Pattern Recognit Artif Intell 5(1):353–382CrossRefGoogle Scholar
  23. 23.
    Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993CrossRefGoogle Scholar
  24. 24.
    Hammer B, Villmann T (2002) Generalized relevance learning vector quantization. Neural Netw 15(8):1059–1068. http://www.sciencedirect.com/science/article/pii/S0893608002000795
  25. 25.
    Haykin S (1994) Neural networks a comprehensive foundation. Macmillan [u.a.], New York. iD: 263545311Google Scholar
  26. 26.
    Karthikeyan M, Glen RC, Bender A (2005) General melting point prediction based on a diverse compound data set and artificial neural networks. J Chem Inf Model 45(3):581–590. http://pubs.acs.org/doi/abs/10.1021/ci0500132
  27. 27.
    Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480CrossRefGoogle Scholar
  28. 28.
    Kuhn M (2013) QSARdata: quantitative structure activity relationship (QSAR) data sets. https://CRAN.R-project.org/package=QSARdata. R package version 1.3
  29. 29.
    Lawrence S, Giles CL, Tsoi AC, Back AD (1997) Face recognition: a convolutional neural-network approach. IEEE Trans Neural Netw 8(1):98–113CrossRefGoogle Scholar
  30. 30.
    Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2(2):164–168MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Lewis FL, Jagannathan S, Yeildirek A (1998) Neural network control of robot manipulators and nonlinear systems. CRC, Boca RatonGoogle Scholar
  32. 32.
    Li J, Manry MT, Narasimha PL, Yu C (2006) Feature selection using a piecewise linear network. IEEE Trans Neural Netw 17(5):1101–1115CrossRefGoogle Scholar
  33. 33.
    Lichman M (2013) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml
  34. 34.
    Lu H, Setiono R, Liu H (1996) Effective data mining using neural networks. IEEE Trans Knowl Data Eng 8(6):957–961CrossRefGoogle Scholar
  35. 35.
    Luo ZQ, Tseng P (1992) On the convergence of the coordinate descent method for convex differentiable minimization. J Optim Theory Appl 72(1):7–35. doi:10.1007/BF00939948 MathSciNetCrossRefMATHGoogle Scholar
  36. 36.
    Maldonado FJ, Manry MT (2002) Optimal pruning of feedforward neural networks based upon the Schmidt procedure. In: Asilomar conference on signals systems and computers, IEEE; 1998, vol 2, pp 1024–1028Google Scholar
  37. 37.
    Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441MathSciNetCrossRefMATHGoogle Scholar
  38. 38.
    Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179CrossRefGoogle Scholar
  39. 39.
    Nerrand O, Roussel-Ragot P, Personnaz L, Dreyfus G, Marcos S (1993) Neural networks and nonlinear adaptive filtering: unifying concepts and new algorithms. Neural Comput 5(2):165–199CrossRefGoogle Scholar
  40. 40.
    Nocedal J, Wright S (2006) Numerical optimization. Springer, BerlinMATHGoogle Scholar
  41. 41.
    Oh Y, Sarabandi K, Ulaby FT (1992) An empirical model and an inversion technique for radar scattering from bare soil surfaces. IEEE Trans Geosci Remote Sens 30(2):370–381CrossRefGoogle Scholar
  42. 42.
    Ortega JM, Rheinboldt WC (1970) Iterative solution of nonlinear equations in several variables, vol 30. SIAM, PhiladelphiaMATHGoogle Scholar
  43. 43.
    Pea JM, Lozano JA, Larraaga P (1999) An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recognit Lett 20(10):1027–1040. doi:10.1016/S0167-8655(99)00069-0. http://www.sciencedirect.com/science/article/pii/S0167865599000690
  44. 44.
    Samworth RJ (2012) Optimal weighted nearest neighbour classifiers. Ann Stat 40(5):2733–2763. doi:10.1214/12-AOS1049. http://projecteuclid.org/euclid.aos/1359987536
  45. 45.
    Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell 6(1):81–87CrossRefMATHGoogle Scholar
  46. 46.
    Shepherd AJ (2012) Second-order methods for neural networks: fast and reliable training methods for multi-layer perceptrons. Springer, BerlinGoogle Scholar
  47. 47.
    Franti P et al (2015) Clustering datasets. http://cs.uef.fi/sipu/datasets/
  48. 48.
    Subbarayan S, Kim KK, Manry MT, Devarajan V, Chen HH (1996) Modular neural network architecture using piece-wise linear mapping. In: 1996 Conference record of the thirtieth Asilomar conference on signals, systems and computers, 1996, pp 1171–1175Google Scholar
  49. 49.
    Tikhonov AN, Arsenin VI (1977) Solutions of ill-posed problems. Winston, WashingtonMATHGoogle Scholar
  50. 50.
    Turner R (2016) deldir: delaunay triangulation and dirichlet (Voronoi) tessellation. CRAN. https://CRAN.R-project.org/package=deldir. R package version 0.1-12
  51. 51.
    Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999CrossRefGoogle Scholar
  52. 52.
    Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37(3):328–339. doi:10.1109/29.21701
  53. 53.
    Wang YJ, Lin CT (1998) A second-order learning algorithm for multilayer networks based on block Hessian matrix. Neural Netw 11(9):1607–1622CrossRefGoogle Scholar
  54. 54.
    White H (1988) Economic prediction using neural networks: the case of IBM daily stock returns. Proc IEEE Int Conf Neural Netw 2:451–458CrossRefGoogle Scholar
  55. 55.
    Wilson CL, Candela GT, Watson CI (1994) Neural network fingerprint classification. J Artif Neural Netw 1(2):203–228Google Scholar
  56. 56.
    Yeh IC (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cem Concr Res 28(12):1797–1808. http://www.sciencedirect.com/science/article/pii/S0008884698001653

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.The University of Texas at ArlingtonArlingtonUSA

Personalised recommendations