Advertisement

Neural Computing & Applications

, Volume 6, Issue 1, pp 19–41 | Cite as

Combining linear discriminant functions with neural networks for supervised learning

  • Ke ChenEmail author
  • Xiang Yu
  • Huisheng Chi
Articles

Abstract

A novel supervised learning method is proposed by combining linear discriminant functions with neural networks. The proposed method results in a tree-structured hybrid architecture. Due to constructive learning, the binary tree hierarchical architecture is automatically generated by a controlled growing process for a specific supervised learning task. Unlike the classic decision tree, the linear discriminant functions are merely employed in the intermediate level of the tree for heuristically partitioning a large and complicated task into several smaller and simpler subtasks in the proposed method. These subtasks are dealt with by component neural networks at the leaves of the tree accordingly. For constructive learning, growing and credit-assignment algorithms are developed to serve for the hybrid architecture. The proposed architecture provides an efficient way to apply existing neural networks (e.g. multi-layered perceptron) for solving a large scale problem. We have already applied the proposed method to a universal approximation problem and several benchmark classification problems in order to evaluate its performance. Simulation results have shown that the proposed method yields better results and faster training in comparison with the multilayered perceptron.

Keywords

Constructive learning Divide-and-conquer Linear discriminant function Modular and hierarchical architecture Multi-layered perceptron Supervised learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bishop M. Neural Networks for Pattern Recognition. Oxford University Press, 1995.Google Scholar
  2. 2.
    Cohen M, Franco H, Morgan N, Rumelhart D, Abrash V. Context-dependent multiple distribution phonetic modeling with MLPs. In: SJ Hanson, JD Cowan, CL Giles (eds.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1993, pp. 649–657.Google Scholar
  3. 3.
    Gyuyon I, Albrecht P, LeCun Y, Denker J, Hubbard W. Applications of neural networks to character recognition. Int J Pattern Recognition and Artificial Intelligence 1991; 5: 353–382.Google Scholar
  4. 4.
    Haykin S, Deng C. Classification of radar clutter using neural networks. IEEE Trans Neural Networks 1991; 2: 589–600.Google Scholar
  5. 5.
    LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W. Handwritten digit recognition with a back-propagation network. In: DS Touretsky, (ed.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1990, pp. 396–404.Google Scholar
  6. 6.
    Narendra KS, Parthasarathy K. Indentification and control of dynamical systems using neural networks. IEEE Trans Neural Networks 1990; 1: 4–27.Google Scholar
  7. 7.
    Pomerleau DA. Neural network perception for mobile robot guidance. PhD Thesis, School of Computer Science, Carnegie Mellon University, 1992.Google Scholar
  8. 8.
    Rajavelu A, Musavi M, Shivaikar M. A neural network approach to character recognition. Neural Networks 1989: 2(5): 387–394.Google Scholar
  9. 9.
    Rumelhart D, McClelland J. Parallel Distributed Processing. MIT Press, Cambridge, MA, 1986.Google Scholar
  10. 10.
    Sejnowski TJ, Resenberg CR. Parallel networks that learn to pronounce English text. Complex Systems 1987; 1: 145–168.Google Scholar
  11. 11.
    Sejnowski TJ, Yuhas BP, Goldstein MH, Jenkins RE. Combining visual and acoustic speech signals with a neural network improves intelligibility. In: DS Touretsky (ed.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1990, pp. 232–239.Google Scholar
  12. 12.
    Hornik K, Stinchcombe M, White H. Multilayer feed-forward networks are universal approximators. Neural Networks 1989; 2: 359–366.Google Scholar
  13. 13.
    Irie B, Miyake S. Capabilities of three-layered perceptrons. Proc IEEE Int Conf Neural Networks, vol 1, 1988; pp. 641–648.Google Scholar
  14. 14.
    Judd S. Learning in networks is hard. Proc IEEE Int Conf Neural Networks, vol 2, 1987, pp. 685–692.Google Scholar
  15. 15.
    Jacobs RA. Increased rates of convergence through learning rate adaptation. Neural Networks 1988; 1: 295–307.Google Scholar
  16. 16.
    Van Der Smagt PP. Minimization methods for training feedforward neural networks. Neural Networks 1994; 7(1): 1–11.Google Scholar
  17. 17.
    Ripley BD. Pattern Recognition and Neural Networks. Cambridge University Press, New York, 1996.Google Scholar
  18. 18.
    Wahba G. Generalization and regularization in nonlinear learning systems. In: MA Arbib (ed.), The Handbook of Brain Theory and Neural Networks. MIT Press, 1995, pp. 426–430.Google Scholar
  19. 19.
    Fahlman SE, Lebiere C. The cascade-correlation learning architecture. In: DS Touretsky (ed.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1990, pp. 524–532.Google Scholar
  20. 20.
    Nadal JP. New algorithms for feedforward networks. In: Theumann and Kiberle (eds.), Neural Networks and Spin Glasses. World Scientific, 1989, pp. 80–88.Google Scholar
  21. 21.
    Shadafan RS, Niranjan M. A dynamic neural network architecture by sequential partitioning of the input space. Neural Computation 1994; 6: 1202–1222.Google Scholar
  22. 22.
    Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Wadsworth & Brooks, 1984.Google Scholar
  23. 23.
    Brown DE, Pittard CL. Classification trees with optimal multivariate splits. Proc IEEE Int Conf Systems, Man and Cybernetics, vol 3, Le Touquet, 1993, pp. 475–477.Google Scholar
  24. 24.
    Friedman JH. A recursive partitioning decision rule for nonparametric classification. IEEE Trans Computer 1977; 26: 404–408.Google Scholar
  25. 25.
    Kim B, Landgrebe DA. Hierarchical classifier design in high-dimensional numerous class cases. IEEE Trans Geosci Remote Sens 1991; 29(4): 518–528.Google Scholar
  26. 26.
    Murthy KVS. On growing better decision trees from data. PhD Thesis, The Johns Hopkins University, 1995.Google Scholar
  27. 27.
    Park Y, Sklansky J. Automated design of linear tree classifiers. Patt Recogn 1990; 23(12): 1393–1412.Google Scholar
  28. 28.
    Shi QY, Fu KS. A method for the design of binary tree classifiers. Patt Recogn 1983; 16: 593–603.Google Scholar
  29. 29.
    Sklansky J, Wassel GN. Pattern Classifiers and Trainable Machines. Springer-Verlag, New York, 1981.Google Scholar
  30. 30.
    Curram SP, Mingers J. Neural networks, decision tree induction and discriminant analysis: An empirical comparison. J Operat Res Soc 1994; 45(4): 440–450.Google Scholar
  31. 31.
    Park Y. A comparison of neural net classifiers and linear tree classifiers: their similarities and differences. Patt Recogn 1994; 27(11): 1493–1503.Google Scholar
  32. 32.
    Cios KJ, Liu N. A machine learning method for generation of a neural network architecture: A continuous ID3 algorithm. IEEE Trans Neural Networks 1992; 3(2): 280–291.Google Scholar
  33. 33.
    Golea M, Marchand M. A growth algorithm for neural network decision trees. EuroPhysics Lett 1990; 12(3): 205–210.Google Scholar
  34. 34.
    Guo H, Gelfand SB. Classification trees with neural network feature extraction. IEEE Trans Neural Networks 1992; 3(6): 923–933.Google Scholar
  35. 35.
    Herman GT, Yeung KTD. On piecewise-linear classification. IEEE Trans Pattern Analysis and Machine Intelligence 1992; 14(7): 782–786.Google Scholar
  36. 36.
    Ishwar K, Sethi K. Entropy nets: from decision trees to neural networks. Proc IEEE 1990; 78(10): 1605–1613.Google Scholar
  37. 37.
    Dalche-Buc F, Zwierski D, Nadal JP. Trio learning: A new strategy for building hybrid neural trees. Int J Neural Systems 1994; 5(4): 259–274.Google Scholar
  38. 38.
    Sankar A, Mammone RJ. Growing and pruning neural tree networks. IEEE Trans Computer 1993; 42(3): 291–299.Google Scholar
  39. 39.
    Sirat JA, Nadal JP. Neural tree: A new tool for classification. Network: Computation in Neural Systems 1990; 1(4): 423–438.Google Scholar
  40. 40.
    Jordan MI, Jacobs RA. Hierarchical mixture of experts and the EM algorithm. Neural Computation 1994; 6: 181–214.Google Scholar
  41. 41.
    Chen K, Xie DH, Chi HS. A modified HME architecture for text-dependent speaker identification. IEEE Trans Neural Networks 1996; 7(5): 1309–1313.Google Scholar
  42. 42.
    Chen K, Xie DH, Chi HS. Speaker identification using time-delay HMEs. Int J Neural Systems 1996; 7(1): 29–43.Google Scholar
  43. 43.
    Chen K, Yang LP, Yu X, Chi HS. A self-generating modular neural network architecture for supervised learning. Neurocomputing 1997; 16(1): 33–48.Google Scholar
  44. 44.
    Fisher RA. The use of multiple measurements in taxonomic problem. Ann Eugenics 1936; 7: 179–188.Google Scholar
  45. 45.
    Duda R, Hart P. Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 1973.Google Scholar
  46. 46.
    Murthy PM, Aha DW. UCI Repository of machine learning database. [http://www.ics.uci.edu/mlearn/MLRepository.html], Department of Information and Computer Science, Irvine, CA: University of California, 1994.Google Scholar
  47. 47.
    Fletcher R. Practical Methods of Optimization. John Wiley & Sons, New York, 1987.Google Scholar
  48. 48.
    Ishikawa M. Structural learning with forgetting. Neural Networks 1996; 9(3): 509–521.Google Scholar
  49. 49.
    Lang KJ, Witbrock MJ. Learning to tell two spirals apart. In: D Touretzky, G Hinton, T Sejnowski (eds.), Proc 1988 Connectionist Models Summer School, Morgan Kaufmann, 1989; 52–59.Google Scholar
  50. 50.
    Deterding DH. Speaker normalization for automatic speech recognition. PhD Thesis, University of Cambridge, 1989.Google Scholar
  51. 51.
    Robinson AJ. Dynamic error propagation networks. PhD Thesis, University of Cambridge, 1989.Google Scholar
  52. 52.
    Cybenko G. Approximation by superpositions of a sigmoidal function. University of Illinois, Urbana, 1988.Google Scholar
  53. 53.
    Funahashi K. On the approximate realization of continuous mappings by neural networks. Neural Networks 1989; 2: 183–192.Google Scholar
  54. 54.
    Chen K, Yu X, Chi HS. Text-dependent speaker identification based on the modular tree. Chinese J Electr 1996; 5(2): 63–69.Google Scholar
  55. 55.
    Chen K, Yu X, Chi HS. Text-dependent speaker identification based on the modular tree: an empirical study. In: S Amariet al. (eds.), Progress in Neural Information Processing. 1996, Springer-Verlag, Singapore, pp. 294–299.Google Scholar
  56. 56.
    Blum A, Rivest R. Training a 3-node neural net is NP-complete. In: DS Touretsky (ed.), Advances in Neural Information Processing Systems, Morgan Kaufmann, 1989, pp. 494–501.Google Scholar
  57. 57.
    Minsky M, Papert S. Perceptrons: An Introduction to Computational Geometry. MIT Press, Camridge, 1988.Google Scholar
  58. 58.
    Wolpert DH. Stacked generalization. Technical Report LA-UR-90-3460, The Santa Fe Institute, 1990.Google Scholar

Copyright information

© Springer-Verlag London Limited 1997

Authors and Affiliations

  1. 1.National Laboratory of Machine Perception and Center for Information SciencePeking UniversityBeijingChina
  2. 2.Department of Computer and Information Science and The Center for Cognitive ScienceThe Ohio State UniversityColumbusUSA

Personalised recommendations