Advertisement

Machine Learning

, Volume 27, Issue 2, pp 173–200 | Cite as

Representing Probabilistic Rules with Networks of Gaussian Basis Functions

  • Volker Tresp
  • Jürgen Hollatz
  • Subutai Ahmad
Article

Abstract

There is great interest in understanding the intrinsic knowledge neural networks have acquired during training. Most work in this direction is focussed on the multi-layer perceptron architecture. The topic of this paper is networks of Gaussian basis functions which are used extensively as learning systems in neural computation. We show that networks of Gaussian basis functions can be generated from simple probabilistic rules. Also, if appropriate learning rules are used, probabilistic rules can be extracted from trained networks. We present methods for the reduction of network complexity with the goal of obtaining concise and meaningful rules. We show how prior knowledge can be refined or supplemented using data by employing either a Bayesian approach, by a weighted combination of knowledge bases, or by generating artificial training data representing the prior knowledge. We validate our approach using a standard statistical data set.

Neural networks theory refinement knowledge-based neural networks probability density estimation knowledge extraction mixture densities combining knowledge bases Bayesian learning 

References

  1. Ahmad, S., & Tresp, V. (1993). Some solutions to the missing feature problem in vision. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Neural Information Processing Systems 5, San Mateo, CA: Morgan Kaufmann.Google Scholar
  2. Bernardo, J. M., & Smith, A. F. M. (1993). Bayesian Theory. New York: J. Wiley & Sons.Google Scholar
  3. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.Google Scholar
  4. Buntine, W. L., & Weigend, A. S. (1991). Bayesian back-propagation. Complex Systems, 5, 605–643.Google Scholar
  5. Buntine, W. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.Google Scholar
  6. Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., & Freeman, D. (1988). AutoClass: A Bayesian classification system. Proceedings of the Fifth International Workshop on Machine Learning (pp. 54–64). Ann Arbor, MI: Morgan Kaufmann.Google Scholar
  7. Cover, T. M., & Thomas, J. A. (1991). Elements of Information Theory. New York: J. Wiley & Sons.Google Scholar
  8. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data vie the EM algorithm. J. Royal Statistical Society Series B, 39, 1–38.Google Scholar
  9. Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York: J. Wiley and Sons.Google Scholar
  10. Fu, L. M. (1989). Integration of neural heuristics into knowledge-based inference. Connection Science, 1, 325–340.Google Scholar
  11. Fu, L. M. (1991). Rule learning by searching on adapted nets. Proceedings of the National Conference on Artificial Intelligence (pp. 590–595). Anaheim, CA: Morgan Kaufmann.Google Scholar
  12. Gallant, S. I. (1988). Connectionist expert systems. Communications of the ACM, 31, 152–169.Google Scholar
  13. Ghahramani, Z., & Jordan, M. I. (1993). Function approximation via density estimation using an EM approach (Technical Report 9304). Cambridge, MA: MIT Computational Cognitive Sciences.Google Scholar
  14. Giles, C. L., & Omlin, C. W. (1992). Inserting rules into recurrent neural networks. In S. Kung, F. Fallside, J. A. Sorenson, & C. Kamm (Eds.), Neural Networks for Signal Processing 2, Piscataway: IEEE Press.Google Scholar
  15. Hampshire, J., & Waibel, A. (1989). The meta-pi network: building distributed knowledge representations for robust pattern recognition (Technical Report CMU-CS–89–166). Pittsburgh, PA: Carnegie Mellon University.Google Scholar
  16. Harrison, D., & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. J. Envir. Econ. and Management, 5, 81–102.Google Scholar
  17. Heckerman, D. (1995). A tutorial on learning Bayesian networks (Technical Report MSR-TR–95–06). Redmond, WA: Microsoft Research.Google Scholar
  18. Heckerman, D., Geiger, D., & Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.Google Scholar
  19. Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley.Google Scholar
  20. Hofmann, R., & Tresp, V. (1996). Discovering structure in continuous variables using Bayesian networks. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Neural Information Processing Systems 8, Cambridge, MA: MIT Press.Google Scholar
  21. Hollatz, J. (1993). Integration von regelbasiertem Wissen in neuronale Netze. Doctoral dissertation, Technische Universität, München.Google Scholar
  22. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, J. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3, 79–87.Google Scholar
  23. Jordan, M., & Jacobs, R. (1993). Hierarchical mixtures of experts and the EM algorithm (Technical Report TR 9302). Cambridge, MA: MIT Computational Cognitive Sciences.Google Scholar
  24. MacKay, J. C. (1992). Bayesian model comparison and backprop nets. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.Google Scholar
  25. Mendenhall, W., & Sincich, T. (1992). Statistics for Engineering and the Sciences. San Francisco, CA: Dellen Publishing Company.Google Scholar
  26. Moody, J. E., & Darken, C. (1989). Fast learning in networks of locally-tuned processing units, Neural Computation, 1, 281–294.Google Scholar
  27. Nowlan, S. J. (1990). Maximum likelihood competitive learning. In D. S. Touretzky (Ed.), Neural Information Processing Systems 2, San Mateo, CA: Morgan Kaufmann.Google Scholar
  28. Nowlan, S. J. (1991). Soft competitive adaptation: Neural network learning algorithms based on fitting statistical mixtures. Doctoral dissertation, Pittsburgh, PA: Carnegie Mellon University.Google Scholar
  29. Ormoneit, D., & Tresp, V. (1996) Improved Gaussian mixture density estimates using Bayesian penalty terms and network averaging. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Neural Information Processing Systems 8, Cambridge, MA: MIT Press.Google Scholar
  30. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann.Google Scholar
  31. Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78, 1481–1497.Google Scholar
  32. Röscheisen, M., Hofmann, R., & Tresp, V. (1992). Neural control for rolling mills: Incorporating domain theories to overcome data deficiency. In J. E. Moody, J. E. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.Google Scholar
  33. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel Distributed Processing: Exploration in the Microstructures of Cognition (Vol. 1), Cambridge, MA: MIT Press.Google Scholar
  34. Sen, A., & Srivastava, M. (1990). Regression Analysis. New York: Springer Verlag.Google Scholar
  35. Shavlik, J. W., & Towell, G. G. (1989). An approach to combining explanation-based and neural learning algorithms. Connection Science, 1, 233–255.Google Scholar
  36. Smyth, P. (1994). Probability density estimation and local basis function neural networks. In Hanson, S., Petsche, T, Kearns, M., & Rivest, R. (Eds.), Computational Learning Theory and Natural Learning Systems, Cambridge, MA: MIT Press.Google Scholar
  37. Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3, 109–117.Google Scholar
  38. Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2, 568–576.Google Scholar
  39. Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–278.Google Scholar
  40. Thrun, S. (1995). Extracting rules from artificial neural networks with distributed representations. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Neural Information Processing Systems 7, MIT Press, Cambridge, MA.Google Scholar
  41. Towell, G. G., & Shavlik, J.W. (1993). Extracting refined rules from knowledge-based neural networks. Machine Learning, 13, 71–101.Google Scholar
  42. Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based neural networks. Artificial Intelligence, 70, 119–165.Google Scholar
  43. Tresp, V., Hollatz J., & Ahmad, S. (1993). Network structuring and training using rule-based knowledge. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Neural Information Processing Systems 5, San Mateo, CA: Morgan Kaufmann.Google Scholar
  44. Tresp, V., Ahmad, S., & Neuneier, R. (1994). Training neural networks with deficient data. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Neural Information Processing Systems 6, San Mateo, CA: Morgan Kaufmann.Google Scholar
  45. Wang, L.-X., & Mendel, J. M. (1992). Fuzzy basis functions, universal approximation, and orthogonal least-squares learning. IEEE Transactions on Neural Networks, 3, 807–814.Google Scholar
  46. Wettscherek, D., & Dietterich, T. (1992). Improving the performance of radial basis function networks by learning center locations. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Volker Tresp
    • 1
  • Jürgen Hollatz
    • 1
  • Subutai Ahmad
    • 2
  1. 1.Siemens AG, Central ResearchMünchenGermany
  2. 2.Interval Research CorporationPalo Alto

Personalised recommendations