Representing Probabilistic Rules with Networks of Gaussian Basis Functions

Abstract

There is great interest in understanding the intrinsic knowledge neural networks have acquired during training. Most work in this direction is focussed on the multi-layer perceptron architecture. The topic of this paper is networks of Gaussian basis functions which are used extensively as learning systems in neural computation. We show that networks of Gaussian basis functions can be generated from simple probabilistic rules. Also, if appropriate learning rules are used, probabilistic rules can be extracted from trained networks. We present methods for the reduction of network complexity with the goal of obtaining concise and meaningful rules. We show how prior knowledge can be refined or supplemented using data by employing either a Bayesian approach, by a weighted combination of knowledge bases, or by generating artificial training data representing the prior knowledge. We validate our approach using a standard statistical data set.

References

  1. Ahmad, S., & Tresp, V. (1993). Some solutions to the missing feature problem in vision. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Neural Information Processing Systems 5, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  2. Bernardo, J. M., & Smith, A. F. M. (1993). Bayesian Theory. New York: J. Wiley & Sons.

    Google Scholar 

  3. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.

    Google Scholar 

  4. Buntine, W. L., & Weigend, A. S. (1991). Bayesian back-propagation. Complex Systems, 5, 605–643.

    Google Scholar 

  5. Buntine, W. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.

    Google Scholar 

  6. Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., & Freeman, D. (1988). AutoClass: A Bayesian classification system. Proceedings of the Fifth International Workshop on Machine Learning (pp. 54–64). Ann Arbor, MI: Morgan Kaufmann.

    Google Scholar 

  7. Cover, T. M., & Thomas, J. A. (1991). Elements of Information Theory. New York: J. Wiley & Sons.

    Google Scholar 

  8. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data vie the EM algorithm. J. Royal Statistical Society Series B, 39, 1–38.

    Google Scholar 

  9. Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York: J. Wiley and Sons.

    Google Scholar 

  10. Fu, L. M. (1989). Integration of neural heuristics into knowledge-based inference. Connection Science, 1, 325–340.

    Google Scholar 

  11. Fu, L. M. (1991). Rule learning by searching on adapted nets. Proceedings of the National Conference on Artificial Intelligence (pp. 590–595). Anaheim, CA: Morgan Kaufmann.

    Google Scholar 

  12. Gallant, S. I. (1988). Connectionist expert systems. Communications of the ACM, 31, 152–169.

    Google Scholar 

  13. Ghahramani, Z., & Jordan, M. I. (1993). Function approximation via density estimation using an EM approach (Technical Report 9304). Cambridge, MA: MIT Computational Cognitive Sciences.

    Google Scholar 

  14. Giles, C. L., & Omlin, C. W. (1992). Inserting rules into recurrent neural networks. In S. Kung, F. Fallside, J. A. Sorenson, & C. Kamm (Eds.), Neural Networks for Signal Processing 2, Piscataway: IEEE Press.

    Google Scholar 

  15. Hampshire, J., & Waibel, A. (1989). The meta-pi network: building distributed knowledge representations for robust pattern recognition (Technical Report CMU-CS–89–166). Pittsburgh, PA: Carnegie Mellon University.

    Google Scholar 

  16. Harrison, D., & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. J. Envir. Econ. and Management, 5, 81–102.

    Google Scholar 

  17. Heckerman, D. (1995). A tutorial on learning Bayesian networks (Technical Report MSR-TR–95–06). Redmond, WA: Microsoft Research.

    Google Scholar 

  18. Heckerman, D., Geiger, D., & Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.

    Google Scholar 

  19. Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley.

    Google Scholar 

  20. Hofmann, R., & Tresp, V. (1996). Discovering structure in continuous variables using Bayesian networks. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Neural Information Processing Systems 8, Cambridge, MA: MIT Press.

    Google Scholar 

  21. Hollatz, J. (1993). Integration von regelbasiertem Wissen in neuronale Netze. Doctoral dissertation, Technische Universität, München.

    Google Scholar 

  22. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, J. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3, 79–87.

    Google Scholar 

  23. Jordan, M., & Jacobs, R. (1993). Hierarchical mixtures of experts and the EM algorithm (Technical Report TR 9302). Cambridge, MA: MIT Computational Cognitive Sciences.

    Google Scholar 

  24. MacKay, J. C. (1992). Bayesian model comparison and backprop nets. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  25. Mendenhall, W., & Sincich, T. (1992). Statistics for Engineering and the Sciences. San Francisco, CA: Dellen Publishing Company.

    Google Scholar 

  26. Moody, J. E., & Darken, C. (1989). Fast learning in networks of locally-tuned processing units, Neural Computation, 1, 281–294.

    Google Scholar 

  27. Nowlan, S. J. (1990). Maximum likelihood competitive learning. In D. S. Touretzky (Ed.), Neural Information Processing Systems 2, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  28. Nowlan, S. J. (1991). Soft competitive adaptation: Neural network learning algorithms based on fitting statistical mixtures. Doctoral dissertation, Pittsburgh, PA: Carnegie Mellon University.

    Google Scholar 

  29. Ormoneit, D., & Tresp, V. (1996) Improved Gaussian mixture density estimates using Bayesian penalty terms and network averaging. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Neural Information Processing Systems 8, Cambridge, MA: MIT Press.

    Google Scholar 

  30. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  31. Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78, 1481–1497.

    Google Scholar 

  32. Röscheisen, M., Hofmann, R., & Tresp, V. (1992). Neural control for rolling mills: Incorporating domain theories to overcome data deficiency. In J. E. Moody, J. E. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  33. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel Distributed Processing: Exploration in the Microstructures of Cognition (Vol. 1), Cambridge, MA: MIT Press.

    Google Scholar 

  34. Sen, A., & Srivastava, M. (1990). Regression Analysis. New York: Springer Verlag.

    Google Scholar 

  35. Shavlik, J. W., & Towell, G. G. (1989). An approach to combining explanation-based and neural learning algorithms. Connection Science, 1, 233–255.

    Google Scholar 

  36. Smyth, P. (1994). Probability density estimation and local basis function neural networks. In Hanson, S., Petsche, T, Kearns, M., & Rivest, R. (Eds.), Computational Learning Theory and Natural Learning Systems, Cambridge, MA: MIT Press.

    Google Scholar 

  37. Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3, 109–117.

    Google Scholar 

  38. Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2, 568–576.

    Google Scholar 

  39. Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–278.

    Google Scholar 

  40. Thrun, S. (1995). Extracting rules from artificial neural networks with distributed representations. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Neural Information Processing Systems 7, MIT Press, Cambridge, MA.

    Google Scholar 

  41. Towell, G. G., & Shavlik, J.W. (1993). Extracting refined rules from knowledge-based neural networks. Machine Learning, 13, 71–101.

    Google Scholar 

  42. Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based neural networks. Artificial Intelligence, 70, 119–165.

    Google Scholar 

  43. Tresp, V., Hollatz J., & Ahmad, S. (1993). Network structuring and training using rule-based knowledge. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Neural Information Processing Systems 5, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  44. Tresp, V., Ahmad, S., & Neuneier, R. (1994). Training neural networks with deficient data. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Neural Information Processing Systems 6, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  45. Wang, L.-X., & Mendel, J. M. (1992). Fuzzy basis functions, universal approximation, and orthogonal least-squares learning. IEEE Transactions on Neural Networks, 3, 807–814.

    Google Scholar 

  46. Wettscherek, D., & Dietterich, T. (1992). Improving the performance of radial basis function networks by learning center locations. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Tresp, V., Hollatz, J. & Ahmad, S. Representing Probabilistic Rules with Networks of Gaussian Basis Functions. Machine Learning 27, 173–200 (1997). https://doi.org/10.1023/A:1007381408604

Download citation

  • Neural networks
  • theory refinement
  • knowledge-based neural networks
  • probability density estimation
  • knowledge extraction
  • mixture densities
  • combining knowledge bases
  • Bayesian learning