Representing Probabilistic Rules with Networks of Gaussian Basis Functions

Tresp, Volker; Hollatz, Jürgen; Ahmad, Subutai

doi:10.1023/A:1007381408604

Representing Probabilistic Rules with Networks of Gaussian Basis Functions

Published: May 1997

Volume 27, pages 173–200, (1997)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Representing Probabilistic Rules with Networks of Gaussian Basis Functions

Download PDF

Volker Tresp¹,
Jürgen Hollatz¹ &
Subutai Ahmad²

537 Accesses
9 Citations
Explore all metrics

Abstract

There is great interest in understanding the intrinsic knowledge neural networks have acquired during training. Most work in this direction is focussed on the multi-layer perceptron architecture. The topic of this paper is networks of Gaussian basis functions which are used extensively as learning systems in neural computation. We show that networks of Gaussian basis functions can be generated from simple probabilistic rules. Also, if appropriate learning rules are used, probabilistic rules can be extracted from trained networks. We present methods for the reduction of network complexity with the goal of obtaining concise and meaningful rules. We show how prior knowledge can be refined or supplemented using data by employing either a Bayesian approach, by a weighted combination of knowledge bases, or by generating artificial training data representing the prior knowledge. We validate our approach using a standard statistical data set.

References

Ahmad, S., & Tresp, V. (1993). Some solutions to the missing feature problem in vision. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Neural Information Processing Systems 5, San Mateo, CA: Morgan Kaufmann.
Google Scholar
Bernardo, J. M., & Smith, A. F. M. (1993). Bayesian Theory. New York: J. Wiley & Sons.
Google Scholar
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.
Google Scholar
Buntine, W. L., & Weigend, A. S. (1991). Bayesian back-propagation. Complex Systems, 5, 605–643.
Google Scholar
Buntine, W. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.
Google Scholar
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., & Freeman, D. (1988). AutoClass: A Bayesian classification system. Proceedings of the Fifth International Workshop on Machine Learning (pp. 54–64). Ann Arbor, MI: Morgan Kaufmann.
Google Scholar
Cover, T. M., & Thomas, J. A. (1991). Elements of Information Theory. New York: J. Wiley & Sons.
Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data vie the EM algorithm. J. Royal Statistical Society Series B, 39, 1–38.
Google Scholar
Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York: J. Wiley and Sons.
Google Scholar
Fu, L. M. (1989). Integration of neural heuristics into knowledge-based inference. Connection Science, 1, 325–340.
Google Scholar
Fu, L. M. (1991). Rule learning by searching on adapted nets. Proceedings of the National Conference on Artificial Intelligence (pp. 590–595). Anaheim, CA: Morgan Kaufmann.
Google Scholar
Gallant, S. I. (1988). Connectionist expert systems. Communications of the ACM, 31, 152–169.
Google Scholar
Ghahramani, Z., & Jordan, M. I. (1993). Function approximation via density estimation using an EM approach (Technical Report 9304). Cambridge, MA: MIT Computational Cognitive Sciences.
Google Scholar
Giles, C. L., & Omlin, C. W. (1992). Inserting rules into recurrent neural networks. In S. Kung, F. Fallside, J. A. Sorenson, & C. Kamm (Eds.), Neural Networks for Signal Processing 2, Piscataway: IEEE Press.
Google Scholar
Hampshire, J., & Waibel, A. (1989). The meta-pi network: building distributed knowledge representations for robust pattern recognition (Technical Report CMU-CS–89–166). Pittsburgh, PA: Carnegie Mellon University.
Google Scholar
Harrison, D., & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. J. Envir. Econ. and Management, 5, 81–102.
Google Scholar
Heckerman, D. (1995). A tutorial on learning Bayesian networks (Technical Report MSR-TR–95–06). Redmond, WA: Microsoft Research.
Google Scholar
Heckerman, D., Geiger, D., & Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.
Google Scholar
Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley.
Google Scholar
Hofmann, R., & Tresp, V. (1996). Discovering structure in continuous variables using Bayesian networks. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Neural Information Processing Systems 8, Cambridge, MA: MIT Press.
Google Scholar
Hollatz, J. (1993). Integration von regelbasiertem Wissen in neuronale Netze. Doctoral dissertation, Technische Universität, München.
Google Scholar
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, J. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3, 79–87.
Google Scholar
Jordan, M., & Jacobs, R. (1993). Hierarchical mixtures of experts and the EM algorithm (Technical Report TR 9302). Cambridge, MA: MIT Computational Cognitive Sciences.
Google Scholar
MacKay, J. C. (1992). Bayesian model comparison and backprop nets. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.
Google Scholar
Mendenhall, W., & Sincich, T. (1992). Statistics for Engineering and the Sciences. San Francisco, CA: Dellen Publishing Company.
Google Scholar
Moody, J. E., & Darken, C. (1989). Fast learning in networks of locally-tuned processing units, Neural Computation, 1, 281–294.
Google Scholar
Nowlan, S. J. (1990). Maximum likelihood competitive learning. In D. S. Touretzky (Ed.), Neural Information Processing Systems 2, San Mateo, CA: Morgan Kaufmann.
Google Scholar
Nowlan, S. J. (1991). Soft competitive adaptation: Neural network learning algorithms based on fitting statistical mixtures. Doctoral dissertation, Pittsburgh, PA: Carnegie Mellon University.
Google Scholar
Ormoneit, D., & Tresp, V. (1996) Improved Gaussian mixture density estimates using Bayesian penalty terms and network averaging. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Neural Information Processing Systems 8, Cambridge, MA: MIT Press.
Google Scholar
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78, 1481–1497.
Google Scholar
Röscheisen, M., Hofmann, R., & Tresp, V. (1992). Neural control for rolling mills: Incorporating domain theories to overcome data deficiency. In J. E. Moody, J. E. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.
Google Scholar
Rumelhart, D. E., & McClelland, J. L. (1986). Parallel Distributed Processing: Exploration in the Microstructures of Cognition (Vol. 1), Cambridge, MA: MIT Press.
Google Scholar
Sen, A., & Srivastava, M. (1990). Regression Analysis. New York: Springer Verlag.
Google Scholar
Shavlik, J. W., & Towell, G. G. (1989). An approach to combining explanation-based and neural learning algorithms. Connection Science, 1, 233–255.
Google Scholar
Smyth, P. (1994). Probability density estimation and local basis function neural networks. In Hanson, S., Petsche, T, Kearns, M., & Rivest, R. (Eds.), Computational Learning Theory and Natural Learning Systems, Cambridge, MA: MIT Press.
Google Scholar
Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3, 109–117.
Google Scholar
Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2, 568–576.
Google Scholar
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–278.
Google Scholar
Thrun, S. (1995). Extracting rules from artificial neural networks with distributed representations. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Neural Information Processing Systems 7, MIT Press, Cambridge, MA.
Google Scholar
Towell, G. G., & Shavlik, J.W. (1993). Extracting refined rules from knowledge-based neural networks. Machine Learning, 13, 71–101.
Google Scholar
Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based neural networks. Artificial Intelligence, 70, 119–165.
Google Scholar
Tresp, V., Hollatz J., & Ahmad, S. (1993). Network structuring and training using rule-based knowledge. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Neural Information Processing Systems 5, San Mateo, CA: Morgan Kaufmann.
Google Scholar
Tresp, V., Ahmad, S., & Neuneier, R. (1994). Training neural networks with deficient data. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Neural Information Processing Systems 6, San Mateo, CA: Morgan Kaufmann.
Google Scholar
Wang, L.-X., & Mendel, J. M. (1992). Fuzzy basis functions, universal approximation, and orthogonal least-squares learning. IEEE Transactions on Neural Networks, 3, 807–814.
Google Scholar
Wettscherek, D., & Dietterich, T. (1992). Improving the performance of radial basis function networks by learning center locations. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Siemens AG, Central Research, 81730, München, Germany
Volker Tresp & Jürgen Hollatz
Interval Research Corporation, 1801-C Page Mill Rd., Palo Alto, CA, 94304
Subutai Ahmad

Authors

Volker Tresp
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Hollatz
View author publications
You can also search for this author in PubMed Google Scholar
Subutai Ahmad
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tresp, V., Hollatz, J. & Ahmad, S. Representing Probabilistic Rules with Networks of Gaussian Basis Functions. Machine Learning 27, 173–200 (1997). https://doi.org/10.1023/A:1007381408604

Download citation

Issue Date: May 1997
DOI: https://doi.org/10.1023/A:1007381408604

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Representing Probabilistic Rules with Networks of Gaussian Basis Functions

Abstract

Article PDF

Similar content being viewed by others

Development and Application of Artificial Neural Network

Explainable AI Methods - A Brief Overview

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Representing Probabilistic Rules with Networks of Gaussian Basis Functions

Abstract

Article PDF

Similar content being viewed by others

Development and Application of Artificial Neural Network

Explainable AI Methods - A Brief Overview

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation