There is great interest in understanding the intrinsic knowledge neural networks have acquired during training. Most work in this direction is focussed on the multi-layer perceptron architecture. The topic of this paper is networks of Gaussian basis functions which are used extensively as learning systems in neural computation. We show that networks of Gaussian basis functions can be generated from simple probabilistic rules. Also, if appropriate learning rules are used, probabilistic rules can be extracted from trained networks. We present methods for the reduction of network complexity with the goal of obtaining concise and meaningful rules. We show how prior knowledge can be refined or supplemented using data by employing either a Bayesian approach, by a weighted combination of knowledge bases, or by generating artificial training data representing the prior knowledge. We validate our approach using a standard statistical data set.
Ahmad, S., & Tresp, V. (1993). Some solutions to the missing feature problem in vision. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Neural Information Processing Systems 5, San Mateo, CA: Morgan Kaufmann.
Bernardo, J. M., & Smith, A. F. M. (1993). Bayesian Theory. New York: J. Wiley & Sons.
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.
Buntine, W. L., & Weigend, A. S. (1991). Bayesian back-propagation. Complex Systems, 5, 605–643.
Buntine, W. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., & Freeman, D. (1988). AutoClass: A Bayesian classification system. Proceedings of the Fifth International Workshop on Machine Learning (pp. 54–64). Ann Arbor, MI: Morgan Kaufmann.
Cover, T. M., & Thomas, J. A. (1991). Elements of Information Theory. New York: J. Wiley & Sons.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data vie the EM algorithm. J. Royal Statistical Society Series B, 39, 1–38.
Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York: J. Wiley and Sons.
Fu, L. M. (1989). Integration of neural heuristics into knowledge-based inference. Connection Science, 1, 325–340.
Fu, L. M. (1991). Rule learning by searching on adapted nets. Proceedings of the National Conference on Artificial Intelligence (pp. 590–595). Anaheim, CA: Morgan Kaufmann.
Gallant, S. I. (1988). Connectionist expert systems. Communications of the ACM, 31, 152–169.
Ghahramani, Z., & Jordan, M. I. (1993). Function approximation via density estimation using an EM approach (Technical Report 9304). Cambridge, MA: MIT Computational Cognitive Sciences.
Giles, C. L., & Omlin, C. W. (1992). Inserting rules into recurrent neural networks. In S. Kung, F. Fallside, J. A. Sorenson, & C. Kamm (Eds.), Neural Networks for Signal Processing 2, Piscataway: IEEE Press.
Hampshire, J., & Waibel, A. (1989). The meta-pi network: building distributed knowledge representations for robust pattern recognition (Technical Report CMU-CS–89–166). Pittsburgh, PA: Carnegie Mellon University.
Harrison, D., & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. J. Envir. Econ. and Management, 5, 81–102.
Heckerman, D. (1995). A tutorial on learning Bayesian networks (Technical Report MSR-TR–95–06). Redmond, WA: Microsoft Research.
Heckerman, D., Geiger, D., & Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.
Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley.
Hofmann, R., & Tresp, V. (1996). Discovering structure in continuous variables using Bayesian networks. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Neural Information Processing Systems 8, Cambridge, MA: MIT Press.
Hollatz, J. (1993). Integration von regelbasiertem Wissen in neuronale Netze. Doctoral dissertation, Technische Universität, München.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, J. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3, 79–87.
Jordan, M., & Jacobs, R. (1993). Hierarchical mixtures of experts and the EM algorithm (Technical Report TR 9302). Cambridge, MA: MIT Computational Cognitive Sciences.
MacKay, J. C. (1992). Bayesian model comparison and backprop nets. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.
Mendenhall, W., & Sincich, T. (1992). Statistics for Engineering and the Sciences. San Francisco, CA: Dellen Publishing Company.
Moody, J. E., & Darken, C. (1989). Fast learning in networks of locally-tuned processing units, Neural Computation, 1, 281–294.
Nowlan, S. J. (1990). Maximum likelihood competitive learning. In D. S. Touretzky (Ed.), Neural Information Processing Systems 2, San Mateo, CA: Morgan Kaufmann.
Nowlan, S. J. (1991). Soft competitive adaptation: Neural network learning algorithms based on fitting statistical mixtures. Doctoral dissertation, Pittsburgh, PA: Carnegie Mellon University.
Ormoneit, D., & Tresp, V. (1996) Improved Gaussian mixture density estimates using Bayesian penalty terms and network averaging. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Neural Information Processing Systems 8, Cambridge, MA: MIT Press.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann.
Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78, 1481–1497.
Röscheisen, M., Hofmann, R., & Tresp, V. (1992). Neural control for rolling mills: Incorporating domain theories to overcome data deficiency. In J. E. Moody, J. E. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.
Rumelhart, D. E., & McClelland, J. L. (1986). Parallel Distributed Processing: Exploration in the Microstructures of Cognition (Vol. 1), Cambridge, MA: MIT Press.
Sen, A., & Srivastava, M. (1990). Regression Analysis. New York: Springer Verlag.
Shavlik, J. W., & Towell, G. G. (1989). An approach to combining explanation-based and neural learning algorithms. Connection Science, 1, 233–255.
Smyth, P. (1994). Probability density estimation and local basis function neural networks. In Hanson, S., Petsche, T, Kearns, M., & Rivest, R. (Eds.), Computational Learning Theory and Natural Learning Systems, Cambridge, MA: MIT Press.
Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3, 109–117.
Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2, 568–576.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–278.
Thrun, S. (1995). Extracting rules from artificial neural networks with distributed representations. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Neural Information Processing Systems 7, MIT Press, Cambridge, MA.
Towell, G. G., & Shavlik, J.W. (1993). Extracting refined rules from knowledge-based neural networks. Machine Learning, 13, 71–101.
Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based neural networks. Artificial Intelligence, 70, 119–165.
Tresp, V., Hollatz J., & Ahmad, S. (1993). Network structuring and training using rule-based knowledge. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Neural Information Processing Systems 5, San Mateo, CA: Morgan Kaufmann.
Tresp, V., Ahmad, S., & Neuneier, R. (1994). Training neural networks with deficient data. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Neural Information Processing Systems 6, San Mateo, CA: Morgan Kaufmann.
Wang, L.-X., & Mendel, J. M. (1992). Fuzzy basis functions, universal approximation, and orthogonal least-squares learning. IEEE Transactions on Neural Networks, 3, 807–814.
Wettscherek, D., & Dietterich, T. (1992). Improving the performance of radial basis function networks by learning center locations. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.), Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.
About this article
Cite this article
Tresp, V., Hollatz, J. & Ahmad, S. Representing Probabilistic Rules with Networks of Gaussian Basis Functions. Machine Learning 27, 173–200 (1997). https://doi.org/10.1023/A:1007381408604
- Neural networks
- theory refinement
- knowledge-based neural networks
- probability density estimation
- knowledge extraction
- mixture densities
- combining knowledge bases
- Bayesian learning