Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition
We are concerned with feed-forward non-linear networks (multi-layer perceptrons, or MLPs) with multiple outputs. We wish to treat the outputs of the network as probabilities of alternatives (e.g. pattern classes), conditioned on the inputs. We look for appropriate output non-linearities and for appropriate criteria for adaptation of the parameters of the network (e.g. weights). We explain two modifications: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non-linearity. The two modifications together result in quite simple arithmetic, and hardware implementation is not difficult either. The use of radial units (squared distance instead of dot product) immediately before the softmax output stage produces a network which computes posterior distributions over class labels based on an assumption of Gaussian within-class distributions. However the training, which uses cross-class information, can result in better performance at class discrimination than the usual within-class training method, unless the within-class distribution assumptions are actually correct.
Unable to display preview. Download preview PDF.
- T J Seinowski and C R Rosenberg. NETtalk: A parallel network that learns to read aloud. Technical Report JHU/EECS-86/01, Johns Hopkins U. EE&CS, 1986.Google Scholar
- L Gillick. Probability scores for backpropagation networks. July 1987. Personal communication.Google Scholar
- G E Hinton. Connectionist Learning Procedures. Technical Report CMU-CS-87–115, Carnegie Mellon University Computer Science Department, June 1987.Google Scholar
- E B Baum and F Wilczek. Supervised learning of probability distributions by neural networks. In D Anderson, editor,Neural Information Processing Systems, pages 52–61, Am. Inst, of Physics, 1988.Google Scholar
- S Solla, E Levin, and M Fleisher. Accelerated learning in layered neural networks. Complex Systems, January 1989.Google Scholar
- G.E. Hinton, T.J. Sejnowski, and D.H. Ackley. Boltzmann machines: constraint satisfaction networks that learn. Technical report CMU-CS-84–119, Carnegie-Mellon University, May 1984.Google Scholar
- E Yair and A Gersho. The Boltzmann Perceptron Network: a soft classifier. Technical Report CIPR TR 88–11, Center for Information Processing Research, Dept. of E&CE, UCSB, November 1988.Google Scholar
- L R Bahl, P F Brown, P V de Souza, and R L Mercer. Maximum mutual information estimation of hidden Markov model parameters. In Proc. IEEE ICASSP86, pages 49–52, 1986.Google Scholar
- W M Huang and R P Lippmann. Neural net and traditional classifiers. In D Anderson, editor, Neural Information Processing Systems, pages 387–396, Am. Inst, of Physics, 1988.Google Scholar