Abstract
We use partial likelihood (PL) theory to introduce a general probabilistic framework for the design and analysis of neural classifiers. The formulation allows for the training samples used in the design to have correlations in time, and for use of a wide range of neural network probability models including recurrent structures. We use PL theory to establish a fundamental information-theoretic connection, show the equivalence of likelihood maximization and relative entropy minimization, without making the common assumptions of independent training samples and true distribution information. We use this result to construct the information geometry of partial likelihood and derive the information geometric e- and m-projection (em) algorithm for class conditional density modeling by finite normal mixtures. We demonstrate the successful application of the algorithm by a channel equalization example and give simulation results to show the efficiency of the scheme.
Similar content being viewed by others
References
H. Gish, “A Probabilistic Approach to the Understanding and Training of Neural Network Classifiers,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Albuquerque, NM, April 1990, pp. 1361–1364.
L.Kullback and R.A. Leibler, “On Information and Sufficiency,” Annals of Mathematical Statistics, vol. 22, 1951, pp. 79–86.
H.V. Poor, An Introduction to Signal Detection and Estimation, 2nd edn., New York: Springer-Verlag, 1994.
J.S. Bridle, “Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition,” in NATO ASI Series, vol. F68: Neurocomputing, 1990, pp. 227–236.
H. White, “Learning in Artificial Neural Networks: A Statistical Perspective,” Neural Computation, vol. 1, 1989, pp. 425–464.
R. Rohwer and M. Morciniec, “The Theoretical and Experimental Status of the n-tuple Classifier,” Neural Networks, vol. 11, no.1, 1998, pp. 1–14.
D.R. Cox, “Partial Likelihood,” Biometrika, vol. 62, 1975, pp. 69–72.
H. Ni, T. Adali, and B. Wang, “Partial Likelihood Methods for Probability Density Estimation,” in Proc. IEEE Workshop on Neural Networks for Signal Processing (NNSP), Madison, WI, Aug. 1999, pp. 147–156.
T. Adali, X. Liu, and M.K. Sönmez, “Conditional Distribution Learning with Neural Networks and its Application to Channel Equalization,” IEEE Trans. Signal Processing, vol. 45, no.4, 1997, pp. 1051–1064.
B.S. Wittner and J.S. Denker, “Strategies for Teaching Layered Networks Classification Tasks,” in Neural Info. Proc. Systems (Denver, CO), D.Z. Anderson (Eds.), New York: American Institute of Physics, 1988, pp. 850–859.
I. Csiszár and G. Tusnády, “Information Geometry and Alternating Minimization Procedure,” in Statistics and Decisions (Supplementary issue, No. 1), E. Dedewicz et al. (Eds.), Munich, Oldenburg Verlag, 1984, pp. 205–237.
M.I. Jordan, “Why the Logistic Function? A Tutorial Discussion on Probabilities and Neural Networks,” MIT Computational Cognitive Science Technical Report 9503, 1995.
W.H. Wong, “Theory of Partial Likelihood,” Ann. Statist., vol. 14, 1986, pp. 88–123.
T. Adalı, “A General Probabilistic Formulation for Neural Classifiers,” in Proc. IEEE Workshop on Neural Networks for Signal Processing (NNSP), Cambridge, England, Sept. 1998, pp. 145–154.
E. Slud and B. Kedem, “Partial Likelihood Analysis of Logistic Regression and Autoregression,” Statistica Sinica, vol. 4, no.1, 1994, pp. 89–106.
T. Adalı, H. Ni, and B. Wang, “Partial Likelihood for Estimation of Multi-Class Posterior Probabilities,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), Phoenix, AZ, March 1999, vol. 2, pp. 1053–1056.
S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd edn., Englewood Cliffs, NJ: Macmillan College Publishing Co., 1998.
S. Amari, “Information Geometry of the EM and em Algorithms for Neural Networks,” Neural Networks, vol. 8, No.9, 1995, pp. 1379–1408.
W. Byrne, “Alternating Minimization and Boltzmann Machine Learning,” IEEE Trans. Neural Networks, vol. 3, no.4, 1992, pp. 612–620.
M. Hintz-Madsen, M.W. Pedersen, L.K. Hansen, and J. Larsen, “Design and Evaluation of Neural Classifiers,” in Proc. IEEE Workshop on Neural Networks for Signal Processing VI, S. Usui, Y. Tohkura, S. Katagiri, and E. Wilson (Eds.), Piscataway, New Jersey: IEEE, 1996, pp. 223–232.
G.J. Gibson, S. Siu, and C.F.N. Cowan, “The Application of Nonlinear Structures to the Reconstruction of Binary Signals,” IEEE Trans. Signal Processing, vol. 39, no.8, 1991, pp. 1877–1884.
J. Larsen, L.N. Andersen, M. Hintz-Madsen, and L.K. Hansen, “Design of Robust Neural Network Classifiers,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Seattle, WA, May 1998, pp. 1205–1208.
Y. Wang, T. Adalı, S.-Y.Kung, and Z. Szabo, “Quantification and Segmentation of Brain Tissue from MR Images: A Probabilistic Neural Network Approach,” IEEE Trans. Image Processing, Special Issue on Applications of Neural Networks to Image Processing, vol. 7, no.8, 1998, pp. 1165–1181.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ni, H., Adali, T., Wang, B. et al. A General Probabilistic Formulation for Supervised Neural Classifiers. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 26, 141–153 (2000). https://doi.org/10.1023/A:1008107819882
Published:
Issue Date:
DOI: https://doi.org/10.1023/A:1008107819882