Abstract
We describe a Bayesian approach to the unsupervised discovery of classes in a set of cases, sometimes called finite mixture separation or clustering. The main difference between clustering and our approach is that we search for the “best” set of class descriptions rather than grouping the cases themselves. We describe our classes in terms of probability distribution or density functions, and the locally maximal posterior probability parameters. We rate our classifications with an approximate posterior probability of the distribution function w.r.t. the data, obtained by marginalizing over all the parameters. Approximation is necessitated by the computational complexity of the joint probability, and our marginalization is w.r.t. a local maxima in the parameter space. This posterior probability rating allows direct comparison of alternate density functions that differ in number of classes and/or individual class density functions.
We discuss the rationale behind our approach to classification. We give the mathematical development for the basic mixture model, describe the approximations needed for computational tractability, give some specifics of models for several common attribute types, and describe some of the results achieved by the AutoClass program..
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
J. Aitchison and and J. A. C. Brown. The Lognormal Distribution. University Press, Cambridge, 1957.
D. M. Boulton and C. S. Wallace. An information Measure of Hierarchic Classification. Computer Journal, 16 (3), pp 57–63,1973.
G. E. P. Box and G. C. Tiao. Bayesian Inference in Statistical Analysis. Addison-Wesley, Reading, Mass. 1973. John Wiley & Sons, New York, 1992.
G. E. P. Box and G. C. Tiao. Bayesian Inference in Statistical Analysis. Addison-Wesley, Reading, Mass. 1973. John Wiley & Sons, New York, 1992.
P. Cheeseman, J. Stutz, M. Self, W. Taylor, J. Goebel, K. Volk, H. Walker. Automatic Classification of Spectra From the Infrared Astronomical Satellite (IRAS). NASA Ref. Pubi. #1217, 1989.
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38,1977.
W. Dillon and M. Goldstein. Multivariate Analysis: Methods and Applications, chapter 3. Wiley, 1984.
B. S. Everitt and D. J. Hand. Finite Mixture Distributions. Monographs on Applied Probability and Statistics, Chapman and Hall, London, England, 1981. Extensive Bibliography.
J. Goebel, K. Volk, H. Walker, F. Gerbault, P. Cheeseman, M. Self, J. Stutz, and W. Taylor. A Bayesian classification of the IRAS LRS Atlas. Astron. Astrophys 222, L5–L8, (1989).
R. Hanson, J. Stutz, and P. Cheeseman. Bayesian Classification with correlation and inheritance. In 12th International Joint conference on Artificial Intelligence, pages 692–698, Sydney, 1991.
Thomas Loredo. The Promise of Bayesian Inference for Astrophysics. In E. Feigelson and G. Babu Eds.,Statistical Challenges in Modern Astronomy, Springer-Verlag, 1992.
K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariant Analysis. Academic Press, New York, 1979.
D. M. Titterington, A. F. M. Smith, and U. E. Makov. Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons, New York, 1985.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Kluwer Academic Publishers
About this paper
Cite this paper
Stutz, J., Cheeseman, P. (1996). Autoclass — A Bayesian Approach to Classification. In: Skilling, J., Sibisi, S. (eds) Maximum Entropy and Bayesian Methods. Fundamental Theories of Physics, vol 70. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-0107-0_13
Download citation
DOI: https://doi.org/10.1007/978-94-009-0107-0_13
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-6534-4
Online ISBN: 978-94-009-0107-0
eBook Packages: Springer Book Archive