Skip to main content
Log in

Bayesian predictive identification and cumulative classification of bacteria

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

In this paper we give a mathematically precise formulation of an old idea in bacterial taxonomy, namely cumulative classification, where the taxonomy is continuously updated and possibly augmented as new strains are identified. Our formulation is based on Bayesian predictive probability distributions. The criterion for founding a new taxon is given a firm theoretical foundation based on prediction and it is given a clear-cut interpretation. We formulate an algorithm for cumulative classification and apply it to a large database of bacteria belonging to the family Enterobacteriaceae. The resulting taxonomy makes microbiological sense.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • D’Amato, R. F., B. Holmes and E. J. Bottone (1981). The systems approach to diagnostic microbiology. Crit. Rev. Microbiol. 9, 1–44.

    Google Scholar 

  • Barnett, J. A., S. Bascomb and J. C. Gower (1975). A maximal predictive classification of Klebsielleae and of the yeasts. J. Gen. Microbiol. 86, 93–102.

    Google Scholar 

  • Baron, E. J., L. R. Peterson and S. M. Finegold (Eds) (1994). Bailey and Scott’s Diagnostic Microbiology, Ninth Edition. St Louis: Mosby.

    Google Scholar 

  • Beers, R. J. and W. R. Lockhart (1962). Experimental methods in computer taxonomy. J. Gen. Microbiol. 28, 633–640.

    Google Scholar 

  • Bender, E. A. (1996). Mathematical Methods of Artificial Intelligence, Los Alamitos, CA: IEEE Computer Society Press.

    Google Scholar 

  • Berger, S. A. (1990). Lack of precision in commercial identification systems: correction using Bayesian analysis. J. Appl. Bacteriol. 68, 285–288.

    Google Scholar 

  • Bernardo, J. M. and A. F. M. Smith (1994). Bayesian Theory. New York: Wiley.

    MATH  Google Scholar 

  • Bryant, T. N. (1993). A compilation of probabilistic bacterial identification matrices. Binary 5, 207–210.

    Google Scholar 

  • Busse, H.-J., E. B. M. Denner and W. Lubitz (1996). Classification and identification of bacteria: current approaches to an old problem. Overview of methods used in bacterial systematics. J. Biotechnol. 47, 3–38.

    Article  Google Scholar 

  • Dawid, A. P. (1984). Statistical theory. The prequential approach. J. Roy. Stat. Soc. A147, 278–292.

    MathSciNet  Google Scholar 

  • Dawid, A. P. (1992). Prequential analysis, stochastic complexity and Bayesian inference, in Bayesian Statistics 4, J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith (Eds), Oxford: Oxford University Press, pp. 109–125.

    Google Scholar 

  • Dybowski, W. and D. A. Franklin (1968). Conditional probability and the identification of bacteria. J. Gen. Microbiol. 54, 215–229.

    Google Scholar 

  • Engen, S. (1978). Stochastic Abundance Models, London: Chapman and Hall.

    MATH  Google Scholar 

  • Farmer, J. J. III et al. (1985). Biochemical identification of new species and biogroups of Enterobacteriaceae isolated from clinical specimens. J. Clin. Microbiol. 21, 46–76.

    Google Scholar 

  • Farris, J. S. (1978). The information content of the phylogenetic system. Systematic Zoology 28, 483–482.

    Article  Google Scholar 

  • De Finetti, B. (1971). Theory of Probability, Vol. 1–2, New York: Wiley.

    Google Scholar 

  • Fisher, R. A., A. S. Corbet and C. B. Williams (1943). The relation between the number of species and the number of individuals in a random sample from an animal population. J. Anim. Ecol. 12, 42–58.

    Google Scholar 

  • Friedman, R., D. Bruce, J. MacLowry and V. Brenner (1973). Computer-assisted identification of bacteria. Am. J. Clin. Pathol. 60, 395–403.

    Google Scholar 

  • Geisser, S. (1966). Predictive discrimination, in Multivariate Analysis, P. R. Krishnaiah (Ed.), New York: Academic Press, pp. 149–163.

    Google Scholar 

  • Geisser, S. (1985). On the prediction of the observables: a selective update. in Bayesian Statistics 2, J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith (Eds), Oxford: Oxford University Press, pp. 203–230.

    Google Scholar 

  • Geisser, S. (1993). Predictive Inference. An Introduction, London: Chapman and Hall.

    MATH  Google Scholar 

  • Good, I. J. (1965). The Estimation of Probabilities: An Essay on Modern Bayesian Methods, Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Good, I. J. (1967). A Bayesian significance test for multinomial distributions. J. Roy. Stat. Soc. Ser. B, 29, 399–431.

    MATH  MathSciNet  Google Scholar 

  • Gower, J. C. (1974). Maximal predictive classification. Biometrics 30, 643–654.

    MATH  Google Scholar 

  • Gyllenberg, H. G. (1976). Development of reference systems for automatic identification of clinical isolates of bacteria. Arch. Immunologiae et Therapiae Experimentalis 24, 1–19.

    Google Scholar 

  • Gyllenberg, H. G. (1981). Continuous cumulation of identification matrices. Helsingin Yliopiston Mikrobiologian Laitoksen Julkaisuja 20.

  • Gyllenberg, H. G., M. Gyllenberg, T. Koski, T. Lund, J. Schindler and M. Verlaan (1997). Classification of Enterobacteriaceae by minimization of stochastic complexity. Microbiology 143, 721–732.

    Article  Google Scholar 

  • Gyllenberg, H. G., M. Gyllenberg, T. Koski and T. Lund (1998a) Stochastic complexity as a taxonomix tool. Comput. Methods Programs in Biomed. 56, 11–22.

    Article  Google Scholar 

  • Gyllenberg, H. G., M. Gyllenberg, T. Koski, T. Lund and J. Schindler (1998b). An assessment of cumulative classification, submitted.

  • Gyllenberg, H. G. and T. K. Niemelä (1975). Basic principles in computer-assisted identification of microorganisms, in New Approaches to the Identification of Microorganisms, C.-G. Hedén and T. Illéni (Eds), New York: Wiley, pp. 201–223.

    Google Scholar 

  • Gyllenberg M., H. G. Gyllenberg, T. Koski and J. Schindler (1993). Nonuniqueness of numerical taxonomic structures. Binary 5, 138–144.

    Google Scholar 

  • Gyllenberg, M. and T. Koski (1996). Numerical taxonomy and the principle of maximum entropy. J. Classification 13, 213–230.

    Article  MathSciNet  MATH  Google Scholar 

  • Gyllenberg, M. and T. Koski (1998). Bayesian predictiveness and exchangeability in classification, submitted.

  • Gyllenberg, M., T. Koski, E. Reilink and M. Verlaan (1996). Probabilistic aspects of numerical identification in microbiology, in Frontiers in Pure and Applied Probability II, A. N. Shiryaev, A. V. Melnikov, H. Niemi and E. Valkeila (Eds), Moscow: TVP Science Publishers, pp. 67–78.

    Google Scholar 

  • Gyllenberg, M., T. Koski and M. Verlaan (1997). Classification of of binary vectors by stochastic complexity. J. Multivariate Anal. 63, 47–72.

    Article  MathSciNet  MATH  Google Scholar 

  • Györfi, L., Z. Györfi and I. Vajda (1976). Bayesian decision with rejection. Prob. Control Inf. Theory 8, 445–452.

    Google Scholar 

  • Hill, L. R. (1974). Theoretical aspects of numerical identification. Int. J. Syst. Bacteriol. 24, 494–499.

    Article  Google Scholar 

  • Hilpinen, R. (1968). Rules of Acceptance and Inductive Logic. Acta Philosophica Fennica 22, Amsterdam: North-Holland.

    Google Scholar 

  • Hinkley, D. (1979). Predictive likelihood. Ann. Stat. 7, 718–728.

    MATH  MathSciNet  Google Scholar 

  • Hintikka, J. and I. Niiniluoto (1974). An axiomatic foundation for the logic of inductive generalization, in Formal Methods in the Methodology of Empirical Sciences, M. Przelecki, K. Szaniawski and R. Wojcicki (Eds), Boston: Reidel, pp. 57–92.

    Google Scholar 

  • Holmes, B. and M. Costas (1992). Identification and typing of Enterobacteriaceae by computerized methods, in Identification methods in applied and environmental microbiology, R. G. Board, D. Jones and F. A. Skinner (Eds), Oxford: Blackwell Scientific Publications. 127–149.

    Google Scholar 

  • Jilly, B. J. (1988). Microcomputer application of Bayesean probability testing for the identification of bacteria. Int. J. Bio-med. Comput. 22, 107–119.

    Article  Google Scholar 

  • Kanerva, P. (1990). Sparse Distributed Memory, Second Printing, Cambridge MA: MIT Press.

    Google Scholar 

  • Kohonen, T. (1989). Self-Organizing and Associative Memory, Berlin: Springer.

    Google Scholar 

  • Lapage, S. P., S. Bascomb, W. R. Willcox and M. A. Curtis (1973). Identification of bacteria by computer: general aspects and perspectives. J. Gen. Microbiol. 77, 291–315.

    Google Scholar 

  • Liston, J., W. J. Wiebe and R. R. Colwell (1963). Quantitative approach to the study of bacterial organisms. J. Bacteriol. 85, 1061–1070.

    Google Scholar 

  • Neapolitan, R. E. (1990). Probabilistic Reasoning in Expert Systems, New York: Wiley.

    Google Scholar 

  • Pankhurst, R. J. (1991). Practical Taxonomic Computing, Cambridge: Cambridge University Press.

    Google Scholar 

  • Paynes, L. C. (1963). Towards medical automation. World Medical Electronics 2, 6–11.

    Google Scholar 

  • Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probability Theory and Related Fields 102, 145–158.

    Article  MATH  MathSciNet  Google Scholar 

  • Pitman, J. (1996). Some developments of the Blackwell-MacQueen urn scheme, in Statistics, Probability and Game Theory, T. S. Ferguson and J. B. MacQueen (Eds), IMS Lecture Notes, Monograph Series, Vol. 30, pp. 245–267.

  • Ripley, B. D. (1996). Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, Singapore: World Scientific.

    MATH  Google Scholar 

  • Ristad, E. S. (1995). A natural law of succession, Research Report CS-TR 495-95, Department of Computer Science, University of Princeton.

  • Roberts, H. V. (1965). Probabilistic prediction. J. Am. Stat. Assoc. 60, 50–62.

    Article  MATH  Google Scholar 

  • Sneath, P. H. A. (1964). New approaches to bacterial taxonomy: use of computers. Annu. Rev. Microbiol. 18, 335–346.

    Article  Google Scholar 

  • Sneath, P. H. A. (1979a). BASIC program for identification of an unknown with presence-absence data against an identification matrix of percent positive characteristics. Comput. Geosci. 5, 195–213.

    Article  Google Scholar 

  • Sneath, P. H. A. (1979b). BASIC program for determining the best identification scores possible from the most typical examples when compared with an identification matrix of percent positive characteristics. Comput. Geosci. 6, 27–34.

    Google Scholar 

  • Sneath, P. H.A (1995). The history and future potential of numerical concepts in systematics: the contributions of H. G. Gyllenberg. Binary 7, 32–36.

    Google Scholar 

  • Sneath, P. H. A. and R. I. C. Hansell (1985). Naturalness and predictivity of classifications, Biol. J. Linnean Soc. 24, 217–231.

    Google Scholar 

  • Stager, C. E. and J. R. Davis (1992). Automated systems for identification of microorganisms. Clin. Microbiol. Rev. 5, 302–327.

    Google Scholar 

  • Vlachonikolis, I. G. (1990). Predictive discrimination and classification with mixed binary and continuous variables. Biometrika 77, 657–662.

    Article  MathSciNet  Google Scholar 

  • Wilks, S. S. (1962). Mathematical Statistics, New York: Wiley.

    MATH  Google Scholar 

  • Willcox, W. R, S. P. Lapage and B. Holmes (1980). A review of numerical methods in bacterial identification. Antonie van Leeuwenhoek 46, 233–299.

    Article  Google Scholar 

  • Zabell, S. L. (1982) W. E. Johnson’s ’sufficientness’ principle. Ann. Stat. 10, 1091–1099.

    MATH  MathSciNet  Google Scholar 

  • Zabell, S. L. (1992). Predicting the unpredictable. Synthese 90, 205–232.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mats Gyllenberg.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gyllenberg, M., Koski, T., Lund, T. et al. Bayesian predictive identification and cumulative classification of bacteria. Bull. Math. Biol. 61, 85–111 (1999). https://doi.org/10.1006/bulm.1998.0076

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1006/bulm.1998.0076

Keywords

Navigation