Bayesian predictive identification and cumulative classification of bacteria

Gyllenberg, Mats; Koski, Timo; Lund, Tatu; Gyllenberg, Helge G.

doi:10.1006/bulm.1998.0076

Bayesian predictive identification and cumulative classification of bacteria

Published: January 1999

Volume 61, pages 85–111, (1999)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Mats Gyllenberg¹,
Timo Koski^1,3,
Tatu Lund¹ &
…
Helge G. Gyllenberg²

69 Accesses
7 Citations
Explore all metrics

Abstract

In this paper we give a mathematically precise formulation of an old idea in bacterial taxonomy, namely cumulative classification, where the taxonomy is continuously updated and possibly augmented as new strains are identified. Our formulation is based on Bayesian predictive probability distributions. The criterion for founding a new taxon is given a firm theoretical foundation based on prediction and it is given a clear-cut interpretation. We formulate an algorithm for cumulative classification and apply it to a large database of bacteria belonging to the family Enterobacteriaceae. The resulting taxonomy makes microbiological sense.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

microclass: an R-package for 16S taxonomy classification

Article Open access 16 March 2017

Metagenomic Phylogenetic Classification Using Improved Naïve Bayes

A Random Categorization Model for Hierarchical Taxonomies

Article Open access 06 December 2017

References

D’Amato, R. F., B. Holmes and E. J. Bottone (1981). The systems approach to diagnostic microbiology. Crit. Rev. Microbiol. 9, 1–44.
Google Scholar
Barnett, J. A., S. Bascomb and J. C. Gower (1975). A maximal predictive classification of Klebsielleae and of the yeasts. J. Gen. Microbiol. 86, 93–102.
Google Scholar
Baron, E. J., L. R. Peterson and S. M. Finegold (Eds) (1994). Bailey and Scott’s Diagnostic Microbiology, Ninth Edition. St Louis: Mosby.
Google Scholar
Beers, R. J. and W. R. Lockhart (1962). Experimental methods in computer taxonomy. J. Gen. Microbiol. 28, 633–640.
Google Scholar
Bender, E. A. (1996). Mathematical Methods of Artificial Intelligence, Los Alamitos, CA: IEEE Computer Society Press.
Google Scholar
Berger, S. A. (1990). Lack of precision in commercial identification systems: correction using Bayesian analysis. J. Appl. Bacteriol. 68, 285–288.
Google Scholar
Bernardo, J. M. and A. F. M. Smith (1994). Bayesian Theory. New York: Wiley.
MATH Google Scholar
Bryant, T. N. (1993). A compilation of probabilistic bacterial identification matrices. Binary 5, 207–210.
Google Scholar
Busse, H.-J., E. B. M. Denner and W. Lubitz (1996). Classification and identification of bacteria: current approaches to an old problem. Overview of methods used in bacterial systematics. J. Biotechnol. 47, 3–38.
Article Google Scholar
Dawid, A. P. (1984). Statistical theory. The prequential approach. J. Roy. Stat. Soc. A147, 278–292.
MathSciNet Google Scholar
Dawid, A. P. (1992). Prequential analysis, stochastic complexity and Bayesian inference, in Bayesian Statistics 4, J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith (Eds), Oxford: Oxford University Press, pp. 109–125.
Google Scholar
Dybowski, W. and D. A. Franklin (1968). Conditional probability and the identification of bacteria. J. Gen. Microbiol. 54, 215–229.
Google Scholar
Engen, S. (1978). Stochastic Abundance Models, London: Chapman and Hall.
MATH Google Scholar
Farmer, J. J. III et al. (1985). Biochemical identification of new species and biogroups of Enterobacteriaceae isolated from clinical specimens. J. Clin. Microbiol. 21, 46–76.
Google Scholar
Farris, J. S. (1978). The information content of the phylogenetic system. Systematic Zoology 28, 483–482.
Article Google Scholar
De Finetti, B. (1971). Theory of Probability, Vol. 1–2, New York: Wiley.
Google Scholar
Fisher, R. A., A. S. Corbet and C. B. Williams (1943). The relation between the number of species and the number of individuals in a random sample from an animal population. J. Anim. Ecol. 12, 42–58.
Google Scholar
Friedman, R., D. Bruce, J. MacLowry and V. Brenner (1973). Computer-assisted identification of bacteria. Am. J. Clin. Pathol. 60, 395–403.
Google Scholar
Geisser, S. (1966). Predictive discrimination, in Multivariate Analysis, P. R. Krishnaiah (Ed.), New York: Academic Press, pp. 149–163.
Google Scholar
Geisser, S. (1985). On the prediction of the observables: a selective update. in Bayesian Statistics 2, J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith (Eds), Oxford: Oxford University Press, pp. 203–230.
Google Scholar
Geisser, S. (1993). Predictive Inference. An Introduction, London: Chapman and Hall.
MATH Google Scholar
Good, I. J. (1965). The Estimation of Probabilities: An Essay on Modern Bayesian Methods, Cambridge, MA: MIT Press.
MATH Google Scholar
Good, I. J. (1967). A Bayesian significance test for multinomial distributions. J. Roy. Stat. Soc. Ser. B, 29, 399–431.
MATH MathSciNet Google Scholar
Gower, J. C. (1974). Maximal predictive classification. Biometrics 30, 643–654.
MATH Google Scholar
Gyllenberg, H. G. (1976). Development of reference systems for automatic identification of clinical isolates of bacteria. Arch. Immunologiae et Therapiae Experimentalis 24, 1–19.
Google Scholar
Gyllenberg, H. G. (1981). Continuous cumulation of identification matrices. Helsingin Yliopiston Mikrobiologian Laitoksen Julkaisuja 20.
Gyllenberg, H. G., M. Gyllenberg, T. Koski, T. Lund, J. Schindler and M. Verlaan (1997). Classification of Enterobacteriaceae by minimization of stochastic complexity. Microbiology 143, 721–732.
Article Google Scholar
Gyllenberg, H. G., M. Gyllenberg, T. Koski and T. Lund (1998a) Stochastic complexity as a taxonomix tool. Comput. Methods Programs in Biomed. 56, 11–22.
Article Google Scholar
Gyllenberg, H. G., M. Gyllenberg, T. Koski, T. Lund and J. Schindler (1998b). An assessment of cumulative classification, submitted.
Gyllenberg, H. G. and T. K. Niemelä (1975). Basic principles in computer-assisted identification of microorganisms, in New Approaches to the Identification of Microorganisms, C.-G. Hedén and T. Illéni (Eds), New York: Wiley, pp. 201–223.
Google Scholar
Gyllenberg M., H. G. Gyllenberg, T. Koski and J. Schindler (1993). Nonuniqueness of numerical taxonomic structures. Binary 5, 138–144.
Google Scholar
Gyllenberg, M. and T. Koski (1996). Numerical taxonomy and the principle of maximum entropy. J. Classification 13, 213–230.
Article MathSciNet MATH Google Scholar
Gyllenberg, M. and T. Koski (1998). Bayesian predictiveness and exchangeability in classification, submitted.
Gyllenberg, M., T. Koski, E. Reilink and M. Verlaan (1996). Probabilistic aspects of numerical identification in microbiology, in Frontiers in Pure and Applied Probability II, A. N. Shiryaev, A. V. Melnikov, H. Niemi and E. Valkeila (Eds), Moscow: TVP Science Publishers, pp. 67–78.
Google Scholar
Gyllenberg, M., T. Koski and M. Verlaan (1997). Classification of of binary vectors by stochastic complexity. J. Multivariate Anal. 63, 47–72.
Article MathSciNet MATH Google Scholar
Györfi, L., Z. Györfi and I. Vajda (1976). Bayesian decision with rejection. Prob. Control Inf. Theory 8, 445–452.
Google Scholar
Hill, L. R. (1974). Theoretical aspects of numerical identification. Int. J. Syst. Bacteriol. 24, 494–499.
Article Google Scholar
Hilpinen, R. (1968). Rules of Acceptance and Inductive Logic. Acta Philosophica Fennica 22, Amsterdam: North-Holland.
Google Scholar
Hinkley, D. (1979). Predictive likelihood. Ann. Stat. 7, 718–728.
MATH MathSciNet Google Scholar
Hintikka, J. and I. Niiniluoto (1974). An axiomatic foundation for the logic of inductive generalization, in Formal Methods in the Methodology of Empirical Sciences, M. Przelecki, K. Szaniawski and R. Wojcicki (Eds), Boston: Reidel, pp. 57–92.
Google Scholar
Holmes, B. and M. Costas (1992). Identification and typing of Enterobacteriaceae by computerized methods, in Identification methods in applied and environmental microbiology, R. G. Board, D. Jones and F. A. Skinner (Eds), Oxford: Blackwell Scientific Publications. 127–149.
Google Scholar
Jilly, B. J. (1988). Microcomputer application of Bayesean probability testing for the identification of bacteria. Int. J. Bio-med. Comput. 22, 107–119.
Article Google Scholar
Kanerva, P. (1990). Sparse Distributed Memory, Second Printing, Cambridge MA: MIT Press.
Google Scholar
Kohonen, T. (1989). Self-Organizing and Associative Memory, Berlin: Springer.
Google Scholar
Lapage, S. P., S. Bascomb, W. R. Willcox and M. A. Curtis (1973). Identification of bacteria by computer: general aspects and perspectives. J. Gen. Microbiol. 77, 291–315.
Google Scholar
Liston, J., W. J. Wiebe and R. R. Colwell (1963). Quantitative approach to the study of bacterial organisms. J. Bacteriol. 85, 1061–1070.
Google Scholar
Neapolitan, R. E. (1990). Probabilistic Reasoning in Expert Systems, New York: Wiley.
Google Scholar
Pankhurst, R. J. (1991). Practical Taxonomic Computing, Cambridge: Cambridge University Press.
Google Scholar
Paynes, L. C. (1963). Towards medical automation. World Medical Electronics 2, 6–11.
Google Scholar
Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probability Theory and Related Fields 102, 145–158.
Article MATH MathSciNet Google Scholar
Pitman, J. (1996). Some developments of the Blackwell-MacQueen urn scheme, in Statistics, Probability and Game Theory, T. S. Ferguson and J. B. MacQueen (Eds), IMS Lecture Notes, Monograph Series, Vol. 30, pp. 245–267.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press.
MATH Google Scholar
Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, Singapore: World Scientific.
MATH Google Scholar
Ristad, E. S. (1995). A natural law of succession, Research Report CS-TR 495-95, Department of Computer Science, University of Princeton.
Roberts, H. V. (1965). Probabilistic prediction. J. Am. Stat. Assoc. 60, 50–62.
Article MATH Google Scholar
Sneath, P. H. A. (1964). New approaches to bacterial taxonomy: use of computers. Annu. Rev. Microbiol. 18, 335–346.
Article Google Scholar
Sneath, P. H. A. (1979a). BASIC program for identification of an unknown with presence-absence data against an identification matrix of percent positive characteristics. Comput. Geosci. 5, 195–213.
Article Google Scholar
Sneath, P. H. A. (1979b). BASIC program for determining the best identification scores possible from the most typical examples when compared with an identification matrix of percent positive characteristics. Comput. Geosci. 6, 27–34.
Google Scholar
Sneath, P. H.A (1995). The history and future potential of numerical concepts in systematics: the contributions of H. G. Gyllenberg. Binary 7, 32–36.
Google Scholar
Sneath, P. H. A. and R. I. C. Hansell (1985). Naturalness and predictivity of classifications, Biol. J. Linnean Soc. 24, 217–231.
Google Scholar
Stager, C. E. and J. R. Davis (1992). Automated systems for identification of microorganisms. Clin. Microbiol. Rev. 5, 302–327.
Google Scholar
Vlachonikolis, I. G. (1990). Predictive discrimination and classification with mixed binary and continuous variables. Biometrika 77, 657–662.
Article MathSciNet Google Scholar
Wilks, S. S. (1962). Mathematical Statistics, New York: Wiley.
MATH Google Scholar
Willcox, W. R, S. P. Lapage and B. Holmes (1980). A review of numerical methods in bacterial identification. Antonie van Leeuwenhoek 46, 233–299.
Article Google Scholar
Zabell, S. L. (1982) W. E. Johnson’s ’sufficientness’ principle. Ann. Stat. 10, 1091–1099.
MATH MathSciNet Google Scholar
Zabell, S. L. (1992). Predicting the unpredictable. Synthese 90, 205–232.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Turku, 20014, Turku, Finland
Mats Gyllenberg, Timo Koski & Tatu Lund
Institute of Biotechnology, University of Helsinki, 00014, Helsinki, Finland
Helge G. Gyllenberg
Department of Mathematics, Royal Institute of Technology, 10044, Stockholm, Sweden
Timo Koski

Authors

Mats Gyllenberg
View author publications
You can also search for this author in PubMed Google Scholar
Timo Koski
View author publications
You can also search for this author in PubMed Google Scholar
Tatu Lund
View author publications
You can also search for this author in PubMed Google Scholar
Helge G. Gyllenberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mats Gyllenberg.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gyllenberg, M., Koski, T., Lund, T. et al. Bayesian predictive identification and cumulative classification of bacteria. Bull. Math. Biol. 61, 85–111 (1999). https://doi.org/10.1006/bulm.1998.0076

Download citation

Received: 09 September 1997
Accepted: 13 September 1998
Issue Date: January 1999
DOI: https://doi.org/10.1006/bulm.1998.0076

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian predictive identification and cumulative classification of bacteria

Abstract

Access this article

Similar content being viewed by others

microclass: an R-package for 16S taxonomy classification

Metagenomic Phylogenetic Classification Using Improved Naïve Bayes

A Random Categorization Model for Hierarchical Taxonomies

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian predictive identification and cumulative classification of bacteria

Abstract

Access this article

Similar content being viewed by others

microclass: an R-package for 16S taxonomy classification

Metagenomic Phylogenetic Classification Using Improved Naïve Bayes

A Random Categorization Model for Hierarchical Taxonomies

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation