Probability Model Type Sufficiency

  • Leigh J. Fitzgibbon
  • Lloyd Allison
  • Joshua W. Comley
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2690)


We investigate the role of sufficient statistics in generalized probabilistic data mining and machine learning software frameworks. Some issues involved in the specification of a statistical model type are discussed and we show that it is beneficial to explicitly include a sufficient statistic and functions for its manipulation in the model type’s specification. Instances of such types can then be used by generalized learning algorithms while maintaining optimal learning time complexity. Examples are given for problems such as incremental learning and data partitioning problems (e.g. change-point problems, decision trees and mixture models).


Model Type Machine Learning Algorithm Neural Information Processing System Incremental Learning Minimum Message Length 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allison, L.: Types and classes of machine learning and data mining. In: Oudshoorn, M. (ed.) Proceedings of the Twenty-Sixth Australasian Computer Science Conference, February 2003, vol. 16, pp. 207–215 (2003)Google Scholar
  2. 2.
    Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley Series in Probability and Mathematical Statistics. John Wiley and Sons, Chichester (1994)zbMATHCrossRefGoogle Scholar
  3. 3.
    Fitzgibbon, L.J., Allison, L., Dowe, D.L.: Minimum message length grouping of ordered data. In: Arimura, H., Sharma, A.K., Jain, S. (eds.) ALT 2000. LNCS (LNAI), vol. 1968, pp. 56–70. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Hudak, P., Fasel, J.H.: A gentle introduction to Haskell. SIGPLAN Notices, 27(5) (May 1992)Google Scholar
  5. 5.
    Moore, A.: Very fast EM-based mixture model clustering using multiresolution kdtrees. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (1998)Google Scholar
  6. 6.
    Rissanen, J.J.: Hypothesis selection and testing by the MDL principle. Computer Journal 42(4), 260–269 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Wallace, C.S., Dowe, D.L.: Minimum message length and Kolmogorov complexity. The Computer Journal, Special Issue - Kolmogorov Complexity 42(4), 270–283 (1999)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Leigh J. Fitzgibbon
    • 1
  • Lloyd Allison
    • 1
  • Joshua W. Comley
    • 1
  1. 1.School of Computer Science and Software EngineeringMonash UniversityAustralia

Personalised recommendations