Probability Model Type Sufficiency
We investigate the role of sufficient statistics in generalized probabilistic data mining and machine learning software frameworks. Some issues involved in the specification of a statistical model type are discussed and we show that it is beneficial to explicitly include a sufficient statistic and functions for its manipulation in the model type’s specification. Instances of such types can then be used by generalized learning algorithms while maintaining optimal learning time complexity. Examples are given for problems such as incremental learning and data partitioning problems (e.g. change-point problems, decision trees and mixture models).
KeywordsModel Type Machine Learning Algorithm Neural Information Processing System Incremental Learning Minimum Message Length
Unable to display preview. Download preview PDF.
- 1.Allison, L.: Types and classes of machine learning and data mining. In: Oudshoorn, M. (ed.) Proceedings of the Twenty-Sixth Australasian Computer Science Conference, February 2003, vol. 16, pp. 207–215 (2003)Google Scholar
- 4.Hudak, P., Fasel, J.H.: A gentle introduction to Haskell. SIGPLAN Notices, 27(5) (May 1992)Google Scholar
- 5.Moore, A.: Very fast EM-based mixture model clustering using multiresolution kdtrees. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (1998)Google Scholar