International Journal of Computer Vision

, Volume 69, Issue 2, pp 181–201 | Cite as

Model Selection for Unsupervised Learning of Visual Context

  • Tao Xiang
  • Shaogang Gong


This study addresses the problem of choosing the most suitable probabilistic model selection criterion for unsupervised learning of visual context of a dynamic scene using mixture models. A rectified Bayesian Information Criterion (BICr) and a Completed Likelihood Akaike’s Information Criterion (CL-AIC) are formulated to estimate the optimal model order (complexity) for a given visual scene. Both criteria are designed to overcome poor model selection by existing popular criteria when the data sample size varies from small to large and the true mixture distribution kernel functions differ from the assumed ones. Extensive experiments on learning visual context for dynamic scene modelling are carried out to demonstrate the effectiveness of BICr and CL-AIC, compared to that of existing popular model selection criteria including BIC, AIC and Integrated Completed Likelihood (ICL). Our study suggests that for learning visual context using a mixture model, BICr is the most appropriate criterion given sparse data, while CL-AIC should be chosen given moderate or large data sample sizes.


learning for vision visual context model selection dynamic scene modelling clustering Bayesian methods mixture models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike, H. 1973. Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory, pp. 267–28.Google Scholar
  2. Bernardo, J. and Smith, A. 1994. Bayesian Theory. Wiley and Sons.Google Scholar
  3. Biernacki, C., Celeux, G., and Govaert, G. 2000. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):719–725.CrossRefGoogle Scholar
  4. Bishop, C. 1995. Neural Networks for Pattern Recognition. Cambridge University Press.Google Scholar
  5. Brand, M. and Kettnaker, V. 2000. Discovery and segmentation of activities in video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):844–851.CrossRefGoogle Scholar
  6. Brand, M., Oliver, N., and Pentland, A. 1996. Coupled hidden markov models for complex action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 994–999.Google Scholar
  7. Celeux, G. and Soromenho, G. 1996. An entropy criterion for assessing the number of clusters in a mixture model. J. Classification, 13:195–212.MathSciNetCrossRefGoogle Scholar
  8. Chapelle, O., Vapnik, V., and Bengio, Y. 2002. Model selection for small sample regression. Machine Learning, 48(1):9–23.CrossRefGoogle Scholar
  9. Cherkassky, V. and Ma, Y. 2003. Comparison of model selection for regression. Neural Computation, 15(2):1691–1714.CrossRefGoogle Scholar
  10. Cohen, I., Sebe, N., Chen, L., Garg, A., and Huang, T. 2003. Facial expression recognition from video sequences: Temporal and static modeling. Computer Vision and Image Understanding, 91:160–187.CrossRefGoogle Scholar
  11. Cootes, T.F., Edwards, G.J., and Taylor, C.J. 1998. Active appearance models. In European Conference on Computer Vision, Freiburg, Germany, pp. 484–498.Google Scholar
  12. Dempster, A., Laird, N., and Rubin, D. 1977. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1–38.MathSciNetGoogle Scholar
  13. Dempster, A., Laird, N., and Rubin, D. 1979. Comments on model selection criteria of Akaike and Schwarz. Journal of the Royal Statistical Society B, 41:276–278.Google Scholar
  14. Figueiredo, M. and Jain, A.K. 2002. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):381–396.CrossRefGoogle Scholar
  15. Fitzgerald, W. 1996. Numerical Bayesian Methods Applied to Signal Processing. Springer Verlag.Google Scholar
  16. Gath, I. and Geva, B. 1989. Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):773–781.CrossRefGoogle Scholar
  17. Gong, S. and Xiang, T. 2003. Recognition of group activities using dynamic probabilistic networks. In IEEE International Conference on Computer Vision, pp. 742–749.Google Scholar
  18. Haritaoglu, I., Harwood, D., and Davis, L.S. 2000. w4: Real-time surveillance of people and their activities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):809–830.CrossRefGoogle Scholar
  19. Hastie, T., Tibshirani, R., and Friedman, J. 2001. The elements of statistical learning: Data mining, inference and prediction. Springer.Google Scholar
  20. Hoeting, J., Madigan, D., Raftery, A., and Volinsky, C. 1995. Bayesian model averaging, a tutorial. Statistical Science, 14:382–417.MathSciNetGoogle Scholar
  21. Hongeng, S. and Nevatia, R. 2001. multi-agent event recognition. In IEEE International Conference on Computer Vision, pp. 80–86.Google Scholar
  22. Hurivich, C., Shumway, R., and Tsai, C. 1990. Improved estimators of Kullback-Leibler information for autoregressive model selection in small samples. Biometrika, 77(4):709–719.MathSciNetCrossRefGoogle Scholar
  23. Hurivich, C. and Tsai, C. 1976. Regression and time series model selection in small samples. Biometrika, 76:297–307.CrossRefGoogle Scholar
  24. Johnson, N., Galata, A., and Hogg, D. 1998. The acquisition and use of interaction behaviour models. In IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, USA, pp. 866–871.Google Scholar
  25. Kass, R. and Raftery, A. 1995. Bayes factors. Journal of the American Statistical Association, 90:377–395.Google Scholar
  26. Kullback, S. 1968. Information Theory and Statistics. Dover: New York.Google Scholar
  27. Lange, T., Roth, V., Braun, M.L., and Buhmann, J.M. 2004. Stability-based validation of clustering solutions. Neural Computation, 16:1299–1323.CrossRefGoogle Scholar
  28. McKenna, S., Jabri, S., Duric, Z., Rosenfeld, A., and Wechsler, H. 2000. Tracking group of people. Computer Vision and Image Understanding, 80:42–56.CrossRefGoogle Scholar
  29. McKenna, S. and Nait-Charif, H. 2004. Learning spatial context from tracking using penalised likelihoods. In International Conference on Pattern Recognition, pp. 138–141.Google Scholar
  30. Mclachlan, G. and Peel, D. 1997. Finite Mixture Models. John Wiley & Sons.Google Scholar
  31. Oliver, N., Rosario, B., and Pentland, A. 2000. A bayesian computer vision system for modelling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):831–843.CrossRefGoogle Scholar
  32. Raftery, A. 1995. Bayes model selection in social research. Sociological Methodology, 90:181–196.Google Scholar
  33. Rissanen, J. 1989. Stochastic Complexity in Statistical Inquiry. World Scentific.Google Scholar
  34. Roberts, S. 1997. Parametric and non-parametric unsupervised cluster analysis. Pattern Recognition, 30(2):261–272.CrossRefGoogle Scholar
  35. Roberts, S., Husmeier, D., Rezek, I., and Penny, W. 1998. Bayesian approaches to Gaussian mixture modelling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1133–1142.CrossRefGoogle Scholar
  36. Schwarz, G. 1978. Estimating the dimension of a model. Annals of Statistics, 6:461–464.zbMATHMathSciNetGoogle Scholar
  37. Shibata, R. 1976. Selection of the order of an autoregressive model by Akaike’s Information Criterion. Biometrika, 63:117–126.zbMATHMathSciNetCrossRefGoogle Scholar
  38. Stauffer, C. and Grimson, W. 2000. Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):747–758.CrossRefGoogle Scholar
  39. Tian, Y., Kanade, T., and Cohn, J. 2001. Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23:97–115.CrossRefGoogle Scholar
  40. Tipping, M. and Biship, C. 1999. Mixtures of probabilistic principal component analyzers. Neural Computation, 11:443–482.CrossRefGoogle Scholar
  41. Wada, T. and Matsuyama, T. 2000. Multiobject behavior recognition by event driven selective attention method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):873–887.CrossRefGoogle Scholar
  42. Xiang, T., Gong, S., and Parkinson, D. 2002. Autonomous visual events detection and classification without explicit object-centred segmentation and tracking. In British Machine Vision Conference, pp. 233–242.Google Scholar
  43. Zalewski, L. and Gong, S. 2004. Modelling facial expression as probabilistic hierarchical dynamical states. Technical Report 0043, Vision Lab, Queen Mary, University of London.Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.Department of Computer Science, Queen MaryUniversity of LondonLondonUK

Personalised recommendations