This article presents a unified theory for analysis of components in discrete data, and compares the methods with techniques such as independent component analysis, non-negative matrix factorisation and latent Dirichlet allocation. The main families of algorithms discussed are a variational approximation, Gibbs sampling, and Rao-Blackwellised Gibbs sampling. Applications are presented for voting records from the United States Senate for 2003, and for the Reuters-21578 newswire collection.


Independent Component Analysis Gibbs Sampling Latent Dirichlet Allocation Independent Component Analysis Latent Semantic Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [AGvR03]
    Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and ir precision-recall measures. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 369–370 (2003)Google Scholar
  2. [BJ04]
    Buntine, W., Jakulin, A.: Applying discrete PCA in data analysis. In: UAI-2004, Banff, Canada (2004)Google Scholar
  3. [BKG03]
    Bingham, E., Kabán, A., Girolami, M.: Topic identification in dynamical text by complexity pursuit. Neural Process. Lett. 17(1), 69–83 (2003)CrossRefGoogle Scholar
  4. [BNJ03]
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)MATHGoogle Scholar
  5. [BPT04]
    Buntine, W.L., Perttu, S., Tuulos, V.: Using discrete PCA on web pages. In: Workshop on Statistical Approaches to Web Mining, SAWM 2004 (2004), At ECML 2004Google Scholar
  6. [BS94]
    Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. John Wiley, Chichester (1994)CrossRefMATHGoogle Scholar
  7. [Bun02]
    Buntine, W.L.: Variational extensions to EM and multinomial PCA. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS, vol. 2430, p. 23. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. [BYRN99]
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)Google Scholar
  9. [Can04]
    Canny, J.: GaP: a factor model for discrete data. In: SIGIR 2004, pp. 122–129 (2004)Google Scholar
  10. [CB90]
    Casella, G., Berger, R.L.: Statistical Inference. Wadsworth & Brooks/Cole, Belmont (1990)MATHGoogle Scholar
  11. [CB94]
    Clarke, B.S., Barron, A.R.: Jeffrey’s prior is asymptotically least favorable under entropy risk. Journal of Statistical Planning and Inference 41, 37–60 (1994)MathSciNetCrossRefMATHGoogle Scholar
  12. [CC95]
    Carlin, B.P., Chib, S.: Bayesian model choice via MCMC. Journal of the Royal Statistical Society B 57, 473–484 (1995)MATHGoogle Scholar
  13. [CDS01]
    Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal component analysis to the exponential family. In: NIPS*13 (2001)Google Scholar
  14. [CJR04]
    Clinton, J.D., Jackman, S., Rivers, D.: The statistical analysis of roll call voting: A unified approach. American Political Science Review 98(2), 355–370 (2004)CrossRefGoogle Scholar
  15. [CR96]
    Casella, G., Robert, C.P.: Rao-Blackewellization of sampling schemes. Biometrika 83(1), 81–94 (1996)MathSciNetCrossRefMATHGoogle Scholar
  16. [DDL+90]
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  17. [deL03]
    de Leeuw, J.: Principal component analysis of binary data: Applications to roll-call-analysis. Technical Report 364, UCLA Department of Statistics (2003)Google Scholar
  18. [Dun94]
    Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1994)Google Scholar
  19. [GB00]
    Ghahramani, Z., Beal, M.J.: Propagation algorithms for variational Bayesian learning. In: NIPS, pp. 507–513 (2000)Google Scholar
  20. [GCSR95]
    Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Chapman & Hall, Boca Raton (1995)MATHGoogle Scholar
  21. [GG05]
    Gaussier, E., Goutte, C.: Relation between PLSA and NMF and implications. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 601–602. ACM Press, New York (2005)CrossRefGoogle Scholar
  22. [GS02]
    Griffiths, T.L., Steyvers, M.: A probabilistic approach to semantic representation. In: Proc. of the 24th Annual Conference of the Cognitive Science Society (2002)Google Scholar
  23. [GS04]
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS Colloquium (2004)Google Scholar
  24. [HB97]
    Hofmann, T., Buhmann, J.M.: Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(1), 1–14 (1997)CrossRefGoogle Scholar
  25. [HKO01]
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Chichester (2001)CrossRefGoogle Scholar
  26. [HLL83]
    Holland, P., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: Some first steps. Social Networks 5, 109–137 (1983)MathSciNetCrossRefGoogle Scholar
  27. [HO00]
    Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4-5), 411–430 (2000)CrossRefGoogle Scholar
  28. [Hof99]
    Hofmann, T.: Probabilistic latent semantic indexing. Research and Development in Information Retrieval, 50–57 (1999)Google Scholar
  29. [JB03]
    Jakulin, A., Bratko, I.: Analyzing attribute dependencies. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS, vol. 2838, pp. 229–240. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  30. [Joa98]
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  31. [Joa99]
    Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)Google Scholar
  32. [LS99]
    Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
  33. [LYRL04]
    Lewis, D.D., Yand, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)Google Scholar
  34. [MKB79]
    Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, London (1979)MATHGoogle Scholar
  35. [ML02]
    Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: UAI-2002, Edmonton (2002)Google Scholar
  36. [MN89]
    McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall, London (1989)CrossRefMATHGoogle Scholar
  37. [Poo00]
    Poole, K.T.: Non-parametric unfolding of binary choice data. Political Analysis 8(3), 211–232 (2000)CrossRefGoogle Scholar
  38. [PSD00]
    Pritchard, J.K., Stephens, M., Donnelly, P.J.: Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)Google Scholar
  39. [PTL93]
    Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: Proceedings of ACL 1993 (June 1993)Google Scholar
  40. [Ros89]
    Ross, S.M.: Introduction to Probability Models, 4th edn. Academic Press, London (1989)MATHGoogle Scholar
  41. [Row98]
    Roweis, S.: EM algorithms for PCA and SPCA. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10. The MIT Press, Cambridge (1998)Google Scholar
  42. [SN97]
    Snijders, T.A.B., Nowicki, K.: Estimation and prediction for stochastic block models for graphs with latent block structure. Journal of Classification 14, 75–100 (1997)MathSciNetCrossRefMATHGoogle Scholar
  43. [TB99]
    Tipping, M.E., Bishop, C.M.: Probabilistic principal components analysis. J. Roy. Statistical Society B 61(3), 611–622 (1999)MathSciNetCrossRefMATHGoogle Scholar
  44. [Tit]
    Titterington, D.M.: Some aspects of latent structure analysis (In this volume.). In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 69–83. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  45. [vGv99]
    van der Heijden, P.G.M., Gilula, Z., van der Ark, L.A.: An extended study into the relationship between correspondence analysis and latent class analysis. Sociological Methodology 29, 147–186 (1999)CrossRefGoogle Scholar
  46. [WM82]
    Woodbury, M.A., Manton, K.G.: A new procedure for analysis of medical classification. Methods Inf. Med. 21, 210–220 (1982)Google Scholar
  47. [WMM05]
    Wang, X., Mohanty, N., McCallum, A.: Group and topic discovery from relations and text. In: The 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD 2005), pp. 28–35 (2005)Google Scholar
  48. [YYT05]
    Yu, K., Yu, S., Tresp, V.: Dirichlet enhanced latent semantic analysis. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Proc. of the 10th International Workshop on Artificial Intelligence and Statistics (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wray Buntine
    • 1
  • Aleks Jakulin
    • 2
  1. 1.Helsinki Institute for Information Technology (HIIT), Dept. of Computer ScienceUniversity of HelsinkiFinland
  2. 2.Department of Knowledge TechnologiesJozef Stefan InstituteLjubljanaSlovenia

Personalised recommendations