Machine Learning

, Volume 93, Issue 1, pp 5–29 | Cite as

Probabilistic topic models for sequence data

  • Nicola Barbieri
  • Giuseppe Manco
  • Ettore Ritacco
  • Marco Carnuccio
  • Antonio Bevacqua
Article

Abstract

Probabilistic topic models are widely used in different contexts to uncover the hidden structure in large text corpora. One of the main (and perhaps strong) assumption of these models is that generative process follows a bag-of-words assumption, i.e. each token is independent from the previous one. We extend the popular Latent Dirichlet Allocation model by exploiting three different conditional Markovian assumptions: (i) the token generation depends on the current topic and on the previous token; (ii) the topic associated with each observation depends on topic associated with the previous one; (iii) the token generation depends on the current and previous topic. For each of these modeling assumptions we present a Gibbs Sampling procedure for parameter estimation. Experimental evaluation over real-word data shows the performance advantages, in terms of recall and precision, of the sequence-modeling approaches.

Keywords

Recommender systems Collaborative filtering Probabilistic topic models Performance 

Notations

M

# Traces

N

# Distinct tokens

K

# Topics

W

Collection of traces, W={w1,…,wM}

Nd

# tokens in trace d

wd

Token trace d, \(\mathbf{w}_{d} = \{w_{d,1} . w_{d,2} . \cdots. w_{d,N_{d}-1}.w_{u,N_{d}}\}\)

wd,j

j-th token in trace d

Z

Collection of topic traces, Z={z1,…,zM}

zd

Topics for trace d, \(\mathbf{z}_{d} = \{z_{d,1} . z_{d,2} . \cdots. z_{d,N_{d}-1}.z_{d,N_{d}}\}\)

zd,j

j-th topic in trace d

\(n^{k}_{d,s}\)

Number of times token s has been associated with topic k for trace d

nd,(⋅)

Vector \(\mathbf{n}_{d,(\cdot)} = \{ n^{1}_{d,(\cdot)}, \ldots, n^{K}_{d,(\cdot)}\}\)

\(n^{k}_{d,(\cdot)}\)

Number of times topic k has been associated with trace d in the whole data

\(\mathbf{n}^{k}_{(\cdot),r}\)

Vector \(\mathbf{n}^{k}_{(\cdot),r} = \{ n^{k}_{(\cdot),r.1}, \ldots, n^{k}_{(\cdot),r.N}\}\)

\(n^{k}_{(\cdot),r.s}\)

Number of times topic k has been associated with the token pair r.s in the whole data

\(\mathbf{n}^{k}_{(\cdot)}\)

Vector \(\mathbf{n}^{k}_{(\cdot)} = \{ n^{k}_{(\cdot),1}, \ldots, n^{k}_{(\cdot),N}\}\)

\(n^{k}_{(\cdot),s}\)

Number of times token s has been associated with topic k in the whole data

\(\mathbf{n}^{k}_{d,(\cdot)}\)

Vector \(\mathbf{n}^{k}_{d,(\cdot)} = \{ n^{k.1}_{d,(\cdot)}, \ldots, n^{k.K}_{d,(\cdot)}\}\)

\(n^{h.k}_{d,(\cdot)}\)

Number of times that topic pair h.k has been associated with the trace d

\(n^{h.(\cdot)}_{d,(\cdot)}\)

Number of times that a topic pair, that begins with topic h, has been associated with the trace d

\(\mathbf{n}^{h.k}_{(\cdot)}\)

Vector \(\mathbf{n}^{h.k}_{(\cdot)} = \{n^{h.k}_{(\cdot),1}, \ldots, n^{h.k}_{(\cdot),N}\}\)

\(n^{h.k}_{(\cdot),s}\)

Number of times that topic pair h.k has been associated with the token s in the whole data

α

(LDA, TokenBigram and TokenBitopic Model) hyper parameters for topic Dirichlet distribution α={α1,…,αK} (Topic Bigram Model) set of hyper parameters for topic Dirichlet distribution α={α0,…,αK}

αh

Hyper parameters for topic Dirichlet distribution αh={αh.1,…,αh.K}

β

(LDA and TopicBigram Model) set of hyper parameters for token Dirichlet distribution β={β1,…,βK} (TokenBigram Model) set of hyper parameters for token Dirichlet distribution β={β1,1,…,βK,1,…,β1,2,…,βK,2,…,βK,N} (TokenBitopic Model) set of hyper parameters for token Dirichlet distribution β={β1.1,…,βK.1,…,β1.2,…,βK.2,…,βK.K}

βk

Hyper parameters for token Dirichlet distribution βk={βk,1,…,βk,N}

βk,s

Hyper parameters for token Dirichlet distribution βk,s={βk,s.1,…,βk,s.N}

βh.k

Hyper parameters for token Dirichlet distribution βh.k={βh.k,1,…,βh.k,N}

Θ

Matrix of parameters θd

θd

Mixing proportion of topics for trace d

ϑd,k

Mixing coefficient of the topic k for trace d

ϑd,h.k

Mixing coefficient of the topic sequence h.k for the trace d

Φ

(LDA and TopicBigram Model) matrix of parameters φk={φk,s} (TokenBigram Model) matrix of parameters φk={φk,r.s} (TokenBitopic Model) matrix of parameters φh.k={φh.k,s}

φk,s

Mixing coefficient of the topic k for the token s

φk,r.s

Mixing coefficient of the topic k for the token sequence r.s

φh.k,s

Mixing coefficient of the topic sequence h.k for the token s

Z−(d,j)

Z−{zd,j}

Δ(q)

Dirichlet’s Delta \(\Delta(\boldsymbol {q}) = \frac {\prod_{p=1}^{P} {\varGamma(q_{p})}}{ \varGamma ( \sum_{p=1}^{P} {\varGamma(q_{p})} )}\)

References

  1. Bambini, R., Cremonesi, P., & Turrin, R. (2011). A recommender system for an IPTV service provider: a real large-scale production environment. In F. Ricci, L. Rokach, B. Shapira, & P. Kantor (Eds.), Recommender systems handbook (pp. 299–331). Berlin: Springer. CrossRefGoogle Scholar
  2. Barbieri, N., Bonchi, F., & Manco, G. (2013). Cascade-based community detection. In Sixth ACM international conference on web search and data mining (WSDM’2013) (pp. 33–42). CrossRefGoogle Scholar
  3. Barbieri, N., Costa, G., Manco, G., & Ortale, R. (2011). Modeling item selection and relevance for accurate recommendations: a Bayesian approach. In Proceedings of the 5th ACM conference on recommender systems (RecSys’11) (pp. 21–28). CrossRefGoogle Scholar
  4. Barbieri, N., & Manco, G. (2011). An analysis of probabilistic methods for top-n recommendation in collaborative filtering. In Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML-PKDD’11) (pp. 172–187). CrossRefGoogle Scholar
  5. Barbieri, N., Manco, G., Ortale, R., & Ritacco, E. (2012). Balancing prediction and recommendation accuracy: hierarchical latent factors for preference data. In Proceedings of the 12th SIAM international conference on data mining (SDM’12). Google Scholar
  6. Bishop, C. (2006). Pattern recognition and machine learning. New York: Springer. MATHGoogle Scholar
  7. Blei, D. M. (2011). Introduction to probabilistic topic models. Communications of the ACM. Google Scholar
  8. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. MATHGoogle Scholar
  9. Cadez, I., Heckerman, D., Meek, C., Smyth, P., & White, S. (2000). Visualization of navigation patterns on a web site using model-based clustering. In Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’00) (pp. 280–284). Google Scholar
  10. Cremonesi, P., Koren, Y., & Turrin, R. (2010). Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the 4th ACM conference on recommender systems (RecSys’10) (pp. 39–46). CrossRefGoogle Scholar
  11. Cremonesi, P., & Turrin, R. (2009). Analysis of cold-start recommendations in IPTV systems. In Proceedings of the 3rd ACM conference on recommender systems (RecSys’09) (pp. 233–236). Google Scholar
  12. Deerwester, S. (1988). Improving information retrieval with latent semantic indexing. In C. L. Borgman & E. Y. H. Pai (Eds.), Proceedings of the 51st ASIS annual meeting (ASIS ’88) (Vol. 25). Google Scholar
  13. Doyle, G., & Elkan, C. (2009). Accounting for burstiness in topic models. In Proceedings of the 26th international conference on machine learning (ICML’09) (p. 36). Google Scholar
  14. Du, L., Buntine, W. L., & Jin, H. (2010). Sequential latent Dirichlet allocation: discover underlying topic structures within a document. In Proceedings of the 10th IEEE international conference on data mining (ICDM’10) (pp. 148–157). CrossRefGoogle Scholar
  15. Griffiths, T., Steyvers, M., Blei, D., & Tenenbaum, J. (2005). Integrating topics and syntax. In Advances in neural information processing systems (NIPS’05). Google Scholar
  16. Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114. Google Scholar
  17. Gruber, A., Weiss, Y., & Rosen-Zvi, M. (2007). Hidden topic Markov models. Journal of Machine Learning Research, 2, 162–170. Google Scholar
  18. Heinrich, G. (2008). Parameter estimation for text analysis (Tech. rep.). University of Leipzig. Google Scholar
  19. Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 22(1), 89–115. CrossRefGoogle Scholar
  20. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. In Proceedings of the 8th IEEE international conference on data mining (ICDM’08) (pp. 263–272). CrossRefGoogle Scholar
  21. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. IEEE Computer, 42(8), 30–37. CrossRefGoogle Scholar
  22. Li, F., Huang, M., & Zhu, X. (2010). Sentiment analysis with global topics and local dependency. In Proceedings of the 24th AAAI conference on artificial intelligence (AAAI’10). Google Scholar
  23. Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press. MATHGoogle Scholar
  24. Menon, A., & Elkan, C. (2010). Predicting labels for dyadic data. Data Mining and Knowledge Discovery, 21(2), 327–343. MathSciNetCrossRefGoogle Scholar
  25. Menon, A., & Elkan, C. (2011). Link prediction via matrix factorization. In Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML-PKDD’11) (pp. 437–452). CrossRefGoogle Scholar
  26. Minka, T. P. (2000). Estimating a Dirichlet distribution (Tech. rep.). Microsoft Research. http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/minka-dirichlet.pdf.
  27. Salakhutdinov, R., & Mnih, A. (2007). Probabilistic matrix factorization. In Proceedings of the 21st annual conference on neural information processing systems (NIPS’07). Google Scholar
  28. Sindhwani, V., Bucak, S., Hu, J., & Mojsilovic, A. (2010). One-class matrix completion with low-density factorizations. In Proceedings of the 10th IEEE international conference on data mining (ICDM’10) (pp. 1055–1060). CrossRefGoogle Scholar
  29. Wallach, H., Mimno, D., & McCallum, A. (2009a). Rethinking lda: why priors matter. In Advances in neural information processing systems (NIPS’09) (pp. 1973–1981). Google Scholar
  30. Wallach, H., Murray, I., Salakhutdinov, R., & Mimno, D. (2009b). Evaluation methods for topic models. In Proceedings of the 26th international conference on machine learning (ICML’09). Google Scholar
  31. Wallach, H. M. (2006). Topic modeling: beyond bag-of-words. In Proceedings of the 23rd international conference on machine learning (ICML’06) (pp. 977–984). CrossRefGoogle Scholar
  32. Wang, X. A. M., & Wei, X. (2007). Topical n-grams: phrase and topic discovery, with an application to information retrieval. In Proceedings of the 7th IEEE international conference on data mining (ICDM’07) (pp. 697–702). CrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Nicola Barbieri
    • 1
  • Giuseppe Manco
    • 2
  • Ettore Ritacco
    • 2
  • Marco Carnuccio
    • 3
  • Antonio Bevacqua
    • 3
  1. 1.Yahoo ResearchBarcelonaSpain
  2. 2.Institute for High Performance Computing and Networks (ICAR)Italian National Research CouncilRendeItaly
  3. 3.Department of Electronics, Informatics and SystemsUniversity of CalabriaRendeItaly

Personalised recommendations