Machine Learning

, Volume 93, Issue 1, pp 5–29

Probabilistic topic models for sequence data

  • Nicola Barbieri
  • Giuseppe Manco
  • Ettore Ritacco
  • Marco Carnuccio
  • Antonio Bevacqua
Article

DOI: 10.1007/s10994-013-5391-2

Cite this article as:
Barbieri, N., Manco, G., Ritacco, E. et al. Mach Learn (2013) 93: 5. doi:10.1007/s10994-013-5391-2

Abstract

Probabilistic topic models are widely used in different contexts to uncover the hidden structure in large text corpora. One of the main (and perhaps strong) assumption of these models is that generative process follows a bag-of-words assumption, i.e. each token is independent from the previous one. We extend the popular Latent Dirichlet Allocation model by exploiting three different conditional Markovian assumptions: (i) the token generation depends on the current topic and on the previous token; (ii) the topic associated with each observation depends on topic associated with the previous one; (iii) the token generation depends on the current and previous topic. For each of these modeling assumptions we present a Gibbs Sampling procedure for parameter estimation. Experimental evaluation over real-word data shows the performance advantages, in terms of recall and precision, of the sequence-modeling approaches.

Keywords

Recommender systems Collaborative filtering Probabilistic topic models Performance 

Notations

M

# Traces

N

# Distinct tokens

K

# Topics

W

Collection of traces, W={w1,…,wM}

Nd

# tokens in trace d

wd

Token trace d, \(\mathbf{w}_{d} = \{w_{d,1} . w_{d,2} . \cdots. w_{d,N_{d}-1}.w_{u,N_{d}}\}\)

wd,j

j-th token in trace d

Z

Collection of topic traces, Z={z1,…,zM}

zd

Topics for trace d, \(\mathbf{z}_{d} = \{z_{d,1} . z_{d,2} . \cdots. z_{d,N_{d}-1}.z_{d,N_{d}}\}\)

zd,j

j-th topic in trace d

\(n^{k}_{d,s}\)

Number of times token s has been associated with topic k for trace d

nd,(⋅)

Vector \(\mathbf{n}_{d,(\cdot)} = \{ n^{1}_{d,(\cdot)}, \ldots, n^{K}_{d,(\cdot)}\}\)

\(n^{k}_{d,(\cdot)}\)

Number of times topic k has been associated with trace d in the whole data

\(\mathbf{n}^{k}_{(\cdot),r}\)

Vector \(\mathbf{n}^{k}_{(\cdot),r} = \{ n^{k}_{(\cdot),r.1}, \ldots, n^{k}_{(\cdot),r.N}\}\)

\(n^{k}_{(\cdot),r.s}\)

Number of times topic k has been associated with the token pair r.s in the whole data

\(\mathbf{n}^{k}_{(\cdot)}\)

Vector \(\mathbf{n}^{k}_{(\cdot)} = \{ n^{k}_{(\cdot),1}, \ldots, n^{k}_{(\cdot),N}\}\)

\(n^{k}_{(\cdot),s}\)

Number of times token s has been associated with topic k in the whole data

\(\mathbf{n}^{k}_{d,(\cdot)}\)

Vector \(\mathbf{n}^{k}_{d,(\cdot)} = \{ n^{k.1}_{d,(\cdot)}, \ldots, n^{k.K}_{d,(\cdot)}\}\)

\(n^{h.k}_{d,(\cdot)}\)

Number of times that topic pair h.k has been associated with the trace d

\(n^{h.(\cdot)}_{d,(\cdot)}\)

Number of times that a topic pair, that begins with topic h, has been associated with the trace d

\(\mathbf{n}^{h.k}_{(\cdot)}\)

Vector \(\mathbf{n}^{h.k}_{(\cdot)} = \{n^{h.k}_{(\cdot),1}, \ldots, n^{h.k}_{(\cdot),N}\}\)

\(n^{h.k}_{(\cdot),s}\)

Number of times that topic pair h.k has been associated with the token s in the whole data

α

(LDA, TokenBigram and TokenBitopic Model) hyper parameters for topic Dirichlet distribution α={α1,…,αK} (Topic Bigram Model) set of hyper parameters for topic Dirichlet distribution α={α0,…,αK}

αh

Hyper parameters for topic Dirichlet distribution αh={αh.1,…,αh.K}

β

(LDA and TopicBigram Model) set of hyper parameters for token Dirichlet distribution β={β1,…,βK} (TokenBigram Model) set of hyper parameters for token Dirichlet distribution β={β1,1,…,βK,1,…,β1,2,…,βK,2,…,βK,N} (TokenBitopic Model) set of hyper parameters for token Dirichlet distribution β={β1.1,…,βK.1,…,β1.2,…,βK.2,…,βK.K}

βk

Hyper parameters for token Dirichlet distribution βk={βk,1,…,βk,N}

βk,s

Hyper parameters for token Dirichlet distribution βk,s={βk,s.1,…,βk,s.N}

βh.k

Hyper parameters for token Dirichlet distribution βh.k={βh.k,1,…,βh.k,N}

Θ

Matrix of parameters θd

θd

Mixing proportion of topics for trace d

ϑd,k

Mixing coefficient of the topic k for trace d

ϑd,h.k

Mixing coefficient of the topic sequence h.k for the trace d

Φ

(LDA and TopicBigram Model) matrix of parameters φk={φk,s} (TokenBigram Model) matrix of parameters φk={φk,r.s} (TokenBitopic Model) matrix of parameters φh.k={φh.k,s}

φk,s

Mixing coefficient of the topic k for the token s

φk,r.s

Mixing coefficient of the topic k for the token sequence r.s

φh.k,s

Mixing coefficient of the topic sequence h.k for the token s

Z−(d,j)

Z−{zd,j}

Δ(q)

Dirichlet’s Delta \(\Delta(\boldsymbol {q}) = \frac {\prod_{p=1}^{P} {\varGamma(q_{p})}}{ \varGamma ( \sum_{p=1}^{P} {\varGamma(q_{p})} )}\)

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Nicola Barbieri
    • 1
  • Giuseppe Manco
    • 2
  • Ettore Ritacco
    • 2
  • Marco Carnuccio
    • 3
  • Antonio Bevacqua
    • 3
  1. 1.Yahoo ResearchBarcelonaSpain
  2. 2.Institute for High Performance Computing and Networks (ICAR)Italian National Research CouncilRendeItaly
  3. 3.Department of Electronics, Informatics and SystemsUniversity of CalabriaRendeItaly

Personalised recommendations