Skip to main content

Nonparametric Topic Modeling Using Chinese Restaurant Franchise with Buddy Customers

  • Conference paper
Advances in Information Retrieval (ECIR 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9022))

Included in the following conference series:

Abstract

Many popular latent topic models for text documents generally make two assumptions. The first assumption relates to a finite-dimensional parameter space. The second assumption is the bag-of-words assumption, restricting such models to capture the interdependence between the words. While existing nonparametric admixture models relax the first assumption, they still impose the second assumption mentioned above about bag-of-words representation. We investigate a nonparametric admixture model by relaxing both assumptions in one unified model. One challenge is that the state-of-the-art posterior inference cannot be applied directly. To tackle this problem, we propose a new metaphor in Bayesian nonparametrics known as the “Chinese Restaurant Franchise with Buddy Customers”. We conduct experiments on different datasets, and show an improvement over existing comparative models.

The work described in this paper is substantially supported by grants from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Code: CUHK413510) and the Microsoft Research Asia Urban Informatics Grant FY14-RES-Sponsor-057. This work is also affiliated with the CUHK MoE-Microsoft Key Laboratory of Human-centric Computing and Interface Technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Jameel, S., Lam, W.: An unsupervised topic segmentation model incorporating word order. In: SIGIR, pp. 472–479 (2013)

    Google Scholar 

  3. Lindsey, R.V., Headden III, W.P., Stipicevic, M.J.: A phrase-discovering topic model using hierarchical Pitman-Yor processes. In: EMNLP, pp. 214–222 (2012)

    Google Scholar 

  4. Kim, H.D., Park, D.H., Lu, Y., Zhai, C.: Enriching text representation with frequent pattern mining for probabilistic topic modeling. ASIST 49, 1–10 (2012)

    MATH  Google Scholar 

  5. Barbieri, N., Manco, G., Ritacco, E., Carnuccio, M., Bevacqua, A.: Probabilistic topic models for sequence data. Machine Learning 93, 5–29 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  6. Jameel, S., Lam, W.: A nonparametric N-gram topic model with interpretable latent topics. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 74–85. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Kawamae, N.: Supervised N-gram topic model. In: WSDM, pp. 473–482 (2014)

    Google Scholar 

  8. Wallach, H.M.: Topic modeling: beyond bag-of-words. In: ICML, pp. 977–984 (2006)

    Google Scholar 

  9. McCallum, A., Mimno, D.M., Wallach, H.M.: Rethinking LDA: Why priors matter. In: NIPS, pp. 1973–1981 (2009)

    Google Scholar 

  10. Wang, X., McCallum, A., Wei, X.: Topical N-grams: Phrase and topic discovery, with an application to Information Retrieval. In: ICDM, pp. 697–702 (2007)

    Google Scholar 

  11. Fei, G., Chen, Z., Liu, B.: Review topic discovery with phrases using the Pólya urn model. In: COLING, pp. 667–676 (2014)

    Google Scholar 

  12. Darling, W.: Generalized Probabilistic Topic and Syntax Models for Natural Language Processing. PhD thesis (2012)

    Google Scholar 

  13. Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. JACM 57, 7 (2010)

    Article  MathSciNet  Google Scholar 

  14. Teh, Y.W.: Dirichlet process. In: Encyclopedia of Machine Learning, pp. 280–287 (2010)

    Google Scholar 

  15. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. JASA 101, 1566–1581 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  16. Teh, Y.W., Kurihara, K., Welling, M.: Collapsed variational inference for HDP. In: NIPS, pp. 1481–1488 (2007)

    Google Scholar 

  17. Sudderth, E.B.: Graphical models for visual object recognition and tracking. PhD thesis, Massachusetts Institute of Technology (2006)

    Google Scholar 

  18. Foti, N., Williamson, S.: A survey of non-exchangeable priors for Bayesian nonparametric models (2013)

    Google Scholar 

  19. Griffiths, T.L., Steyvers, M., Tenenbaum, J.B.: Topics in semantic representation. Psychological Review 114, 211 (2007)

    Article  Google Scholar 

  20. Bartlett, N., Pfau, D., Wood, F.: Forgetting counts: Constant memory inference for a dependent hierarchical Pitman-Yor process. In: ICML, pp. 63–70 (2010)

    Google Scholar 

  21. Lau, J.H., Baldwin, T., Newman, D.: On collocations and topic models. TSLP 10, 10:1–10:14 (2013)

    Google Scholar 

  22. Johri, N., Roth, D., Tu, Y.: Experts’ retrieval with multiword-enhanced author topic model. In: NAACL. SS 2010, pp. 10–18 (2010)

    Google Scholar 

  23. Noji, H., Mochihashi, D., Miyao, Y.: Improvements to the Bayesian topic n-gram models. In: EMNLP, pp. 1180–1190 (2013)

    Google Scholar 

  24. Teh, Y.W., Jordan, M.I.: Hierarchical Bayesian nonparametric models with applications. Bayesian Nonparametrics, 158–207 (2010)

    Google Scholar 

  25. Goldwater, S., Griffiths, T., Johnson, M.: A Bayesian framework for word segmentation: Exploring the effects of context. Cognition 112, 21–54 (2009)

    Article  Google Scholar 

  26. Johnson, M.: PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. In: ACL, pp. 1148–1157 (2010)

    Google Scholar 

  27. Deane, P.: A nonparametric method for extraction of candidate phrasal terms. In: ACL, pp. 605–613 (2005)

    Google Scholar 

  28. Petrovic, S., Snajder, J., Basic, B.D.: Extending lexical association measures for collocation extraction. Computer Speech and Language 24, 383–394 (2010)

    Article  Google Scholar 

  29. Yoshii, K., Goto, M.: A vocabulary-free infinity-gram model for nonparametric Bayesian chord progression analysis. In: ICMIR, pp. 645–650 (2011)

    Google Scholar 

  30. Blei, D.M., Frazier, P.I.: Distance dependent Chinese restaurant processes. JMLR 12, 2461–2488 (2011)

    MATH  MathSciNet  Google Scholar 

  31. Kim, D., Oh, A.: Accounting for data dependencies within a hierarchical Dirichlet process mixture model. In: CIKM, pp. 873–878 (2011)

    Google Scholar 

  32. Fox, E., Sudderth, E., Jordan, M., Willsky, A.: A sticky HDP-HMM with application to speaker diarization. APS 5, 1020–1056 (2011)

    MATH  MathSciNet  Google Scholar 

  33. Tayal, A., Poupart, P., Li, Y.: Hierarchical double Dirichlet process mixture of Gaussian processes. In: AAAI (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Jameel, S., Lam, W., Bing, L. (2015). Nonparametric Topic Modeling Using Chinese Restaurant Franchise with Buddy Customers. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds) Advances in Information Retrieval. ECIR 2015. Lecture Notes in Computer Science, vol 9022. Springer, Cham. https://doi.org/10.1007/978-3-319-16354-3_71

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16354-3_71

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16353-6

  • Online ISBN: 978-3-319-16354-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics