Topic Significance Ranking of LDA Generative Models

AlSumait, Loulwah; Barbará, Daniel; Gentle, James; Domeniconi, Carlotta

doi:10.1007/978-3-642-04180-8_22

Loulwah AlSumait²²,
Daniel Barbará²²,
James Gentle²³ &
…
Carlotta Domeniconi²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4262 Accesses
53 Citations
1 Altmetric

Abstract

Topic models, like Latent Dirichlet Allocation (LDA), have been recently used to automatically generate text corpora topics, and to subdivide the corpus words among those topics. However, not all the estimated topics are of equal importance or correspond to genuine themes of the domain. Some of the topics can be a collection of irrelevant words, or represent insignificant themes. Current approaches to topic modeling perform manual examination to find meaningful topics. This paper presents the first automated unsupervised analysis of LDA models to identify junk topics from legitimate ones, and to rank the topic significance. Basically, the distance between a topic distribution and three definitions of “junk distribution” is computed using a variety of measures, from which an expressive figure of the topic significance is implemented using 4-phase Weighted Combination approach. Our experiments on synthetic and benchmark datasets show the effectiveness of the proposed approach in ranking the topic significance.

Download to read the full chapter text

Chapter PDF

Interactive Generalized Dirichlet Mixture Allocation Model

Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection

A Novel Topic Number Selecting Algorithm for Topic Model

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

AlSumait, L., Barbará, D., Domeniconi, C.: Online LDA: Adaptive Topic Model for Mining Text Streams with Application on Topic Detection and Tracking. In: Proceedings of IEEE International Conference on Data Mining, ICDM 2008 (2008)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Bouyssou, D., Marchant, T., Pirlot, M., Tsoukias, A., Vincke, P.: Evaluation and Decision Models with Multiple Criteria. Springer, Heidelberg (2006)
MATH Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Chemudugunta, C., Smyth, P., Steyvers, M.: Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model. In: Proceedings of Neural Information Processing Systems (2007)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: Proceeding of the National Academy of Sciences, pp. 5228–5235 (2004)
Google Scholar
Joachims, T.: A Statistical Learning Model of Text Classification with Support Vector Machines. In: Proceedings of the Conference on Research and Development in Information Retrieval, SIGIR (2001)
Google Scholar
Steyvers, M., Griffiths, T.L.: Probabilistic Topic Models. In: Landauer, T., McNamara, D., Dennis, S., Kintsch, W. (eds.) Latent Semantic Analysis: A Road to Meaning. Lawrence Erlbaum, Mahwah (2005)
Google Scholar
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Addison Wesley, London (2006)
Google Scholar
Wang, X., McCallum, A.: Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends. In: ACM SIGKDD international conference on Knowledge discovery in data mining (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, George Mason University, Fairfax, VA, 22030, USA
Loulwah AlSumait, Daniel Barbará & Carlotta Domeniconi
Department of Computational and Data Sciences, George Mason University, Fairfax, VA, 22030, USA
James Gentle

Authors

Loulwah AlSumait
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Barbará
View author publications
You can also search for this author in PubMed Google Scholar
James Gentle
View author publications
You can also search for this author in PubMed Google Scholar
Carlotta Domeniconi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

AlSumait, L., Barbará, D., Gentle, J., Domeniconi, C. (2009). Topic Significance Ranking of LDA Generative Models. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Topic Significance Ranking of LDA Generative Models

Abstract

Chapter PDF

Similar content being viewed by others

Interactive Generalized Dirichlet Mixture Allocation Model

Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection

A Novel Topic Number Selecting Algorithm for Topic Model

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Topic Significance Ranking of LDA Generative Models

Abstract

Chapter PDF

Similar content being viewed by others

Interactive Generalized Dirichlet Mixture Allocation Model

Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection

A Novel Topic Number Selecting Algorithm for Topic Model

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation