General Topic Annotation in Social Networks: A Latent Dirichlet Allocation Approach

Razavi, Amir H.; Inkpen, Diana; Brusilovsky, Dmitry; Bogouslavski, Lana

doi:10.1007/978-3-642-38457-8_29

Amir H. Razavi²¹,
Diana Inkpen²¹,
Dmitry Brusilovsky²² &
…
Lana Bogouslavski²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7884))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

1744 Accesses
6 Citations

Abstract

In this article, we present a novel document annotation method that can be applied on corpora containing short documents such as social media texts. The method applies Latent Dirichlet Allocation (LDA) on a corpus to initially infer some topical word clusters. Each document is assigned one or more topic clusters automatically. Further document annotation is done through a projection of the topics extracted and assigned by LDA into a set of generic categories. The translation from the topical clusters to the small set of generic categories is done manually. Then the categories are used to automatically annotate the general topics of the documents. It is remarkable that the number of the topical clusters that need to be manually mapped to the general topics is far smaller than the number of postings of a corpus that normally need to be annotated to build training and testing sets manually. We show that the accuracy of the annotation done through this method is about 80% which is comparable with inter-human agreement in similar tasks. Additionally, using the LDA method, the corpus entries are represented by low-dimensional vectors which lead to good classification results. The lower-dimensional representation can be fed into many machine learning algorithms that cannot be applied on the conventional high-dimensional text representation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Teh, Y.-W., Newman, D., Welling, M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Procs. NIPS (2006)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101, 5228–5235 (2004)
Article Google Scholar
Minka, T., Lafferty, J.: Expectation propagation for the generative aspect model. In: Proceedings of UAI (2002)
Google Scholar
Wang, X., McCallum, A.: Topics over time: A non-markov continuous-time model of topical trends. In: Proceedings of KDD (2006)
Google Scholar
Blei, D.M., McAulie, J.: Supervised topic models. In: Procs. of NIPS (2007)
Google Scholar
Li, W., McCallum, A.: Pachinko allocation: Dag-structured mixture models of topic correlations. In: ICML (2006)
Google Scholar
Hoffman, M., Blei, D., Bach, F.: Online learning for latent Dirichlet allocation. In: Proceedings of NIPS (2010)
Google Scholar
Wahabzada, M., Kersting, K.: Larger residuals, less work: Active document scheduling for latent dirichlet allocation. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 475–490. Springer, Heidelberg (2011)
Chapter Google Scholar
Heinrich, G.: Parameter estimation for text analysis, Technical Report (For further information please refer to JGibbLDA at: http://jgibblda.sourceforge.net/ )
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Wang, A., Hoang, C.D.V., Kan, M.-Y.: Perspectives on crowdsourcing annotations for natural language processing. Language Resources and Evaluation, 1–23 (2012)
Google Scholar
Ferschke, O., Daxenberger, J., Gurevych, I.: A Survey of NLP Methods and Resources for Analyzing the Collaborative Writing Process in Wikipedia (2012)
Google Scholar
Fleischmann, K.R., Templeton, C., Boyd-Graber, J., Cheng, A.-S., Oard, D.W., Ishita, E., Koepfler, J.A., Wallace, W.A.: Explaining Sentiment Polarity: Automatic Detection of Human Values in Texts (2012) (to appear)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, University of Ottawa, Canada
Amir H. Razavi & Diana Inkpen
Business Intelligence Solutions, Canada
Dmitry Brusilovsky & Lana Bogouslavski

Authors

Amir H. Razavi
View author publications
You can also search for this author in PubMed Google Scholar
Diana Inkpen
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Brusilovsky
View author publications
You can also search for this author in PubMed Google Scholar
Lana Bogouslavski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Alberta, Edmonton, AB, Canada
Osmar R. Zaïane
Department of Computer Science, University of Regina, Canada
Sandra Zilles

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Razavi, A.H., Inkpen, D., Brusilovsky, D., Bogouslavski, L. (2013). General Topic Annotation in Social Networks: A Latent Dirichlet Allocation Approach. In: Zaïane, O.R., Zilles, S. (eds) Advances in Artificial Intelligence. Canadian AI 2013. Lecture Notes in Computer Science(), vol 7884. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38457-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-38457-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38456-1
Online ISBN: 978-3-642-38457-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics