Abstract
The purpose of this work is to perform the extraction of topics by applying latent Dirichlet allocation (LDA) to a newspaper article data set. Several new topics are generated based on day-by-day reported changes of previous topics in the newspaper articles. When simply reading the newspaper’s articles, it is difficult to notice small changes. In particular, it is important to identify the relationship between changes in society to extract changes for each week (or month) of the structures in the topic group. Illuminating these relationships, we create a network of topics (a topic network) that can track changes in the topic throughout the year using LDA. In addition, we have created a topic network focusing on specific vocabulary items. The proposed method can extract networks of relationships among topics. If we generate the network using this method, we can extract a network focused on specific vocabulary items that have not appeared in previous articles. Therefore, this information retrieval method for topics related to the economy and society can determine the frequency of osccurrence of new words.
Similar content being viewed by others
References
Blei DM (2012) Probabilistic topic models. Commun ACM 55:77–84
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn 3:993–1022
Blei DM, Lafferty JD (2006) Dynamic topic models. Proceedings of the 23rd international conference on Machine learning. ACM, pp 113–120
Bolelli L, Ertekin S, Giles CL (2009) Topic and trend detection in text collections using latent Dirichlet allocation. Advances in Information Retrieval, pp 776–780
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101:5228–5235
Heinrich G (2008) Parameter estimation for text analysis. University of Leipzig, Tech. Rep
Hisano R et al (2013) High quality topic extraction from business news explains abnormal financial market volatility. PloS One 8(6):64846
Iwata T, Watanabeand S, Yamada T, Ueda N (2009) Topic tracking model for analyzing consumer purchase behavior. IJCAI 9:1427–1432
Kelinberg J (2003) Bursty and hierarchical structure in streams. Data Mining Knowl Discov 7(4):373–397
Kim D, Oh A (2011) Topic chains for understanding a news corpus. The 12th International Conference on Intelligent Text Processing and Computational Linguistics
Kitajima R, Kobayashi I (2011) Summarization using Latent Dirichlet Allocation based on Events in a Document. DEIM Forum 2011:F4–2
Phan X-H, Nguyen CT (2008) GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation, http://gibbslda.sourceforge.net/, accessed May 2015
Sasaki K, Yoshikawa T, Furuhashi T (2014) Time series mixture model considering dependence to multiple topics, IPSJ SIG technical reports, pp 1–6
Serizawa M, Kobayashi I (2011) Topic Tracking based on Topic Similarity with Latent Dirichlet Allocation. DEIM Forum 2011:F4–1
Takahashi Y, Utsuro T, Yoshioka M (2011) Aggregating bursty keywords in news stream into topics. DEIM Forum 2011:B5–6
Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 424–433
Acknowledgements
This work is partially supported by the Nihon Keizai Shimbun. We wish to thank Yoon, from whom we received useful advice. The authors thank the Yukawa Institute for Theoretical Physics at Kyoto University. Discussions during the YITP workshop YITP-W-15-15 on “Econophysics 2015” were useful to complete this work.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Kawata, S., Fujiwara, Y. Constructing of network from topics and their temporal change in the Nikkei newspaper articles. Evolut Inst Econ Rev 13, 423–436 (2016). https://doi.org/10.1007/s40844-016-0061-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40844-016-0061-2