Abstract
The vast amount and rapid growth of data on the Web and in document repositories make knowledge extraction and trend analysis a challenging task. A well-proven approach for the unsupervised analysis of large text corpora is dynamic topic modeling. While there is a solid body of research on fundamentals and applications of this technique, visual-interactive analysis systems for allowing end-users to perform analysis tasks using topic models are still rare. In this paper, we present D-VITA, an interactive text analysis system that exploits dynamic topic modeling to detect the latent topic structure and dynamics in a collection of documents. D-VITA supports end-users in understanding and exploiting the topic modeling results by providing interactive visualizations of the topic evolution in document collections and by browsing documents based on keyword search and similarity of their topic distributions. The system was evaluated by a scientific community that used D-VITA for trend analysis in their data sources. The results indicate high usability of D-VITA and its usefulness for productive analysis tasks.
Similar content being viewed by others
References
Ahmed A, Xing EP (2010) Timeline: a dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Conference on uncertainty in artificial intelligence (UAI), pp 20–29
Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Blei DM, Lafferty JD (2006) Dynamic topic models. In: International conference on machine learning (ICML), pp 113–120
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Chaney AJ-B, Blei DM (2012) Visualizing topic models. In: International conference on weblogs and social, Media (ICWSM)
Cohen E (2009) Decay models. In: Encyclopedia of database systems, pp 757–761
Cormode G, Korn F, Tirthapura S (2008) Exponentially decayed aggregates on data streams. In: International conference on data engineering (ICDE), pp 1379–1381
Derntl M, Cooper A, Pham MC, Klamma R, Renzel D (2011) In: Mediabase ready and first analysis report, TEL-map deliverable D4.3
Derntl M, Klamma R (2012) A mediabase for technology enhanced learning in Europe. IEEE Learn. Technol. Newslett. 14(3):2–5
Fuglede B, Topsoe F (2004) Jensen-Shannon divergence and Hilbert space embedding. In: International symposium on information theory. IEEE Press, New York, p 31
Günnemann N (2013) D-VITA: a visual interactive text analysis system using dynamic topic mining. In: BTW workshops, pp 237–246
Havre S, Hetzler EG, Nowell LT (2000) ThemeRiver: visualizing theme changes over time. In: IEEE symposium on information visualization (INFOVIS), pp 115–123
He Q, Chen B, Pei J, Qiu B, Mitra P, Giles CL (2009) Detecting topic evolution in scientific literature: how can citations help? In: ACM international conference on information and knowledge management (CIKM), pp 957–966
Hong L, Yin D, Guo J, Davison BD (2011) Tracking trends: incorporating term volume into temporal topic models. In: ACM SIGKDD international conference on knowledge, pp 484–492
Jo Y, Hopcroft JE, Lagoze C (2011) The web of topics: discovering the topology of topic evolution in a corpus. In: World wide web conference (WWW), pp 257–266
Keim D, Mansmann F, Schneidewind J, Thomas J, Ziegler H (2008) Visual analytics: scope and challenges. In: Visual data mining, pp 76–90
Leskovec J, Backstrom L, Kleinberg JM (2009) Meme-tracking and the dynamics of the news cycle. In: ACM SIGKDD international conference on knowledge, pp 497–506
Liu S, Zhou MX, Pan S, Song Y, Qian W, Cai W, Lian X (2012) TIARA: interactive, topic-based visual text summarization and analysis. ACM Trans Intell Syst Technol 3(2):25
Mei Q, Zhai C (2005) Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: ACM SIGKDD international conference on knowledge, pp 198–207
Mendonça S, Cardoso G, Caraça J (2012) The strategic strength of weak signal analysis. Futures 44(3):218–228
Taibi D, Dietze S (2013) Fostering analytics on learning analytics research: the LAK dataset. Technical report, http://resources.linkededucation.org/2013/03/lak-dataset-taibi.pdf
Wang C, Blei DM, Heckerman D (2008) Continuous time dynamic topic models. In: Conference on uncertainty in artificial intelligence (UAI), pp 579–586
Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: ACM SIGKDD international conference on knowledge, pp 424–433
Acknowledgements
This research was supported by the European Commission through the TEL-Map Support Action (FP7-257822).
Author information
Authors and Affiliations
Corresponding author
Additional information
This is an extended version of the paper [11] selected for the special DASP issue Best Workshop Papers of BTW 2013.
Rights and permissions
About this article
Cite this article
Günnemann, N., Derntl, M., Klamma, R. et al. An Interactive System for Visual Analytics of Dynamic Topic Models. Datenbank Spektrum 13, 213–223 (2013). https://doi.org/10.1007/s13222-013-0134-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-013-0134-x