Abstract
Topic Modeling (TM) is a rapidly-growing area at the interfaces of text mining, artificial intelligence and statistical modeling, that is being increasingly deployed to address the ‘information overload’ associated with extensive text repositories. The goal in TM is typically to infer a rich yet intuitive summary model of a large document collection, indicating a specific collection of topics that characterizes the collection – each topic being a probability distribution over words – along with the degrees to which each individual document is concerned with each topic. The model then supports segmentation, clustering, profiling, browsing, and many other tasks. Current approaches to TM, dominated by Latent Dirichlet Allocation (LDA), assume a topic-driven document generation process and find a model that maximizes the likelihood of the data with respect to this process. This is clearly sensitive to any mismatch between the ‘true’ generating process and statistical model, while it is also clear that the quality of a topic model is multi-faceted and complex. Individual topics should be intuitively meaningful, sensibly distinct, and free of noise. Here we investigate multi-objective approaches to TM, which attempt to infer coherent topic models by navigating the trade-offs between objectives that are oriented towards coherence as well as coverage of the corpus at hand. Comparisons with LDA show that adoption of MOEA approaches enables significantly more coherent topics than LDA, consequently enhancing the use and interpretability of these models in a range of applications, without significant degradation in generalization ability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baars, H., Kemper, H.G.: Management Support with Structured and Unstructured Data-An Integrated Business Intelligence Framework. Inf. Sys. Manag. 25(2), 132–148 (2008)
Ha-Thuc, V., Srinivasan, P.: Topic Models and a Revisit of Text-related Applications. In: Proceedings of the 2nd PhD Workshop on Information and Knowledge Management (PIKM 2008), New York, pp. 25–32 (2008)
Steyvers, M., Griffiths, T.L.: Rational Analysis as a Link Between Human Memory and Information Retrieval. In: Chater, N., Oaksford, M. (eds.) The Probabilistic Mind: Prospects for Bayesian Cognitive Science, pp. 327–347. Oxford University Press (2008)
Blei, D.M., Lakerty, J.D.: A correlated topic model of science. Annals of Applied Statistics 1(1), 17–35 (2007)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. 3, 993–1022 (2003)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Srivastava, A., Sahami, M.: Text Mining: Classification, Clustering, and Applications, 1st edn. Taylor and Francis Group (2009)
Wallach, H.M., Murray, I., Mimno, D.: Evaluation Methods for Topic Models. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1105–1112. ACM, Montreal Canada (2009)
Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., Blei, D.M.: Reading Tea Leaves: How Human Interpret Topic Models. In: Advances in Neural Information Processing Systems. NIPS Foundation, Vancouver British Columbia (2009)
de Waal, A., Barnard, E.: Evaluating Topic Models with Stability. In: Nineteenth Annual Symposium of the Pattern Recognition Association of South Africa, Cape Town South Africa (2008)
Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating Topic Models for Digital Libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 215–224. ACM, Gold Coast (2010)
Su, Q., Xiang, K., Wang, H., Sun, B., Yu, S.: Using Pointwise Mutual Information to Identify Implicit Features in Customer Reviews. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 22–30. Springer, Heidelberg (2006)
Stevens, K., Kegelmeyer, P., Andrzejewski, D., Buttler, D.: Exploring Topic Coherence Over Many Models and Many Topics. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island Korea, pp. 952–961 (2012)
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic Evaluation of Topic Coherence. In: Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles California, pp. 100–108 (2010)
Bouma, G.: Normalized (Pointwise) Mutual Information in Collocation Extraction. In: Proceedings of The International Conference of the German Society for Computational Linguistics and Language Technology, pp. 31–40 (2009)
Pareto, V.: Cours d’Economie politique. Revue Economique 7(3), 426–430 (1896)
Coello, C.A.C.: Evolutionary Multi-objective Optimization: a Historical View of the Field. Computational Intelligence Magazine IEEE 1(1), 28–36 (2006)
Coello Coello, C.A.: Evolutionary Multi-Objective Optimization: Basic Concepts and Some Applications in Pattern Recognition. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Ben-Youssef Brants, C., Hancock, E.R. (eds.) MCPR 2011. LNCS, vol. 6718, pp. 22–33. Springer, Heidelberg (2011)
Chen, X., Hu, X., Shen, X., Rosen, G.: Probabilistic Topic Modeling for Genomic Data Interpretation. In: 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 149–152. IEEE Press, Hong Kong (2010)
Malisiewicz, T.J., Huang, J.C., Efros, A.A.: Detecting Objects via Multiple Segmentations and Latent Topic Models. Technical report, CMU Tech (2006)
Smaragdis, P., Shashanka, M., Raj, B.: Topic Models for Audio Mixture Analysis. In: Applications for Topic Models: Text and Beyond, Whistler (2009)
Shenghua, B., Shengliang, X., Li, Z., Rong, Y., Zhong, S., Dingyi, H., Yong, Y.: Joint Emotion-Topic Modeling for Social Affective Text Mining. In: Proceedings of the Ninth IEEE International Conference on Data Mining, pp. 699–704. IEEE Computer Society, Washington DC (2009)
Gabriel, D., Charles, E.: Financial Topic Models. In: Applications for Topic Models: Text and Beyond, Whistler Canada (2009)
Zhang, Q., Li, H.: MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition. IEEE Transactions on Evolutionary Comp. 11(6), 712–731 (2007)
MALLET: Machine Learning for Language Toolkit, http://mallet.cs.umass.edu
MOEA Framework: a Java library for multiobjective evolutionary algorithms, http://www.moeaframework.org
Dudziak, W.J.: Multi-Dimensional Interpolation Function for Non-Uniform Data: Microsphere Projection. In: Conf. Computer Graphics and Visualization, Lisbon, pp. 143–147 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Khalifa, O., Corne, D.W., Chantler, M., Halley, F. (2013). Multi-objective Topic Modeling. In: Purshouse, R.C., Fleming, P.J., Fonseca, C.M., Greco, S., Shaw, J. (eds) Evolutionary Multi-Criterion Optimization. EMO 2013. Lecture Notes in Computer Science, vol 7811. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37140-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-37140-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37139-4
Online ISBN: 978-3-642-37140-0
eBook Packages: Computer ScienceComputer Science (R0)