Does Topic Modelling Reflect Semantic Prototypes?

Korzycki, Michał; Korczyński, Wojciech

doi:10.1007/978-3-319-10383-9_11

Michał Korzycki⁵ &
Wojciech Korczyński⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 314))

604 Accesses

Abstract

The chapter introduces a representation of a textual event as a mixture of semantic stereotypes and factual information. We also present a method to distinguish semantic prototypes that are specific for a given event from generic elements that might provide cause and result information. Moreover, this chapter discusses the results of experiments of unsupervised topic extraction performed on documents from a large-scale corpus with an additional temporal structure. These experiments were realized as a comparison of the nature of information provided by Latent Dirichlet Allocation based on Log-Entropy weights and Vector Space modelling. The impact of different corpus time windows on this information is discussed. Finally, we try to answer if the unsupervised topic modelling may reflect deeper semantic information, such as elements describing given event or its causes and results, and discern it from factual data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003), http://dl.acm.org/citation.cfm?id=944919.944937
MATH Google Scholar
Boyd-Graber, J., Chang, J., Gerrish, S., Wang, C., Blei, D.: Reading tea leaves: How humans interpret topic models. In: Neural Information Processing Systems (NIPS) (2009)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Article Google Scholar
Dorosz, K., Korzycki, M.: Latent semantic analysis evaluation of conceptual dependency driven focused crawling. In: Dziech, A., Czyżewski, A. (eds.) MCSS 2012. CCIS, vol. 287, pp. 77–84. Springer, Heidelberg (2012)
Chapter Google Scholar
Gatkowska, I., Korzycki, M., Lubaszewski, W.: Can human association norm evaluate latent semantic analysis? In: Proceedings of the 10th NLPCS Workshop, pp. 92–104 (2013)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(suppl. 1), 5228–5235 (2004)
Google Scholar
Landauer, T.K.: Handbook of Latent Semantic Analysis. University of Colorado Institute of Cognitive Science Series. Lawrence Erlbaum Associates (2007), http://books.google.pl/books?id=jgVWCuFXePEC
Leetaru, K.: Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space. First Monday 16(9) (2011)
Google Scholar
Lubaszewski, W., Dorosz, K., Korzycki, M.: System for web information monitoring. In: 2013 International Conference on Computer Applications Technology (ICCAT), pp. 1–6 (2013)
Google Scholar
Lytinen, S.L.: Conceptual dependency and its descendants. Computers and Mathematics with Applications 23, 51–73 (1992)
Article MATH Google Scholar
Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, UAI 2002, pp. 352–359. Morgan Kaufmann Publishers Inc., San Francisco (2002), http://dl.acm.org/citation.cfm?id=2073876.2073918
Google Scholar
Ortega-Pacheco, D., Arias-Trejo, N., Martinez, J.B.B.: Latent semantic analysis model as a representation of free-association word norms. In: MICAI (Special Sessions), pp. 21–25. IEEE (2012)
Google Scholar
Rosch, E.: Principles of categorization. In: Rosch, E., Lloyd, B. (eds.) Cognition and Categorization, pp. 27–48. Erlbaum, Hillsdale (1978)
Google Scholar
Schank, R.C.: Conceptual dependency: A theory of natural language understanding. Cognitive Psychology 3(4), 532–631 (1972)
Article Google Scholar
Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Latent Semantic Analysis: A Road to Meaning. Lawrence Erlbaum (2005), http://psiexp.ss.uci.edu/research/papers/SteyversGriffithsLSABookFormatted.pdf
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta (2010), http://is.muni.cz/publication/884893/en
Google Scholar
Wandmacher, T.: How semantic is latent semantic analysis? In: Proceedings of TALN/RECITAL (2005)
Google Scholar
Wandmacher, T., Ovchinnikova, E., Alexandrov, T.: Does latent semantic analysis reflect human associations? In: Proceedings of the Lexical Semantics Workshop at ESSLLI 2008 (2008)
Google Scholar
Wettler, M., Rapp, R., Sedlmeier, P.: Free word associations correspond to contiguities between words in texts. Journal of Quantitative Linguistics 12(2-3), 111–122 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

AGH University of Science and Technology, Al. Mickiewicza 30, 30-962, Krakow, Poland
Michał Korzycki & Wojciech Korczyński

Authors

Michał Korzycki
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Korczyński
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michał Korzycki .

Editor information

Editors and Affiliations

Division of Information Systems, Wroclaw University of Technology Institute of Informatics, Wrocław, Poland
Aleksander Zgrzywa
Division of Information Systems Institute of Informatics, Wroclaw University of Technology, Wrocław, Poland
Kazimierz Choroś
Division of Information Systems Institute of Informatics, Wroclaw University of Technology, Wrocław, Poland
Andrzej Siemiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Korzycki, M., Korczyński, W. (2015). Does Topic Modelling Reflect Semantic Prototypes?. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds) New Research in Multimedia and Internet Systems. Advances in Intelligent Systems and Computing, vol 314. Springer, Cham. https://doi.org/10.1007/978-3-319-10383-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-10383-9_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10382-2
Online ISBN: 978-3-319-10383-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics