The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German

Kölbl, Max; Kyogoku, Yuki; Philipp, J. Nathanael; Richter, Michael; Rietdorf, Clemens; Yousef, Tariq

doi:10.1007/978-3-030-63787-3_5

Max Kölbl³,
Yuki Kyogoku³,
J. Nathanael Philipp³,
Michael Richter³,
Clemens Rietdorf³ &
…
Tariq Yousef³

Part of the book series: Studies in Computational Intelligence ((SCI,volume 939))

887 Accesses
1 Citations

Abstract

This paper reports the results of a study on automatic keyword extraction in German. We employed in general two types of methods: (A) unsupervised, based on information theory, i.e., (i) a bigram model, (ii) a probabilistic parser model, and (iii) a novel model which considers topics within the discourse of target word for the calculation of their information content, and (B) supervised, employing a recurrent neural network (RNN). As baselines, we employed TextRank and the TF-IDF ranking function. The topic model (A)(iii) outperformed clearly all remaining models, even TextRank and TF-IDF. In contrast, RNN performed poorly. We take the results as first evidence that (i) information content can be employed for keyword extraction tasks and has thus a clear correspondence to semantics of natural language, and (ii) that—as a cognitive principle—the information content of words is determined from extra-sentential contexts, i.e., from the discourse of words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A k-truss in a graph is a subset of the graph such that every edge in the subject is supported by at least \( k - 2 \) other edges that form triangles with that particular edge. In other words, every edge in the truss must be part of \( k - 2 \) triangles made up of nodes that are part of the truss. https://louridas.github.io/rwa/assignments/finding-trusses/.
2.
https://heise.de.
3.
https://clarin.informatik.uni-leipzig.de/de?corpusId=deu_news_2012_3M.

References

Aji, S., Kaimal, R.: Document summarization using positive pointwise mutual information. Int. J. Comput. Sci. Inf. Technol. 4(2), 47 (2012). https://doi.org/10.5121/ijcsit.2012.4204
Bever, T.G.: The cognitive basis for linguistic structures. Cogn. Dev. Lang. (1970)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
Celano, G.G., Richter, M., Voll, R., Heyer, G.: Aspect coding asymmetries of verbs: the case of Russian. In: Proceedings of the 14th Conference on Natural Language Processing, pp. 34–39 (2018)
Google Scholar
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches (2014). arXiv preprint arXiv:1409.1259. https://doi.org/10.3115/v1/W14-4012
Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11(4), 29–41 (2009). https://doi.org/10.1109/MCSE.2009.120
van Dijk, B.: Parlement européen. In: Evaluation des opérations pilotes d’indexation automatique (Convention spécifique no 52556), Rapport d’évalution finale (1995)
Google Scholar
Dretske, F.: Knowledge and the Flow of Information. MIT Press, Cambridge (1981)
Google Scholar
Foley, R.: Dretske’s “information-theoretic” account of knowledge. Synthese 159–184 (1987). https://doi.org/10.1007/BF00413933
Frege, G.: Begriffsschrift, a formula language, modeled upon that of arithmetic, for pure thought. From Frege to Gödel: A Source Book in Mathematical Logic, vol. 1931, pp. 1–82 (1879). https://doi.org/10.4159/harvard.9780674864603.c2
Hale, J.: A probabilistic earley parser as a psycholinguistic model. In: 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (2001)
Google Scholar
Honnibal, M., Johnson, M.: An improved non-monotonic transition system for dependency parsing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1373–1378 (2015). https://doi.org/10.18653/v1/D15-1162
Horch, E., Reich, I.: On “article omission” in German and the “uniform information density hypothesis”. Bochumer Linguistische Arbeitsberichte, p. 125 (2016)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003). https://doi.org/10.3115/1119355.1119383
Hulth, A.: Enhancing linguistically oriented automatic keyword extraction. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics 2004: Short Papers, pp. 17–20 (2004). https://doi.org/10.3115/1613984.1613989
Huo, H., Liu, X.H.: Automatic summarization based on mutual information. In: Applied Mechanics and Materials, vol. 513, pp. 1994–1997. Trans Tech Publications, Freienbach (2014). https://doi.org/10.4028/www.scientific.net/AMM.513-517.1994
Jaeger, T.F.: Redundancy and reduction: speakers manage syntactic information density. Cogn. Psychol. 61(1), 23–62 (2010). https://doi.org/10.1016/j.cogpsych.2010.02.002
Jaeger, T.F., Levy, R.P.: Speakers optimize information density through syntactic reduction. In: Advances in Neural Information Processing Systems, pp. 849–856 (2007)
Google Scholar
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. (1972). https://doi.org/10.1108/eb026526
Kamp, H.: Discourse representation theory: what it is and where it ought to go. Nat. Lang. Comput. 320(1), 84–111 (1988)
Article Google Scholar
Krifka, M.: Basic notions of information structure. Acta Linguist. Hung. 55(3–4), 243–276 (2008). https://doi.org/10.1556/aling.55.2008.3-4.2
Kölbl, M., Kyogoku, Y., Philipp, J.N., Richter, M., Rietdorf, C., Yousef, T.: Keyword extraction in German: information-theory vs. deep learning. In: ICAART (1), pp. 459–464 (2020). https://doi.org/10.5220/0009374704590464
Levy, R.: Expectation-based syntactic comprehension. Cognition 106(3), 1126–1177 (2008). https://doi.org/10.1016/j.cognition.2007.05.006
Liu, R., Nyberg, E.: A phased ranking model for question answering. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 79–88 (2013). https://doi.org/10.1145/2505515.2505678
Lombardi, O.: Dretske, Shannon’s theory and the interpretation of information. Synthese 144(1), 23–39 (2005). https://doi.org/10.1007/s11229-005-9127-0
Lombardi, O., Holik, F., Vanni, L.: What is Shannon information? Synthese 193(7), 1983–2012 (2016). https://doi.org/10.1007/s11229-015-0824-z
Marujo, L., Bugalho, M., Neto, J.P.S., Gershman, A., Carbonell, J.: Hourly traffic prediction of news stories (2013). arXiv preprint arXiv:1306.4608
Marujo, L., Ling, W., Trancoso, I., Dyer, C., Black, A.W., Gershman, A., de Matos, D.M., Neto, J.P., Carbonell, J.G.: Automatic keyword extraction on twitter. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 637–643 (2015). https://doi.org/10.3115/v1/P15-2105
May, C., Cotterell, R., Van Durme, B.: An analysis of lemmatization on topic models of morphologically rich language (2016). arXiv preprint arXiv:1608.03995
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Ogden, C.K., Richards, I.A.: The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism, vol. 29. K. Paul, Trench, Trubner & Company, Limited, London (1923). https://doi.org/10.1038/111566b0
Özgür, A., Özgür, L., Güngör, T.: Text categorization with class-based and corpus-based keyword selection. In: International Symposium on Computer and Information Sciences, pp. 606–615. Springer (2005). https://doi.org/10.1007/11569596_63
Pal, A.R., Maiti, P.K., Saha, D.: An approach to automatic text summarization using simplified Lesk algorithm and wordnet. Int. J. Control. Theory Comput. Model. 3 (2013). https://doi.org/10.5121/ijctcm.2013.3502
Peirce, C.S.: Collected Papers of Charles S. Peirce. In: Hartshorne, C., Weiss, P., Burks, A.W. (eds.) (1932)
Google Scholar
Ravindra, G.: Information theoretic approach to extractive text summarization. Ph.D. thesis, Supercomputer Education and Research Center, Indian Institute of Science, Bangalore (2009)
Google Scholar
Richter, M., Kyogoku, Y., Kölbl, M.: Estimation of average information content: comparison of impact of contexts. In: Proceedings of SAI Intelligent Systems Conference, pp. 1251–1257. Springer (2019). https://doi.org/10.1007/978-3-030-29513-4_91
Richter, M., Kyogoku, Y., Kölbl, M.: Interaction of information content and frequency as predictors of verbs’ lengths. In: International Conference on Business Information Systems, pp. 271–282. Springer (2019). https://doi.org/10.1007/978-3-030-20485-3
Rietdorf, C., Kölbl, M., Kyogoku, Y., Richter, M.: Summarisation by information maps. A pilot study (2019). Submitted
Google Scholar
Rogers, T.M.: Is Dretske’s Theory of Information Naturalistically Grounded? How emergent communication channels reference an abstracted ontic framework (2007). https://www.researchgate.net/publication/326561084. Unpublished
Rooth, M.: Association with focus. Ph.D. thesis, Department of Linguistics, University of Massachusetts, Amherst (1985). Unpublished
Google Scholar
Rooth, M.: A theory of focus interpretation. Nat. Lang. Semant. 1(1), 75–116 (1992). https://doi.org/10.1007/BF02342617
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0
Schofield, A., Mimno, D.: Comparing apples to apple: the effects of stemmers on topic models. Trans. Assoc. Comput. Linguist. 4, 287–300 (2016). https://doi.org/10.1162/tacl_a_00099
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (1948). https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Sowa, J.F., Way, E.C.: Implementing a semantic interpreter using conceptual graphs. IBM J. Res. Dev. 30(1), 57–69 (1986). https://doi.org/10.1147/rd.301.0057
Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities (1994). arXiv preprint arXiv:cmp-lg/9411029
Tixier, A., Malliaros, F., Vazirgiannis, M.: A graph degeneracy-based approach to keyword extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1860–1870 (2016).https://doi.org/10.18653/v1/D16-1191
Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000). https://doi.org/10.1023/A:1009976227802
Vijayarajan, V., Dinakaran, M., Tejaswin, P., Lohani, M.: A generic framework for ontology-based information retrieval and image retrieval in web data. Hum.-Centric Comput. Inf. Sci. 6(1), 18 (2016). https://doi.org/10.1186/s13673-016-0074-1
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automated keyphrase extraction. In: Design and Usability of Digital Libraries: Case Studies in the Asia Pacific, pp. 129–152. IGI Global, Pennsylvania (2005)
Google Scholar
Yang, Z., Nyberg, E.: Leveraging procedural knowledge for task-oriented search. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 513–522 (2015). https://doi.org/10.1145/2766462.2767744
Zhang, Q., Wang, Y., Gong, Y., Huang, X.J.: Keyphrase extraction using deep recurrent neural networks on Twitter. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845 (2016). https://doi.org/10.18653/v1/D16-1080

Download references

Acknowledgements

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), project number: 357550571. The training of the neural network was done on the High Performance Computing (HPC) Cluster of the Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) of the Technische Universität Dresden. Thanks to Caitlin Hazelwood for proofreading this chapter. This chapter is an extended version from the initial paper with the title ‘Keyword extraction in German: Information-theory vs. deep learning’ published in Proceedings of the 12th International Conference on Agents and Artificial Intelligence (Vol. 1), 459–464, ICAART 2020.

Author information

Authors and Affiliations

Institute of Computer Science, NLP Group, Universität Leipzig, Augustusplatz 10, 04109, Leipzig, Germany
Max Kölbl, Yuki Kyogoku, J. Nathanael Philipp, Michael Richter, Clemens Rietdorf & Tariq Yousef

Authors

Max Kölbl
View author publications
You can also search for this author in PubMed Google Scholar
Yuki Kyogoku
View author publications
You can also search for this author in PubMed Google Scholar
J. Nathanael Philipp
View author publications
You can also search for this author in PubMed Google Scholar
Michael Richter
View author publications
You can also search for this author in PubMed Google Scholar
Clemens Rietdorf
View author publications
You can also search for this author in PubMed Google Scholar
Tariq Yousef
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Nathanael Philipp .

Editor information

Editors and Affiliations

Department of Algebra and Logic, Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria
Roussanka Loukanova

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kölbl, M., Kyogoku, Y., Philipp, J.N., Richter, M., Rietdorf, C., Yousef, T. (2021). The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German. In: Loukanova, R. (eds) Natural Language Processing in Artificial Intelligence—NLPinAI 2020. Studies in Computational Intelligence, vol 939. Springer, Cham. https://doi.org/10.1007/978-3-030-63787-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-63787-3_5
Published: 26 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63786-6
Online ISBN: 978-3-030-63787-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics