Skip to main content

The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German

  • Chapter
  • First Online:
Natural Language Processing in Artificial Intelligence—NLPinAI 2020

Abstract

This paper reports the results of a study on automatic keyword extraction in German. We employed in general two types of methods: (A) unsupervised, based on information theory, i.e., (i) a bigram model, (ii) a probabilistic parser model, and (iii) a novel model which considers topics within the discourse of target word for the calculation of their information content, and (B) supervised, employing a recurrent neural network (RNN). As baselines, we employed TextRank and the TF-IDF ranking function. The topic model (A)(iii) outperformed clearly all remaining models, even TextRank and TF-IDF. In contrast, RNN performed poorly. We take the results as first evidence that (i) information content can be employed for keyword extraction tasks and has thus a clear correspondence to semantics of natural language, and (ii) that—as a cognitive principle—the information content of words is determined from extra-sentential contexts, i.e., from the discourse of words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A k-truss in a graph is a subset of the graph such that every edge in the subject is supported by at least \( k - 2 \) other edges that form triangles with that particular edge. In other words, every edge in the truss must be part of \( k - 2 \) triangles made up of nodes that are part of the truss. https://louridas.github.io/rwa/assignments/finding-trusses/.

  2. 2.

    https://heise.de.

  3. 3.

    https://clarin.informatik.uni-leipzig.de/de?corpusId=deu_news_2012_3M.

References

  1. Aji, S., Kaimal, R.: Document summarization using positive pointwise mutual information. Int. J. Comput. Sci. Inf. Technol. 4(2), 47 (2012). https://doi.org/10.5121/ijcsit.2012.4204

  2. Bever, T.G.: The cognitive basis for linguistic structures. Cogn. Dev. Lang. (1970)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    Google Scholar 

  4. Celano, G.G., Richter, M., Voll, R., Heyer, G.: Aspect coding asymmetries of verbs: the case of Russian. In: Proceedings of the 14th Conference on Natural Language Processing, pp. 34–39 (2018)

    Google Scholar 

  5. Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches (2014). arXiv preprint arXiv:1409.1259. https://doi.org/10.3115/v1/W14-4012

  6. Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11(4), 29–41 (2009). https://doi.org/10.1109/MCSE.2009.120

  7. van Dijk, B.: Parlement européen. In: Evaluation des opérations pilotes d’indexation automatique (Convention spécifique no 52556), Rapport d’évalution finale (1995)

    Google Scholar 

  8. Dretske, F.: Knowledge and the Flow of Information. MIT Press, Cambridge (1981)

    Google Scholar 

  9. Foley, R.: Dretske’s “information-theoretic” account of knowledge. Synthese 159–184 (1987). https://doi.org/10.1007/BF00413933

  10. Frege, G.: Begriffsschrift, a formula language, modeled upon that of arithmetic, for pure thought. From Frege to Gödel: A Source Book in Mathematical Logic, vol. 1931, pp. 1–82 (1879). https://doi.org/10.4159/harvard.9780674864603.c2

  11. Hale, J.: A probabilistic earley parser as a psycholinguistic model. In: 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (2001)

    Google Scholar 

  12. Honnibal, M., Johnson, M.: An improved non-monotonic transition system for dependency parsing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1373–1378 (2015). https://doi.org/10.18653/v1/D15-1162

  13. Horch, E., Reich, I.: On “article omission” in German and the “uniform information density hypothesis”. Bochumer Linguistische Arbeitsberichte, p. 125 (2016)

    Google Scholar 

  14. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003). https://doi.org/10.3115/1119355.1119383

  15. Hulth, A.: Enhancing linguistically oriented automatic keyword extraction. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics 2004: Short Papers, pp. 17–20 (2004). https://doi.org/10.3115/1613984.1613989

  16. Huo, H., Liu, X.H.: Automatic summarization based on mutual information. In: Applied Mechanics and Materials, vol. 513, pp. 1994–1997. Trans Tech Publications, Freienbach (2014). https://doi.org/10.4028/www.scientific.net/AMM.513-517.1994

  17. Jaeger, T.F.: Redundancy and reduction: speakers manage syntactic information density. Cogn. Psychol. 61(1), 23–62 (2010). https://doi.org/10.1016/j.cogpsych.2010.02.002

  18. Jaeger, T.F., Levy, R.P.: Speakers optimize information density through syntactic reduction. In: Advances in Neural Information Processing Systems, pp. 849–856 (2007)

    Google Scholar 

  19. Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. (1972). https://doi.org/10.1108/eb026526

  20. Kamp, H.: Discourse representation theory: what it is and where it ought to go. Nat. Lang. Comput. 320(1), 84–111 (1988)

    Article  Google Scholar 

  21. Krifka, M.: Basic notions of information structure. Acta Linguist. Hung. 55(3–4), 243–276 (2008). https://doi.org/10.1556/aling.55.2008.3-4.2

  22. Kölbl, M., Kyogoku, Y., Philipp, J.N., Richter, M., Rietdorf, C., Yousef, T.: Keyword extraction in German: information-theory vs. deep learning. In: ICAART (1), pp. 459–464 (2020). https://doi.org/10.5220/0009374704590464

  23. Levy, R.: Expectation-based syntactic comprehension. Cognition 106(3), 1126–1177 (2008). https://doi.org/10.1016/j.cognition.2007.05.006

  24. Liu, R., Nyberg, E.: A phased ranking model for question answering. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 79–88 (2013). https://doi.org/10.1145/2505515.2505678

  25. Lombardi, O.: Dretske, Shannon’s theory and the interpretation of information. Synthese 144(1), 23–39 (2005). https://doi.org/10.1007/s11229-005-9127-0

  26. Lombardi, O., Holik, F., Vanni, L.: What is Shannon information? Synthese 193(7), 1983–2012 (2016). https://doi.org/10.1007/s11229-015-0824-z

  27. Marujo, L., Bugalho, M., Neto, J.P.S., Gershman, A., Carbonell, J.: Hourly traffic prediction of news stories (2013). arXiv preprint arXiv:1306.4608

  28. Marujo, L., Ling, W., Trancoso, I., Dyer, C., Black, A.W., Gershman, A., de Matos, D.M., Neto, J.P., Carbonell, J.G.: Automatic keyword extraction on twitter. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 637–643 (2015). https://doi.org/10.3115/v1/P15-2105

  29. May, C., Cotterell, R., Van Durme, B.: An analysis of lemmatization on topic models of morphologically rich language (2016). arXiv preprint arXiv:1608.03995

  30. Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)

    Google Scholar 

  31. Ogden, C.K., Richards, I.A.: The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism, vol. 29. K. Paul, Trench, Trubner & Company, Limited, London (1923). https://doi.org/10.1038/111566b0

  32. Özgür, A., Özgür, L., Güngör, T.: Text categorization with class-based and corpus-based keyword selection. In: International Symposium on Computer and Information Sciences, pp. 606–615. Springer (2005). https://doi.org/10.1007/11569596_63

  33. Pal, A.R., Maiti, P.K., Saha, D.: An approach to automatic text summarization using simplified Lesk algorithm and wordnet. Int. J. Control. Theory Comput. Model. 3 (2013). https://doi.org/10.5121/ijctcm.2013.3502

  34. Peirce, C.S.: Collected Papers of Charles S. Peirce. In: Hartshorne, C., Weiss, P., Burks, A.W. (eds.) (1932)

    Google Scholar 

  35. Ravindra, G.: Information theoretic approach to extractive text summarization. Ph.D. thesis, Supercomputer Education and Research Center, Indian Institute of Science, Bangalore (2009)

    Google Scholar 

  36. Richter, M., Kyogoku, Y., Kölbl, M.: Estimation of average information content: comparison of impact of contexts. In: Proceedings of SAI Intelligent Systems Conference, pp. 1251–1257. Springer (2019). https://doi.org/10.1007/978-3-030-29513-4_91

  37. Richter, M., Kyogoku, Y., Kölbl, M.: Interaction of information content and frequency as predictors of verbs’ lengths. In: International Conference on Business Information Systems, pp. 271–282. Springer (2019). https://doi.org/10.1007/978-3-030-20485-3

  38. Rietdorf, C., Kölbl, M., Kyogoku, Y., Richter, M.: Summarisation by information maps. A pilot study (2019). Submitted

    Google Scholar 

  39. Rogers, T.M.: Is Dretske’s Theory of Information Naturalistically Grounded? How emergent communication channels reference an abstracted ontic framework (2007). https://www.researchgate.net/publication/326561084. Unpublished

  40. Rooth, M.: Association with focus. Ph.D. thesis, Department of Linguistics, University of Massachusetts, Amherst (1985). Unpublished

    Google Scholar 

  41. Rooth, M.: A theory of focus interpretation. Nat. Lang. Semant. 1(1), 75–116 (1992). https://doi.org/10.1007/BF02342617

  42. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0

  43. Schofield, A., Mimno, D.: Comparing apples to apple: the effects of stemmers on topic models. Trans. Assoc. Comput. Linguist. 4, 287–300 (2016). https://doi.org/10.1162/tacl_a_00099

  44. Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (1948). https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

  45. Sowa, J.F., Way, E.C.: Implementing a semantic interpreter using conceptual graphs. IBM J. Res. Dev. 30(1), 57–69 (1986). https://doi.org/10.1147/rd.301.0057

  46. Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities (1994). arXiv preprint arXiv:cmp-lg/9411029

  47. Tixier, A., Malliaros, F., Vazirgiannis, M.: A graph degeneracy-based approach to keyword extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1860–1870 (2016).https://doi.org/10.18653/v1/D16-1191

  48. Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000). https://doi.org/10.1023/A:1009976227802

  49. Vijayarajan, V., Dinakaran, M., Tejaswin, P., Lohani, M.: A generic framework for ontology-based information retrieval and image retrieval in web data. Hum.-Centric Comput. Inf. Sci. 6(1), 18 (2016). https://doi.org/10.1186/s13673-016-0074-1

  50. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automated keyphrase extraction. In: Design and Usability of Digital Libraries: Case Studies in the Asia Pacific, pp. 129–152. IGI Global, Pennsylvania (2005)

    Google Scholar 

  51. Yang, Z., Nyberg, E.: Leveraging procedural knowledge for task-oriented search. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 513–522 (2015). https://doi.org/10.1145/2766462.2767744

  52. Zhang, Q., Wang, Y., Gong, Y., Huang, X.J.: Keyphrase extraction using deep recurrent neural networks on Twitter. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845 (2016). https://doi.org/10.18653/v1/D16-1080

Download references

Acknowledgements

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), project number: 357550571. The training of the neural network was done on the High Performance Computing (HPC) Cluster of the Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) of the Technische Universität Dresden. Thanks to Caitlin Hazelwood for proofreading this chapter. This chapter is an extended version from the initial paper with the title ‘Keyword extraction in German: Information-theory vs. deep learning’ published in Proceedings of the 12th International Conference on Agents and Artificial Intelligence (Vol. 1), 459–464, ICAART 2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Nathanael Philipp .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kölbl, M., Kyogoku, Y., Philipp, J.N., Richter, M., Rietdorf, C., Yousef, T. (2021). The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German. In: Loukanova, R. (eds) Natural Language Processing in Artificial Intelligence—NLPinAI 2020. Studies in Computational Intelligence, vol 939. Springer, Cham. https://doi.org/10.1007/978-3-030-63787-3_5

Download citation

Publish with us

Policies and ethics