Skip to main content

Semantic Measures for Keywords Extraction

  • Conference paper
  • First Online:
AI*IA 2017 Advances in Artificial Intelligence (AI*IA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10640))

Included in the following conference series:

Abstract

In this paper we introduce a minimalist hypothesis for keywords extraction: keywords can be extracted from text documents by considering concepts underlying document terms. Furthermore, central concepts are individuated as the concepts that are more related to title concepts. Namely, we propose five metrics, that are diverse in essence, to compute the centrality of concepts in the document body with respect to those in the title. We finally report about an experimentation over a popular data set of human annotated news articles; the results confirm the soundness of our hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://babelfy.org.

  2. 2.

    There is a subtle though neat difference between semantic relatedness and similarity: consider, e.g., that ‘eraser’ and ‘pencil’ are related but not similar, whilst ‘pencil’ and ‘pen’ are similar.

  3. 3.

    For the sake of clarity in this example we consider the lexical rather than the unified vector, i.e. having terms in place of conceptual IDs that are actually used by the system.

  4. 4.

    In order to compute such measures we used the Palmetto library [16].

  5. 5.

    Specifically, in the Palmetto implementation, the pointwise mutual information (PMI) and word co-occurrence counts were computed by using Wikipedia as reference corpus [16].

  6. 6.

    The ttcs \(^{\mathcal {E}}\) resource is available for download at the URL http://ttcs.di.unito.it.

  7. 7.

    Available at the URLs http://www.alchemyapi.com/api/keyword-extraction/, http://developer.zemanta.com/, http://www.opencalais.com/, http://TagMe.di.unipi.it/ and http://www.textrazor.com/, respectively.

References

  1. Camacho-Collados, J., Pilehvar, M.T., Navigli, R.: NASARI: a novel approach to a semantically-aware representation of items. In: Proceedings of NAACL, pp. 567–577 (2015)

    Google Scholar 

  2. Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)

    Article  MATH  Google Scholar 

  3. El-Beltagy, S.R., Rafea, A.: KP-Miner: a keyphrase extraction system for English and Arabic documents. Inf. Syst. 34(1), 132–144 (2009)

    Article  Google Scholar 

  4. Haggag, M.H.: Keyword extraction using semantic analysis. Int. J. Comput. Appl. 61(1), 1–6 (2013)

    MathSciNet  Google Scholar 

  5. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of EMNLP 2003, pp. 216–223 (2003)

    Google Scholar 

  6. Jean-Louis, L., Zouaq, A., Gagnon, M., Ensan, F.: An assessment of online semantic annotators for the keyword extraction task. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS (LNAI), vol. 8862, pp. 548–560. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13560-1_44

    Google Scholar 

  7. Lieto, A., Mensa, E., Radicioni, D.P.: A resource-driven approach for anchoring linguistic resources to conceptual spaces. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 435–449. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49130-1_32

    Chapter  Google Scholar 

  8. Marujo, L., Gershman, A., Carbonell, J.G., Frederking, R.E., Neto, J.P.: Supervised topical key phrase extraction of news stories using crowdsourcing and co-reference normalization. In: Proceedings of LREC, pp. 399–403. ELRA (2012)

    Google Scholar 

  9. Mensa, E., Radicioni, D.P., Lieto, A.: MERALI at SemEval-2017 task 2 subtask 1: a cognitively inspired approach. In: Proceedings of SemEval-2017, pp. 236–240. ACL (2017). http://www.aclweb.org/anthology/S17-2038

  10. Mensa, E., Radicioni, D.P., Lieto, A.: TTCS\(^{\cal{E}}\): a vectorial resource for computing conceptual similarity. In: EACL 2017 Workshop on Sense, Concept and Entity Representations and their Applications, pp. 96–101. ACL (2017). http://www.aclweb.org/anthology/W17-1912

  11. Mihalcea, R., Tarau, P.: Textrank: Bringing Order into Texts. Association for Computational Linguistics (2004)

    Google Scholar 

  12. Mimno, D.M., Wallach, H.M., Talley, E.M., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272. ACL (2011)

    Google Scholar 

  13. Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  14. Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the ACM/IEEE JCDL2010. ACM (2010)

    Google Scholar 

  15. Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet: similarity: measuring the relatedness of concepts. In: HLT-NAACL, pp. 38–41. ACL (2004)

    Google Scholar 

  16. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of WSDM 2015, pp. 399–408. ACM (2015)

    Google Scholar 

  17. Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Text Mining, pp. 1–20 (2010)

    Google Scholar 

  18. Stevens, K., Kegelmeyer, P., Andrzejewski, D., Buttler, D.: Exploring topic coherence over many models and many topics. In: Proceedings of EMNLP-CoNLL, pp. 952–961. ACL (2012)

    Google Scholar 

  19. Tsatsaronis, G., Varlamis, I., Nørvåg, K.: Semanticrank: ranking keywords and sentences using semantic graphs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1074–1082. ACL (2010)

    Google Scholar 

  20. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of JCDL, pp. 254–255. ACM (1999)

    Google Scholar 

Download references

Acknowledgements

We desire to thank Simone Donetti and the Technical Staff of the Computer Science Department of the University of Turin, for their support in the setup and administration of the computer system used in the experimentation.

The authors are also grateful to the anonymous reviewers for their valuable comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniele P. Radicioni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Colla, D., Mensa, E., Radicioni, D.P. (2017). Semantic Measures for Keywords Extraction. In: Esposito, F., Basili, R., Ferilli, S., Lisi, F. (eds) AI*IA 2017 Advances in Artificial Intelligence. AI*IA 2017. Lecture Notes in Computer Science(), vol 10640. Springer, Cham. https://doi.org/10.1007/978-3-319-70169-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70169-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70168-4

  • Online ISBN: 978-3-319-70169-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics