Semantic Measures for Keywords Extraction

Colla, Davide; Mensa, Enrico; Radicioni, Daniele P.

doi:10.1007/978-3-319-70169-1_10

Davide Colla¹⁷,
Enrico Mensa¹⁷ &
Daniele P. Radicioni¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10640))

Included in the following conference series:

Conference of the Italian Association for Artificial Intelligence

1407 Accesses
2 Citations

Abstract

In this paper we introduce a minimalist hypothesis for keywords extraction: keywords can be extracted from text documents by considering concepts underlying document terms. Furthermore, central concepts are individuated as the concepts that are more related to title concepts. Namely, we propose five metrics, that are diverse in essence, to compute the centrality of concepts in the document body with respect to those in the title. We finally report about an experimentation over a popular data set of human annotated news articles; the results confirm the soundness of our hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://babelfy.org.
2.
There is a subtle though neat difference between semantic relatedness and similarity: consider, e.g., that ‘eraser’ and ‘pencil’ are related but not similar, whilst ‘pencil’ and ‘pen’ are similar.
3.
For the sake of clarity in this example we consider the lexical rather than the unified vector, i.e. having terms in place of conceptual IDs that are actually used by the system.
4.
In order to compute such measures we used the Palmetto library [16].
5.
Specifically, in the Palmetto implementation, the pointwise mutual information (PMI) and word co-occurrence counts were computed by using Wikipedia as reference corpus [16].
6.
The ttcs \(^{\mathcal {E}}\) resource is available for download at the URL http://ttcs.di.unito.it.
7.
Available at the URLs http://www.alchemyapi.com/api/keyword-extraction/, http://developer.zemanta.com/, http://www.opencalais.com/, http://TagMe.di.unipi.it/ and http://www.textrazor.com/, respectively.

References

Camacho-Collados, J., Pilehvar, M.T., Navigli, R.: NASARI: a novel approach to a semantically-aware representation of items. In: Proceedings of NAACL, pp. 567–577 (2015)
Google Scholar
Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)
Article MATH Google Scholar
El-Beltagy, S.R., Rafea, A.: KP-Miner: a keyphrase extraction system for English and Arabic documents. Inf. Syst. 34(1), 132–144 (2009)
Article Google Scholar
Haggag, M.H.: Keyword extraction using semantic analysis. Int. J. Comput. Appl. 61(1), 1–6 (2013)
MathSciNet Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of EMNLP 2003, pp. 216–223 (2003)
Google Scholar
Jean-Louis, L., Zouaq, A., Gagnon, M., Ensan, F.: An assessment of online semantic annotators for the keyword extraction task. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS (LNAI), vol. 8862, pp. 548–560. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13560-1_44
Google Scholar
Lieto, A., Mensa, E., Radicioni, D.P.: A resource-driven approach for anchoring linguistic resources to conceptual spaces. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 435–449. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49130-1_32
Chapter Google Scholar
Marujo, L., Gershman, A., Carbonell, J.G., Frederking, R.E., Neto, J.P.: Supervised topical key phrase extraction of news stories using crowdsourcing and co-reference normalization. In: Proceedings of LREC, pp. 399–403. ELRA (2012)
Google Scholar
Mensa, E., Radicioni, D.P., Lieto, A.: MERALI at SemEval-2017 task 2 subtask 1: a cognitively inspired approach. In: Proceedings of SemEval-2017, pp. 236–240. ACL (2017). http://www.aclweb.org/anthology/S17-2038
Mensa, E., Radicioni, D.P., Lieto, A.: TTCS\(^{\cal{E}}\): a vectorial resource for computing conceptual similarity. In: EACL 2017 Workshop on Sense, Concept and Entity Representations and their Applications, pp. 96–101. ACL (2017). http://www.aclweb.org/anthology/W17-1912
Mihalcea, R., Tarau, P.: Textrank: Bringing Order into Texts. Association for Computational Linguistics (2004)
Google Scholar
Mimno, D.M., Wallach, H.M., Talley, E.M., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272. ACL (2011)
Google Scholar
Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)
Article MATH MathSciNet Google Scholar
Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the ACM/IEEE JCDL2010. ACM (2010)
Google Scholar
Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet: similarity: measuring the relatedness of concepts. In: HLT-NAACL, pp. 38–41. ACL (2004)
Google Scholar
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of WSDM 2015, pp. 399–408. ACM (2015)
Google Scholar
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Text Mining, pp. 1–20 (2010)
Google Scholar
Stevens, K., Kegelmeyer, P., Andrzejewski, D., Buttler, D.: Exploring topic coherence over many models and many topics. In: Proceedings of EMNLP-CoNLL, pp. 952–961. ACL (2012)
Google Scholar
Tsatsaronis, G., Varlamis, I., Nørvåg, K.: Semanticrank: ranking keywords and sentences using semantic graphs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1074–1082. ACL (2010)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of JCDL, pp. 254–255. ACM (1999)
Google Scholar

Download references

Acknowledgements

We desire to thank Simone Donetti and the Technical Staff of the Computer Science Department of the University of Turin, for their support in the setup and administration of the computer system used in the experimentation.

The authors are also grateful to the anonymous reviewers for their valuable comments and suggestions.

Author information

Authors and Affiliations

Dipartimento di Informatica, Università di Torino, Turin, Italy
Davide Colla, Enrico Mensa & Daniele P. Radicioni

Authors

Davide Colla
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Mensa
View author publications
You can also search for this author in PubMed Google Scholar
Daniele P. Radicioni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniele P. Radicioni .

Editor information

Editors and Affiliations

University of Bari, Bari, Italy
Floriana Esposito
University of Rome Tor Vergata, Rome, Italy
Roberto Basili
University of Bari, Bari, Italy
Stefano Ferilli
University of Bari, Bari, Italy
Francesca A. Lisi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Colla, D., Mensa, E., Radicioni, D.P. (2017). Semantic Measures for Keywords Extraction. In: Esposito, F., Basili, R., Ferilli, S., Lisi, F. (eds) AI*IA 2017 Advances in Artificial Intelligence. AI*IA 2017. Lecture Notes in Computer Science(), vol 10640. Springer, Cham. https://doi.org/10.1007/978-3-319-70169-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-70169-1_10
Published: 07 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70168-4
Online ISBN: 978-3-319-70169-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics