Towards Keyphrase Assignment for Texts in Portuguese Language

Silveira, Raquel; Furtado, Vasco; Pinheiro, Vládia

doi:10.1007/978-3-319-41552-9_17

Towards Keyphrase Assignment for Texts in Portuguese Language

Raquel Silveira¹⁸,
Vasco Furtado¹⁸ &
Vládia Pinheiro¹⁸

Conference paper
First Online: 21 June 2016

593 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

Abstract

Keyphrase assignment has often been confounded with keyphrase extraction, since the basic hypothesis is that a keyphrase of a text must be extracted from this text. Typically, keyphrase extraction approaches use a training set restricted to textual terms, reducing the learning capabilities of any inductive algorithm. Our research investigates ways to improve the accuracy of the keyphrase assignment systems for texts in Portuguese language by allowing classification algorithms to learn from non-textual terms as well. The basic assumption we have followed is that non-textual terms can be included into the training set by inference from an eventual semantic relationship with textual terms. In order to discover the latent relationship between non-textual and textual terms, we use deductive strategies to be applied in Portuguese common sense bases such as Wikipedia and InferenceNet. We show that algorithms that follow our approach outperform others that do not use the same methods introduced here.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Hulth, A., Megyesi, B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL 2006, pp. 537–544 (2006)
Google Scholar
Yih, W., Goodman, J., Carvalho, V.R.: Finding advertising keywords on web pages. In: Proceedings of the 15th International Conference on World Wide Web, WWW 2006, pp. 213–222. ACM, New York (2006). http://dx.doi.org/10.1145/1135777.1135813
Zhang, Y., Zincir-Heywood, N., Milios, E.: World Wide Web site summarization. Web Intell. Agent Syst. 2, 39–53 (2004)
Google Scholar
Turney, P.: Learning algorithms for keyphrase extraction. Inf. Retrieval 2, 303–336 (2000)
Article Google Scholar
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262–1273 (2014)
Google Scholar
Silveira, R., Furtado, V., Pinheiro, V.: Using non-textual terms for boosting document keyphrase assignment. In: Proceedings of the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2015 (2015)
Google Scholar
Li, Y., Bandar, Z.A., Mclean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15, 871–882 (2003)
Article Google Scholar
Pinheiro, V., Pequeno, T., Furtado, V., Franco, W.: InferenceNet.Br: expression of inferentialist semantic content of the Portuguese language. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., Lima, V.L.S. (eds.) PROPOR 2010. LNCS, vol. 6001, pp. 90–99. Springer, Heidelberg (2010)
Chapter Google Scholar
Pinheiro, V., Furtado, V., Pequeno, T., Nogueira, D.: Natural language processing based on semantic inferentialism for extracting crime information from text. In: IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 19–24. IEEE (2010)
Google Scholar
Brandom, R.: Articulating Reasons: An Introduction to Inferentialism. Harvard University Press, Cambridge (2001)
Google Scholar
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 3, pp. 1318–1327. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Mihalcea, R., Andra, C.: Wikify!: Linking documents to encyclopedic knowledge. In: CIKM 2007, Lisbon, Portugal, pp. 233–242 (2007)
Google Scholar
Turney, P.D.: Coherent keyphrase extraction via web mining. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI 2003, pp. 434–439. Morgan Kaufmann Publishers Inc., San Francisco (2003)
Google Scholar
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden. Morgan Kaufmann Publishers, San Francisco (1999)
Google Scholar
Milne, D., Witten, I.H.: Learning to link with Wikipedia, pp. 509–518. ACM (2008)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts, pp. 404–411. Association for Computational Linguistics, Barcelona (2004)
Google Scholar
Grineva, M., Grivev, M., Lizorkin, D.: Extracting key terms from noisy and multi-theme documents. In: Proceedings of 18th International Conference on World Wide Web, New York, USA, pp. 661–670 (2009)
Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Pinheiro, V., Furtado, V., Freire, L.M., Ferreira, C.: knowledge-intensive word disambiguation via common-sense and wikipedia. In: Barros, L.N., Finger, M., Pozo, A.T., Gimenénez-Lugo, G.A., Castilho, M. (eds.) SBIA 2012. LNCS, vol. 7589, pp. 182–191. Springer, Heidelberg (2012)
Chapter Google Scholar
Manning, C.D., Surdeanum, M., Bauer, J., et al.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2014)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutermann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)
MATH Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Programa de Pós-Graduação em Informática Aplicada, Universidade de Fortaleza, Av. Washington Soares, 1321, Fortaleza, Ceará, Brazil
Raquel Silveira, Vasco Furtado & Vládia Pinheiro

Authors

Raquel Silveira
View author publications
You can also search for this author in PubMed Google Scholar
Vasco Furtado
View author publications
You can also search for this author in PubMed Google Scholar
Vládia Pinheiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raquel Silveira .

Editor information

Editors and Affiliations

Universidade de Lisbon, Portugal
João Silva
ISCTE-IUL, Lisbon, Portugal
Ricardo Ribeiro
Universidade de Évora, Évora, Portugal
Paulo Quaresma
Universidade de Caxias do Sul, Caxias do Suö, Brazil
André Adami
Universidade de Lisbon, Lisboa, Portugal
António Branco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Silveira, R., Furtado, V., Pinheiro, V. (2016). Towards Keyphrase Assignment for Texts in Portuguese Language. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-41552-9_17
Published: 21 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41551-2
Online ISBN: 978-3-319-41552-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics