Abstract
Keyphrase assignment has often been confounded with keyphrase extraction, since the basic hypothesis is that a keyphrase of a text must be extracted from this text. Typically, keyphrase extraction approaches use a training set restricted to textual terms, reducing the learning capabilities of any inductive algorithm. Our research investigates ways to improve the accuracy of the keyphrase assignment systems for texts in Portuguese language by allowing classification algorithms to learn from non-textual terms as well. The basic assumption we have followed is that non-textual terms can be included into the training set by inference from an eventual semantic relationship with textual terms. In order to discover the latent relationship between non-textual and textual terms, we use deductive strategies to be applied in Portuguese common sense bases such as Wikipedia and InferenceNet. We show that algorithms that follow our approach outperform others that do not use the same methods introduced here.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hulth, A., Megyesi, B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL 2006, pp. 537–544 (2006)
Yih, W., Goodman, J., Carvalho, V.R.: Finding advertising keywords on web pages. In: Proceedings of the 15th International Conference on World Wide Web, WWW 2006, pp. 213–222. ACM, New York (2006). http://dx.doi.org/10.1145/1135777.1135813
Zhang, Y., Zincir-Heywood, N., Milios, E.: World Wide Web site summarization. Web Intell. Agent Syst. 2, 39–53 (2004)
Turney, P.: Learning algorithms for keyphrase extraction. Inf. Retrieval 2, 303–336 (2000)
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262–1273 (2014)
Silveira, R., Furtado, V., Pinheiro, V.: Using non-textual terms for boosting document keyphrase assignment. In: Proceedings of the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2015 (2015)
Li, Y., Bandar, Z.A., Mclean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15, 871–882 (2003)
Pinheiro, V., Pequeno, T., Furtado, V., Franco, W.: InferenceNet.Br: expression of inferentialist semantic content of the Portuguese language. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., Lima, V.L.S. (eds.) PROPOR 2010. LNCS, vol. 6001, pp. 90–99. Springer, Heidelberg (2010)
Pinheiro, V., Furtado, V., Pequeno, T., Nogueira, D.: Natural language processing based on semantic inferentialism for extracting crime information from text. In: IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 19–24. IEEE (2010)
Brandom, R.: Articulating Reasons: An Introduction to Inferentialism. Harvard University Press, Cambridge (2001)
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 3, pp. 1318–1327. Association for Computational Linguistics, Stroudsburg (2009)
Mihalcea, R., Andra, C.: Wikify!: Linking documents to encyclopedic knowledge. In: CIKM 2007, Lisbon, Portugal, pp. 233–242 (2007)
Turney, P.D.: Coherent keyphrase extraction via web mining. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI 2003, pp. 434–439. Morgan Kaufmann Publishers Inc., San Francisco (2003)
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden. Morgan Kaufmann Publishers, San Francisco (1999)
Milne, D., Witten, I.H.: Learning to link with Wikipedia, pp. 509–518. ACM (2008)
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts, pp. 404–411. Association for Computational Linguistics, Barcelona (2004)
Grineva, M., Grivev, M., Lizorkin, D.: Extracting key terms from noisy and multi-theme documents. In: Proceedings of 18th International Conference on World Wide Web, New York, USA, pp. 661–670 (2009)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Pinheiro, V., Furtado, V., Freire, L.M., Ferreira, C.: knowledge-intensive word disambiguation via common-sense and wikipedia. In: Barros, L.N., Finger, M., Pozo, A.T., Gimenénez-Lugo, G.A., Castilho, M. (eds.) SBIA 2012. LNCS, vol. 7589, pp. 182–191. Springer, Heidelberg (2012)
Manning, C.D., Surdeanum, M., Bauer, J., et al.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2014)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutermann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Silveira, R., Furtado, V., Pinheiro, V. (2016). Towards Keyphrase Assignment for Texts in Portuguese Language. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-41552-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41551-2
Online ISBN: 978-3-319-41552-9
eBook Packages: Computer ScienceComputer Science (R0)