Skip to main content

Towards Keyphrase Assignment for Texts in Portuguese Language

  • Conference paper
  • First Online:
  • 593 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

Abstract

Keyphrase assignment has often been confounded with keyphrase extraction, since the basic hypothesis is that a keyphrase of a text must be extracted from this text. Typically, keyphrase extraction approaches use a training set restricted to textual terms, reducing the learning capabilities of any inductive algorithm. Our research investigates ways to improve the accuracy of the keyphrase assignment systems for texts in Portuguese language by allowing classification algorithms to learn from non-textual terms as well. The basic assumption we have followed is that non-textual terms can be included into the training set by inference from an eventual semantic relationship with textual terms. In order to discover the latent relationship between non-textual and textual terms, we use deductive strategies to be applied in Portuguese common sense bases such as Wikipedia and InferenceNet. We show that algorithms that follow our approach outperform others that do not use the same methods introduced here.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hulth, A., Megyesi, B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL 2006, pp. 537–544 (2006)

    Google Scholar 

  2. Yih, W., Goodman, J., Carvalho, V.R.: Finding advertising keywords on web pages. In: Proceedings of the 15th International Conference on World Wide Web, WWW 2006, pp. 213–222. ACM, New York (2006). http://dx.doi.org/10.1145/1135777.1135813

  3. Zhang, Y., Zincir-Heywood, N., Milios, E.: World Wide Web site summarization. Web Intell. Agent Syst. 2, 39–53 (2004)

    Google Scholar 

  4. Turney, P.: Learning algorithms for keyphrase extraction. Inf. Retrieval 2, 303–336 (2000)

    Article  Google Scholar 

  5. Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262–1273 (2014)

    Google Scholar 

  6. Silveira, R., Furtado, V., Pinheiro, V.: Using non-textual terms for boosting document keyphrase assignment. In: Proceedings of the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2015 (2015)

    Google Scholar 

  7. Li, Y., Bandar, Z.A., Mclean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15, 871–882 (2003)

    Article  Google Scholar 

  8. Pinheiro, V., Pequeno, T., Furtado, V., Franco, W.: InferenceNet.Br: expression of inferentialist semantic content of the Portuguese language. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., Lima, V.L.S. (eds.) PROPOR 2010. LNCS, vol. 6001, pp. 90–99. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Pinheiro, V., Furtado, V., Pequeno, T., Nogueira, D.: Natural language processing based on semantic inferentialism for extracting crime information from text. In: IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 19–24. IEEE (2010)

    Google Scholar 

  10. Brandom, R.: Articulating Reasons: An Introduction to Inferentialism. Harvard University Press, Cambridge (2001)

    Google Scholar 

  11. Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 3, pp. 1318–1327. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  12. Mihalcea, R., Andra, C.: Wikify!: Linking documents to encyclopedic knowledge. In: CIKM 2007, Lisbon, Portugal, pp. 233–242 (2007)

    Google Scholar 

  13. Turney, P.D.: Coherent keyphrase extraction via web mining. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI 2003, pp. 434–439. Morgan Kaufmann Publishers Inc., San Francisco (2003)

    Google Scholar 

  14. Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden. Morgan Kaufmann Publishers, San Francisco (1999)

    Google Scholar 

  15. Milne, D., Witten, I.H.: Learning to link with Wikipedia, pp. 509–518. ACM (2008)

    Google Scholar 

  16. Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts, pp. 404–411. Association for Computational Linguistics, Barcelona (2004)

    Google Scholar 

  17. Grineva, M., Grivev, M., Lizorkin, D.: Extracting key terms from noisy and multi-theme documents. In: Proceedings of 18th International Conference on World Wide Web, New York, USA, pp. 661–670 (2009)

    Google Scholar 

  18. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  19. Pinheiro, V., Furtado, V., Freire, L.M., Ferreira, C.: knowledge-intensive word disambiguation via common-sense and wikipedia. In: Barros, L.N., Finger, M., Pozo, A.T., Gimenénez-Lugo, G.A., Castilho, M. (eds.) SBIA 2012. LNCS, vol. 7589, pp. 182–191. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  20. Manning, C.D., Surdeanum, M., Bauer, J., et al.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2014)

    Google Scholar 

  21. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutermann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  22. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)

    MATH  Google Scholar 

  23. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raquel Silveira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Silveira, R., Furtado, V., Pinheiro, V. (2016). Towards Keyphrase Assignment for Texts in Portuguese Language. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41552-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41551-2

  • Online ISBN: 978-3-319-41552-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics