Skip to main content

Automatic Extraction and Learning of Keyphrases from Scientific Articles

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Abstract

Many academic journals and conferences require that each article include a list of keyphrases. These keyphrases should provide general information about the contents and the topics of the article. Keyphrases may save precious time for tasks such as filtering, summarization, and categorization. In this paper, we investigate automatic extraction and learning of keyphrases from scientific articles written in English. Firstly, we introduce various baseline extraction methods. Some of them, formalized by us, are very successful for academic papers. Then, we integrate these methods using different machine learning methods. The best results have been achieved by J48, an improved variant of C4.5. These results are significantly better than those achieved by previous extraction systems, regarded as the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alterman, R.: Text Summarization. In: Shapiro, S.C. (ed.) Encyclopedia of Artificial Intelligence, pp. 1579–1587. John Wiley & Sons, New York (1992)

    Google Scholar 

  2. Brandow, B., Mitze, K., Rau, L.F.: Automatic Condensation of Electronic Publications by Sentence Selection. Information Processing and Management 31(5), 675–685 (1994)

    Article  Google Scholar 

  3. D’Avanzo, E., Magnini, B., Vallin, A.: Keyphrase Extraction for Summarization Purposes: The LAKE System at DUC 2004. In: Document Understanding Workshop (2004)

    Google Scholar 

  4. Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of ACM-CIK International Conference on Information and Knowledge Management, pp. 148–155. ACM Press, Philadelphia (1998)

    Google Scholar 

  5. Edmundson, H.P.: New Methods in Automatic Extraction. Journal of the ACM 16(2), 264–285 (1969)

    Article  MATH  Google Scholar 

  6. Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-Specific Key-Phrase Extraction. In: Proc. IJCAI, pp. 668–673. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  7. Gelbukh, A., Sidorov, G., Guzmán-Arenas, A.: A Method of Describing Document Contents through Topic Selection. In: Proc. SPIRE 1999, International Symposium on String Processing and Information Retrieval, Mexico, pp. 73–80 (1999)

    Google Scholar 

  8. Gelbukh, A., Sidorov, G., Han, S.-Y., Hernandez-Rubio, E.: Automatic Syntactic Analysis for Detection of Word Combinations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 243–247. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. HaCohen-Kerner, Y.: Automatic Extraction of Keywords from Abstracts. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS, vol. 2773, pp. 843–849. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. HaCohen-Kerner, Y., Malin, E., Chasson, I.: Summarization of Jewish Law Articles in Hebrew. In: Proceedings of the 16th International Conference on Computer Applications in Industry and Engineering, pp. 172–177. International Society for Computers and Their Applications (ISCA), Las Vegas (2003)

    Google Scholar 

  11. HaCohen-Kerner, Y., Stern, I., Korkus, D.: Baseline Keyphrase Extraction Methods from Hebrew News HTML Documents. WSEAS Transactions on Information Science and Applications 6(1), 1557–1562 (2004)

    Google Scholar 

  12. Hulth, A.: Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)

    Google Scholar 

  13. Hulth, A.: Reducing False Positives by Expert Combination in Automatic Keyword Indexing. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP 2003), Borovets, pp. 197–203 (2003)

    Google Scholar 

  14. Humphreys, K.J.B.: Phraserate: An HTML Keyphrase Extractor. Technical report, University of California, Riverside, California (2002)

    Google Scholar 

  15. Jones, S., Paynter, G.W.: Automatic Extraction of Document Keyphrases for Use in Digital Libraries: Evaluation and Applications. Journal of the American Society for Information Science and Technology 53(8), 653–677 (2002)

    Article  Google Scholar 

  16. Kupiec, J., Pederson, J., Chen, F.: A Trainable Document Summarizer. In: Proceedings of the 18th Annual International ACM SIGIR, pp. 68–73 (1995)

    Google Scholar 

  17. Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  18. Mani, I., Maybury, M.T.: Advances in Automatic Text Summarization, pp. ix–xv. MIT Press, Cambridge (1999)

    Google Scholar 

  19. Neto, J.L., Freitas, A.A., Kaestner, C.A.A.: Automatic Text Summarization Using a Machine Learning Approach. In: Bittencourt, G., Ramalho, G.L. (eds.) SBIA 2002. LNCS (LNAI), vol. 2507, pp. 205–215. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  20. Quinlan, J.R.: C4.5: Programs For Machine Learning. Morgan Kaufmann, Los Altos (1993)

    Google Scholar 

  21. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Upper Saddle River (1995)

    MATH  Google Scholar 

  22. Turney, P.: Learning Algorithms for Keyphrase Extraction. Information Retrieval Journal 2(4), 303–336 (2000)

    Article  Google Scholar 

  23. Turney, P.: Coherent Keyphrase Extraction via Web Mining. In: Proceedings of IJCAI 2003, pp. 434–439 (2003)

    Google Scholar 

  24. Wu, J., Agogino, A.M.: Automating Keyphrase Building with Multi-Objective Genetic Algorithms. In: Proceedings of the 37th Annual Hawaii International Conference on System Science, HICSS, pp. 104–111 (2003)

    Google Scholar 

  25. Weka (2004), http://www.cs.waikato.ac.nz/~ml/weka

  26. Yang, Y., Webb, G.I.: Weighted Proportional k-Interval Discretization for Naïve-Bayes Classifiers. In: Proceedings of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 501–512 (2003)

    Google Scholar 

  27. Zhang, Y., Milios, E., Zincir-Heywood, N.: A Comparison of Keyword- and Keyterm-based Methods for Automatic Web Site Summarization, in Technical Report WS-04-01, Papers from the on Adaptive Text Extraction and Mining, San Jose, CA, pp. 15–20 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

HaCohen-Kerner, Y., Gross, Z., Masa, A. (2005). Automatic Extraction and Learning of Keyphrases from Scientific Articles. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_74

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30586-6_74

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24523-0

  • Online ISBN: 978-3-540-30586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics