Automatic Keyphrases Extraction from Document Using Neural Network

  • Jiabing Wang
  • Hong Peng
  • Jing-song Hu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3930)


Keyphrase extraction is a task with many applications in information retrieval, text mining, and natural language processing. In this paper, a keyphrase extraction approach based on neural network is proposed. To determine whether a phrase is a keyphrase, the following features of a phrase in a given document are adopted: its term frequency and inverted document frequency, whether to appear in the title or headings (subheadings) of the given document, and its frequency appearing in the paragraphs of the given document. The algorithm is evaluated by the standard information retrieval metrics of precision and recall, and human assessment.


Hide Layer Digital Library Noun Phrase Inductive Logic Programming Candidate Phrase 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Azcarraga, A.P., Yap Jr., T., Chua, T.S., Chua, T.S.: Comparing Keyword Extraction Techniques for WEBSOM Text Archives. International Journal on Artificial Intelligence Tools 11(2), 219–232 (2002)CrossRefGoogle Scholar
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Publishing Company, Reading (1999)Google Scholar
  3. 3.
    Barker, K., Cornacchia, N.: Using Noun Phrase Heads to Extract Document Keyphrases. In: Hamilton, H.J. (ed.) Canadian AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Chien, L.F.: PAT-tree-based Adaptive Keyphrase Extraction for Intelligent Chinese Information Retrieval. Information Processing and Management 35, 501–521 (1999)CrossRefGoogle Scholar
  5. 5.
    Freeman, J.A., Skapura, D.M.: Neural Networks: Algorithms, Applications, and Programming Techniques. Addison-Wesley Publishing Company, Reading (1992)Google Scholar
  6. 6.
    Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39, 169–202 (2000)MATHCrossRefGoogle Scholar
  7. 7.
    Gayo-Avello, D., Álvarez-Gutiérrez, D., Gayo-Avello, J.: Naïve Algorithms for Keyphrase Extraction and Text Summarization from a Single Document Inspired by the Protein Biosynthesis Process. In: Ijspeert, A.J., Murata, M., Wakamiya, N. (eds.) BioADIT 2004. LNCS, vol. 3141, pp. 440–455. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    HaCohen-Kerner, Y.: Automatic Extraction of Keywords from Abstracts. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS (LNAI), vol. 2773, pp. 843–849. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    HaCohen-Kerner, Y., Gross, Z., Masa, A.: Automatic Extraction and Learning of Keyphrases from Scientific Articles. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 657–669. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Hagan, M.T., Menhaj, M.: Training Feed-forward Networks with the Marquardt Algorithm. IEEE Transactions on Neural Networks 5(6), 989–993 (1994)CrossRefGoogle Scholar
  11. 11.
    He, J., Tan, A.-H., Tan, C.-L.: On Machine Learning Methods for Chinese Document Keyphrases Categorization. Applied Intelligence 18, 311–322 (2003)MATHCrossRefGoogle Scholar
  12. 12.
    Hulth, A., Karlgren, J., Jonsson, A., Boström, H.: Automatic Keyword Extraction Using Domain Knowledge. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 472–482. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  13. 13.
    Ikeda, D., Hirokawa, S.: Extracting Positive and Negative Keywords for Web Communities. In: Morishita, S., Arikawa, S. (eds.) DS 2000. LNCS (LNAI), vol. 1967, pp. 299–303. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  14. 14.
    Jones, S., Paynter, G.W.: Automatic Extraction of Document Keyphrases for Use in Digital Libraries: Evaluation and Applications. Journal of the American Society for Information Science and Technology 53(8), 653–677 (2002)CrossRefGoogle Scholar
  15. 15.
    Martínez-Fernández, J.L., Gacía-Serrano, A., Martínez, P., Villena, J.: Automatic Keyword Extraction for News Finder. In: Nürnberger, A., Detyniecki, M. (eds.) AMR 2003. LNCS, vol. 3094, pp. 99–119. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Matsuo, Y., Ohsawa, Y., Ishizuka, M.: KeyWorld: Extracting Keywords from a Document as a Small World. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 271–281. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  17. 17.
    Rydberg-Cox, J.A.: Keyword Extraction from Ancient Greek Literacy Texts. Literary and Linguistic Computing 17(2), 231–244 (2002)CrossRefGoogle Scholar
  18. 18.
    Soderland, S.: Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning 34, 233–272 (1999)MATHCrossRefGoogle Scholar
  19. 19.
    Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 303–336 (2000)CrossRefGoogle Scholar
  20. 20.
    Witten, I.H., Paynter, G.W., Frank, E., et al.: KEA: Practical Automatic Keyphrase Extraction. In: Fox, E.A., Rowe, N. (eds.) Proceedings of Digital Libraries 1999: The Fourth ACM Conference on Digital Libraries, pp. 254–255. ACM Press, Berkeley (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jiabing Wang
    • 1
  • Hong Peng
    • 1
  • Jing-song Hu
    • 1
  1. 1.School of Computer Science and EngineeringSouth China University of TechnologyGuangzhouChina

Personalised recommendations