Advertisement

Ensemble Learning for Keyphrases Extraction from Scientific Document

  • Jiabing Wang
  • Hong Peng
  • Jing-song Hu
  • Jun Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3971)

Abstract

Keyphrase extraction is a task with many applications in information retrieval, text mining, and natural language processing. In this paper, a keyphrase extraction approach based on neural network ensemble is proposed. To determine whether a phrase is a keyphrase, the following features of a phrase in a given document are adopted: its term frequency, whether to appear in the title, abstract or headings (subheadings), and its frequency appearing in the paragraphs of the given document. The approach is evaluated by the standard information retrieval metrics of precision and recall. Experiment results show that the ensemble learning can significantly increase the precision and recall.

Keywords

Neural Network Class Label Feature Subset Ensemble Learn AdaBoost Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barker, K., Cornacchia, N.: Using Noun Phrase Heads to Extract Document Keyphrases. In: Hamilton, H.J. (ed.) Canadian AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  2. 2.
    Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science. University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.htm Google Scholar
  3. 3.
    Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)MATHMathSciNetGoogle Scholar
  4. 4.
    Dietterich, T.G.: An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40, 139–157 (2000)CrossRefGoogle Scholar
  5. 5.
    Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39, 169–202 (2000)MATHCrossRefGoogle Scholar
  6. 6.
    Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-line Learning and An Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    HaCohen-Kerner, Y., Gross, Z., Masa, A.: Automatic Extraction and Learning of Keyphrases from Scientific Articles. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 657–669. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Hagan, M.T., Menhaj, M.: Training Feed-forward Networks with the Marquardt Algorithm. IEEE Transactions on Neural Networks 5(6), 989–993 (1994)CrossRefGoogle Scholar
  9. 9.
    Hansen, L., Salamon, K.P.: Neural Network Ensembles. IEEE Trans. on Pattern Analysis and Machine intelligence 12(10), 993–1001 (1990)CrossRefGoogle Scholar
  10. 10.
    Jones, S., Paynter, G.W.: Automatic Extraction of Document Keyphrases for Use in Digital Libraries: Evaluation and Applications. Journal of the American Society for Information Science and Technology 53(8), 653–677 (2002)CrossRefGoogle Scholar
  11. 11.
    Schapire, R.E., Singer, Y.: Improved Boosting Algorithms Using Confidence-Rated Predictions. Machine Learning 37, 297–336 (1999)MATHCrossRefGoogle Scholar
  12. 12.
    Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Diversity in Search Strategies for Ensemble Feature Selection. Information Fusion 6, 83–98 (2005)CrossRefGoogle Scholar
  13. 13.
    Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 303–336 (2000)CrossRefGoogle Scholar
  14. 14.
    Wang, J.B., Peng, H., Hu, J.-S.: Automatic Keyphrases Extraction from Document Using Backpropagation. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, pp. 3770–3774. IEEE Press, New York (2005)CrossRefGoogle Scholar
  15. 15.
    Wang, J.B., Peng, H.: Keyphrases Extraction from Web Document by the Least Squares Support Vector Machine. In: Skowron, A., Agrawal, R., Luck, M., et al. (eds.) Proceedings of 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 293–296. IEEE Computer Society Press, Los Almitos (2005)CrossRefGoogle Scholar
  16. 16.
    Zhou, Z.H., Wu, J.X., Tang, W.: Ensembling Neural Networks: Many Could Be Better Than All. Artificial Intelligence 137, 239–263 (2002)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jiabing Wang
    • 1
  • Hong Peng
    • 1
  • Jing-song Hu
    • 1
  • Jun Zhang
    • 2
  1. 1.School of Computer Science and EngineeringSouth China University of TechnologyGuangzhouChina
  2. 2.School of Information ScienceGuangdong Commerce CollegeGuangzhouChina

Personalised recommendations