Advertisement

A Generic Classifier-Ensemble Approach for Biomedical Named Entity Recognition

  • Zhihua Liao
  • Zili Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7301)

Abstract

In named entity recognition (NER) for biomedical literature, approaches based on combined classifiers have demonstrated great performance improvement compared to a single (best) classifier. This is mainly owed to sufficient level of diversity exhibited among classifiers, which is a selective property of classifier set. Given a large number of classifiers, how to select different classifiers to put into a classifier-ensemble is a crucial issue of multiple classifier-ensemble design. With this observation in mind, we proposed a generic genetic classifier-ensemble method for the classifier selection in biomedical NER. Various diversity measures and majority voting are considered, and disjoint feature subsets are selected to construct individual classifiers. A basic type of individual classifier – Support Vector Machine (SVM) classifier is adopted as SVM-classifier committee. A multi-objective Genetic algorithm (GA) is employed as the classifier selector to facilitate the ensemble classifier to improve the overall sample classification accuracy. The proposed approach is tested on the benchmark dataset – GENIA version 3.02 corpus, and compared with both individual best SVM classifier and SVM-classifier ensemble algorithm as well as other machine learning methods such as CRF, HMM and MEMM. The results show that the proposed approach outperforms other classification algorithms and can be a useful method for the biomedical NER problem.

Keywords

Support Vector Machine Hide Markov Model Natural Language Processing Ensemble Method Conditional Random Field 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zhou, G., Su, J.: Exploring Deep Knowledge Resources in Biomedical Name Recognition. In: Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA 2004), pp. 70–75 (2004)Google Scholar
  2. 2.
    Finkel, J., Dingare, S., Nguyen, H., Nissim, M., Sinclair, G., Manning, C.: Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web. In: Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications, JNLPBA 2004 (2004)Google Scholar
  3. 3.
    Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Novel Feature Sets. In: Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA 2004), pp. 104–107 (2004)Google Scholar
  4. 4.
    Song, Y., Kim, E., Lee, G.-G., Yi, B.-K.: POSBIOTM-NER in the shared task of BioNLP/NLPBA 2004. In: Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications, JNLPBA 2004 (2004)Google Scholar
  5. 5.
    Zhao, S.: Name Entity Recognition in Biomedical Text using a HMM model. In: Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA 2004), pp. 84–87 (2004)Google Scholar
  6. 6.
    Zhang, Z., Yang, P.: An ensemble of classifiers with genetic algorithm based feature selection. IEEE Intelligent Informatics Bulletin 9, 18–24 (2008)Google Scholar
  7. 7.
    Yang, P., Zhang, Z., Zhou, B.B., Zomaya, A.Y.: Sample Subset Optimization for Classifying Imbalanced Biological Data. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS(LNAI), vol. 6635, pp. 333–344. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Yang, P., Yang, Y.-H., Zhou, B.B., Zomaya, A.Y.: A review of ensemble methods in bioinformatics. Current Bioinformatics 5, 296–308 (2010)CrossRefGoogle Scholar
  9. 9.
    Yang, P., Ho, J.W.K., Zomaya, A.Y., Zhou, B.B.: A genetic ensemble approach for gene-gene interaction identification. BMC Bioinformatics 11, 524 (2010)CrossRefGoogle Scholar
  10. 10.
    Kuncheva, L.I., Jain, L.C.: Designing classifier fusion systems by genetic algorithms. IEEE Transaction on Evolutionary Computation 4(4) (September 2000)Google Scholar
  11. 11.
    Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–5. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  12. 12.
    Ruta, D., Gabrys, B.: Application of the Evolutionary Algorithms for Classifier Selection in Multiple Classifier Systems with Majority Voting. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 399–408. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  13. 13.
    Larkey, L.S., Croft, W.B.: Combining classifier in text categorization. In: SIGIR 1996, pp. 289–297 (1996)Google Scholar
  14. 14.
    Patrick, J., Wang, Y.: Biomedical Named Entity Recognition System. In: Proceedings of the 10th Australasian Document Computing Symposium (2005)Google Scholar
  15. 15.
    Tsai, T.-H., Wu, C.-W., Hsu, W.-L.: Using Maximum Entropy to Extract Biomedical Named Entities without Dictionaries. In: JNLPBA 2006, pp. 268–273 (2006)Google Scholar
  16. 16.
    Chan, S.-K., Lam, W., Yu, X.: A Cascaded Approach to Biomedical Named Entity Recognition Using a Unified Model. In: The 7th IEEE International Conference on Data Mining, pp. 93–102Google Scholar
  17. 17.
    Dimililer, N., Varoğlu, E.: Recognizing Biomedical Named Entities Using SVMs: Improving Recognition Performance with a Minimal Set of Features. In: Bremer, E.G., Hakenberg, J., Han, E.-H(S.), Berrar, D., Dubitzky, W. (eds.) KDLL 2006. LNCS (LNBI), vol. 3886, pp. 53–67. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Kazamay, J.-I., Makinoz, T., Ohta, Y., Tsujiiy, J.-I.: Tuning Support Vector Machines for Biomedical Named Entity Recognition. In: ACL NLP, pp. 1–8 (2002)Google Scholar
  19. 19.
    Mitsumori, T., Fation, S., Murata, M., Doi, K., Doi, H.: Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics 6(suppl. 1) (2005)Google Scholar
  20. 20.
    Dimililer, N., Varoğlu, E., Altınçay, H.: Vote-Based Classifier Selection for Biomedical NER Using Genetic Algorithms. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007, Part II. LNCS, vol. 4478, pp. 202–209. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Dimililer, N., Varoglu, E., Altmcay, H.: Classifier subset selection for biomedical named entity recognition. Appl. Intell., 267–282 (2009)Google Scholar
  22. 22.
    Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 1, 63–81 (2005)CrossRefGoogle Scholar
  23. 23.
    Yang, T., Kecman, V., Cao, L., Zhang, C., Huang, J.Z.: Margin-based ensemble classifier for protein fold recognition. Expert Syst. Appl. 38(10), 12348–12355 (2011)CrossRefGoogle Scholar
  24. 24.
    Zhang, P., Zhu, X., Shi, Y., Wu, X.: An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 1021–1029. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  25. 25.
    John, H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to biology, control and artificial intelligence. MIT Press (1998) ISBN 0-262-58111-6Google Scholar
  26. 26.
    Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the Bio-Entity Recognition Task at JNLPBA. In: Proceedings of the International Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA 2004), pp. 70–75 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Zhihua Liao
    • 1
  • Zili Zhang
    • 2
    • 3
  1. 1.Modern Foreign-Language Education Technology Center, Foreign Studies CollegeHunan Normal UniversityChina
  2. 2.Faculty of Computer and Information ScienceSouthwest UniversityChina
  3. 3.School of Information TechnologyDeakin UniversityAustralia

Personalised recommendations