Advertisement

Soft Computing

, Volume 19, Issue 12, pp 3529–3549 | Cite as

MODE: multiobjective differential evolution for feature selection and classifier ensemble

  • Utpal Kumar Sikdar
  • Asif EkbalEmail author
  • Sriparna Saha
Focus

Abstract

In this paper, we propose a multiobjective differential evolution (MODE)-based feature selection and ensemble learning approaches for entity extraction in biomedical texts. The first step of the algorithm concerns with the problem of automatic feature selection in a machine learning framework, namely conditional random field. The final Pareto optimal front which is obtained as an output of the feature selection module contains a set of solutions, each of which represents a particular feature representation. In the second step of our algorithm, we combine a subset of these classifiers using a MODE-based ensemble technique. Our experiments on three benchmark datasets namely GENIA, GENETAG and AIMed show the F-measure values of 76.75, 94.15 and 91.91 %, respectively. Comparisons with the existing systems show that our proposed algorithm achieves the performance levels which are at par with the state of the art. These results also exhibit that our method is general in nature and because of this it performs well across the several domain of datasets. The key contribution of this work is the development of MODE-based generalized feature selection and ensemble learning techniques with the aim of extracting entities from the biomedical texts of several domains.

Keywords

Feature Selection Differential Evolution Pareto Front Conditional Random Field Classifier Ensemble 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Ali M, Pant M, Abraham A (2009) Simplex differential evolution. Acta Polytechnica Hungarica 6(5):95–115Google Scholar
  2. Anderson TW, Scolve S (1978) Introduction to the statistical analysis of data. Houghton Mifflin, BostonzbMATHGoogle Scholar
  3. Ando RK (2007) Biocreative ii gene mention tagging system at ibm watson. In: Proceedings of the second biocreative challenge evaluation workshop, Madrid, Spain, pp 101–103Google Scholar
  4. Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evolut Comput 12(3):269–283CrossRefGoogle Scholar
  5. Brest J, Mauec MS (2011) Self-adaptive differential evolution algorithm using population size reduction and three strategies. Soft Comput 15(11):2157–2174CrossRefGoogle Scholar
  6. Dasarathy BV, Sheela BV (1979) Composite classifier system design: concepts and methodology. Proc IEEE 67:708–713CrossRefGoogle Scholar
  7. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156CrossRefGoogle Scholar
  8. Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, EnglandzbMATHGoogle Scholar
  9. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6(2):181–197CrossRefGoogle Scholar
  10. Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, MCS’00. Springer, London, pp 1–15Google Scholar
  11. Ekbal A, Saha S (2012) Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition. IJDAR 15(2):143–166CrossRefGoogle Scholar
  12. Ekbal A, Saha S (2010a) Classifier ensemble selection using genetic algorithm for named entity recognition. Res Lang Comput 8(1):73–99Google Scholar
  13. Ekbal A, Saha S (2010b) Weighted vote based classifier ensemble selection using genetic algorithm for named entity recognition. In: 15th International conference on applications of natural language to information systems (NLDB 2010), pp 256–267Google Scholar
  14. Ekbal A, Saha S (2010c) Weighted vote based classifier ensemble selection using genetic algorithm for named entity recognition. In: Proceedings of the natural language processing and information systems, and 15th international conference on applications of natural language to information systems, NLDB’10, pp 256–267Google Scholar
  15. Ekbal A, Saha S (2011a) A multiobjective simulated annealing approach for classifier ensemble: named entity recognition in indian languages as case studies. Expert Syst Appl 38(12):14760–14772Google Scholar
  16. Ekbal A, Saha S (2011b) Weighted vote-based classifier ensemble for named entity recognition: a genetic algorithm-based approach. ACM Trans Asian Lang Inf Process 10(2):1–37Google Scholar
  17. El-Hefnawy NA (2014) Solving bi-level problems using modified particle swarm optimization algorithm. Int J Artif Intell 12(2):88–101Google Scholar
  18. Finkel J, Dingare S, Nguyen H, Nissim M, Sinclair G, Manning C (2004) Exploiting context for biomedical entity recognition: from syntax to the web. In: Proceedings of the joint workshop on natural language processing in biomedicine and its applications (JNLPBA-2004), pp 88–91Google Scholar
  19. Gmperle R, Mller SD, Koumoutsakos P (2002) A parameter study for differential evolution. In: WSEAS international conference on advances in intelligent systems, fuzzy systems, evolutionary computation, pp 293–298Google Scholar
  20. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New YorkzbMATHGoogle Scholar
  21. GuoDong Z, Jian S (2004) Exploring deep knowledge resources in biomedical name recognition. In: JNLPBA ’04: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications, pp 96–99Google Scholar
  22. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182zbMATHGoogle Scholar
  23. Heidl W, Thumfart S, Lughofer E, Eitzinger C, Klement EP (2013) Machine learning based analysis of gender differences in visual inspection decision making. Inf Sci 224:62–76MathSciNetCrossRefGoogle Scholar
  24. Huang H, Lin Y, Lin K, Kuo C, Chang Y, Yang B, Chung I, Hsu C (2007) High-recall gene mention recognition by unification of multiple backward parsing models. In: Proceedings of the second biocreative challenge evaluation workshop, Madrid, Spain, pp 109–111Google Scholar
  25. Jin-Dong K, Tomoko O, Tsuruoka Y et al (2004) Introduction to the bio-entity recognition task at jnlpba. In: JNLPBA ’04: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 70–75Google Scholar
  26. Kim S, Yoon J, Park KM, Rim HC (2005) Two-phase biomedical named entity recognition using a hybrid method. In: IJCNLP, pp 646–657Google Scholar
  27. Kuo C, Chang Y, Huang H, Lin K, Yang B, Lin Y, Hsu C, Chung I (2007) Rich feature set, unification of bidirectional parsing and dictionary filtering for high f-score gene mention tagging. In: Proceedings of the second biocreative challenge evaluation workshop, Madrid, Spain, pp 105–107Google Scholar
  28. Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp 282–289Google Scholar
  29. Li L, Fan W, Huang D, Dang Y, Sun J (2012) Boosting performance of gene mention tagging system by hybrid methods. J Biomed Inform 45(1):156–164CrossRefGoogle Scholar
  30. Li L, Sun J, Huang D (2010) Boosting performance of gene mention tagging system by classifiers ensemble. In: Natural language processing and knowledge engineering (NLP-KE)Google Scholar
  31. Oliveira LS, Benahmed N, Sabourin R, Bortolozzi F, Suen CY (2001) Feature subset selection using genetic algorithms for handwritten digit recognition. In: Proceedings of 14th Brazilian symposium on computer graphics and image processing, Florianopolis, Oct 2001, IEEE, pp 362–369Google Scholar
  32. Park KM, Kim SH, Rim HC, Hwang YS (2004) Me-based biomedical named entity recognition using lexical knowledge. ACM Trans Asian Lang Inf Process 5:4–21CrossRefGoogle Scholar
  33. Ponomareva N, Pla F, Molina A, Rosso P (2007) Biomedical named entity recognition: a poor knowledge hmm-based approach. In: NLDB, pp 382–387Google Scholar
  34. Preitl S, Precup RE (2006) Iterative feedback tuning in fuzzy control systems. Theory and applications. Acta Polytech Hung 3(3):81–96Google Scholar
  35. Saha SK, Sarkar S, Mitra P (2009) Feature selection techniques for maximum entropy based biomedical named entity recognition. J Biomed Inform 42(5):905–911Google Scholar
  36. Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: JNLPBA ’04: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 104–107Google Scholar
  37. Sikdar UK, Ekbal A, Saha S (2012) Differential evolution based feature selection and classifier ensemble for named entity recognition. In: COLING, pp 2475–2490Google Scholar
  38. Smith L, Tanabe L, Ando R, Kuo CJ, Chung IF, Hsu CN, Lin YS, Klinger R, Friedrich C, Ganchev K, Torii M, Liu H, Haddow B, Struble C, Povinelli R, Vlachos A, Baumgartner W, Hunter L, Carpenter B, Tsai R, Dai HJ, Liu F, Chen Y, Sun C, Katrenko S, Adriaans P, Blaschke C, Torres R, Neves M, Nakov P, Divoli A, Lopez MM, Mata J, Wilbur WJ (2008) Overview of biocreative II gene mention recognition. Genome Biol 9(Suppl 2)Google Scholar
  39. Song Y, Kim E, Lee GG, Yi B(2004) Posbiotm-ner in the shared task of bionlp/nlpba 2004. In. In Proceedings of the joint workshop on natural language processing in biomedicine and its applications (JNLPBA-2004)Google Scholar
  40. Storn R, Price K (1997) Differential evolution a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359zbMATHMathSciNetCrossRefGoogle Scholar
  41. Victor O, Tiwari A, Roy R (2005) Evolutionary computing in manufacturing industry: an overview of recent applications. Appl Soft Comput 5(3):181–299Google Scholar
  42. Wang H, Zhao T, Tan H, Zhang S (2008) Biomedical named entity recognition based on classifiers ensemble. Int J Comput Sci Appl 5:1–11zbMATHCrossRefGoogle Scholar
  43. Yang J, Honavar VG (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst 13(2):44–49CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Utpal Kumar Sikdar
    • 1
  • Asif Ekbal
    • 1
    Email author
  • Sriparna Saha
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology PatnaPatnaIndia

Personalised recommendations