Advertisement

Discernible neighborhood counting based incremental feature selection for heterogeneous data

  • Yanyan YangEmail author
  • Shiji Song
  • Degang Chen
  • Xiao Zhang
Original Article
  • 13 Downloads

Abstract

Incremental feature selection refreshes a subset of information-rich features from added-in samples without forgetting the previously learned knowledge. However, most existing algorithms for incremental feature selection have no explicit mechanisms to handle heterogeneous data with symbolic and real-valued features. Therefore, this paper presents an incremental feature selection method for heterogeneous data with the sequential arrival of samples in group. Discernible neighborhood counting that measures different types of features, is first introduced to establish a framework for feature selection from heterogeneous data. With the arrival of new samples, the discernible neighborhood counting of a feature subset is then updated to reveal the incremental feature selection scheme. This scheme determines the criterion for efficiently adding informative features and deleting redundant features. Based on the incremental scheme, our incremental feature selection algorithm is further formulated to select valuable features from heterogeneous data. Extensive experiments are finally conducted to demonstrate the effectiveness and the efficiency of the proposed incremental feature selection algorithm.

Keywords

Incremental feature selection Feature selection Neighborhood rough set Heterogeneous data 

Notes

Acknowledgements

The paper is supported by the National Key R&D Program of China under Grant no. 2016YFB1200203, the National Natural Science Foundation of China under Grant nos. 61806108, 71471060 and 61602372, and the Project funded by China Postdoctoral Science Foundation under Grant no. 2018M631475.

References

  1. 1.
    Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9):597–618MathSciNetzbMATHGoogle Scholar
  2. 2.
    Zhang X, Mei CL, Chen DG, Yang YY (2018) A fuzzy rough set-based feature selection method using representative instances. Knowl Based Syst 151:216–229Google Scholar
  3. 3.
    Bell DA, Wang H (2000) A formalism for relevance and its application in feature subset selection. Mach Learn 41(2):175–195zbMATHGoogle Scholar
  4. 4.
    Zeng AP, Li TR, Liu D, Zhang JB, Chen HM (2015) A fuzzy rough set approach for incremental feature selection on hybrid information systems. Fuzzy Sets Syst 258:39–60MathSciNetzbMATHGoogle Scholar
  5. 5.
    Zhang JB, Zhu Y, Pan Y, Li TR (2016) Efficient parallel boolean matrix based algorithms for computing composite rough set approximations. Inf Sci 329:287–302zbMATHGoogle Scholar
  6. 6.
    Zhang JB, Li TR, Chen HM (2014) Composite rough sets for dynamic data mining. Inf Sci 257:81–100MathSciNetzbMATHGoogle Scholar
  7. 7.
    Tang WY, Mao KZ (2007) Feature selection algorithm for mixed data with both nominal and continuous features. Pattern Recogn Lett 28(5):563–571MathSciNetGoogle Scholar
  8. 8.
    Ching JY, Wong AKC, Chan KCC (1995) Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans Pattern Anal Mach Intell 17(7):641–651Google Scholar
  9. 9.
    Chmielewski MR, Grzymala-Busse JW (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15(4):319–331zbMATHGoogle Scholar
  10. 10.
    Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106Google Scholar
  11. 11.
    Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594MathSciNetzbMATHGoogle Scholar
  12. 12.
    Chen DG, Yang YY (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334MathSciNetGoogle Scholar
  13. 13.
    Zhang X, Mei CL, Chen DG, Li JH (2016) Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy. Pattern Recogn 56:1–15zbMATHGoogle Scholar
  14. 14.
    Wang CZ, Hu QH, Wang XZ, Chen DG, Qian YH, Dong Z (2018) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999MathSciNetGoogle Scholar
  15. 15.
    Wang CZ, He Q, Shao MW, Hu QH (2018) Feature selection based on maximal neighborhood discernibility. Int J Mach Learn Cybern 9(11):1929–1940Google Scholar
  16. 16.
    Wu Y, Hoi SCH, Mei T, Yu NH (2017) Large-scale online feature selection for ultra-high dimensional sparse data. ACM Trans Knowl Discov Data 11(4):1–13Google Scholar
  17. 17.
    Luo C, Li TR, Chen HM, Fujita H, Zhang Y (2018) Incremental rough set approach for hierarchical multicriteria classification. Inf Sci 429:72–87MathSciNetGoogle Scholar
  18. 18.
    Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411Google Scholar
  19. 19.
    Xie XJ, Qin XL (2018) A novel incremental attribute reduction approach for dynamic incomplete decision systems. Int J Approx Reason 93:443–462MathSciNetzbMATHGoogle Scholar
  20. 20.
    Huang YY, Li TR, Luo C, Fujita H, Horng SJ (2017) Matrix-based dynamic updating rough fuzzy approximations for data mining. Knowl Based Syst 119(C):273–283Google Scholar
  21. 21.
    Hu CX, Liu SX, Liu GX (2017) Matrix-based approaches for dynamic updating approximations in multigranulation rough sets. Knowl Based Syst 122:51–63Google Scholar
  22. 22.
    Zhang YY, Li TR, Luo C, Zhang JB, Chen HM (2016) Incremental updating of rough approximations in interval-valued information systems under attribute generalization. Inf Sci 373:461–475Google Scholar
  23. 23.
    Hu J, Li TR, Luo C, Fujita H, Li SY (2016) Incremental fuzzy probabilistic rough sets over two universes. Int J Approx Reason 81:28–48MathSciNetzbMATHGoogle Scholar
  24. 24.
    Luo C, Li TR, Chen HM, Fujita H, Zhang Y (2016) Efficient updating of probabilistic approximations with incremental objects. Knowl Based Syst 109:71–83Google Scholar
  25. 25.
    Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2014) A rough set-based method for updating decision rules on attribute values’ coarsening and refining. IEEE Trans Knowl Data Eng 6(12):2886–2899Google Scholar
  26. 26.
    Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2015) A decision-theoretic rough set approach for dynamic data mining. IEEE Trans Fuzzy Syst 23(6):1958–1970Google Scholar
  27. 27.
    Orlowska ME, Orlowski MW (1992) Maintenance of knowledge in dynamic information systems. Springer, Dordrecht, pp 315–329Google Scholar
  28. 28.
    Hu F, Wang GY, Huang H, Wu Y (2005) Incremental attribute reduction based on elementary sets. In: Slezak D, Wang G, Szczuka M, Duntsch I, Yao Y (eds) International conference on rough sets, fuzzy sets, data mining, and granular computing. Springer, Berlin, Heidelberg, pp 185–193Google Scholar
  29. 29.
    Hu F, Dai J, Wang GY (2007) Incremental algorithms for attribute reduction in decision table. Control Decis 22(3):268–272zbMATHGoogle Scholar
  30. 30.
    Yang M (2007) An incremental updating algorithm for attribute reduction based on improved discernibility matrix. Chin J Comput 30(5):815–822MathSciNetGoogle Scholar
  31. 31.
    Feng SR, Zhang DZ (2012) Increment algorithm for attribute reduction based on improvement of discernibility matrix. J Shenzhen Univ Sci Eng 29:5zbMATHGoogle Scholar
  32. 32.
    Shu WH, Shen H (2013) A rough-set based incremental approach for updating attribute reduction under dynamic incomplete decision systems. In: IEEE international conference on fuzzy systems. IEEE, Hyderabad, pp 1–7Google Scholar
  33. 33.
    Liang JY, Wang F, Dang CY, Qian YH (2013) A group incremental approach to feature selection applying rough set technique. IEEE Trans Knowl Data Eng 26(2):294–308Google Scholar
  34. 34.
    Chen DG, Yang YY, Dong Z (2016) An incremental algorithm for attribute reduction with variable precision rough sets. Appl Soft Comput 45:129–149Google Scholar
  35. 35.
    Yang YY, Chen DG, Wang H, Tsang ECC, Zhang DL (2016) Fuzzy rough set based incremental attribute reduction from dynamic data with sample arriving. Fuzzy Sets Syst 312:66–86MathSciNetzbMATHGoogle Scholar
  36. 36.
    Yang YY, Chen DG, Wang H, Wang XZ (2018) Incremental perspective for feature selection based on fuzzy rough sets. IEEE Trans Fuzzy Syst 26(3):1257–1273Google Scholar
  37. 37.
    Yang YY, Chen DG, Wang H (2017) Active sample selection based incremental algorithm for attribute reduction with rough sets. IEEE Trans Fuzzy Syst 25(4):825–838Google Scholar
  38. 38.
    Lang GM, Li QG, Cai MJ, Yang T (2015) Characteristic matrixes-based knowledge reduction in dynamic covering decision information systems. Knowl Based Syst 85(C):1–26Google Scholar
  39. 39.
    Jing YG, Li TR, Luo C, Horng SJ, Wang GY, Yu Z (2016) An incremental approach for attribute reduction based on knowledge granularity. Knowl Based Syst 104(C):24–38Google Scholar
  40. 40.
    Jing YG, Li TR, Fujita H, Yu Z, Wang B (2017) An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view. Inf Sci 411:23–38MathSciNetGoogle Scholar
  41. 41.
    Wang H (2006) Nearest neighbors by neighborhood counting. IEEE Trans Pattern Anal Mach Intell 28(6):942–953Google Scholar
  42. 42.
    Wu WZ, Zhang WX (2002) Neighborhood operator systems and approximations. Inf Sci 144(1):201–217MathSciNetzbMATHGoogle Scholar
  43. 43.
    Zhu PF, Hu QH (2013) Adaptive neighborhood granularity selection and combination based on margin distribution optimization. Inf Sci 249:1–12MathSciNetzbMATHGoogle Scholar
  44. 44.
    Wang CZ, Shi YP, Fan XD, Shao MW (2019) Attribute reduction based on k-nearest neighborhood rough sets. Int J Approx Reason 106:18–31MathSciNetzbMATHGoogle Scholar
  45. 45.
    Wang CZ, Shao MW, He Q, Qian YH, Qi YL (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl Based Syst 111:173–179Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Yanyan Yang
    • 1
    Email author
  • Shiji Song
    • 1
  • Degang Chen
    • 2
  • Xiao Zhang
    • 3
  1. 1.Department of AutomationTsinghua UniversityBeijingPeople’s Republic of China
  2. 2.Department of Mathematics and PhysicsNorth China Electric Power UniversityBeijingPeople’s Republic of China
  3. 3.Department of Applied Mathematics, School of SciencesXi’an University of TechnologyXi’anPeople’s Republic of China

Personalised recommendations