Advertisement

Soft Computing

, Volume 23, Issue 24, pp 13105–13126 | Cite as

Feature–granularity selection with variable costs for hybrid data

  • Shujiao LiaoEmail author
  • Qingxin Zhu
  • Yuhua Qian
Methodologies and Application
  • 48 Downloads

Abstract

In recent years, cost-sensitive feature selection has drawn much attention. However, some issues still remain to be investigated. Particularly, most existing work deals with single-typed data, while only a few studies deal with hybrid data; moreover, both the test cost of a feature and the misclassification cost of an object are often assumed to be fixed, but in fact they are usually variable with the error range of the data, or equivalently the data granularity. In view of these facts, a feature–granularity selection approach is proposed to select the optimal feature subset and the optimal data granularity simultaneously to minimize the total cost for processing hybrid data. In the approach, firstly an adaptive neighborhood model is constructed, in which the neighborhood granules are generated adaptively according to the types of features. Then, multiple kinds of variable cost setting are discussed according to reality, and finally, an optimal feature–granularity selection algorithm is designed. Experimental results on sixteen UCI datasets show that a good trade-off among feature dimension reduction, data granularity selection and total cost minimization could be achieved by the proposed algorithm. In particular, the influences of different cost settings to the feature–granularity selection are also discussed thoroughly in the paper, which would provide some feasible schemes for decision making.

Keywords

Adaptive neighborhood Feature–granularity selection Hybrid data Measurement errors Variable costs 

Notes

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61603173 and 11871259, the Natural Science Foundation of Fujian Province, China, under Grant No. 2017J01771, the Institute of Meteorological Big Data-Digital Fujian and Fujian Key Laboratory of Data Science and Statistics.

Compliance with ethical standards

Funding

This study was funded by the National Natural Science Foundation of China under Grant Nos. 61603173 and 11871259, the Natural Science Foundation of Fujian Province, China, under Grant No. 2017J01771, the Institute of Meteorological Big Data-Digital Fujian and Fujian Key Laboratory of Data Science and Statistics.

Conflict of interest

Author Shujiao Liao declares that she has no conflict of interest. Author Qingxin Zhu declares that he has no conflict of interest. Author Yuhua Qian declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Ansorge S, Schmidt J (2015) Visualized episode mining with feature granularity selection. In: Industrial conference on data mining. Springer, Cham, pp 201–215Google Scholar
  2. Bian J, Peng XG, Wang Y, Zhang H (2016) An efficient cost-sensitive feature selection using chaos genetic algorithm for class imbalance problem. Math Probl Eng 2016:1–9Google Scholar
  3. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/mlrepository.html
  4. Boussouf M, Quafafou M (2000) Scalable feature selection using rough set theory. In: Proceedings of rough sets and current trends in computing, vol. 2005. LNCS, pp 131–138Google Scholar
  5. Cao P, Zhao DZ, Zaiane O (2013) An optimized cost-sensitive SVM for imbalanced data learning. In: Advances in knowledge discovery and data mining, vol 7819. LNCS, pp 280–292Google Scholar
  6. Chai XY, Deng L, Yang Q, Ling CX (2004) Test-cost sensitive Naïve Bayes classification. In: Proceedings of the 5th international conference on data mining, pp 51–58Google Scholar
  7. Chen DG, Yang YY (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334MathSciNetCrossRefGoogle Scholar
  8. Dai JH, Wang WT, Xu Q, Tian HW (2012) Uncertainty measurement for interval-valued decision systems based on extended conditional entropy. Knowl Based Syst 27:443–450CrossRefGoogle Scholar
  9. Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151:155–176MathSciNetCrossRefGoogle Scholar
  10. Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the 5th international conference on knowledge discovery and data mining, pp 155–164Google Scholar
  11. Doquire G, Verleysen M (2011) An hybrid approach to feature selection for mixed categorical and continuous data. In: Proceedings of the international conference on knowledge discovery and information retrieval, pp 394–401Google Scholar
  12. Du J, Cai ZH, Ling CX (2007) Cost-sensitive decision trees with pre-pruning. In: Proceedings of Canadian AI, No. 4509. LNAI, pp 171–179Google Scholar
  13. Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans R Soc Lond Ser A Contain Pap Math Phys Charact 222:309–368CrossRefGoogle Scholar
  14. Greiner R, Grove AJ, Roth D (2002) Learning cost-sensitive active classifiers. Artif Intell 139(2):137–174MathSciNetCrossRefGoogle Scholar
  15. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182zbMATHGoogle Scholar
  16. Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594MathSciNetCrossRefGoogle Scholar
  17. Hu QH, Pedrycz W, Yu DR, Lang J (2010) Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans Syst Man Cybern Part B Cybern 40(1):137–150CrossRefGoogle Scholar
  18. Huang TY, Zhu W (2017) Cost-sensitive feature selection via manifold learning. J Shandong Univ 52(3):91–96zbMATHGoogle Scholar
  19. Iswandy K, Koenig A (2006) Feature selection with acquisition cost for optimizing sensor system design. Adv Radio Sci 4:135–141CrossRefGoogle Scholar
  20. Jia XY, Liao WH, Tang ZM, Shang L (2013) Minimum cost attribute reduction in decision-theoretic rough set models. Inf Sci 219:151–167MathSciNetCrossRefGoogle Scholar
  21. Kannan SS, Ramaraj N (2010) A novel hybrid feature selection via symmetrical uncertainty ranking based local memetic search algorithm. Knowl Based Syst 23:580–585CrossRefGoogle Scholar
  22. Liang JY, Wang F, Dang CY, Qian YH (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53:912–926MathSciNetCrossRefGoogle Scholar
  23. Liao SJ, Zhu QX, Min F (2014) Cost-sensitive attribute reduction in decision-theoretic rough set models. Math Probl Eng 2014:1–9MathSciNetzbMATHGoogle Scholar
  24. Liao SJ, Zhu QX, Liang R (2017) An efficient approach of test-cost-sensitive attribute reduction for numerical data. Int J Innov Comput Inf Control 13(6):2099–2111Google Scholar
  25. Liao SJ, Zhu QX, Qian YH, Lin GP (2018) Multi-granularity feature selection on cost-sensitive data with measurement errors and variable costs. Knowl Based Syst 158:25–42CrossRefGoogle Scholar
  26. Liu GL, Sai Y (2009) A comparison of two types of rough sets induced by coverings. Int J Approx Reason 50(3):521–528MathSciNetCrossRefGoogle Scholar
  27. Luo C, Li TR, Chen HM, Lu LX (2015) Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values. Inf Sci 299:221–242MathSciNetCrossRefGoogle Scholar
  28. Min F, He HP, Qian YH, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181:4928–4942CrossRefGoogle Scholar
  29. Min F, Hu QH, Zhu W (2014) Feature selection with test cost constraint. Int J Approx Reason 55:167–179MathSciNetCrossRefGoogle Scholar
  30. Pendharkar PC (2013) A maximum-margin genetic algorithm for misclassification cost minimizing feature selection problem. Expert Syst Appl 40(10):3918–3925CrossRefGoogle Scholar
  31. Shu WH, Shen H (2016) Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recogn 51:268–280CrossRefGoogle Scholar
  32. Turney PD (2000) Types of cost in inductive concept learning. In: Proceedings of the workshop on cost-sensitive learning at the 17th ICML, pp 1–7Google Scholar
  33. Wang T, Qin ZX, Jin Z, Zhang S (2010) Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning. J Syst Softw 83(7):1137–1147CrossRefGoogle Scholar
  34. Weiss Y, Elovici Y, Rokach L (2013) The cash algorithm-cost-sensitive attribute selection using histograms. Inf Sci 222:247–268MathSciNetCrossRefGoogle Scholar
  35. Yao YY (2004) A partition model of granular computing. Lect Notes Comput Sci 3100:232–253CrossRefGoogle Scholar
  36. Yao YY, Zhao Y (2008) Attribute reduction in decision-theoretic rough set models. Inf Sci 178(17):3356–3373MathSciNetCrossRefGoogle Scholar
  37. Yu SL, Zhao H (2018) Rough sets and Laplacian score based cost sensitive feature selection. PLoS ONE 13(6):1–23Google Scholar
  38. Zhang SC, Liu L, Zhu XF, Zhang C (2008) A strategy for attributes selection in cost-sensitive decision trees induction. In: IEEE 8th international conference on computer and information technology workshops, Sydney, QLD, pp 8–13Google Scholar
  39. Zhang Y, Gong DW, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinform 14(1):64–75CrossRefGoogle Scholar
  40. Zhao H, Yu SL (2019) Cost-sensitive feature selection via the \(l_{2,1}\)-norm. Int J Approx Reason 104:25–37Google Scholar
  41. Zhao H, Zhu W (2014) Optimal cost-sensitive granularization based on rough sets for variable costs. Knowl Based Syst 65:72–82CrossRefGoogle Scholar
  42. Zhao H, Min F, Zhu W (2013) Cost-sensitive feature selection of numeric data with measurement errors. J Appl Math 2013:1–13zbMATHGoogle Scholar
  43. Zhou YH, Zhou ZH (2016) Large margin distirbution learning with cost interval and unlabeled data. IEEE Trans Knowl Data Eng 28(7):1749–1763CrossRefGoogle Scholar
  44. Zhou QF, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl Based Syst 95:1–11CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Mathematics and StatisticsMinnan Normal UniversityZhangzhouChina
  2. 2.School of Information and Software EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina
  3. 3.Institute of Big Data Science and IndustryShanxi UniversityTaiyuanChina

Personalised recommendations