A New Hybrid Model of Feature Selection for Imbalanced Data
The study of customer identification has the extremely vital significance to promote the core competitiveness of the enterprise. This paper focus on the problem of feature selection in customer identification. We try to solve the issue of feature selection under class imbalance and a hybrid method is proposed. We improve the data cleaning technology Tomek Links and get a new model called I-Tomlinks. Based on the using of I-Tomlinks for data preprocessing, we combine the group method of data handling (GMDH) and transfer learning together to construct a new feature selection model to solve the problem of class imbalance. The experiments show that the new method gives better predictive performance that other methods used as benchmarks. The new model provides a new tool for customer identification.
KeywordsCustomer identification Feature selection Transfer learning I-Tomlinks GMDH
Unable to display preview. Download preview PDF.
- 1.Gupta S (2006) Modeling customer lifetime value. Journal of Service Research 9:139–155Google Scholar
- 2.He H, Garcia E (2009) Learning from imbalanced data. IEEE Transactions On Knowledge and Data Engineering 21:1263–1284Google Scholar
- 3.Pan S, Yang Q (2010) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22:1345–1359Google Scholar
- 4.Mueller J, Lemke F (2003) Self-organizing data mining: An intelligent approach to extract knowledge from data. Libri Books, BerlinGoogle Scholar
- 5.Tomek I (1976) Two modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics 6:769–772Google Scholar
- 6.Malthouse E (2002) Performance-based variable selection for scoring models. Journal of Interactive Marketing 16:37–50Google Scholar
- 7.Kim Y, Street W, Russell G et al (2006) Customer targeting: A neural network approach guided by genetic algorithms. Management Science 51:64–276Google Scholar