Similarity Aggregation a New Version of Rank Aggregation Applied to Credit Scoring Case

  • Waad Bouaguel
  • Ghazi Bel Mufti
  • Mohamed Limam
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8284)

Abstract

Credit scoring is one of the most challenging research topics that have been a source of many innovative works in banking field. Choosing the appropriate set of features is one of the most interesting and difficult tasks that have a key effect on the performance of credit scoring models. With the huge amount of feature selection techniques and specially ranking techniques for feature selection, rank aggregation techniques become indispensable tools for fusing individual ranked lists into a single consensus list with better performance. However, in some cases the obtained ranking may be noisy or incomplete witch lead to an unsatisfactory final rank. We investigate on this issue by proposing a similarity based algorithm that extends two standard methods of rank aggregation namely majority vote and mean aggregation based on the similarity between the features in the dataset. Evaluations on four credit datasets show that feature subsets selected by the aggregation based similarity technique give superior results to those selected by individual filters and the standard aggregation techniques.

Keywords

Feature selection filter mutual information 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Piramuthu, S.: On preprocessing data for financial credit risk evaluation. Expert Syst. Appl. 30, 489–497 (2006)CrossRefGoogle Scholar
  2. 2.
    Liu, Y., Schumann, M.: Data mining feature selection for credit scoring models. Journal of the Operational Research Society 56, 1099–1108 (2005)CrossRefMATHGoogle Scholar
  3. 3.
    Howley, T., Madden, M.G., O’Connell, M.L., Ryder, A.G.: The effect of principal component analysis on machine learning accuracy with high-dimensional spectral data. Knowl.-Based Syst. 19, 363–370 (2006)CrossRefGoogle Scholar
  4. 4.
    Forman, G.: BNS feature scaling: an improved representation over tf-idf for svm text classification. In: CIKM 2008: Proceedings of the 17th ACM Conference on Information and Knowledge Mining, pp. 263–270. ACM, New York (2008)CrossRefGoogle Scholar
  5. 5.
    Wu, O., Zuo, H., Zhu, M., Hu, W., Gao, J., Wang, H.: Rank aggregation based text feature selection. In: Web Intelligence, pp. 165–172 (2009)Google Scholar
  6. 6.
    Wang, C.M., Huang, W.F.: Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data. Expert Syst. Appl. 36, 5900–5908 (2009)CrossRefGoogle Scholar
  7. 7.
    Bouaguel, W., Bel Mufti, G.: An improvement direction for filter selection techniques using information theory measures and quadratic optimization. International Journal of Advanced Research in Artificial Intelligence 1, 7–11 (2012) Google Scholar
  8. 8.
    Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Schowe, B., Morik, K.: Fast-ensembles of minimum redundancy feature selection. In: Okun, O., Valentini, G., Re, M. (eds.) Ensembles in Machine Learning Applications. SCI, vol. 373, pp. 75–95. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Kira, K., Rendell, L.: A practical approach to feature selection. In: Sleeman, D., Edwards, P. (eds.) International Conference on Machine Learning, pp. 368–377 (1992)Google Scholar
  12. 12.
    Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann (2000)Google Scholar
  13. 13.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc. (1993)Google Scholar
  14. 14.
    Bouckaert, R.R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., Scuse, D.: Weka manual (3.7.1) (June 2009)Google Scholar
  15. 15.
    Kuncheva, L.I., Bezdek, J.C., Duin, P.W.: Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognition 34, 299–314 (2001)CrossRefMATHGoogle Scholar
  16. 16.
    Guldogan, E., Gabbouj, M.: Feature selection for content-based image retrieval. In: Signal, Image and Video Processing, pp. 241–250 (2008)Google Scholar
  17. 17.
    Wald, R., Khoshgoftaar, T.M., Dittman, D.J.: Mean aggregation versus robust rank aggregation for ensemble gene selection. ICMLA (1), 63–69 (2012)Google Scholar
  18. 18.
    Okun, O.: Feature Selection and Ensemble Methods for Bioinformatics: Algorithmic Classification and Implementations. In (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Waad Bouaguel
    • 1
  • Ghazi Bel Mufti
    • 2
  • Mohamed Limam
    • 1
    • 3
  1. 1.LARODEC, ISGUniversity of TunisTunisia
  2. 2.LARIME, ESSECUniversity of TunisTunisia
  3. 3.Dhofar UniversityOman

Personalised recommendations