Skip to main content

PAKDD Data Mining Competition 2009: New Ways of Using Known Methods

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5669))

Abstract

The PAKDD 2009 competition focuses on the problem of credit risk assessment. As required, we had to confront the problem of the robustness of the credit-scoring model against performance degradation caused by gradual market changes along a few years of business operation. We utilized the following standard models: logistic regression, KNN, SVM, GBM and decision tree. The novelty of our approach is two-fold: the integration of existing models, namely feeding the results of KNN as an input variable to the logistic regression, and re-coding categorical variables as numerical values that represent each category’s statistical impact on the target label. The best solution we obtained reached 3rd place in the competition, with an AUC score of 0.655.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. PAKDD data mining competition 2009, Credit risk assessment on a private label credit card application (2009), http://sede.neurotech.com.br/PAKDD2009

  2. Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69(1), 138–147 (2001)

    Article  Google Scholar 

  3. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2009), http://www.R-project.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Linhart, C., Harari, G., Abramovich, S., Buchris, A. (2010). PAKDD Data Mining Competition 2009: New Ways of Using Known Methods. In: Theeramunkong, T., et al. New Frontiers in Applied Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14640-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14640-4_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14639-8

  • Online ISBN: 978-3-642-14640-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics