Advertisement

Versatile Decision Trees for Learning Over Multiple Contexts

  • Reem Al-OtaibiEmail author
  • Ricardo B. C. Prudêncio
  • Meelis Kull
  • Peter Flach
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9284)

Abstract

Discriminative models for classification assume that training and deployment data are drawn from the same distribution. The performance of these models can vary significantly when they are learned and deployed in different contexts with different data distributions. In the literature, this phenomenon is called dataset shift. In this paper, we address several important issues in the dataset shift problem. First, how can we automatically detect that there is a significant difference between training and deployment data to take action or adjust the model appropriately? Secondly, different shifts can occur in real applications (e.g., linear and non-linear), which require the use of diverse solutions. Thirdly, how should we combine the original model of the training data with other models to achieve better performance? This work offers two main contributions towards these issues. We propose a Versatile Model that is rich enough to handle different kinds of shift without making strong assumptions such as linearity, and furthermore does not require labelled data to identify the data shift at deployment. Empirical results on both synthetic shift and real datasets shift show strong performance gains by achieved the proposed model.

Keywords

Versatile model Decision Trees Dataset shift   Percentile Kolmogorov-Smirnov test 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahmed, C.F., Lachiche, N., Charnay, C., Braud, A.: Reframing continuous input attributes. In: 2014 IEEE 26th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 31–38, November 2014Google Scholar
  2. 2.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic and Soft Computing 17(2–3), 255–287 (2011)Google Scholar
  3. 3.
    Bickel, S., Brückner, M., Scheffer, T.: Discriminative learning under covariate shift. Journal of Machine Learning Research 10, 2137–2155 (2009)zbMATHGoogle Scholar
  4. 4.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)zbMATHGoogle Scholar
  5. 5.
    Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973–978 (2001)Google Scholar
  6. 6.
    Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., Schölkopf, B.: Covariate shift by kernel mean matching. In: Quińonero-Candela, J., Masashi Sugiyama, A.S., Lawrence, N.D. (eds.) Dataset Shift in Machine Learning, pp. 131–160. MIT Press (2009)Google Scholar
  7. 7.
    Kull, M., Flach, P.: Patterns of dataset shift. In: First International Workshop on Learning over Multiple Contexts (LMCE) at ECML-PKDD 2014, Nancy, France, September 2014Google Scholar
  8. 8.
    Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
  9. 9.
    Moreno-Torres, J.G., Llorí, X., Goldberg, D.E., Bhargava, R.: Repairing fractures between data using genetic programming-based feature extraction: A case study in cancer diagnosis. Inf. Sci. 222, 805–823 (2013)CrossRefGoogle Scholar
  10. 10.
    Moreno-Torres, J.G., Raeder, T., Alaiz-Rodríguez, R., Chawla, N.V., Herrera, F.: A unifying view on dataset shift in classification. Pattern Recognition 45(1), 521–530 (2012)CrossRefGoogle Scholar
  11. 11.
    Moreno-Torres, J.G., Raeder, T., Aláiz-Rodríguez, R., Chawla, N.V., Herrera, F.: Tackling dataset shift in classification: Benchmarks and methods. http://sci2s.ugr.es/dataset-shift (accessed: March 30, 2015)
  12. 12.
    Storkey, A.J.: When training and test sets are different: characterising learning transfer. In: Quińonero-Candela, J., Masashi Sugiyama, A.S., Lawrence, N.D. (eds.) Dataset Shift in Machine Learning, chap. 1, pp. 3–28. MIT Press (2009)Google Scholar
  13. 13.
    Strack, B., DeShazo, J.P., Gennings, C., Olmo, J.L., Ventura, S., Cios, K.J., Clore, J.N.: Impact of hba1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed Research International 2014, 781670 (2014). http://europepmc.org/articles/PMC3996476
  14. 14.
    Sugiyama, M., Krauledat, M., Müller, K.R.: Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research 8, 985–1005 (2007)zbMATHGoogle Scholar
  15. 15.
    Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: International Conference on Machine Learning ICML 2004, pp. 903–910 (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Reem Al-Otaibi
    • 1
    • 2
    Email author
  • Ricardo B. C. Prudêncio
    • 3
  • Meelis Kull
    • 1
  • Peter Flach
    • 1
  1. 1.Intelligent System Laboratory, Computer ScienceUniversity of BristolBristolUK
  2. 2.King Abdulaziz UniversityJeddahSaudi Arabia
  3. 3.Informatics CenterUniversidade Federal de PernambucoRecifeBrazil

Personalised recommendations