Advertisement

Chance influence in datasets with a large number of features

  • Abdel Aziz Taha
  • Alexandros Bampoulidis
  • Mihai Lupu
Conference paper
  • 1.2k Downloads

Abstract

Machine learning research, e.g. genomics research, is often based on sparse datasets that have very large numbers of features, but small samples sizes. Such configuration promotes the influence of chance on the learning process as well as on the evaluation. Prior research underlined the problem of generalization of models obtained based on such data. In this paper, we deeply investigate the influence of chance on classification and regression. We empirically show how considerable the influence of chance such datasets is. This brings the conclusions drawn based on them into question. We relate the observations of chance correlation to the problem of method generalization. Finally, we provide a discussion of chance correlation and guidelines that mitigate the influence of chance.

Index Terms

Chance correlation Generalization Reproducibiliy sparse data Genomics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature 2019

Authors and Affiliations

  • Abdel Aziz Taha
    • 1
  • Alexandros Bampoulidis
    • 1
  • Mihai Lupu
    • 1
  1. 1.Research Studios AustriaViennaAustria

Personalised recommendations