Pattern-Based Causal Feature Extraction

  • Diogo Moitinho de Almeida
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


This cause-effect pairs challenge was motivated by the contrast between the costs of performing controlled experiments in order to determine causality and the abundance of observational data. Our goal was to provide a value representing our confidence of causality determined by the observation data which would help identify the most promising variables for experimental verification of their causal relationship. By identifying patterns in functions that generate relevant features, a feature extraction pipeline was architected to allow for the creation of large amounts of complex features with minimal human intervention. Using this pipeline, we were able to finish second in the public leaderboard and first in the private leaderboard. Furthermore, this process by default generates over 20,000 features. In this paper, we analyze which aspects are most important, and create a new pipeline that gets comparable performance with only 324 features.


Feature extraction Machine learning Causality 



Special thanks to the organizers of the ChaLearn Cause-Effect Pair Challenge hosted by Kaggle.


  1. 1.
    Causality Workbench causality challenge #3: Cause-effect pairs - help. Accessed: 2013.
  2. 2.
    Cause-Effect Pairs, howpublished =, note = Accessed: 2013.
  3. 3.
    Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.Google Scholar
  4. 4.
    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.Google Scholar
  5. 5.
    Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Diogo Moitinho de Almeida
    • 1
  1. 1.GoogleMenlo ParkUSA

Personalised recommendations