Training Gradient Boosting Machines Using Curve-Fitting and Information-Theoretic Features for Causal Direction Detection

Samothrakis, Spyridon; Perez, Diego; Lucas, Simon

doi:10.1007/978-3-030-21810-2_11

Spyridon Samothrakis⁷,
Diego Perez⁷ &
Simon Lucas^7,8

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

920 Accesses
3 Citations

Abstract

Detecting causal relationships between random variables using only matched pairs of noisy observations is a crucial problem in many scientific fields. In this paper the problem is addressed by extracting a number of features for each matched pair using a selection of curve-fitting and information theoretic features. Using these features, we train a pair of Gradient Boosting Machines whose hyperparameters we optimise using stochastic simultaneous optimistic optimisation. The results show that our method is relatively successful, gaining a third place in the 2013 Kaggle’s Causality Challenge. Our method is sound enough to be used in causality detection (or as part of a more comprehensive toolkit), although we believe it might be possible to considerably improve the quality of results by adding more features in the same vein.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035. Society for Industrial and Applied Mathematics, 2007.
Google Scholar
Andrea Falcon. Aristotle on causality. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Winter 2012 edition, 2012.
Google Scholar
James George Frazer. The Golden Bough: A Study in Magic and Religion. Vol. 13, Aftermath: a Supplement to the Golden Bough. Macmillan, 1936.
Google Scholar
Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189–1232, 2001.
Google Scholar
Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models. 2009.
Google Scholar
Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. Information-geometric approach to inferring causal directions. Artificial Intelligence, 182:1–31, 2012.
Article MathSciNet Google Scholar
Judea Pearl. Causality: models, reasoning and inference, volume 29. Cambridge Univ Press, 2000.
MATH Google Scholar
Shohei Shimizu, Patrik O Hoyer, Aapo Hyvärinen, and Antti Kerminen. A linear non-Gaussian acyclic model for causal discovery. The Journal of Machine Learning Research, 7:2003–2030, 2006.
Google Scholar
Michal Valko, Alexandra Carpentier, and Rémi Munos. Stochastic Simultaneous Optimistic Optimization. In 30th International Conference on Machine Learning, Atlanta, États-Unis, February 2013. URL http://hal.inria.fr/hal-00789606.

Download references

Acknowledgment

This work was supported by EPSRC grant EP/H048588/1 entitled: “UCT for Games and Beyond”.

Author information

Authors and Affiliations

University of Essex, Wivenhoe Park, Colchester, Essex, UK
Spyridon Samothrakis, Diego Perez & Simon Lucas
School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK
Simon Lucas

Authors

Spyridon Samothrakis
View author publications
You can also search for this author in PubMed Google Scholar
Diego Perez
View author publications
You can also search for this author in PubMed Google Scholar
Simon Lucas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Spyridon Samothrakis .

Editor information

Editors and Affiliations

Team TAU - CNRS, INRIA, Université Paris Sud, Université Paris Saclay, Orsay France, ChaLearn, Berkeley, CA, USA
Isabelle Guyon
SoFi, San Francisco, CA, USA
Alexander Statnikov
University of Paris-Sud, Paris-Saclay, Paris, France
Berna Bakir Batu

Appendix: Causality Challenge

Title: Training Gradient Boosting Machines using Curve-fitting and Information theoretic features for Causal Direction Detection.

Participant name, address, email and website: Spyridon Samothrakis, Diego Perez, https://github.com/ssamot/causality.

Task(s) solved: Kaggle Competition.

Reference: This paper.

Method: A combination of feature extraction from the sample data, Gradient boosting machines and StoSOO meta-optimisation.

Preprocessing: Exploit Symmetries.
Causal discovery: Gradient Boosting Machine, Curve fitting/Information theoretic features.
Feature selection: Feature Ranking.
Classification: Gradient Boosting Machine
Model selection/hyperparameter selection: Cross-validation, Stochastic Simultaneous Optimistic Optimisation.

Results (Table 11.1 ):

Table 11.1 Result table

Full size table

quantitative advantages: The method and ideas behind our method are relatively simple. We advocate a feature extraction strategy based on curve fitting + information theoretic features.
qualitative advantages: There are some elements of novelty, mostly in the ideas behind extracting features and doing hyper-parameter optimisation.

Code and installation instructions can be found here: https://github.com/ssamot/causality.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Samothrakis, S., Perez, D., Lucas, S. (2019). Training Gradient Boosting Machines Using Curve-Fitting and Information-Theoretic Features for Causal Direction Detection. In: Guyon, I., Statnikov, A., Batu, B. (eds) Cause Effect Pairs in Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-21810-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-21810-2_11
Published: 23 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21809-6
Online ISBN: 978-3-030-21810-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Training Gradient Boosting Machines Using Curve-Fitting and Information-Theoretic Features for Causal Direction Detection

Abstract

Access this chapter

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Causality Challenge

Appendix: Causality Challenge

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation