Soft Computing

, Volume 20, Issue 11, pp 4575–4588

Multiple-cause discovery combined with structure learning for high-dimensional discrete data and application to stock prediction

  • Weiqi Chen
  • Zhifeng Hao
  • Ruichu Cai
  • Xiangzhou Zhang
  • Yong Hu
  • Mei Liu
Methodologies and Application

DOI: 10.1007/s00500-015-1764-8

Cite this article as:
Chen, W., Hao, Z., Cai, R. et al. Soft Comput (2016) 20: 4575. doi:10.1007/s00500-015-1764-8

Abstract

Causal discovery in observational data is crucial to a variety of scientific and business research. Although many causal discovery algorithms have been proposed in recent decades, none of them is effective enough in dealing with high-dimensional discrete data. The main challenge is the complex interactions among large volume of variables, leading to numerous spurious causalities found. In this work, we propose a novel multiple-cause discovery method combined with structure learning (McDSL) to eliminate the spurious causalities. The method is carried out in two phases. In the first phase, conditional independence test is used to distinguish direct causal candidates from the indirect ones. In the second phase, causal direction of multi-cause structure is carefully determined with a hybrid causal discovery method. Validation experiments on synthetic data showed that McDSL is reliable in discovering multi-cause structures and eliminating indirect causes. We then applied this algorithm in discovering multiple causes of stock return based on 13-year historical financial data of the Shanghai Stock Exchanges of China, and established a stock prediction model. Experimental results showed that the McDSL discovered causes revealed changes of key risk factors of the stock market over 13 years, which indicated investors should change their investment strategy over time. Moreover, the causes discovered by McDSL have better performance in predicting stock return than that of other common filter-based feature selection algorithms.

Keywords

Causal discovery High-dimensional discrete data Structure learning Additive noise model  Stock prediction 

Funding information

Funder NameGrant NumberFunding Note
National Natural Science Foundation of China
  • 71271061
National Natural Science Foundation of China
  • 70801020
Science and Technology Planning Project of Guangdong Province, China
  • 2010B010600034
Science and Technology Planning Project of Guangdong Province, China
  • 2012B091100192
Business Intelligence Key Team of Guangdong University of Foreign Studies
  • TD1202

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Weiqi Chen
    • 1
  • Zhifeng Hao
    • 2
  • Ruichu Cai
    • 2
  • Xiangzhou Zhang
    • 3
    • 4
  • Yong Hu
    • 3
    • 4
  • Mei Liu
    • 4
    • 5
  1. 1.Faculty of AutomationGuangdong University of TechnologyGuangzhouChina
  2. 2.Department of Computer ScienceGuangdong University of TechnologyGuangzhouChina
  3. 3.School of BusinessSun Yat-sen UniversityGuangzhouChina
  4. 4.Big Data Decision InstituteJinan UniversityGuangzhouChina
  5. 5.University of Kansas Medical CenterKansas CityUSA

Personalised recommendations