Significant Pattern Mining with Confounding Variables

Conference paper

DOI: 10.1007/978-3-319-31753-3_23

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9651)
Cite this paper as:
Terada A., duVerle D., Tsuda K. (2016) Significant Pattern Mining with Confounding Variables. In: Bailey J., Khan L., Washio T., Dobbie G., Huang J., Wang R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science, vol 9651. Springer, Cham


Recent pattern mining algorithms such as LAMP allow us to compute statistical significance of patterns with respect to an outcome variable. Their p-values are adjusted to control the family-wise error rate, which is the probability of at least one false discovery occurring. However, they are a poor fit for medical applications, due to their inability to handle potential confounding variables such as age or gender. We propose a novel pattern mining algorithm that evaluates statistical significance under confounding variables. Using a new testability bound based on the exact logistic regression model, the algorithm can exclude a large quantity of combination without testing them, limiting the amount of correction required for multiple testing. Using synthetic data, we showed that our method could remove the bias introduced by confounding variables while still detecting true patterns correlated with the class. In addition, we demonstrated application of data integration using a confounding variable.


Significant pattern mining Multiple testing Exact logistic regression 

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Computational Biology and Medical Sciences, Graduate School of Frontier SciencesThe University of TokyoChibaJapan
  2. 2.Research Fellow of Japan Society for the Promotion of Science KojimachiJapan
  3. 3.Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
  4. 4.Center for Materials Research by Information Integration, National Institute for Materials ScienceIbarakiJapan

Personalised recommendations