Advertisement

Statistically Significant Discriminative Patterns Searching

  • Hoang Son PhamEmail author
  • Gwendal Virlet
  • Dominique Lavenier
  • Alexandre Termier
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11708)

Abstract

In this paper, we propose a novel algorithm, named SSDPS, to discover patterns in two-class datasets. The SSDPS algorithm owes its efficiency to an original enumeration strategy of the patterns, which allows to exploit some degrees of anti-monotonicity on the measures of discriminance and statistical significance. Experimental results demonstrate that the performance of the SSDPS algorithm is better than others. In addition, the number of generated patterns is much less than the number of the other algorithms. Experiment on real data also shows that SSDPS efficiently detects multiple SNPs combinations in genetic data.

Keywords

Discriminative patterns Discriminative measures Statistical significance Anti-monotonicity 

References

  1. 1.
    Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Fifth ACM SIGKDD, KDD 1999, pp. 43–52. ACM, New York (1999)Google Scholar
  2. 2.
    Bay, S., Pazzani, M.: Detecting group differences: mining contrast sets. Data Min. Knowl. Discov. 5(3), 213–246 (2001)CrossRefGoogle Scholar
  3. 3.
    Cheng, H., Yan, X., Han, J., Yu, P.S.: Direct discriminative pattern mining for effective classification. In: ICDE 2008, pp. 169–178. IEEE Computer Society, Washington, DC (2008) Google Scholar
  4. 4.
    García-Borroto, M., Martínez-Trinidad, J., Carrasco-Ochoa, J.: A survey of emerging patterns for supervised classification. Artif. Intell. Rev. 42(4), 705–721 (2014)CrossRefGoogle Scholar
  5. 5.
    Ma, L., Assimes, T.L., Asadi, N.B., Iribarren, C., Quertermous, T., Wong, W.H.: An “almost exhaustive” search-based sequential permutation method for detecting epistasis in disease association studies. Genet. Epidemiol. 34(5), 434–443 (2010)CrossRefGoogle Scholar
  6. 6.
    Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12), 1951–1983 (2011)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Morris, J.A., Gardner, M.J.: Statistics in medicine: calculating confidence intervals for relative risks (odds ratios) and standardised ratios and rates. Br. Med. J. 296(6632), 1313–1316 (1988)CrossRefGoogle Scholar
  8. 8.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Rec. 22(2), 207–216 (1993)CrossRefGoogle Scholar
  9. 9.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1999).  https://doi.org/10.1007/3-540-49257-7_25CrossRefGoogle Scholar
  10. 10.
    Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Workshop Frequent Item Set Mining Implementations (2004)Google Scholar
  11. 11.
    Leroy, V., Kirchgessner, M., Termier, A., Amer-Yahia, S.: TopPI: an efficient algorithm for item-centric mining. Inf. Syst. 64, 104–118 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Hoang Son Pham
    • 1
    Email author
  • Gwendal Virlet
    • 2
  • Dominique Lavenier
    • 2
  • Alexandre Termier
    • 2
  1. 1.ICTEAMUCLouvainLouvain-la-NeuveBelgium
  2. 2.Univ Rennes, Inria, CNRS, IRISARennesFrance

Personalised recommendations