Customer Activity Sequence Classification for Debt Prevention in Social Security

  • Huaifeng Zhang
  • Yanchang Zhao
  • Longbing Cao
  • Chengqi Zhang
  • Hans Bohlscheid
Regular Paper

Abstract

From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which are the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples are covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.

Keywords

sequential pattern mining sequence classification coverage test interestingness measure 

References

  1. [1]
    Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Trans. Speech and Audio Signal Processing, May 1997, 5(3): 257–265.CrossRefGoogle Scholar
  2. [2]
    Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C. Text classification using string kernels. Journal of Machine Learning Research, 2002, 2: 419–444.MATHCrossRefGoogle Scholar
  3. [3]
    Baker L D, McCallum A K. Distributional clustering of words for text classification. In Proc. the 21st ACM SIGIR International Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 24–28, 1998, pp.96–103.Google Scholar
  4. [4]
    Wu C, Berry M, Shivakumar S, McLarty J. Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning, October, 1995, 21(1/2): 177–193.CrossRefGoogle Scholar
  5. [5]
    Chuzhanova N A, Jones A J, Margetts S. Feature selection for genetic sequence classification. Bioinformatics, 1998, 14(2): 139–143.CrossRefGoogle Scholar
  6. [6]
    She R, Chen F, Wang K, Ester M, Gardy J L, Brinkman F S L. Frequent-subsequence-based prediction of outer membrane proteins. In Proc. the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2003), Washington DC, USA, August 24–27, 2003, pp.436–445.Google Scholar
  7. [7]
    Sonnenburg S, R¨atsch G, Sch¨afer C. Learning interpretable SVMs for biological sequence classification. In Proc. Research in Computational Molecular Biology (RECOMB2005), Cambridge, USA, May 14–18, 2005, pp.389–407.Google Scholar
  8. [8]
    Hakeem A, Sheikh Y, Shah M. CASEE: A hierarchical event representation for the analysis of videos. In Proc. the Nineteenth National Conference on Artificial Intelligence (AAAI2004), San Jose, USA., July 25–29, 2004, pp.263–268.Google Scholar
  9. [9]
    Eichinger F, Nauck D D, Klawonn F. Sequence mining for customer behaviour predictions in telecommunications. In Proc. the Workshop on Practical Data Mining at ECML/PKDD, Berlin, Germany, September 18–22, 2006, pp.3–10.Google Scholar
  10. [10]
    Centrelink Annual Report 2007-2008. Technical Report, Centrelink, 2008.Google Scholar
  11. [11]
    Lesh N, Zaki M J, Ogihara M. Mining features for sequence classification. In Proc. the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA, August 15–18, 1999, pp.342–346.Google Scholar
  12. [12]
    Tseng V S M, Lee C-H. CBS: A new classification method by using sequential patterns. In Proc. SIAM International Conference on Data Mining (SDM2005), Newport Beach, USA, April 21–23, 2005, pp.596–600.Google Scholar
  13. [13]
    Xing Z, Pei J, Dong G, Yu P S. Mining sequence classifiers for early prediction. In Proc. SIAM International Conference on Data Mining (SDM2008), Atlanta, USA, April 24–26, 2008, pp.644–655.Google Scholar
  14. [14]
    Exarchos T P, Tsipouras M G, Papaloukas C, Fotiadis D I. A two-stage methodology for sequence classification based on sequential pattern mining and optimization. Data & Knowledge Engineering, September 2008, 66(3): 467–487.CrossRefGoogle Scholar
  15. [15]
    Agrawal R, Srikant R. Mining sequential patterns. In Proc. the Eleventh IEEE International Conference on Data Engineering (ICDE 1995), Taipei, China, March 6–10, 1995, pp.3–14.Google Scholar
  16. [16]
    Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu MC. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. the 17th IEEE International Conference on Data Engineering (ICDE 2001), Heidelberg, Germany, April 2–6, 2001, pp.215–224.Google Scholar
  17. [17]
    Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern mining using a bitmap representation. In Proc. the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002), Edmonton, Canada, July 23–26, 2002, pp.429–435.Google Scholar
  18. [18]
    Yan X, Han J, Afshar R. Clospan: Mining closed sequential patterns in large datasets. In Proc. SIAM International Conference on Data Mining (SDM2003), San Francisco, USA, May 1–3, 2003, pp.166–177.Google Scholar
  19. [19]
    Zaki M J. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 2001, 42(1/2): 31–60.MATHCrossRefGoogle Scholar
  20. [20]
    Liu B, HsuW, Ma Y. Integrating classification and association rule mining. In Proc. the 4th ACM International Conference on Knowledge Discovery and Data Mining (KDD1998), Menlo Park, USA, August 27–31, 1998, pp.80–86.Google Scholar
  21. [21]
    Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. the First IEEE International Conference on Data Mining (ICDM2001), Los Alamitos, USA, Nov. 29–Dec.2, 2001, pp.369–376.Google Scholar
  22. [22]
    Cheng H, Yan X, Han J, Hsu C-W. Discriminative frequent pattern analysis for effective classification. In Proc. 23rd IEEE International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, April 17–20, 2007, pp.716–725.Google Scholar
  23. [23]
    Han J, Cheng H, Xin D, Yan X. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, August 2007, 15(1): 55–86.CrossRefMathSciNetGoogle Scholar
  24. [24]
    Verhein F, Chawla S. Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets. In Proc. the Seventh IEEE International Conference on Data Mining (ICDM2007), Omaha, USA, Oct. 28–31, 2007, pp.679–684.Google Scholar
  25. [25]
    Antonie M L, Zaiane O R, Holte R C. Learning to use a learned model: A two-stage approach to classification. In Proc. the Sixth International Conference on Data Mining (ICDM2006), Hong Kong, China, Dec. 18–22, 2006, pp.33–42.Google Scholar
  26. [26]
    Baralis E, Garza P. A lazy approach to pruning classification rules. In Proc. the Second IEEE International Conference on Data Mining (ICDM2002), Maebashi City, Japan, Dec. 9–12, 2002, pp.35–42.Google Scholar
  27. [27]
    Wang J, Karypis G. Harmony: Efficiently mining the best rules for classification. In Proc. SIAM International Conference on Data Mining (SDM2005), Newport Beach, USA, April 21–23, 2005, pp.205–216.Google Scholar
  28. [28]
    Cheng H, Yan X, Han J, Yu P S. Direct discriminative pattern mining for effective classification. In Proc. the 24th IEEE International Conference on Data Engineering (ICDE 2008), Cancun, Mexico, April 7–12, 2008, pp.169–178.Google Scholar
  29. [29]
    Tan P-N, Kumar V, Srivastava J. Selecting the right interestingness measure for association patterns. In Proc. the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002), Edmonton, Canada, July 23–26, 2002, pp.32–41.Google Scholar

Copyright information

© Springer 2009

Authors and Affiliations

  • Huaifeng Zhang
    • 1
  • Yanchang Zhao
    • 2
  • Longbing Cao
    • 2
  • Chengqi Zhang
    • 2
  • Hans Bohlscheid
    • 1
  1. 1.Payment Reviews BranchBusiness Integrity DivisionCanberraAustralia
  2. 2.Centre for Quantum Computation and Intelligent Systems (QCIS)University of TechnologySydneyAustralia

Personalised recommendations