World Wide Web

, Volume 19, Issue 4, pp 633–651 | Cite as

An effective contrast sequential pattern mining approach to taxpayer behavior analysis

  • Zhigang ZhengEmail author
  • Wei Wei
  • Chunming Liu
  • Wei Cao
  • Longbing Cao
  • Maninder Bhatia


Data mining for client behavior analysis has become increasingly important in business, however further analysis on transactions and sequential behaviors would be of even greater value, especially in the financial service industry, such as banking and insurance, government and so on. In a real-world business application of taxation debt collection, in order to understand the internal relationship between taxpayers’ sequential behaviors (payment, lodgment and actions) and compliance to their debt, we need to find the contrast sequential behavior patterns between compliant and non-compliant taxpayers. Contrast Patterns (CP) are defined as the itemsets showing the difference/discrimination between two classes/datasets (Dong and Li, 1999). However, the existing CP mining methods which can only mine itemset patterns, are not suitable for mining sequential patterns, such as time-ordered transactions in taxpayer sequential behaviors. Little work has been conducted on Contrast Sequential Pattern (CSP) mining so far. Therefore, to address this issue, we develop a CSP mining approach, e C S P, by using an effective CSP-tree structure, which improves the PrefixSpan tree (Pei et al., 2001) for mining contrast patterns. We propose some heuristics and interestingness filtering criteria, and integrate them into the CSP-tree seamlessly to reduce the search space and to find business-interesting patterns as well. The performance of the proposed approach is evaluated on three real-world datasets. In addition, we use a case study to show how to implement the approach to analyse taxpayer behaviour. The results show a very promising performance and convincing business value.


Contrast pattern Sequential pattern Client behavior analysis 


  1. 1.
    Agichtein, E., Zheng, Z.: Identifying best bet web search results by mining past user behavior. In: KDD 2006, 902–908. ACM (2006)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995)Google Scholar
  3. 3.
    Attenberg, J., Pandey, S., Suel, T.: Modeling and predicting user behavior in sponsored search. In: KDD 2009, pp. 1067–1076, ACM. (2009)Google Scholar
  4. 4.
    Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential Pattern Mining Using a Bitmap representation. In: KDD 2002, pp. 429–435 (2002)Google Scholar
  5. 5.
    Bailey, J., Manoukian, T., Ramamohanarao, K.: Fast algorithms for mining emerging patterns. Prin Data Min. Knowl. Disc. 2431, 187–208 (2002)zbMATHGoogle Scholar
  6. 6.
    Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. SIGMOD (1998)Google Scholar
  7. 7.
    Chan, S., Kao, B., Yip, C., Tang, M.: Mining emerging substrings. In: DASFAA 2003, pp. 119–126 (2003)Google Scholar
  8. 8.
    Cao, L.: Behavior informatics and analytics: Let behavior talk. In: ICDM 2008 Workshops, pp. 87–96 (2008)Google Scholar
  9. 9.
    Cao, L., Zhang, H., Zhao, Y., Luo, D., Zhang, C.: Combined mining: Discovering informative knowledge in complex data. IEEE Trans. Syst. Man. Cybern. B. Cybern. 41(3), 699–712 (2011)CrossRefGoogle Scholar
  10. 10.
    Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: KDD 1999, pp. 43–52 (1999)Google Scholar
  11. 11.
    Dong, G., Li, J., Zhang, X.: Discovering Jumping Emerging Patterns and Experiments on Real Datasets. (IDC99) (1999)Google Scholar
  12. 12.
    Dong, G., Zhang, X., Wong, L., Caep, J.Li.: Classification by aggregating emerging patterns. In: Discovery Science, vol. 1721, pp. 737–737 (1999)Google Scholar
  13. 13.
    Fan, H., Ramamohanarao, K.: Efficiently mining interesting emerging patterns. In: WAIM2003, pp. 189–201 (2003)Google Scholar
  14. 14.
    Fan, H., Ramamohanarao, K.: Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. TKDE 18(6), 721–737 (2006)Google Scholar
  15. 15.
    Han, J., Pei, J., mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.-C.: Freespan: Frequent Pattern-projected Sequential Pattern Mining. In: KDD, pp. 355–359 (2000)Google Scholar
  16. 16.
    Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11, 259–286 (2007)CrossRefGoogle Scholar
  17. 17.
    Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: ICDM 2001, pp. 369–376 (2001)Google Scholar
  18. 18.
    Loekito, E., Bailey, J.: Fast mining of high dimensional expressive contrast patterns using binary decision diagrams. In: SIGKDD 2006, pp. 307–316 (2006)Google Scholar
  19. 19.
    Mannila, H., Toivonen, H.: Levelwise Search and Borders of Theories in Knowledge Discovery. Data Min. Knowl. Disc. 1(3), 41 (1997)Google Scholar
  20. 20.
    Mozer, M., Wolniewicz, R., Grimes, D., Johnson, E., Kaushansky, H.: Predicting subscriber dissatisfaction and improving retention in the wireless telecommunica- tions industry. IEEE Trans. Neural Netw. 11(3), 690–696 (2000)CrossRefGoogle Scholar
  21. 21.
    Pasquier, N., Bastide, R., Taouil, R., Lakhal, L.: Efficient Mining of Association Rules using Closed Itemset Lattices. Information Systems 24(1) (1999)Google Scholar
  22. 22.
    Pei, J., Han, J., Asl, M.B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: PrefixSpan Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth. In: ICDE, pp. 215–226 (2001)Google Scholar
  23. 23.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos (1993)Google Scholar
  24. 24.
    Ramamohanarao, K., Bailey, J.: Emerging patterns: mining and applications. In: ICISIP 2004, pp. 409–414 (2004)Google Scholar
  25. 25.
    Wang, X., Duan, L., Dong, G., Yu, Z., Tang, C.: Efficient Mining of Density-Aware Distinguishing Sequential Patterns with Gap Constraints. DASFAA 372–387 (2014)Google Scholar
  26. 26.
    Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequence. Mach. Learn. 42, 31–60 (2001)CrossRefzbMATHGoogle Scholar
  27. 27.
    Zhao, Y., Zhang, H., Cao, L., Zhang, C., Bohlscheid, H.: Combined Pattern Mining: From Learned Rules to Actionable Knowledge. AI 393–403 (2008)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Zhigang Zheng
    • 1
    Email author
  • Wei Wei
    • 1
  • Chunming Liu
    • 1
  • Wei Cao
    • 1
  • Longbing Cao
    • 1
  • Maninder Bhatia
    • 2
  1. 1.Advanced Analytics Institute, Faculty of Engineering and Information TechnologyUniversity of TechnologySydneyAustralia
  2. 2.Australian Taxation OfficeSydneyAustralia

Personalised recommendations