Skip to main content

A Tax Evasion Detection Method Based on Positive and Unlabeled Learning with Network Embedding Features

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

Abstract

Tax evasion detection has a crucial role in addressing tax revenue loss. In the real world, an accessed tax dataset only contains a small number of labeled taxpayers who evade tax (positive samples) and a large number of unlabeled taxpayers who either evade tax or do not evade tax. It is difficult to address this issue due to this nontraditional dataset. In addition, the basic features of taxpayers designed according to tax experts’ domain knowledge and experience are very limited to determining whether taxpayers evade tax. These limitations motivate the contribution of this work. In this paper, we argue that the tax evasion detection task in the real world should be formalized as a positive unlabeled (PU) learning problem. We propose a novel tax evasion detection method based on PU learning with Network Embedding features (PUNE). PUNE effectively detects tax evasion based on basic features and transaction network features that are extracted by a network embedding algorithm. Moreover, PUNE can work well even under label noise. To evaluate the effectiveness of PUNE, we conduct experimental tests on a real-world tax dataset. The results demonstrate that PUNE can significantly improve the performance of tax evasion detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. arXiv preprint arXiv:1811.04820 (2018)

  2. Chen, Y.S., Cheng, C.H.: A Delphi-based rough sets fusion model for extracting payment rules of vehicle license tax in the government sector. Expert Syst. Appl. 37(3), 2161–2174 (2010)

    Article  Google Scholar 

  3. Christoffel, M., Niu, G., Sugiyama, M.: Class-prior estimation for learning from positive and unlabeled data. In: Asian Conference on Machine Learning, pp. 221–236 (2016)

    Google Scholar 

  4. DeBarr, D., Eyler-Walker, Z.: Closing the gap: automated screening of tax returns to identify egregious tax shelters. ACM SIGKDD Explor. Newslett. 8(1), 11–16 (2006)

    Article  Google Scholar 

  5. Du Plessis, M., Niu, G., Sugiyama, M.: Convex formulation for learning from positive and unlabeled data. In: International Conference on Machine Learning, pp. 1386–1394 (2015)

    Google Scholar 

  6. Du Plessis, M.C., Niu, G., Sugiyama, M.: Analysis of learning from positive and unlabeled data. In: Advances in Neural Information Processing Systems, pp. 703–711 (2014)

    Google Scholar 

  7. Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220. ACM (2008)

    Google Scholar 

  8. Junqué de Fortuny, E., Stankova, M., Moeyersoms, J., Minnaert, B., Provost, F., Martens, D.: Corporate residence fraud detection. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1650–1659. ACM (2014)

    Google Scholar 

  9. Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text classification without negative examples revisit. IEEE Trans. Knowl. Data Eng. 18(1), 6–20 (2005)

    Article  Google Scholar 

  10. Greenwood, P.E., Nikulin, M.S.: A Guide to Chi-Squared Testing, vol. 280. Wiley, New York (1996)

    Google Scholar 

  11. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)

    Google Scholar 

  12. Hornik, K., Stinchcombe, M.: Halbert: multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)

    Article  Google Scholar 

  13. Jain, S., White, M., Radivojac, P.: Estimating the class prior and posterior from noisy positives and unlabeled data. In: Advances in Neural Information Processing Systems, pp. 2693–2701 (2016)

    Google Scholar 

  14. Kiryo, R., Niu, G., du Plessis, M.C., Sugiyama, M.: Positive-unlabeled learning with non-negative risk estimator. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  15. Liu, B., Dai, Y., Li, X., Lee, W.S., Philip, S.Y.: Building text classifiers using positive and unlabeled examples. In: ICDM, vol. 3, pp. 179–188. Citeseer (2003)

    Google Scholar 

  16. Menon, A., Van Rooyen, B., Ong, C.S., Williamson, B.: Learning from corrupted binary labels via class-probability estimation. In: International Conference on Machine Learning, pp. 125–134 (2015)

    Google Scholar 

  17. Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2014)

    Article  Google Scholar 

  18. Pérez López, C., Delgado Rodríguez, M.J., de Lucas Santos, S.: Tax fraud detection through neural networks: an application using a sample of personal income taxpayers. Future Internet 11(4), 86 (2019)

    Article  Google Scholar 

  19. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)

    Google Scholar 

  20. Ramaswamy, H., Scott, C.: Mixture proportion estimation via kernel embeddings of distributions. In: International Conference on Machine Learning, pp. 2052–2060 (2016)

    Google Scholar 

  21. Ruan, J., Yan, Z., Dong, B., Zheng, Q., Qian, B.: Identifying suspicious groups of affiliated-transaction-based tax evasion in big data. Inf. Sci. 477, 508–532 (2019)

    Article  Google Scholar 

  22. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)

    Google Scholar 

  23. Tian, F., et al.: Mining suspicious tax evasion groups in big data. IEEE Trans. Knowl. Data Eng. 28(10), 2651–2664 (2016)

    Article  Google Scholar 

  24. Wu, R.S., Ou, C.S., Lin, H.y., Chang, S.I., Yen, D.C.: Using data mining technique to enhance tax evasion detection performance. Expert Syst. Appl. 39(10), 8769–8777 (2012)

    Google Scholar 

  25. Zhu, X., Yan, Z., Ruan, J., Zheng, Q., Dong, B.: IRTED-TL: an inter-region tax evasion detection method based on transfer learning. In: 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pp. 1224–1235. IEEE (2018)

    Google Scholar 

  26. Zurada, J.M.: Introduction to Artificial Neural Systems, vol. 8. West publishing company St, Paul (1992)

    Google Scholar 

Download references

Acknowledgments

This research was partially supported by “The Fundamental Theory and Applications of Big Data with Knowledge Engineering” under the National Key Research and Development Program of China with Grant No. 2018YFB1004500, the MOE Innovation Research Team No. IRT_17R86, the National Science Foundation of China under Grant Nos. 61721002 and 61532015, and Project of XJTU-SERVYOU Joint AI Lab.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mi, L., Dong, B., Shi, B., Zheng, Q. (2020). A Tax Evasion Detection Method Based on Positive and Unlabeled Learning with Network Embedding Features. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63833-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63832-0

  • Online ISBN: 978-3-030-63833-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics