Abstract
Tax evasion detection has a crucial role in addressing tax revenue loss. In the real world, an accessed tax dataset only contains a small number of labeled taxpayers who evade tax (positive samples) and a large number of unlabeled taxpayers who either evade tax or do not evade tax. It is difficult to address this issue due to this nontraditional dataset. In addition, the basic features of taxpayers designed according to tax experts’ domain knowledge and experience are very limited to determining whether taxpayers evade tax. These limitations motivate the contribution of this work. In this paper, we argue that the tax evasion detection task in the real world should be formalized as a positive unlabeled (PU) learning problem. We propose a novel tax evasion detection method based on PU learning with Network Embedding features (PUNE). PUNE effectively detects tax evasion based on basic features and transaction network features that are extracted by a network embedding algorithm. Moreover, PUNE can work well even under label noise. To evaluate the effectiveness of PUNE, we conduct experimental tests on a real-world tax dataset. The results demonstrate that PUNE can significantly improve the performance of tax evasion detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. arXiv preprint arXiv:1811.04820 (2018)
Chen, Y.S., Cheng, C.H.: A Delphi-based rough sets fusion model for extracting payment rules of vehicle license tax in the government sector. Expert Syst. Appl. 37(3), 2161–2174 (2010)
Christoffel, M., Niu, G., Sugiyama, M.: Class-prior estimation for learning from positive and unlabeled data. In: Asian Conference on Machine Learning, pp. 221–236 (2016)
DeBarr, D., Eyler-Walker, Z.: Closing the gap: automated screening of tax returns to identify egregious tax shelters. ACM SIGKDD Explor. Newslett. 8(1), 11–16 (2006)
Du Plessis, M., Niu, G., Sugiyama, M.: Convex formulation for learning from positive and unlabeled data. In: International Conference on Machine Learning, pp. 1386–1394 (2015)
Du Plessis, M.C., Niu, G., Sugiyama, M.: Analysis of learning from positive and unlabeled data. In: Advances in Neural Information Processing Systems, pp. 703–711 (2014)
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220. ACM (2008)
Junqué de Fortuny, E., Stankova, M., Moeyersoms, J., Minnaert, B., Provost, F., Martens, D.: Corporate residence fraud detection. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1650–1659. ACM (2014)
Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text classification without negative examples revisit. IEEE Trans. Knowl. Data Eng. 18(1), 6–20 (2005)
Greenwood, P.E., Nikulin, M.S.: A Guide to Chi-Squared Testing, vol. 280. Wiley, New York (1996)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Hornik, K., Stinchcombe, M.: Halbert: multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Jain, S., White, M., Radivojac, P.: Estimating the class prior and posterior from noisy positives and unlabeled data. In: Advances in Neural Information Processing Systems, pp. 2693–2701 (2016)
Kiryo, R., Niu, G., du Plessis, M.C., Sugiyama, M.: Positive-unlabeled learning with non-negative risk estimator. In: Advances in Neural Information Processing Systems (2017)
Liu, B., Dai, Y., Li, X., Lee, W.S., Philip, S.Y.: Building text classifiers using positive and unlabeled examples. In: ICDM, vol. 3, pp. 179–188. Citeseer (2003)
Menon, A., Van Rooyen, B., Ong, C.S., Williamson, B.: Learning from corrupted binary labels via class-probability estimation. In: International Conference on Machine Learning, pp. 125–134 (2015)
Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2014)
Pérez López, C., Delgado Rodríguez, M.J., de Lucas Santos, S.: Tax fraud detection through neural networks: an application using a sample of personal income taxpayers. Future Internet 11(4), 86 (2019)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Ramaswamy, H., Scott, C.: Mixture proportion estimation via kernel embeddings of distributions. In: International Conference on Machine Learning, pp. 2052–2060 (2016)
Ruan, J., Yan, Z., Dong, B., Zheng, Q., Qian, B.: Identifying suspicious groups of affiliated-transaction-based tax evasion in big data. Inf. Sci. 477, 508–532 (2019)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)
Tian, F., et al.: Mining suspicious tax evasion groups in big data. IEEE Trans. Knowl. Data Eng. 28(10), 2651–2664 (2016)
Wu, R.S., Ou, C.S., Lin, H.y., Chang, S.I., Yen, D.C.: Using data mining technique to enhance tax evasion detection performance. Expert Syst. Appl. 39(10), 8769–8777 (2012)
Zhu, X., Yan, Z., Ruan, J., Zheng, Q., Dong, B.: IRTED-TL: an inter-region tax evasion detection method based on transfer learning. In: 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pp. 1224–1235. IEEE (2018)
Zurada, J.M.: Introduction to Artificial Neural Systems, vol. 8. West publishing company St, Paul (1992)
Acknowledgments
This research was partially supported by “The Fundamental Theory and Applications of Big Data with Knowledge Engineering” under the National Key Research and Development Program of China with Grant No. 2018YFB1004500, the MOE Innovation Research Team No. IRT_17R86, the National Science Foundation of China under Grant Nos. 61721002 and 61532015, and Project of XJTU-SERVYOU Joint AI Lab.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mi, L., Dong, B., Shi, B., Zheng, Q. (2020). A Tax Evasion Detection Method Based on Positive and Unlabeled Learning with Network Embedding Features. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-63833-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)