A Tax Evasion Detection Method Based on Positive and Unlabeled Learning with Network Embedding Features

Mi, Lingyun; Dong, Bo; Shi, Bin; Zheng, Qinghua

doi:10.1007/978-3-030-63833-7_12

Lingyun Mi^14,15,
Bo Dong^16,17,
Bin Shi^14,15 &
…
Qinghua Zheng^14,15

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

International Conference on Neural Information Processing

2460 Accesses
1 Citations

Abstract

Tax evasion detection has a crucial role in addressing tax revenue loss. In the real world, an accessed tax dataset only contains a small number of labeled taxpayers who evade tax (positive samples) and a large number of unlabeled taxpayers who either evade tax or do not evade tax. It is difficult to address this issue due to this nontraditional dataset. In addition, the basic features of taxpayers designed according to tax experts’ domain knowledge and experience are very limited to determining whether taxpayers evade tax. These limitations motivate the contribution of this work. In this paper, we argue that the tax evasion detection task in the real world should be formalized as a positive unlabeled (PU) learning problem. We propose a novel tax evasion detection method based on PU learning with Network Embedding features (PUNE). PUNE effectively detects tax evasion based on basic features and transaction network features that are extracted by a network embedding algorithm. Moreover, PUNE can work well even under label noise. To evaluate the effectiveness of PUNE, we conduct experimental tests on a real-world tax dataset. The results demonstrate that PUNE can significantly improve the performance of tax evasion detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. arXiv preprint arXiv:1811.04820 (2018)
Chen, Y.S., Cheng, C.H.: A Delphi-based rough sets fusion model for extracting payment rules of vehicle license tax in the government sector. Expert Syst. Appl. 37(3), 2161–2174 (2010)
Article Google Scholar
Christoffel, M., Niu, G., Sugiyama, M.: Class-prior estimation for learning from positive and unlabeled data. In: Asian Conference on Machine Learning, pp. 221–236 (2016)
Google Scholar
DeBarr, D., Eyler-Walker, Z.: Closing the gap: automated screening of tax returns to identify egregious tax shelters. ACM SIGKDD Explor. Newslett. 8(1), 11–16 (2006)
Article Google Scholar
Du Plessis, M., Niu, G., Sugiyama, M.: Convex formulation for learning from positive and unlabeled data. In: International Conference on Machine Learning, pp. 1386–1394 (2015)
Google Scholar
Du Plessis, M.C., Niu, G., Sugiyama, M.: Analysis of learning from positive and unlabeled data. In: Advances in Neural Information Processing Systems, pp. 703–711 (2014)
Google Scholar
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220. ACM (2008)
Google Scholar
Junqué de Fortuny, E., Stankova, M., Moeyersoms, J., Minnaert, B., Provost, F., Martens, D.: Corporate residence fraud detection. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1650–1659. ACM (2014)
Google Scholar
Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text classification without negative examples revisit. IEEE Trans. Knowl. Data Eng. 18(1), 6–20 (2005)
Article Google Scholar
Greenwood, P.E., Nikulin, M.S.: A Guide to Chi-Squared Testing, vol. 280. Wiley, New York (1996)
Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Google Scholar
Hornik, K., Stinchcombe, M.: Halbert: multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article Google Scholar
Jain, S., White, M., Radivojac, P.: Estimating the class prior and posterior from noisy positives and unlabeled data. In: Advances in Neural Information Processing Systems, pp. 2693–2701 (2016)
Google Scholar
Kiryo, R., Niu, G., du Plessis, M.C., Sugiyama, M.: Positive-unlabeled learning with non-negative risk estimator. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Liu, B., Dai, Y., Li, X., Lee, W.S., Philip, S.Y.: Building text classifiers using positive and unlabeled examples. In: ICDM, vol. 3, pp. 179–188. Citeseer (2003)
Google Scholar
Menon, A., Van Rooyen, B., Ong, C.S., Williamson, B.: Learning from corrupted binary labels via class-probability estimation. In: International Conference on Machine Learning, pp. 125–134 (2015)
Google Scholar
Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2014)
Article Google Scholar
Pérez López, C., Delgado Rodríguez, M.J., de Lucas Santos, S.: Tax fraud detection through neural networks: an application using a sample of personal income taxpayers. Future Internet 11(4), 86 (2019)
Article Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Google Scholar
Ramaswamy, H., Scott, C.: Mixture proportion estimation via kernel embeddings of distributions. In: International Conference on Machine Learning, pp. 2052–2060 (2016)
Google Scholar
Ruan, J., Yan, Z., Dong, B., Zheng, Q., Qian, B.: Identifying suspicious groups of affiliated-transaction-based tax evasion in big data. Inf. Sci. 477, 508–532 (2019)
Article Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)
Google Scholar
Tian, F., et al.: Mining suspicious tax evasion groups in big data. IEEE Trans. Knowl. Data Eng. 28(10), 2651–2664 (2016)
Article Google Scholar
Wu, R.S., Ou, C.S., Lin, H.y., Chang, S.I., Yen, D.C.: Using data mining technique to enhance tax evasion detection performance. Expert Syst. Appl. 39(10), 8769–8777 (2012)
Google Scholar
Zhu, X., Yan, Z., Ruan, J., Zheng, Q., Dong, B.: IRTED-TL: an inter-region tax evasion detection method based on transfer learning. In: 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pp. 1224–1235. IEEE (2018)
Google Scholar
Zurada, J.M.: Introduction to Artificial Neural Systems, vol. 8. West publishing company St, Paul (1992)
Google Scholar

Download references

Acknowledgments

This research was partially supported by “The Fundamental Theory and Applications of Big Data with Knowledge Engineering” under the National Key Research and Development Program of China with Grant No. 2018YFB1004500, the MOE Innovation Research Team No. IRT_17R86, the National Science Foundation of China under Grant Nos. 61721002 and 61532015, and Project of XJTU-SERVYOU Joint AI Lab.

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
Lingyun Mi, Bin Shi & Qinghua Zheng
SPKLSTN Lab, Xi’an Jiaotong University, Xi’an, China
Lingyun Mi, Bin Shi & Qinghua Zheng
School of Continuing Education, Xi’an Jiaotong University, Xi’an, China
Bo Dong
National Engineering Lab for Big Data Analytics, Xi’an Jiaotong University, Xi’an, China
Bo Dong

Authors

Lingyun Mi
View author publications
You can also search for this author in PubMed Google Scholar
Bo Dong
View author publications
You can also search for this author in PubMed Google Scholar
Bin Shi
View author publications
You can also search for this author in PubMed Google Scholar
Qinghua Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Dong .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mi, L., Dong, B., Shi, B., Zheng, Q. (2020). A Tax Evasion Detection Method Based on Positive and Unlabeled Learning with Network Embedding Features. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_12
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics