Skip to main content

Design of a NLP-empowered finance fraud awareness model: the anti-fraud chatbot for fraud detection and fraud classification as an instance


Advanced technologies, Internet of things and fundamental information communication technology frameworks in particular, facilitate information sharing. One simple click-on end device can make every tool accessible to users; however, whether correct information is received remains to be an open question. Incorrect information that bundles the factors of fake, malicious, or fraudulent information, whether deliberately or not, may worsen misunderstandings. To avoid these cases escalating to the level of crime, a universal financial fraud-awareness model was designed in this study. The model first targets accurate fraud detection and classification using the natural language processing technique. An anti-fraud chatbot is then implemented as an instance of the model and deployed on a widely used social network service, namely LINE. This implementation aims to manage finance-fraud cases and provide anti-fraud suggestions to deal with foreseeable fraud events. Statistics of the comparison between Word2vec, ELMO, BERT, and DistilBERT on the five-strong conventional machine-learning models and the models of artificial neural networks indicate that the proposed model can achieve an accuracy of over 98% while detecting potential finance-fraud cases. In addition, the more efficient models by DistilBERT with a support vector machine or a random forest have lower resource-computation cost and faster execution time in real applications.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Availability of data and material

Data, experiment dataset as well, is accessible per reasonable request.

Code Availability

Code will be available per reasonable request after the patent is filed.









  • Adewumi AO, Akinyelu AA (2017) A survey of machine-learning and nature-inspired based credit card fraud detection techniques. Int J Syst Assur Eng Manage 8:937–953

    Article  Google Scholar 

  • Aggarwal A, Chauhan A, Kumar D, Mittal M, Verma S (2020) Classification of fake news by fine-tuning deep bidirectional transformers based language model. EAI Endorsed Trans Scalable Inf Syst 7(27):1–12

    Google Scholar 

  • Bocklisch T, Faulkner J, Pawlowski N, Nichol A (2017) Rasa: Open source language understanding and dialogue management. arXiv preprint arXiv:1712.05181.

  • Chen LC, Hsu CL, Lo NW, Yeh KH, Lin PH (2017) Fraud analysis and detection for real-time messaging communications on social networks. IEICE Trans Inf Syst 100:2267–2274

    Article  Google Scholar 

  • Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

  • Hajek P, Henriques R (2017) Mining corporate annual reports for intelligent detection of financial statement fraud–A comparative study of machine learning methods. Knowledge-Based Syst 128:139–152

    Article  Google Scholar 

  • 2019 Internet Crime Report Released (2019) Accessed 28 May 2020.

  • Jurgovsky J, Granitzer M, Ziegler K, Calabretto S, Portier PE, He-Guelton L, Caelen O (2018) Sequence classification for credit-card fraud detection. Expert Syst Appl 100:234–245

    Article  Google Scholar 

  • Lilleberg J, Zhu Y, Zhang Y (2015) Support vector machines and word2vec for text classification with semantic features. 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), pp 136–140, 6–8 Jul 2015, Beijing, China.

  • Ling M, Chen Q, Sun Q, Jia Y (2020) Hybrid neural network for Sina Weibo sentiment analysis. IEEE Trans Comput Social Syst 7(4):983–990

    Article  Google Scholar 

  • Martina M, Wu JR (2016) China blames Taiwan criminals for surge in telephone scams. Reuters. Accessed 22 April 2016.

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

  • Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365.

  • Rexha A, Dragoni M, Kern R (2020) A Neural-based Architecture For Small Datasets Classification. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20), pp 319–327, 1–5 Aug 2020, China.

  • Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

  • Sun C, Yang Z, Luo L, Wang L, Zhang Y, Lin H, Wang J (2019) A deep learning approach with deep contextualized word representations for chemical-protein interaction extraction from biomedical literature. IEEE Access 7:151034–151046

    Article  Google Scholar 

  • Wen TH, Gasic M, Mrksic N, Su PH, Vandyke D, Young S (2015) Semantically conditioned lstm-based natural language generation for spoken dialogue systems. arXiv preprint arXiv:1508.01745.

  • Wensen L, Zewen C, Jun W, Xiaoyi W (2016) Short text classification based on Wikipedia and Word2vec. 2016 2nd IEEE International Conference on Computer and Communications (ICCC), pp 1195–1200, 14–17 Oct 2016, Chengdu, China.

Download references


Special thanks to Mr. Hou-Hsun Wang for his assistance in the development of the programming for this study.


This work was partially supported by the Ministry of Science and Technology, Taiwan, R.O.C. [grand number MOST 108-2218-E-025-002-MY3].

Author information

Authors and Affiliations


Corresponding author

Correspondence to Neil Yen.

Ethics declarations

Conflicts of interest

Authors of this work declare that there is no conflict of interest/competing interests.

Consent for participate

Authors are aware of everything related to this submitted work.

Consent for publication

Authors are aware of the submitted work for publication on Journal of Ambient Intelligence and Humanized Computing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



See Table 15.

Table 15 Sample Fraud Events and Categories

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chang, JW., Yen, N. & Hung, J.C. Design of a NLP-empowered finance fraud awareness model: the anti-fraud chatbot for fraud detection and fraud classification as an instance. J Ambient Intell Human Comput 13, 4663–4679 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Natural language processing
  • Fraud detection
  • Fraud classification
  • Context awareness
  • Machine learning
  • Smart city service