Skip to main content

Unsupervised Anomaly Detection for Discrete Sequence Healthcare Data

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12602))

Abstract

Fraud in healthcare is widespread, as doctors could prescribe unnecessary treatments to increase bills. Insurance companies want to detect these anomalous fraudulent bills and reduce their losses. Traditional fraud detection methods use expert rules and manual data processing. Recently, machine learning techniques automate this process, but hand-labeled data is extremely costly and usually out of date. We propose a machine learning model that automates fraud detection in an unsupervised way. Two deep learning approaches include LSTM neural network for prediction next patient visit and a seq2seq model. For normalization of produced anomaly scores, we propose Empirical Distribution Function (EDF) approach. So, the algorithm works with high class imbalance problems.

We use real data on sequences of patients’ visits data from Allianz company for the validation. The models provide state-of-the-art results for unsupervised anomaly detection for fraud detection in healthcare. Our EDF approach further improves the quality of LSTM model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bauder, R.A., Khoshgoftaar, T.M.: Medicare fraud detection using machine learning methods, pp. 858–865 (2017)

    Google Scholar 

  2. Bauder, R., Khoshgoftaar, T., Seliya, N.: A survey on the state of healthcare upcoding fraud analysis and detection. Health Serv. Outcomes Res. Method. 17, 07 (2016)

    Google Scholar 

  3. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. (CSUR) 49(2), 1–50 (2016)

    Article  Google Scholar 

  4. Buzmakov, A., Egho, E., Jay, N., Kuznetsov, S., Napoli, A., Raïssi, C.: On mining complex sequential data by means of FCA and pattern structures. Int. J. Gen. Syst. 45 135–159 (2016)

    Google Scholar 

  5. R. Chalapathy and S. Chawla. Deep learning for anomaly detection: A survey. arXiv:1901.03407 (2019)

  6. Christopher, I.L.: National cancer institute’s surveillance epidemiology and end results (seer) data analysis from nine population-based us cancer registries. JAMA 289, 1421–1424 (2003)

    Article  Google Scholar 

  7. Farbmacher, H., Löw, L., Spindler, M.: An explainable attention network for fraud detection in claims management. Technical report, Technical Report, University of Hamburg (2019)

    Google Scholar 

  8. Fursov, I., Zaytsev, A., Khasyanov, R., Spindler, M., Burnaev, E.: Sequence embeddings help to identify fraudulent cases in healthcare insurance. ArXiv, abs/1910.03072 (2019)

    Google Scholar 

  9. Habeeb, R.A.A., Nasaruddin, F., Gani, A., Hashem, I.A.T., Ahmed, E., Imran, M.: Real-time big data processing for anomaly detection: a survey. Int. J. Inf. Manage. 45, 289–307 (2019)

    Article  Google Scholar 

  10. Herland, M., Bauder, R.A., Khoshgoftaar, T.M.: Approaches for identifying U.S. medicare fraud in provider claims data. Health Care Manage. Sci. 23(1), 2–19 (2018). https://doi.org/10.1007/s10729-018-9460-8

    Article  Google Scholar 

  11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  12. Hundman, K., Constantinou, V., Laporte, C., Colwell, I., Söderström, T.: Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018)

    Google Scholar 

  13. Kiran, B.R., Thomas, D.M., Parakkal, R.: An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. J. Imaging 4(2), 36 (2018)

    Article  Google Scholar 

  14. Kozlovskaia, N., Zaytsev, A.: Deep ensembles for imbalanced classification. In: IEEE ICMLA, pp. 908–913. IEEE (2017)

    Google Scholar 

  15. Kwon, D., Kim, H., Kim, J., Suh, S.C., Kim, I., Kim, K.J.: A survey of deep learning-based network anomaly detection. Cluster Comput. 1–13 (2019)

    Google Scholar 

  16. Qiuxia L., Wenguan W., Khan, S., Shen, J., Sun, H., Shao, L.: Human vs machine attention in neural networks: A comparative study. ArXiv, abs/1906.08764, (2019)

    Google Scholar 

  17. Liu, F.T., Ting, K., Zhou, Z.: Isolation forest, pp. 413–422 (2009)

    Google Scholar 

  18. Miljković, D.: Review of novelty detection methods. In: Proceedings of the 33rd International Convention (MIPRO), pp. 593–598 (2010)

    Google Scholar 

  19. Krishnan, N., Vukosi, N.M.: Unsupervised anomaly detection of healthcare providers using generative adversarial networks. Responsible Design, Implementation and Use of Information and Communication Technology, p. 12066 (2020)

    Google Scholar 

  20. Poelmans, J., Dedene, G., Verheyden, G., Van der Mussele, H., Viaene, S., Peters, E.: Combining business process and data discovery techniques for analyzing and improving integrated care pathways. In: Perner, P. (ed.) ICDM 2010. LNCS (LNAI), vol. 6171, pp. 505–517. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14400-4_39

    Chapter  Google Scholar 

  21. Poelmans, J., Elzinga, P., Ignatov, D., Kuznetsov, S.: Semi-automated knowledge discovery: identifying and profiling human trafficking. Int. J. Gen. Syst. 41, 11 (2012)

    Article  MathSciNet  Google Scholar 

  22. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  23. Sherstinsky, A.: Fundamentals of recurrent neural network and long short-term memory network. Phys. D Nonlinear Phenomena 404, 132306 (2020)

    Article  MathSciNet  Google Scholar 

  24. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR, abs/1409.3215, 2014

    Google Scholar 

  25. Wiewel, F., Yang, B.: Continual learning for anomaly detection with variational autoencoder. In: IEEE ICASSP, pp. 3837–3841. IEEE (2019)

    Google Scholar 

  26. Zimmerer, D., Kohl, S., Petersen, J., Isensee, F., Maier-Hein, K.: Context-encoding variational autoencoder for unsupervised anomaly detection. arXiv preprint arXiv:1812.05941 (2018)

  27. Zong, B., et al.: Deep autoencoding gaussian mixture model for unsupervised anomaly detection (2018)

    Google Scholar 

Download references

Acknowledgments

We thank Martin Spindler for providing the data and Ivan Fursov for providing code for data processing. This work was supported by the federal program “Research and development in priority areas for the development of the scientific and technological complex of Russia for 2014–2020” via grant RFMEFI60619X0008.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Snorovikhina, V., Zaytsev, A. (2021). Unsupervised Anomaly Detection for Discrete Sequence Healthcare Data. In: van der Aalst, W.M.P., et al. Analysis of Images, Social Networks and Texts. AIST 2020. Lecture Notes in Computer Science(), vol 12602. Springer, Cham. https://doi.org/10.1007/978-3-030-72610-2_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72610-2_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72609-6

  • Online ISBN: 978-3-030-72610-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics