Hybrid Approach to Document Anomaly Detection: An Application to Facilitate RPA in Title Insurance

Abstract

Anomaly detection (AD) is an important aspect of various domains and title insurance (TI) is no exception. Robotic process automation (RPA) is taking over manual tasks in TI business processes, but it has its limitations without the support of artificial intelligence (AI) and machine learning (ML). With increasing data dimensionality and in composite population scenarios, the complexity of detecting anomalies increases and AD in automated document management systems (ADMS) is the least explored domain. Deep learning, being the fastest maturing technology can be combined along with traditional anomaly detectors to facilitate and improve the RPAs in TI. We present a hybrid model for AD, using autoencoders (AE) and a one-class support vector machine (OSVM). In the present study, OSVM receives input features representing real-time documents from the TI business, orchestrated and with dimensions reduced by AE. The results obtained from multiple experiments are comparable with traditional methods and within a business acceptable range, regarding accuracy and performance.

This is a preview of subscription content, access via your institution.

References

  1. [1]

    X. D. Xu, H. W. Liu, M. H. Yao. Recent progress of anomaly detection. Complexity, vol. 2019, Article number 2686378, 2019. DOI: https://doi.org/10.1155/2019/2686378.

  2. [2]

    Y. Hao, Z. J. Xu, Y. Liu, J. Wang, J. L. Fan. Effective crowd anomaly detection through spatio-temporal texture analysis. International Journal of Automation and Computing, vol. 16, no. 1, pp. 27–39, 2019. DOI: https://doi.org/10.1007/s11633-018-1141-z.

    Google Scholar 

  3. [3]

    M. Anderka, B. Stein, N. Lipka. Detection of text quality flaws as a one-class classification problem. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM, Glasoow, UK, pp.2313–2316, 2011. DOI: https://doi.org/10.1145/2063576.2063954.

    Google Scholar 

  4. [4]

    Z. G. Ding, D. J. Du, M.R. Fei An isolation principle based distributed anomaly detection method in wireless sensor networks. International Journal of Automation and Computing, vol. 12, no. 4, pp. 402–412, 2015. DOI: https://doi.org/10.1007/s11633-014-0847-9.

    Google Scholar 

  5. [5]

    V. Chandola, A. Banerjee, V. Kumar. Anomaly detection: A survey. ACM Computing Surveys, vol. 41, no. 3, Article number 15, 2009. DOI: https://doi.org/10.1145/1541880.1541882.

  6. [6]

    S. S. Khan, M. G. Madden. One-class classification: Taxonomy of study and review of techniques. The Knowledge Engineering Review, vol. 29, no. 3, pp. 345–374, 2014. DOI: https://doi.org/10.1017/S026988891300043X.

    Google Scholar 

  7. [7]

    M. Kemmler, E. Rodner, E. S. Wacker, J. Denzler. One-class classification with Gaussian processes. Pattern Recognition, vol. 46, no. 12, pp. 3507–3518, 2013. DOI: https://doi.org/10.1016/j.patcog.2013.06.005.

    Google Scholar 

  8. [8]

    Q. Leng, H. G. Qi, J. Miao, W. T. Zhu, G. P. Su. One-class classification with extreme learning machine. Mathematical Problems in Engineering, vol. 2015, Article number 412957, 2015. DOI: https://doi.org/10.1155/2015/412957.

  9. [9]

    P. F. Liang, W. T. Li, H. Tian, J. L. Hu. One-class classification using a support vector machine with a quasi-linear kernel. IEEJ Transactions on Electrical and Electronic Engineering, vol. 14, no. 3, pp. 449–456, 2019. DOI: https://doi.org/10.1002/tee.22826.

    Google Scholar 

  10. [10]

    C. Bellinger, S. Sharma, N. Japkowicz. One-class versus binary classification: Which and when? In Proceedings of the 11th International Conference on Machine Learning and Applications, IEEE, Boca Raton, USA, pp.102–106, 2012. DOI: https://doi.org/10.1109/ICMLA.2012.212.

    Google Scholar 

  11. [11]

    A. Guha, D. Samanta. Real-time application of document classification based on machine learning. In Proceedings of the 1st International Conference on Information, Communication and Computing Technology, Springer, Istanbul, Turkey, pp.366–379, 2020. DOI: https://doi.org/10.1007/978-3-030-38501-9_37.

    Google Scholar 

  12. [12]

    Y. Chen, M. J. Zaki. Kate: K-competitive autoencoder for text. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Halifax, Canada, pp.85–94, 2017. DOI: https://doi.org/10.1145/3097983.3098017.

    Google Scholar 

  13. [13]

    D. Cozzolino, L. Verdoliva. Single-image splicing localization through autoencoder-baeed anomaly detection. In Proceedings of IEEE International Workshop on Information Forensics and Security, IEEE, Abu Dhabi, United Arab Emirates, 2016. DOI: https://doi.org/10.1109/WIFS.2016.7823921.

    Google Scholar 

  14. [14]

    D. Y. Oh, I. D. Yun. Residual error based anomaly detection using auto-encoder in SMD machine sound. Sensors, vol. 18, Article number 1308, 2018. DOI: https://doi.org/10.3390/s18051308.

  15. [15]

    J. Mourao-Miranda, D. R Hardoon, T. Hahn, A. F. Marquand, S. C. R. Williams, J. Shawe-Taylor, M. Brammer. Patient classification as an outlier detection problem: An application of the one-class support vector machine. NeuroImage, vol. 58, no. 3, pp. 793–804, 2011. DOI: https://doi.org/10.1016/j.neuroimage.2011.06.042.

    Google Scholar 

  16. [16]

    L. M. Manevitz, M. Yousef. One-class SVMs for document classification. Journal of Machine Learning Research, vol. 2, no. 1, pp. 139–154, 2001.

    MATH  Google Scholar 

  17. [17]

    T. Sukchotrat, S. B. Kim, F. Tsung. One-class classification-based control charts for multivariate process monitoring. II E Transactions, vol. 42, no. 2, pp. 107–120, 2009. DOI: https://doi.org/10.1080/07408170903019150.

    Google Scholar 

  18. [18]

    P. Perera, V. M. Patel Learning deep features for one-class classification. IEEE Transactions on Image Processing, vol. 28, no. 11, pp. 5450–5463, 2019. DOI: https://doi.org/10.1109/TIP.2019.2917862.

    MathSciNet  MATH  Google Scholar 

  19. [19]

    L. Ruff, R. Vandermeulen, N Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Muller, M. Kloft. Deep one-dass classification. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp.4393–4402, 2018.

  20. [20]

    B. Scholkopf, R. Williamson, A. Smola, J. Shawe-Taylor, J. Platt. Support vector method for novelty detection. In Proceedings of the 12th International Conference on Neural Information Processing Systems, ACM Denver, USA, pp.582–588, 1999.

    Google Scholar 

  21. [21]

    D. M. J. Tax, R. P. W. Duin. Support vector data description. Machine Learning, vol. 54, no. 1, pp. 45–66, 2004. DOI: https://doi.org/10.1023/B:MACH.0000008084.60811.49.

    MATH  Google Scholar 

  22. [22]

    I. Goodfellow, Y. Bengio, A. Courville. Deep Learning, Cambridge, USA: MIT Press, 2016.

    Google Scholar 

  23. [23]

    M. Sakurada, T. Yairi. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2nd Workshop on Machine Learning for Sensory Data Analysis, ACM, Gold Coast, Australia, pp.4–11, 2014. DOI: https://doi.org/10.1145/2689746.2689747.

    Google Scholar 

  24. [24]

    M. Goldstein, S. Uchida. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLOS One, vol. 11, no. 4, Article number e0152173, 2016. DOI: https://doi.org/10.1371/journal.pone.0152173.

  25. [25]

    S. S. Khan, M. G. Madden. A survey of recent trends in one class classification. In Proceedings of the 20th Irish Conference on Artificial Intelligence and Cognitive Science, Springer, Dublin, Ireland, pp.1188–197, 2010. DOI: https://doi.org/10.1007/978-3-642-17080-5_21.

    Google Scholar 

  26. [26]

    V. Mahadevan, W X. Li, V. Bhalodia, N. Vasconcelos. Anomaly detection in crowded scenes. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Francisco, USA, pp.1975–1981, 2010. DOI: https://doi.org/10.1109/CVPR.2010.5539872.

    Google Scholar 

  27. [27]

    W. X. Li, V. Mahadevan, N. Vasconcelos. Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 1, pp. 18–32, 2014. DOI: https://doi.org/10.1109/TPAMI.2013.111.

    Google Scholar 

  28. [28]

    M. Sabokrou, M. Fayyaz, M. Fathy, Z. Moayed, R. Klette. Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Computer Vision and Image Understanding, vol. 172, pp. 88–97, 2018. DOI: https://doi.org/10.1016/j.cviu.2018.02.006.

    MATH  Google Scholar 

  29. [29]

    G. Kim, S. Lee, S. Kim. A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert Systems with Applications, vol. 41, no. 4, pp. 1690–1700, 2014. DOI: https://doi.org/10.1016/j.eswa.2013.08.066.

    MathSciNet  Google Scholar 

  30. [30]

    R. C. Aygun, A. G. Yavuz. Network anomaly detection with stochastically improved autoencoder based models. In Proceedings of the 4th IEEE International Conference on Cyber Security and Cloud Computing, IEEE, New York, USA, pp.193–198, 2017. DOI: https://doi.org/10.1109/CSCloud.2017.39.

    Google Scholar 

  31. [31]

    U. Fiore, F. Palmieri, A. Castiglione, A. De Santis. Network anomaly detection with the restricted Boltzmann machine. Neurocomputing, vol 122, pp 13–23, 2013 DOI: https://doi.org/10.1016/jneucom.2012.11.050

    Google Scholar 

  32. [32]

    W. Li, Q. Du. Collaborative representation for hyperspectral anomaly detection. IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 3, pp. 1463–1474, 2015. DOI: https://doi.org/10.1109/TGRS.2014.2343955.

    Google Scholar 

  33. [33]

    P. Papadimitriou, A. Dasdan, H. Garcia-Molina. Web graph similarity for anomaly detection. Journal of Internet Services and Applications, vol. 1, no. 1, pp. 19–30, 2010. DOI: https://doi.org/10.1007/s13174-010-0003-x.

    Google Scholar 

  34. [34]

    C. W. Ten, J. B. Hong, C. C. Liu. Anomaly detection for cybersecurity of the substations. IEEE Transactions on Smart Grid, vol. 2, no. 4, pp. 865–873, 2011. DOI: https://doi.org/10.1109/TSG.2011.2159406.

    Google Scholar 

  35. [35]

    S. Ahmad, A. Lavin, S. Purdy, Z. Agha. Unsupervised real-time anomaly detection for streaming data. Neurocomputing, vol. 262, pp. 134–147, 2017. DOI: https://doi.org/10.1016/j.neucom.2017.04.070.

    Google Scholar 

  36. [36]

    T. Schlegl, P. Seebock, S. M. Waldstein, U. Schmidt-Erfurth, G. Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Proceedings of the 25th Internationa Conference on Information Processing in Medical Imaging, Springer, Boone, USA, pp.146–157, 2017. DOI: https://doi.org/10.1007/978-3-319-59050-9_12.

    Google Scholar 

  37. [37]

    M. Du, F. F. Li, G. N. Zheng, V. Srikumar. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of ACM SIGSAC Conference on Computer and Communications Security, ACM, Dallas, USA, pp.1285–1298, 2017. DOI: https://doi.org/10.1145/3133956.3134015.

    Google Scholar 

  38. [38]

    H. M. Lu, Y. J. Li, S. L. Mu, D. Wang, H. Kim, S. Serikawa. Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet of Things Journal, vol. 5, no. 4, pp. 2315–2322, 2018. DOI: https://doi.org/10.1109/JIOT.2017.2737479.

    Google Scholar 

  39. [39]

    P. V. Bindu, P. S. Thilagam. Mining social networks for anomalies: Methods and challenges. Journal of Network and Computer Applications, vol. 68, pp. 213–229, 2016. DOI: https://doi.org/10.1016/j.jnca.2016.02.021.

    Google Scholar 

  40. [40]

    W. Z. Yan, L. J. Yu. On accurate and reliable anomaly detection for gas turbine combustors: A deep learning approach. https://arxiv.org/abs/1908.09238, 2019.

  41. [41]

    R. M. Alguliyev, R. M. Aliguliyev, Y. N. Imamverdiyev, L. V. Sukhostat. An anomaly detection based on optimization. International Journal of Intelligent Systems and Applications, vol. 9, no. 12, pp. 87–96, 2017. DOI: https://doi.org/10.5815/ijisa.2017.12.08.

    Google Scholar 

  42. [42]

    M. H. Hassoun. Fundamentals of Artificial Neural Networks, Cambridge, USA: MIT Press, 1995.

    Google Scholar 

  43. [43]

    M. D. Tissera, M. D. McDonnell. Deep extreme learning machines: Supervised autoencoding architecture for classification. Neurocomputing, vol. 174, pp. 42–49, 2016. DOI: https://doi.org/10.1016/j.neucom.2015.03.110.

    Google Scholar 

  44. [44]

    R. Chalapathy, A. K. Menon, S. Chawla. Anomaly detection using one-class neural networks. https://arxiv.org/abs/1802.06360, 2018.

  45. [45]

    P. Oza, V. M. Patel. Active authentication using an autoencoder regularized CNN-based one-class classifier. In Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, IEEE, Lille, France, pp.1–8, 2019. DOI: https://doi.org/10.1109/FG.2019.8756525.

    Google Scholar 

  46. [46]

    S. M. Erfani, S. Rajasegarar, S. Karunasekera, C. Leckie. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, vol 58, pp. 121–134, 2016. DOI: https://doi.org/10.1016/j.patcog.2016.03.028

    Google Scholar 

  47. [47]

    J. An, S. Cho. Variational autoencoder based anomaly detection using reconstruction probability, Technical Report, SNU Data Mining Center, Korea, 2015.

    Google Scholar 

  48. [48]

    W. Li, G. D. Wu, Q. Du. Transferred deep learning for anomaly detection in hyperspectral imagery. IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 5, pp. 597–601, 2017. DOI: https://doi.org/10.1109/LGRS.2017.2657818.

    Google Scholar 

  49. [49]

    B. R. Kiran, D. M. Thomas, R. Parakkal. An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. Journal of Imaging, vol. 4, no. 2, Article number 36, 2018. DOI: https://doi.org/10.3390/jimaging4020036.

  50. [50]

    T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, M. Ghogho. Deep learning approach for network intrusion detection in software defined networking. In Proceedings of International Conference on Wireless Networks and Mobile Communications, IEEE, Fez, Morocco, pp.258–263, 2016. DOI: https://doi.org/10.1109/WINCOM.2016.7777224.

    Google Scholar 

  51. [51]

    V. L. Cao, M. Nicolau, J. McDermott. A hybrid autoencoder and density estimation model for anomaly detection. In Proceedings of the International Conference on Parallel Problem Solving from Nature, Springer, Edinburgh, UK, pp.717–726, 2016. DOI: https://doi.org/10.1007/978-3-319-45823-6_67.

    Google Scholar 

  52. [52]

    H. L. Yu, D. Sun, X. Y. Xi, X. B. Yang, S. Zheng, Q. Wang. Fuzzy one-class extreme auto-encoder. Neural Processing Letters, vol. 50, no. 1, pp. 701–727, 2049. DOI: https://doi.org/10.1007/s11063-018-9952-z.

    Google Scholar 

  53. [53]

    D. Zimmerer, S. A. A. Kohl, J. Petersen, F. Isensee, K. H. Maier-Hein. Context-encoding variational autoencoder for unsupervised anomaly detection. htpps://arxiv.org/abs/1812.05941, 2018.

  54. [54]

    M. Jeragh, M. AlSulaimi. Combining auto encoders and one class support vectors machine for fraudulant credit card transactions detection. In Proceedings of the 2nd World Conference on Smart Trends in Systems, Security and Sustainability, IEEE, London, UK, pp.178–184, 2018. DOI: https://doi.org/10.1109/WorldS4.2018.8611624.

    Google Scholar 

  55. [55]

    Y. S. Chong, Y. H. Tay. Abnormal event detection in videos using spatiotemporal autoencoder. In Proceedings of the 14th International Symposium on Neural Networks, Springer, Sapporo, Japan, pp.189–196, 2017. DOI: https://doi.org/10.1007/978-3-319-59081-3_23.

    Google Scholar 

  56. [56]

    M. Amer, M. Goldstein, S. Abdennadher. Enhancing one-class support vector machines for unsupervised anomaly detection. In Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, ACM, Chicago, USA, pp.8–15, 2013. DOI: https://doi.org/10.1145/2500853.2500857.

    Google Scholar 

  57. [57]

    Y. C. Xiao, H. G. Wang, L. Zhang, W. L. Xu. Two methods of selecting Gaussian kernel parameters for one-class SVM and their application to fault detection. Knowledge-Based Systems, vol. 59, pp. 75–84, 2014. DOI: https://doi.org/10.1016/j.knosys.2014.01.020.

    Google Scholar 

  58. [58]

    I. Irigoien, B. Sierra, C. Arenas. Towards application of one-class classification methods to medical data. The Scientific World Journal, vol. 2014, Article number 730712, 2014. DOI: https://doi.org/10.1155/2014/730712.

  59. [59]

    H. Yu. SVMC: Single-class classification with support vector machines. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, ACM, Acapulco, Mexico, pp.567–572, 2003.

    Google Scholar 

  60. [60]

    M. Hejazi, Y. P. Singh. One-class support vector machines approach to anomaly detection. Applied Artificial Intelligence, vol. 27, no. 5, pp. 351–366, 2013. DOI: https://doi.org/10.1080/08839514.2013.785791.

    Google Scholar 

  61. [61]

    W. Khreich, B. Khosravifar, A. Hamou-Lhadj, C. Talhi. An anomaly detection system based on variable N-gram features and one-class SVM. Information and Software Technology, vol. 91, pp. 186–197, 2017. DOI: https://doi.org/10.1016/j.infsof.2017.07.009.

    Google Scholar 

  62. [62]

    C. Gautam, R. Balaji, K. Sudharsan, A. Tiwari, K. Ahuja. Localized multiple kernel learning for anomaly detection: One-class classification. Knowledge-based Systems, vol. 165, pp. 241–252, 2019. DOI: https://doi.org/10.1016/j.knosys.2018.11.030.

    Google Scholar 

  63. [63]

    B. Krawczyk, M. Wozniak, B. Cyganek. Clustering-based ensembles for one-class classification. Information Sciences, vol. 264, pp. 182–195, 2014. DOI: https://doi.org/10.1016/j.ins.2013.12.019.

    MathSciNet  MATH  Google Scholar 

  64. [64]

    D. M. J. Tax, K. R. Muller. Feature extraction for one-class classification. In Proceedings of Joint International Conference ICANN/ICONIP, Istanbul, Turbey, pp.342–349, 2003. DOI: https://doi.org/10.1007/3-540-44989-2_41.

  65. [65]

    Y. Goldberg, O. Levy. word2vec explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. https://arxiv.org/abs/1402.3722, 2014.

  66. [66]

    L. Van Der Maaten, G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.

    MATH  Google Scholar 

  67. [67]

    E. Mayoraz, E. Alpaydin. Support vector machines for multi-class classification. In Proceedings of the International Work-conference on Artificial Neural Networks, Springer, Alicante, Spain, pp.833–842, 1999. DOI: https://doi.org/10.1007/BFb0100551.

    Google Scholar 

  68. [68]

    C. Zhou, R. C. Paffenroth. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Halifax, Canada, pp.665–674, 2017. DOI: https://doi.org/10.1145/3097983.3098052.

    Google Scholar 

  69. [69]

    L. Manevitz, M. Yousef. One-class document classification via neural networks. Neurocomputing, vol. 70, no. 7–9, pp. 1466–1481, 2007. DOI: https://doi.org/10.1016/j.neucom.2006.05.013.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Abhijit Guha.

Additional information

Abhijit Guha received the B. Sc. degree (Chemistry Honors) from Calcutta University, India in 2006, and MCA (master of computer applications) degree in computer applications degree from Academy of Technology under West Bengal University of Technology, India 2009. He is a Ph. D. degree candidate in Department of Data Science, CHRIST (Deemed to be University), India. Presently, he is working as a research and development scientist in First American India Private Limited. His research areas include document image processing, data mining, statistical modeling, machine learning modelling in title insurance domain. He has delivered multiple business solutions using the AI technologies and received consecutive three “Innovation of the year” awards from 2015 to 2017 by First American India for his contribution towards his research.

His research interests include artificial intelligence, natural language processing, text mining statistical learning and machine learning.

Debabrata Samanta received the B. Sc. degree (Physics Honors) from Calcutta University, India in 2007, and MCA degree from Academy of Technology under West Bengal University of Technology, India in 2010, and the Ph. D. degree in computer science and engineering from National Institute of Technology, India in 2014. In 2015, he was a faculty member at Day-ananda Sagar University, India and in 2019 he was at CHRIST (Deemed to be University), India. Currently, he is an assistant professor in Department of Computer Science at CHRIST (Deemed to be University), India. He is a professional IEEE member, an associate life member of Computer Society of India (CSI) and a life member of Indian Society for Technical Education (ISTE). He has authored and coauthored over 127 papers in SCI/Scopus/Springer/Elsevier journals and IEEE/Springer/Elsevier conference proceedings in areas of artificial intelligence, natural language processing and image processing. He has received “Scholastic Award” at the 2nd International conference on Computer Science and IT Application, CSIT-2011, India. He has published 9 books, available for sale on Amazon and Flipkart. He has edited 1 book available on Google Book server. He has authored and coauthored of 2 Elsevier and 5 Springer Book Chapter. He is a convener, keynote speaker, technical programme committee (TPC) member in various conferences/workshops, etc. He was an invited speaker at several Institutions.

His research interests include artificial intelligence, natural language processing and image processing.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Guha, A., Samanta, D. Hybrid Approach to Document Anomaly Detection: An Application to Facilitate RPA in Title Insurance. Int. J. Autom. Comput. 18, 55–72 (2021). https://doi.org/10.1007/s11633-020-1247-y

Download citation

Keywords

  • Anomaly detection
  • title insurance
  • autoencoder
  • one-class support vector machine (OSVM)
  • term frequency — inverse document frequency (TF-IDF)
  • robotic process automation
  • dimensionality reduction