Skip to main content
Log in

Outlier based intrusion detection in databases for user behaviour analysis using weighted sequential pattern mining

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

With the rise of traffic over wide networks, particularly the internet, and the cloud-based transactions and interactions, database security is important for any organisation. The detection of, and protection from, unauthorised external attacks and insiders abusing privileges is an integral part of database security. To that end, we propose Outlier based Intrusion Detection in Databases for User Behaviour Analysis using Weighted Sequential Pattern Mining (BWSPM), a novel method for the detection of malicious transactions through a sequential flow from outlier detection followed by different behavioural checks at the role-based rule mining component, and finally a user level behavioural check. In the worst case, a transaction has to go through a triple-fold security validation directing the model from generalisation to specification. The Outlier Detection module generates clusters based on the syntactic characteristics of transactions and detects transactions that do not adhere to their closest cluster. Role-level analysis is based upon mining rules that capture dynamic usage of attributes local to every role domain, and the transactions are verified against these rules. Finally, User behaviour profiling models user behaviour based on past transactions, and the incoming transaction is flagged if it diverges from that. Security checks are made at every level to prevent further transaction analysis to reduce false positive rate and achieve a higher degree of optimisation. Encouraging results, with levels of accuracy of around 86.4%, were obtained through our approach after conducting experiments on a dataset generated using the TPC-C (Transaction Processing Performance Council) benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Algorithm 2
Fig. 3
Algorithm 3
Algorithm 4
Fig. 4
Algorithm 5
Algorithm 6
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets generated and/or analysed during the current study are available in http://www.tpc.org/tpcc/default.asp.

References

  1. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp 3–14

  2. Agrawal Rakesh, Srikant Ramakrishnan, et al. (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, vol 1215, pp 487–499

  3. Alzubi OA, Qiqieh I, Alzubi JA (2023) Fusion of deep learning based cyberattack detection and classification model for intelligent systems. Cloud Comput 2:1363–1374

    Google Scholar 

  4. Alzubi Omar, Alzubi Jafar, Alazab Moutaz, Alrabea Adnan, Awajan Albara, Qiqieh Issa (2022) Optimized machine learning-based intrusion detection system for fog and edge computing environment. Electronics 11:3007

    Article  Google Scholar 

  5. Bergroth L, Hakonen H, Raita T (2000) A survey of longest common subsequence algorithms. In: Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000, pp 39–48

  6. Bertino Elisa, Sandhu R (2005) Database security - concepts, approaches. IEEE Trans Dependable Secure Comput 2:2–19

    Article  Google Scholar 

  7. Bezdek James C, Ehrlich Robert, Full William (1984) The fuzzy c-means clustering algorithm. FCM Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  8. Seok-Jun Bu, Cho Sung-Bae (2020) A convolutional neural-based learning classifier system for detecting database intrusion via insider attack. Inform Sci 512:123–136

    Article  Google Scholar 

  9. Seok-Jun Bu, Kang Han-Bit, Cho Sung-Bae (2022) Ensemble of deep convolutional learning classifier system based on genetic algorithm for database intrusion detection. Electronics 11(5)

  10. Cai Jinyu, Fan Jicong (2022) Perturbation learning based anomaly detection

  11. Dawn Cappelli, Andrew Moore, Randall Trzeciak (2012) The CERT Guide to Insider Threats: How to Prevent, Detect, and Respond to Information Technology Crimes (Theft, Sabotage, Fraud)

  12. Cárdenas Alvaro A, Amin Saurabh, Lin Zong-Syun, Huang Yu-Lun, Huang Chi-Yen, Sastry Shankar (2011) Attacks against process control systems: risk assessment, detection, and response. In: Proceedings of the 6th ACM symposium on information, computer and communications security, pp 355–366

  13. Chen Ming-Syan, Han Jiawei, Philip SYu (1996) Data mining: an overview from a database perspective. IEEE Trans Knowledge Data Eng 8(6):866–883

    Article  Google Scholar 

  14. Chung Christina Yip, Gertz Michael, Levitt Karl (1999) Demids: A misuse detection system for database systems. In: Working Conference on Integrity and Internal Control in Information Systems. Springer, pp 159–178

  15. Debar Hervé, Dacier Marc, Wespi Andreas (1999) Towards a taxonomy of intrusion-detection systems. Computer Networks 31(8):805–822

    Article  Google Scholar 

  16. Denning DE (1987) An intrusion-detection model. IEEE Trans Software Eng SE–13(2):222–232

    Article  Google Scholar 

  17. Ferraiolo David, Sandhu Ravi, Serban Gavrila D, Kuhn Ramaswamy Chandramouli (2001) Proposed nist standard for role based access control. ACM Trans Inf Syst Secur 4:224–274

    Article  Google Scholar 

  18. Yang-Geng Fu, Ye Ji-Feng, Yin Ze-Feng, Chen Long-Jiang, Wang Ying-Ming, Liu Geng-Geng (2021) Construction of ebrb classifier for imbalanced data based on fuzzy c-means clustering. Knowledge Based Syst 234:107590

    Article  Google Scholar 

  19. Ge Jiaqi, Xia Yuni, Wang Jian, Hewa Nadungodage Chandima, Prabhakar Sunil (2017) Sequential pattern mining in databases with temporal uncertainty. Knowledge Inform Syst 51(3):821–850

    Article  Google Scholar 

  20. Gondree Mark, Mohassel Payman (2009) Longest common subsequence as private search 81–90

  21. Guorui Feng, Xinguo Zou, Jian Wu (2012) Intrusion detection based on the semi-supervised fuzzy c-means clustering algorithm. In: 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet). IEEE, pp 2667–2670

  22. Harish BS, Aruna Kumar SV (2017) Anomaly based intrusion detection using modified fuzzy clustering

  23. Hashemi Sattar, Yang Ying, Zabihzadeh Davoud, Kangavari Mohammadreza (2008) Detecting intrusion transactions in databases using data item dependencies and anomaly analysis. Expert Syst 25(5):460–473

    Article  Google Scholar 

  24. Heady R, Luger George, Maccabe Arthur, Servilla Mark (1990) The architecture of a network level intrusion detection system

  25. Hu Yi, Panda Brajendra (2003) Identification of malicious transactions in database systems 329–335

  26. Hu Yi, Panda Brajendra (2004) A data mining approach for database intrusion detection 711–716

  27. Hung Ming-Chuan, Yang Don-Lin (2001) An efficient fuzzy c-means clustering algorithm. In: Proceedings 2001 IEEE International Conference on Data Mining. IEEE, pp 225–232

  28. Kalid Suraya Nurain, Ng Keng-Hoong, Tong Gee-Kok, Khor Kok-Chin (2020) A multiple classifiers system for anomaly detection in credit card data with unbalanced and overlapped classes. IEEE Access 8:28210–28221

    Article  Google Scholar 

  29. Kamra Ashish, Terzi Evimaria, Bertino Elisa (2008) Detecting anomalous access patterns in relational databases. VLDB J 17(5):1063–1077

    Article  Google Scholar 

  30. Khan Muhammad Imran, O’Sullivan Barry, Foley Simon N (2017) A semantic approach to frequency based anomaly detection of insider access in database management systems. In: International Conference on Risks and Security of Internet and Systems. Springer, pp 18–28

  31. Kim Tae-Young, Cho Sung-Bae (2019) Cnn-lstm neural networks for anomalous database intrusion detection in rbac-administered model. In: International Conference on Neural Information Processing. Springer, pp 131–139

  32. Kim Tae Young, Cho Sung Bae (2021) Optimizing cnn-lstm neural networks with pso for anomalous query access control. Neurocomputing 456:666–677

    Article  Google Scholar 

  33. Kundu Amlan, Sural Shamik, Majumdar Arun K (2010) Database intrusion detection using sequence alignment. Int J inform Secur 9(3):179–191

    Article  Google Scholar 

  34. Lan Guo-Cheng, Hong Tzung-Pei, Lee Hong-Yu (2014) An efficient approach for finding weighted sequential patterns from sequence databases. Appl Intellig 41:439–452

    Article  Google Scholar 

  35. Liao Hung-Jen, Lin Chun-Hung Richard, Lin Ying-Chih, Tung Kuang-Yuan (2013) Intrusion detection system: a comprehensive review. J Network Comput Appl 36(1):16–24

    Article  Google Scholar 

  36. Martín Alejandro G, Beltrán Marta, Fernández-Isabel Alberto, de Diego Isaac Martín (2021) An approach to detect user behaviour anomalies within identity federations. Comput Secur 108:102356

    Article  Google Scholar 

  37. Navarro Gonzalo (2001) A guided tour to approximate string matching. ACM Comput Surveys (CSUR) 33(1):31–88

    Article  Google Scholar 

  38. Alzubi Jafar A, Alzubi Omar A, Qiqieh Issa (2023) Fusion of deep learning based cyberattack detection and classification model for intelligent systems. Clust Comput 1363–1374

  39. Panigrahi Suvasini, Sural Shamik, Majumdar Arun (2013) Two-stage database intrusion detection by combining multiple evidence and belief update. Inform Syst Front 15:1–19

    Article  Google Scholar 

  40. Rahman Md Mahmudur, Ahmed Chowdhury F, Leung Carson K, Pazdor Adam GM (2018) Frequent sequence mining with weight constraints in uncertain databases. In: Proceedings of the 12th International Conference on Ubiquitous Information Management and Communication, pp 1–8

  41. Rahman Md Mahmudur, Ahmed Chowdhury Farhan, Leung Carson Kai-Sang (2019) Mining weighted frequent sequences in uncertain databases. Inform Sci 479:76–100

    Article  Google Scholar 

  42. Ranjan Rohit, Kumar Shashi Shekhar (2022) User behaviour analysis using data analytics and machine learning to predict malicious user versus legitimate user. High-Confidence Comput 2(1):100034

    Article  MathSciNet  Google Scholar 

  43. Ronao Charissa Ann, Cho Sung-Bae (2016) Anomalous query access detection in rbac-administered databases with random forest and PCA. Inform Sci 369:238–250

    Article  Google Scholar 

  44. Roy Kashob Kumar, Moon Md Hasibul Haque, Rahman Md Mahmudur, Ahmed Chowdhury Farhan, Leung Carson Kai-Sang (2022) Mining weighted sequential patterns in incremental uncertain databases. Inform Sci 582:865–896

    Article  Google Scholar 

  45. Sallam Asmaa, Bertino Elisa (2019) Result-based detection of insider threats to relational databases. In: Proceedings of the ninth ACM conference on data and application security and privacy, pp 133–143

  46. Singh I, Sareen S, Ahuja H (2017) Detection of malicious transactions in databases using dynamic sensitivity and weighted rule mining. In: 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp 1–8

  47. Srikant Ramakrishnan, Agrawal Rakesh (1996) Mining sequential patterns: Generalizations and performance improvements. In: International Conference on Extending Database Technology. Springer, pp 1–17

  48. Srivastava Abhinav, Sural Shamik, Majumdar Arun (2006) Database intrusion detection using weighted sequence mining. JCP 1:8–17

    Google Scholar 

  49. Subudhi Sharmila, Panigrahi Suvasini (2019) Application of optics and ensemble learning for database intrusion detection. J King Saud University-Comput Inform Sci

  50. Sun Xiaobing, Wenjie Feng, Liu Shenghua, Xie Yuyang, Bhatia Siddharth, Hooi Bryan, Wang Wenhan, Cheng Xueqi (2022) Monlad: Money laundering agents detection in transaction streams 976–986

  51. Sun Yuqing, Haoran Xu, Bertino Elisa, Sun Chao (2016) A data-driven evaluation for insider threats. Data Sci Eng 1:07

    Article  Google Scholar 

  52. TPC. Tpc-c benchmark. http://www.tpc.org/tpcc/, Last Accessed = 20-01-01, 1992

  53. Wang Weina, Zhang Yunjie, Li Yi, Zhang Xiaona (2006) The global fuzzy c-means clustering algorithm. In: 2006 6th World Congress on Intelligent Control and Automation, vol 1. IEEE, pp 3604–3607

  54. Wang Yazi, Liang Yingbo, Sun Huaibo, Ma Yuankun (2020) Intrusion detection and performance simulation based on improved sequential pattern mining algorithm. Cluster Comput 23(3):1927–1936

    Article  Google Scholar 

  55. Yang Yinghui Catherine (2010) Web user behavioral profiling for user identification. Decis Support Syst 49(3):261–271

    Article  Google Scholar 

  56. Yun Unil, Leggett John J (2005) Wfim: weighted frequent itemset mining with a weight range and a minimum weight. In: Proceedings of the 2005 SIAM international conference on data mining, vol 26. SIAM, pp 636–640

  57. Zhang Zhong-Ping, Shi Ming-Yue, Liu Cong, Qiu Jing-Yang, Qi Jie (2019) Fast local outlier detection algorithm using k kernel space. J Comput Methods Sci Eng 19(3):751–764

    Google Scholar 

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

IS: Conceptualization, Methodology, Writing, Validation, Review & Editing. RJ: Conceptualization, Supervision, Validation, Review & Editing

Corresponding author

Correspondence to Indu Singh.

Ethics declarations

Conflict of interest

Authors declare that we have no conflict of interest. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed Consent

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, I., Jindal, R. Outlier based intrusion detection in databases for user behaviour analysis using weighted sequential pattern mining. Int. J. Mach. Learn. & Cyber. (2023). https://doi.org/10.1007/s13042-023-02049-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13042-023-02049-4

Keywords

Navigation