Abstract
Database security is pertinent to every organisation with the onset of increased traffic over large networks especially the internet and increase in usage of cloud based transactions and interactions. Greater exposure of organisations to the cloud implies greater risks for the organisational as well as user data. In this paper, we propose a novel approach towards database intrusion detection systems (DIDS) based on Expectation maximization Clustering and Sequential Pattern Mining (EMSPM). This approach unlike any other does not have records and assumes a predetermined policy to be maintained in an organisational database and can operate seamlessly on databases that follow Role Based Access Control as well as on those which do not conform to any such access control and restrictions. This is achieved by focusing on pre-existing logs for the database and using the Expectation maximization clustering algorithm to allot role profiles according to the database user’s activities. These clusters and patterns are then processed into an algorithm that prevents generation of unwanted rules followed by prevention of malicious transactions. Assessment into the accuracy of EMSPM over sets of synthetically generated transactions yielded propitious results with accuracies over 93%.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data base, VLDB, vol 1215, pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, pp 3–14
Assaad HE, Samé A, Govaert G, Aknin P (2016) A variational expectation–maximization algorithm for temporal data clustering. Comput Stat Data Anal 103:206–228
Bertino E, Sandhu R (2005) Database security-concepts, approaches, and challenges. In: IEEE Transactions on Dependable and secure computing 2.1, pp 2–19
Bertino E, Terzi E, Kamra A, Vakali A (2005) Intrusion detection in RBAC-administered databases. In: 21st Annual computer security applications conference (AC-SAC’05), IEEE, 10–pp
Bilmes JA et al (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. In: International computer science institute 4.510, p 126
Bu S-J, Cho S-B (2020) A convolutional neural-based learning classiffier system for detecting database intrusion via insider attack. Inf Sci 512 :123–136
Cappelli DM, Moore AP, Trzeciak RF (2012) The CERT guide to insider threats: how to prevent, detect, and respond to information technology crimes (Theft Sabotage Fraud). Addison-Wesley
Cárdenas AA, Amin S, Lin Z-S, Huang Y-L, Huang C-Y, Sastry S (2011) Attacks against process control systems: risk assessment, detection, and response. In: Proceedings of the 6th ACM symposium on information, computer and communications security, pp 355–366
Chen M-S, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8.6:866–883
Chung CY, Gertz M, Levitt K (1999) Demids: A misuse detection system for database systems. In: Working conference on integrity and internal control in information systems, Springer, pp 159–178
Corney MW, Mohay GM, Clark AJ (2011) Detection of anomalies from user profiles generated from system logs. In: Conferences in research and practice in information technology (CRPIT). vol. 116, Australian Computer Society, Inc. pp 23–32
Debar H, Dacier M, Wespi A (1999) Towards a taxonomy of intrusion-detection systems. Comput Netw 31.8:805–822
Dempster AP, Laird NM , Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B Methodol 39.1:1–22
Denning DE (1987) An intrusion-detection model. IEEE Trans Softw Eng 2:222–232
Do CB, Batzoglou S (2008) What is the expectation maximization algorithm?. Nature biotechnol 26.8:897–899
Doroudian M, Shahriari HR (2014) A hybrid approach for database intrusion detection at transaction and inter-transaction levels. In: 2014 6th Conference on information and knowledge technology (IKT), IEEE, pp 1–6
Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M-C (2000) FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. pp 355–359
Han J, Pei J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering. Citeseer, pp 215–224
Hashemi S, Yang Y, Zabihzadeh D, Kangavari M (2008) Detecting intrusion transactions in databases using data item dependencies and anomaly analysis. Expert Syst 25.5:460–473
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media
Heady R, Luger G, Maccabe A, Servilla M (1990) The architecture of a network level intrusion detection system. Tech. rep. Los Alamos National Lab., NM (United States); New Mexico Univ. Albuquerque...
Hoglund AJ, Hatonen K, Sorvari AS (2000) A computer host-based user anomaly detection system using the self-organizing map. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. neural computing: new challenges and perspectives for the new millennium. vol. 5. IEEE, pp 411–416
Kamra A, Terzi E, Bertino E (2008) Detecting anomalous access patterns in relational databases. VLDB J 17.5:1063–1077
Kim T-Y, Cho S-B (2019) CNN-LSTM neural networks for anomalous database intrusion detection in RBAC-administered model. In: International conference on neural information processing, Springer, pp 131–139
Kuang F-J, Zhang S-Y (2017) A Novel Network Intrusion Detection Based on Support Vector Machine and Tent Chaos Artificial Bee Colony Algorithm. J Netw Intell 2.2:195–204
Lan G-C, Hong T-P, Lee H-Y (2014) An efficient approach for finding weighted sequential patterns from sequence databases. Appl Intell 41.2:439–452
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversal. Soviet Physics doklady 10. 8.:707–710
Liao H-J, Lin C-HR, Lin Y-C, Tung K-Y (2013) Intrusion detection system: A comprehensive review. J Netw Comput Appl 36.1:16–24
Lin JC-W, Fournier-Viger P, Koh YS, Kiran RU, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1.1:54–77
Liu P-Y, Gong W, Jia X (2011) An improved prefixspan algorithm research for sequential pattern mining. In: 2011 IEEE international symposium on IT in medicine and education. vol. 1, IEEE, pp 103–108
Lunt TF, Tamaru A, Gillham F (1992) A real-time intrusion-detection expert system (IDES). SRI International Computer Science Laboratory
Luo C, Chung SM (2005) Efficient mining of maximal sequential patterns using multiple samples. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 415–426
Mazzawi H, Dalal G, Rozenblatz D, Ein-Dorx L, Niniox M, Lavi O (2017) Anomaly detection in large databases using behavioral patterning. In: 2017 IEEE 33rd international conference on data engineering (ICDE). IEEE, pp 1140–1149
McLachlan GJ, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, New York
Mitra P, Pal SK, Siddiqi MA (2003) Non-convex clustering using expectation maximization algorithm with rough set initialization. Pattern Recogn Lett 24.6:863–873
Neal RM, Hinton GE (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Learning in graphical models, Springer, pp 355–368
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48.3:443–453. issn: 0022-2836. https://doi.org/10.1016/0022-2836(70)90057-4). http://www.sciencedirect.com/science/article/pii/0022283670900574
Ordonez C, Omiecinski E (2002) FREM: fast and robust EM clustering for large data sets. In: Proceedings of the eleventh international conference on Information and knowledge management, pp 590–599
Panigrahi S, Sural S, Majumdar AK (2013) Two-stage database intrusion detection by combining multiple evidence and belief update. Inform Syst Front 15.1:35–53
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans Knowl Data Eng 16.11:1424–1440
Rahman MM, Ahmed CF, Leung CK-S (2019) Mining weighted frequent sequences in uncertain databases. Inf Sci 479 :76–100
Rahman MM, Ahmed CF, Leung CK, Pazdor AGM (2018) Frequent sequence mining with weight constraints in uncertain databases. In: Proceedings of the 12th international conference on ubiquitous information management and communication, pp 1–8
Rashid T, Agrafiotis I, Nurse JRC (2016) A new take on detecting insider threats: exploring the use of hidden markov models. In: Proceedings of the 8th ACM CCS international workshop on managing insider security threats, pp 47–56
Ronao CA, Cho S-B (2016) Anomalous query access detection in RBAC-administered databases with random forest and PCA. Inf Sci 369:238–250
Sallam A, Bertino E (2019) Result-based detection of insider threats to relational databases. In: Proceedings of the ninth ACM conference on data and application security and privacy, pp 133–143
Sallam A, Fadolalkarim D, Bertino E, Xiao Q (2016) Data and syntax centric anomaly detection for relational databases. In: Wiley interdisciplinary reviews: data mining and knowledge discovery 6.6, pp 231–239
Sandhu RS, Coyne EJ, Feinstein HL, Youman CE (1996) Role-based access control models. Computer 29.2:38–47
Sandhu R, Ferraiolo D, Kuhn R et al (2000) The NIST model for role-based access control: towards a unified standard. In: ACM workshop on Role-based access control. Vol. 10. 344287.344301
Shirkhorshidi AS, Aghabozorgi S, Wah TY (2015) A comparison study on similarity and dissimilarity measures in clustering continuous data, PloS one 10.12
Shou Z, Di X (2018) Similarity analysis of frequent sequential activity pattern mining. Trans Res Part C Emerg Technol 96:122–143
Smith TF, Waterman MS et al (1981) Identification of common molecular subsequences. J Mol Biol 147.1:195–197
Sohrabi M, Javidi MM, Hashemi S (2014) Detecting intrusion transactions in database systems: a novel approach. J Intell Inf Syst 42.3:619–644
Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: International conference on extending database technology, Springer, pp 1–17
Srivastava A, Sural S, Majumdar AK (2006) Database intrusion detection using weighted sequence mining. J Comput 1.4:8–17
Subudhi S, Panigrahi S (2019) Application of OPTICS and ensemble learning for database intrusion detection. In: Journal of king saud university-computer and information sciences
Talpade R, Kim G, Khurana S (1999) NOMAD: Traffic-based network monitoring framework for anomaly detection. In: Proceedings IEEE international symposium on computers and communications (Cat. No. PR00250). IEEE, pp 442–451
TPC-C Benchmark. http://www.tpc.org/tpcc/default.asp
Yi H, Brajendra P (2003) Identification of malicious transactions in database systems. In: Seventh international database engineering and applications symposium, 2003 Proceedings. IEEE, pp 329–335.
Yi H, Brajendra P (2004) A data mining approach for database intrusion detection. In: Proceedings of the 2004 ACM symposium on applied computing, pp 711–716
Yip RW, Levitt EN (1998) Data level inference detection in database systems. In: Proceedings. 11th IEEE computer security foundations workshop (Cat. No. 98TB100238). IEEE, pp 179–189
Zahedeh Z, Feizollah A, Anuar NB, Kiah LBM, Srikanth K, Kumar S (2019) User profiling in anomaly detection of authorization logs. In: Computational science and technology. Springer, pp 59–65
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, I., Jindal, R. Expectation maximization clustering and sequential pattern mining based approach for detecting intrusive transactions in databases. Multimed Tools Appl 80, 27649–27681 (2021). https://doi.org/10.1007/s11042-021-10786-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10786-3