Skip to main content
Log in

Expectation maximization clustering and sequential pattern mining based approach for detecting intrusive transactions in databases

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Database security is pertinent to every organisation with the onset of increased traffic over large networks especially the internet and increase in usage of cloud based transactions and interactions. Greater exposure of organisations to the cloud implies greater risks for the organisational as well as user data. In this paper, we propose a novel approach towards database intrusion detection systems (DIDS) based on Expectation maximization Clustering and Sequential Pattern Mining (EMSPM). This approach unlike any other does not have records and assumes a predetermined policy to be maintained in an organisational database and can operate seamlessly on databases that follow Role Based Access Control as well as on those which do not conform to any such access control and restrictions. This is achieved by focusing on pre-existing logs for the database and using the Expectation maximization clustering algorithm to allot role profiles according to the database user’s activities. These clusters and patterns are then processed into an algorithm that prevents generation of unwanted rules followed by prevention of malicious transactions. Assessment into the accuracy of EMSPM over sets of synthetically generated transactions yielded propitious results with accuracies over 93%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data base, VLDB, vol 1215, pp 487–499

  2. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, pp 3–14

  3. Assaad HE, Samé A, Govaert G, Aknin P (2016) A variational expectation–maximization algorithm for temporal data clustering. Comput Stat Data Anal 103:206–228

    Article  MathSciNet  Google Scholar 

  4. Bertino E, Sandhu R (2005) Database security-concepts, approaches, and challenges. In: IEEE Transactions on Dependable and secure computing 2.1, pp 2–19

  5. Bertino E, Terzi E, Kamra A, Vakali A (2005) Intrusion detection in RBAC-administered databases. In: 21st Annual computer security applications conference (AC-SAC’05), IEEE, 10–pp

  6. Bilmes JA et al (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. In: International computer science institute 4.510, p 126

  7. Bu S-J, Cho S-B (2020) A convolutional neural-based learning classiffier system for detecting database intrusion via insider attack. Inf Sci 512 :123–136

    Article  Google Scholar 

  8. Cappelli DM, Moore AP, Trzeciak RF (2012) The CERT guide to insider threats: how to prevent, detect, and respond to information technology crimes (Theft Sabotage Fraud). Addison-Wesley

  9. Cárdenas AA, Amin S, Lin Z-S, Huang Y-L, Huang C-Y, Sastry S (2011) Attacks against process control systems: risk assessment, detection, and response. In: Proceedings of the 6th ACM symposium on information, computer and communications security, pp 355–366

  10. Chen M-S, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8.6:866–883

    Article  Google Scholar 

  11. Chung CY, Gertz M, Levitt K (1999) Demids: A misuse detection system for database systems. In: Working conference on integrity and internal control in information systems, Springer, pp 159–178

  12. Corney MW, Mohay GM, Clark AJ (2011) Detection of anomalies from user profiles generated from system logs. In: Conferences in research and practice in information technology (CRPIT). vol. 116, Australian Computer Society, Inc. pp 23–32

  13. Debar H, Dacier M, Wespi A (1999) Towards a taxonomy of intrusion-detection systems. Comput Netw 31.8:805–822

    Article  Google Scholar 

  14. Dempster AP, Laird NM , Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B Methodol 39.1:1–22

    MathSciNet  MATH  Google Scholar 

  15. Denning DE (1987) An intrusion-detection model. IEEE Trans Softw Eng 2:222–232

    Article  Google Scholar 

  16. Do CB, Batzoglou S (2008) What is the expectation maximization algorithm?. Nature biotechnol 26.8:897–899

    Article  Google Scholar 

  17. Doroudian M, Shahriari HR (2014) A hybrid approach for database intrusion detection at transaction and inter-transaction levels. In: 2014 6th Conference on information and knowledge technology (IKT), IEEE, pp 1–6

  18. Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M-C (2000) FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. pp 355–359

  19. Han J, Pei J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering. Citeseer, pp 215–224

  20. Hashemi S, Yang Y, Zabihzadeh D, Kangavari M (2008) Detecting intrusion transactions in databases using data item dependencies and anomaly analysis. Expert Syst 25.5:460–473

    Article  Google Scholar 

  21. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media

  22. Heady R, Luger G, Maccabe A, Servilla M (1990) The architecture of a network level intrusion detection system. Tech. rep. Los Alamos National Lab., NM (United States); New Mexico Univ. Albuquerque...

  23. Hoglund AJ, Hatonen K, Sorvari AS (2000) A computer host-based user anomaly detection system using the self-organizing map. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. neural computing: new challenges and perspectives for the new millennium. vol. 5. IEEE, pp 411–416

  24. Kamra A, Terzi E, Bertino E (2008) Detecting anomalous access patterns in relational databases. VLDB J 17.5:1063–1077

    Article  Google Scholar 

  25. Kim T-Y, Cho S-B (2019) CNN-LSTM neural networks for anomalous database intrusion detection in RBAC-administered model. In: International conference on neural information processing, Springer, pp 131–139

  26. Kuang F-J, Zhang S-Y (2017) A Novel Network Intrusion Detection Based on Support Vector Machine and Tent Chaos Artificial Bee Colony Algorithm. J Netw Intell 2.2:195–204

    Google Scholar 

  27. Lan G-C, Hong T-P, Lee H-Y (2014) An efficient approach for finding weighted sequential patterns from sequence databases. Appl Intell 41.2:439–452

    Article  Google Scholar 

  28. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversal. Soviet Physics doklady 10. 8.:707–710

    MathSciNet  Google Scholar 

  29. Liao H-J, Lin C-HR, Lin Y-C, Tung K-Y (2013) Intrusion detection system: A comprehensive review. J Netw Comput Appl 36.1:16–24

    Article  Google Scholar 

  30. Lin JC-W, Fournier-Viger P, Koh YS, Kiran RU, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1.1:54–77

    Google Scholar 

  31. Liu P-Y, Gong W, Jia X (2011) An improved prefixspan algorithm research for sequential pattern mining. In: 2011 IEEE international symposium on IT in medicine and education. vol. 1, IEEE, pp 103–108

  32. Lunt TF, Tamaru A, Gillham F (1992) A real-time intrusion-detection expert system (IDES). SRI International Computer Science Laboratory

  33. Luo C, Chung SM (2005) Efficient mining of maximal sequential patterns using multiple samples. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 415–426

  34. Mazzawi H, Dalal G, Rozenblatz D, Ein-Dorx L, Niniox M, Lavi O (2017) Anomaly detection in large databases using behavioral patterning. In: 2017 IEEE 33rd international conference on data engineering (ICDE). IEEE, pp 1140–1149

  35. McLachlan GJ, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, New York

    Google Scholar 

  36. Mitra P, Pal SK, Siddiqi MA (2003) Non-convex clustering using expectation maximization algorithm with rough set initialization. Pattern Recogn Lett 24.6:863–873

    Article  Google Scholar 

  37. Neal RM, Hinton GE (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Learning in graphical models, Springer, pp 355–368

  38. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48.3:443–453. issn: 0022-2836. https://doi.org/10.1016/0022-2836(70)90057-4). http://www.sciencedirect.com/science/article/pii/0022283670900574

    Article  Google Scholar 

  39. Ordonez C, Omiecinski E (2002) FREM: fast and robust EM clustering for large data sets. In: Proceedings of the eleventh international conference on Information and knowledge management, pp 590–599

  40. Panigrahi S, Sural S, Majumdar AK (2013) Two-stage database intrusion detection by combining multiple evidence and belief update. Inform Syst Front 15.1:35–53

    Article  Google Scholar 

  41. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans Knowl Data Eng 16.11:1424–1440

    Google Scholar 

  42. Rahman MM, Ahmed CF, Leung CK-S (2019) Mining weighted frequent sequences in uncertain databases. Inf Sci 479 :76–100

    Article  Google Scholar 

  43. Rahman MM, Ahmed CF, Leung CK, Pazdor AGM (2018) Frequent sequence mining with weight constraints in uncertain databases. In: Proceedings of the 12th international conference on ubiquitous information management and communication, pp 1–8

  44. Rashid T, Agrafiotis I, Nurse JRC (2016) A new take on detecting insider threats: exploring the use of hidden markov models. In: Proceedings of the 8th ACM CCS international workshop on managing insider security threats, pp 47–56

  45. Ronao CA, Cho S-B (2016) Anomalous query access detection in RBAC-administered databases with random forest and PCA. Inf Sci 369:238–250

    Article  Google Scholar 

  46. Sallam A, Bertino E (2019) Result-based detection of insider threats to relational databases. In: Proceedings of the ninth ACM conference on data and application security and privacy, pp 133–143

  47. Sallam A, Fadolalkarim D, Bertino E, Xiao Q (2016) Data and syntax centric anomaly detection for relational databases. In: Wiley interdisciplinary reviews: data mining and knowledge discovery 6.6, pp 231–239

  48. Sandhu RS, Coyne EJ, Feinstein HL, Youman CE (1996) Role-based access control models. Computer 29.2:38–47

    Article  Google Scholar 

  49. Sandhu R, Ferraiolo D, Kuhn R et al (2000) The NIST model for role-based access control: towards a unified standard. In: ACM workshop on Role-based access control. Vol. 10. 344287.344301

  50. Shirkhorshidi AS, Aghabozorgi S, Wah TY (2015) A comparison study on similarity and dissimilarity measures in clustering continuous data, PloS one 10.12

  51. Shou Z, Di X (2018) Similarity analysis of frequent sequential activity pattern mining. Trans Res Part C Emerg Technol 96:122–143

    Article  Google Scholar 

  52. Smith TF, Waterman MS et al (1981) Identification of common molecular subsequences. J Mol Biol 147.1:195–197

    Article  Google Scholar 

  53. Sohrabi M, Javidi MM, Hashemi S (2014) Detecting intrusion transactions in database systems: a novel approach. J Intell Inf Syst 42.3:619–644

    Article  Google Scholar 

  54. Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: International conference on extending database technology, Springer, pp 1–17

  55. Srivastava A, Sural S, Majumdar AK (2006) Database intrusion detection using weighted sequence mining. J Comput 1.4:8–17

    Google Scholar 

  56. Subudhi S, Panigrahi S (2019) Application of OPTICS and ensemble learning for database intrusion detection. In: Journal of king saud university-computer and information sciences

  57. Talpade R, Kim G, Khurana S (1999) NOMAD: Traffic-based network monitoring framework for anomaly detection. In: Proceedings IEEE international symposium on computers and communications (Cat. No. PR00250). IEEE, pp 442–451

  58. TPC-C Benchmark. http://www.tpc.org/tpcc/default.asp

  59. Yi H, Brajendra P (2003) Identification of malicious transactions in database systems. In: Seventh international database engineering and applications symposium, 2003 Proceedings. IEEE, pp 329–335.

  60. Yi H, Brajendra P (2004) A data mining approach for database intrusion detection. In: Proceedings of the 2004 ACM symposium on applied computing, pp 711–716

  61. Yip RW, Levitt EN (1998) Data level inference detection in database systems. In: Proceedings. 11th IEEE computer security foundations workshop (Cat. No. 98TB100238). IEEE, pp 179–189

  62. Zahedeh Z, Feizollah A, Anuar NB, Kiah LBM, Srikanth K, Kumar S (2019) User profiling in anomaly detection of authorization logs. In: Computational science and technology. Springer, pp 59–65

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Indu Singh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, I., Jindal, R. Expectation maximization clustering and sequential pattern mining based approach for detecting intrusive transactions in databases. Multimed Tools Appl 80, 27649–27681 (2021). https://doi.org/10.1007/s11042-021-10786-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10786-3

Keywords

Navigation