Random resampling algorithms for addressing the imbalanced dataset classes in insider threat detection

Al-Shehari, Taher; Alsowail, Rakan A.

doi:10.1007/s10207-022-00651-1

Random resampling algorithms for addressing the imbalanced dataset classes in insider threat detection

Regular Contribution
Published: 26 December 2022

Volume 22, pages 611–629, (2023)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

665 Accesses
4 Citations
Explore all metrics

Abstract

Cybersecurity threats can be perpetrated by insiders or outsiders. The threats that could be carried out by insiders are far more serious due to their privileged access, which they may use to cause financial loss and reputation harm for an organization. Thus, insider threats represent a major cybersecurity challenge for private and government organizations. Researchers and cybersecurity practitioners have proposed different approaches for detecting and mitigating insider threats, but they face many challenges (e.g., dataset availability and the highly imbalanced classes of the available dataset). Because the shortcoming of an insider threat dataset, the benchmarking dataset given by The Computer Emergency Response Team (CERT) was used to validate the majority of the insider threat detection approaches. The CERT dataset of insider threat is extremely imbalanced, and hence, once utilized to validate an insider threat detection model, the detection results may be biased and inaccurate. Such imbalance issue of the CERT dataset is ignored by most existing approaches of insider threat detection. As result, effective model is required to detect insider data leakage incidents from an imbalanced dataset more precisely. In this paper an insider data leakage detection model is proposed to leverage various random sampling techniques and well-known machine learning algorithms to deal with the dataset's extremely imbalanced classes. We evaluate the model on CERT r4.2 insider threat dataset utilizing different sampling techniques, and then compare its performance with the baseline and existing work. The empirical results show that by resolving the imbalanced dataset issue, our model enhances the detection performance of insider data leakage events by surpassing existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Insider threat detection using supervised machine learning algorithms

Article 28 December 2023

Insider Threat Detection on an Imbalanced Dataset Using Balancing Methods

Empirical Investigation of Resampling Techniques in an Intruder Detection System

Data availability

The dataset analyzed during the current study is publicly available.

References

Silowash, G., Shimeall, T.J., Cappelli, D., Moore, A., Flynn, L., Trzeciak, R.: Common sense guide to mitigating threats. CERT Progr. 4, 1–144 (2012)
Google Scholar
IBM, O. and: Cost of Insider Threats: Global Report, https://www.proofpoint.com/us/resources/threat-reports/2020-cost-of-insider-threats
CSO, CERT Division of SEI-CMU, U.S. Secret Service, and K.: The 2018 U.S. state of cybercrime survey, https://www.idg.com/tools-for-marketers/2018-u-s-state-of-cybercrime/
Partners, C.R.: 2019 insider threat report, https://crowdresearchpartners.com/insider-threat-report/
Collins, M.: Common sense guide to mitigating insider threats. CARNEGIE-MELLON UNIV PITTSBURGH PA PITTSBURGH United States (2016)
Azaria, A., Richardson, A., Kraus, S., Subrahmanian, V.S.: Behavioral analysis of insider threat: a survey and bootstrapped prediction in imbalanced data. IEEE Trans. Comput. Soc. Syst. 1, 135–155 (2014). https://doi.org/10.1109/TCSS.2014.2377811
Article Google Scholar
Homoliak, I., Toffalini, F., Guarnizo, J., Elovici, Y., Ochoa, M.: Insight into insiders and it: a survey of insider threat taxonomies, analysis, modeling, and countermeasures. ACM Comput. Surv. 52, 30 (2018)
Google Scholar
Liu, L., De Vel, O., Han, Q.L., Zhang, J., Xiang, Y.: Detecting and Preventing Cyber Insider Threats: A Survey, https://ieeexplore.ieee.org/document/8278157/, (2018)
Zeadally, S., Yu, B., Jeong, D.H., Liang, L.: Detecting insider threats solutions and trends. Inf. Secur. J. 21, 183–192 (2012). https://doi.org/10.1080/19393555.2011.654318
Article Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 41, 1–58 (2009)
Article Google Scholar
Salem, M. Ben, Hershkop, S., Stolfo, S.J.: A Survey of Insider Attack Detection Research. In: Insider Attack and Cyber Security. Springer US, Boston, MA (2008)
Alsowail, R.A., Al-Shehari, T.: Empirical detection techniques of insider threat incidents. IEEE Access. 8, 78385–78402 (2020). https://doi.org/10.1109/ACCESS.2020.2989739
Article Google Scholar
Gayathri, R.G., Sajjanhar, A., Xiang, Y.: Image-based feature representation for insider threat classification. Appl. Sci. (2020). https://doi.org/10.3390/app10144945
Article Google Scholar
CERT and ExactData LLC: Insider Threat Test Dataset, https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=508099
Glasser, J., Lindauer, B.: Bridging the gap: A pragmatic approach to generating insider threat data. In: Proceedings - IEEE CS Security and Privacy Workshops, SPW 2013. pp. 98–104. IEEE (2013)
Roberts, S.C., Holodnak, J.T., Nguyen, T., Yuditskaya, S., Milosavljevic, M., Streilein, W.W.: A Model-Based Approach to Predicting the Performance of Insider Threat Detection Systems. In: 2016 IEEE Security and Privacy Workshops (SPW). pp. 314–323. IEEE, San Jose, CA, USA (2016)
Rashid, T., Agrafiotis, I., Nurse, J.R.C.: A new take on detecting insider threats: Exploring the use of Hidden Markov Models. In: MIST 2016 - Proceedings of the International Workshop on Managing Insider Security Threats, co-located with CCS 2016 (2016)
Le, D.C., Zincir-Heywood, A.N.: Evaluating Insider Threat Detection Workflow Using Supervised and Unsupervised Learning. In: 2018 IEEE Security and Privacy Workshops (SPW). pp. 270–275. IEEE, San Francisco, CA, USA (2018)
Le, D.C., Zincir-Heywood, N.: Exploring anomalous behaviour detection and classification for insider threat identification. Int. J. Netw. Manag. (2021). https://doi.org/10.1002/nem.2109
Article Google Scholar
Lo, O., Buchanan, W.J., Griffiths, P., Macfarlane, R.: Distance measurement methods for improved insider threat detection. Secur. Commun. Networks. (2018). https://doi.org/10.1155/2018/5906368
Article Google Scholar
Lin, L., Zhong, S., Jia, C., Chen, K.: Insider threat detection based on deep belief network feature representation. In: Proceedings - 2017 International Conference on Green Informatics, ICGI 2017 (2017)
Le, D.C., Zincir-Heywood, A.N., Heywood, M.I.: Dynamic insider threat detection based on adaptable genetic programming. In: 2019 IEEE Symposium Series on Computational Intelligence, SSCI 2019 (2019)
Lv, B., Wang, D., Wang, Y., Lv, Q., Lu, D.: A hybrid model based on multi-dimensional features for insider threat detection. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018)
Hall, A.J., Pitropakis, N., Buchanan, W.J., Moradpoor, N.: Predicting malicious insider threat scenarios using organizational data and a heterogeneous stack-classifier. In: 2018 IEEE International Conference on Big Data (Big Data). pp. 5034–5039. IEEE (2018)
Aldairi, M., Karimi, L., Joshi, J.: A trust aware unsupervised learning approach for insider threat detection. In: Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019 (2019)
Gamachchi, A., Boztas, S.: Insider threat detection through attributed graph clustering. In: 2017 IEEE Trustcom/BigDataSE/ICESS. pp. 112–119. IEEE, Sydney, NSW, Australia (2017)
Yuan, F., Cao, Y., Shang, Y., Liu, Y., Tan, J., Fang, B.: Insider threat detection with deep neural network. In: Shi, Y., Haohuan, Fu., Tian, Y., Krzhizhanovskaya, V.V., Lees, M.H., Dongarra, J., Sloot, P.M.A. (eds.) Computational Science – ICCS 2018: 18th International Conference, Wuxi, China, June 11–13, 2018, Proceedings, Part I, pp. 43–54. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-93698-7_4
Chapter Google Scholar
Diop, A., Emad, N., Winter, T.: A parallel and scalable framework for insider threat detection. In: Proceedings - 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics, HiPC 2020 (2020)
Tabash, K.A., Happa, J.: Insider-threat detection using gaussian mixture models and sensitivity profiles. Comput. Secur. 77, 838–859 (2018)
Article Google Scholar
Le, D.C., Zincir-Heywood, N., Heywood, M.I.: Analyzing data granularity levels for insider threat detection using machine learning. IEEE Trans. Netw. Serv. Manag. 17, 30–44 (2020). https://doi.org/10.1109/TNSM.2020.2967721
Article Google Scholar
Yuan, F., Shang, Y., Liu, Y., Cao, Y., Tan, J.: Data augmentation for insider threat detection with GAN. In: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). pp. 632–638. IEEE (2020)
Le, D.C., Zincir-Heywood, N.: Anomaly detection for insider threats using unsupervised ensembles. IEEE Trans. Netw. Serv. Manag. 18, 1152–1164 (2021). https://doi.org/10.1109/TNSM.2021.3071928
Article Google Scholar
Wang, J., Cai, L., Yu, A., Meng, D.: Embedding learning with heterogeneous event sequence for insider threat detection. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). pp. 947–954. IEEE (2019)
Ye, X., Han, M.-M.: An improved feature extraction algorithm for insider threat using hidden Markov model on user behavior detection. Inf. Comput. Secur. (2020)
Al-Mhiqani, M.N., Ahmed, R., Abidin, Z.Z., Isnin, S.N.: An integrated imbalanced learning and deep neural network model for insider threat detection. Int. J. Adv. Comput. Sci. Appl. 12, (2021)
Alsowail, R.A., Al-Shehari, T.: A multi-tiered framework for insider threat prevention. Electronics 10, 1005 (2021). https://doi.org/10.3390/electronics10091005
Article Google Scholar
Nelli, F.: Machine learning with scikit-learn. Python Data Anal. 19, 237–264 (2015). https://doi.org/10.1007/978-1-4842-0958-5_8
Article Google Scholar
Su, L., Wu, S., Cao, D.: Windows-based analysis for HFS+ file system. Adv. Mater. Res. (2011). https://doi.org/10.4028/www.scientific.net/AMR.179-180.538
Article Google Scholar
Pereira, R.M., Costa, Y.M.G., Silla, C.N., Jr.: Toward hierarchical classification of imbalanced data using random resampling algorithms. Inf. Sci. (Ny) 578, 344–363 (2021). https://doi.org/10.1016/j.ins.2021.07.033
Article MathSciNet Google Scholar
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, New Jersey (2013)
Book MATH Google Scholar
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49, 1–50 (2016). https://doi.org/10.1145/2907070
Article Google Scholar
Alqahtani, Gumaei, Mathkour, Maher Ben Ismail: A Genetic-Based Extreme Gradient Boosting Model for Detecting Intrusions in Wireless Sensor Networks. Sensors. 19, 4383 (2019). https://doi.org/10.3390/s19204383
Al-Rakhami, M., Gumaei, A., Alsanad, A., Alamri, A., Hassan, M.M.: An ensemble learning approach for accurate energy load prediction in residential buildings. IEEE Access. 7, 48328–48338 (2019). https://doi.org/10.1109/ACCESS.2019.2909470
Article Google Scholar
Farnaaz, N., Jabbar, M.A.: Random forest modeling for network intrusion detection system. Proced. Comput. Sci. 89, 213–217 (2016). https://doi.org/10.1016/j.procs.2016.06.047
Article Google Scholar
Jabbar, M.A., Deekshatulu, B.L., Chndra, P.: Alternating decision trees for early diagnosis of heart disease. In: International Conference on Circuits, Communication, Control and Computing. pp. 322–328. IEEE (2014)
Ali, J., Khan, R., Ahmad, N., Maqsood, I.: Random forests and decision trees. Int. J. Comput. Sci. Issues. 9, (2012)
Fawagreh, K., Gaber, M.M., Elyan, E.: Random forests: from early developments to recent advancements. Syst. Sci. Control Eng. 2, 602–609 (2014). https://doi.org/10.1080/21642583.2014.956265
Article Google Scholar
Sahu, S., Mehtre, B.M.: Network intrusion detection system using J48 Decision Tree. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI). pp. 2023–2026. IEEE (2015)
Ruggieri, S.: Efficient C4.5 [classification algorithm]. IEEE Trans. Knowl. Data. Eng. 14, 438–444 (2002)
Article Google Scholar
Chen, F., Ye, Z., Wang, C., Yan, L., Wang, R.: A Feature Selection Approach for Network Intrusion Detection Based on Tree-Seed Algorithm and K-Nearest Neighbor. In: 2018 IEEE 4th International Symposium on Wireless Systems within the International Conferences on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS). pp. 68–72. IEEE (2018)
Buczak, A.L., Guven, E.: A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutorials. 18, 1153–1176 (2016). https://doi.org/10.1109/COMST.2015.2494502
Article Google Scholar
Jin, Y., Wang, H., Sun, C.: Introduction to Machine Learning. Presented at the (2021)
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, e0118432 (2015). https://doi.org/10.1371/journal.pone.0118432
Article Google Scholar
Liu, Z., Bondell, H.D.: Binormal precision-recall curves for optimal classification of imbalanced data. Stat. Biosci. 11, 141–161 (2019). https://doi.org/10.1007/s12561-019-09231-9
Article Google Scholar
Paper, D.: Hands-on Scikit-Learn for Machine Learning Applications. Apress, Berkeley, CA (2020)
Book Google Scholar
Singh, M., Mehtre, B.M., Sangeetha, S.: Insider Threat Detection Based on User Behaviour Analysis. In: International Conference on Machine Learning, Image Processing, Network Security and Data Sciences. pp. 559–574. Springer (2020)
Yuan, F., Shang, Y., Liu, Y., Cao, Y., Tan, J.: Attention-based LSTM for insider threat detection. In: International Conference on Applications and Techniques in Information Security. pp. 192–201. Springer (2019)
Al-Shehari, T., Alsowail, R.A.: An insider data leakage detection using one-hot encoding, synthetic minority oversampling and machine learning techniques. Entropy 23, 1258 (2021). https://doi.org/10.3390/e23101258
Article Google Scholar

Download references

Acknowledgements

The authors of this study extend their appreciation to the Researchers Supporting Project number (RSPD2023R544), King Saud University, Riyadh, Saudi Arabia.

Author information

Authors and Affiliations

Computer Skills, Self-Development Skills Department, Deanship of Common First Year, King Saud University, Riyadh, 11362, Saudi Arabia
Taher Al-Shehari & Rakan A. Alsowail

Authors

Taher Al-Shehari
View author publications
You can also search for this author in PubMed Google Scholar
Rakan A. Alsowail
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taher Al-Shehari.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Al-Shehari, T., Alsowail, R.A. Random resampling algorithms for addressing the imbalanced dataset classes in insider threat detection. Int. J. Inf. Secur. 22, 611–629 (2023). https://doi.org/10.1007/s10207-022-00651-1

Download citation

Accepted: 05 December 2022
Published: 26 December 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10207-022-00651-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random resampling algorithms for addressing the imbalanced dataset classes in insider threat detection

Abstract

Access this article

Similar content being viewed by others

Insider threat detection using supervised machine learning algorithms

Insider Threat Detection on an Imbalanced Dataset Using Balancing Methods

Empirical Investigation of Resampling Techniques in an Intruder Detection System

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Random resampling algorithms for addressing the imbalanced dataset classes in insider threat detection

Abstract

Access this article

Similar content being viewed by others

Insider threat detection using supervised machine learning algorithms

Insider Threat Detection on an Imbalanced Dataset Using Balancing Methods

Empirical Investigation of Resampling Techniques in an Intruder Detection System

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation