Predicting sensitive information leakage in IoT applications using flows-aware machine learning approach

Naeem, Hajra; Alalfi, Manar H.

doi:10.1007/s10664-022-10157-y

Predicting sensitive information leakage in IoT applications using flows-aware machine learning approach

Published: 23 July 2022

Volume 27, article number 137, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Hajra Naeem¹ &
Manar H. Alalfi¹

332 Accesses
Explore all metrics

Abstract

This paper presents an approach for identification of vulnerable IoT applications. The approach focuses on a category of vulnerabilities that leads to sensitive information leakage which can be identified by using taint flow analysis. Tainted flows vulnerability is very much impacted by the structure of the program and the order of the statements in the code, designing an approach to detect such vulnerability needs to take into consideration such information in order to provide precise results. In this paper, we propose and develop an approach, FlowsMiner, that mines features from the code related to program structure such as control statements and methods, in addition to program’s statement order. FlowsMiner, generates features in the form of tainted flows. We developed, Flows2Vec, a tool that transform the features recovered by FlowsMiner into vectors, which are then used to aid the process of machine learning by providing a flow’s aware model building process. The resulting model is capable of accurately classify applications as vulnerable if the vulnerability is exhibited by changes in the order of statements in source code. When compared to a base Bag of Words (BoW) approach, the experiments show that the proposed approach has improved the AUC of the prediction models for all algorithms and the best case for Corpus1 dataset is improved from 0.91 to 0.94 and for Corpus2 from 0.56 to 0.96.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Listing 2

Listing 3

Listing 4

Listing 6

Listing 11

Listing 12

Listing 13

Listing 14

Listing 16

An Information Flow-Based Taxonomy to Understand the Nature of Software Vulnerabilities

iDetect for vulnerability detection in internet of things operating systems using machine learning

Article Open access 12 October 2022

IoT-Malware Classification Model Using Byte Sequences and Supervised Learning Techniques

References

Alon U, Zilberstein M, Levy O, Yahav E (2018) Code2vec: learning distributed representations of code. CoRR, arXiv:1803.09473
Andersen LO (1994) Program analysis and specialization for the C programming language. Ph.D. Dissertation. University of Cophenhagen
Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Yves LT, Octeau D, McDaniel P (2014) FLOWDROID: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. ACM SIGPLAN Not 49:259–269
Article Google Scholar
Avdiienko V, Kuznetsov K, Gorla A, Zeller A, Arzt S, Rasthofer S, Bodden E (2015) Mining apps for abnormal usage of sensitive data. In: 37th IEEE/ACM international conference on software engineering, ICSE 2015, Florence, Italy, vol 1, pp 426–436
Boris C, Rakesh V (2018) Machine learning methods for software vulnerability detection, pp 31–39
Celik ZB, Babun L, Sikder AK, Aksu H, Tan G, McDaniel PD, Uluagac AS (2018) Sensitive information tracking in commodity IoT. In: 27th USENIX security symposium, USENIX security 2018, Baltimore, MD, USA, pp 1687–1704
Dam HK, Tran T, Pham TTM, Ng SW, Grundy J, Ghose A (2018) Automatic feature learning for predicting vulnerable software components. IEEE Trans Softw Eng 1–1
Dam HK, Pham T, Ng SW, Tran T, Grundy J, Ghose A, Kim T, Kim C (2019) Lessons learned from using a deep Tree-Based model for software defect prediction in practice. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR), pp 46–57
Harer JA, Kim LY, Russell RL, Ozdemir O, Kosta LR, Rangamani A, Hamilton LH, Centeno GI, Key JR, Ellingwood PM, McConley MW, Opper JM, Chin SP, Lazovich T (2018) Automated software vulnerability detection with machine learning. CoRR, arXiv:1803.04497
Hassan J, Shoaib U (2020) Multi-class review rating classification using deep recurrent neural network. Neural Process Lett 51:1031–1048
Article Google Scholar
Irfan MN, Oriat C, Groz R (2010) Angluin style finite state machine inference with non-optimal counterexamples. In: Proceedings of the first international workshop on model inference in testing, pp 11–19
Irfan M -N, Oriat C, Groz R (2013) Model inference and testing. Adv Comput 89:89–139
Article Google Scholar
Kim H, Choi T, Jung S, Kim H, Lee O, Doh K (2008) Applying dataflow analysis to detecting software vulnerability. In: 2008 10th International conference on advanced communication technology, pp 255–258
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Article Google Scholar
Medeiros I, Neves NF, Correia M (2016) DEKANT: a static analysis tool that learns to detect web application vulnerabilitiess. In: Proceedings of the 25th international symposium on software testing and analysis, ISSTA 2016, Saarbrücken, Germany, pp 1–11
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. In: 1st International conference on learning representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Workshop Track Proceedings
Naeem H, Alalfi MH (2020) Identifying vulnerable IoT applications using deep learning. In: 27th IEEE international conference on software analysis, evolution and reengineering, SANER 2020, London, ON, Canada, pp 582–586
Parveen S, Alalfi MH (2020) A mutation framework for evaluating security analysis tools in IoT applications. In: 27th IEEE international conference on software analysis, evolution and reengineering, SANER 2020, London, ON, Canada, pp 587–591
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine Learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Sadeghi A, Bagheri H, Malek S (2015) Analysis of android Inter-App security vulnerabilities using COVERT. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 2, pp 725–728
Scandariato R, Walden J, Hovsepyan A, Joosen W (2014) Predicting vulnerable software components via text mining. IEEE Trans Softw Eng 40:993–1006
Article Google Scholar
Schmeidl F, Nazzal B, Alalfi MH (2019) Security analysis for SmartThings IoT applications. In: Proceedings of the 6th international conference on mobile software engineering and systems, MOBILESoft@ICSE, Montreal, QC, Canada, pp 25–29
Shar LK, Tan HBK (2012) Mining input sanitization patterns for predicting SQL injection and cross site scripting vulnerabilities. In: 34th International conference on software engineering, ICSE 2012, Zurich, Switzerland, pp 1293–1296
Shar LK, Tan HBK, Briand LC (2013) Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis. In: 35th International conference on software engineering, ICSE ’13, San Francisco, CA, USA, pp 642–651
Shoaib U, Ahmad N, Prinetto P, Tiotto G (2014) Integrating MultiWordNet with Italian Sign Language lexical resources. Expert Syst Appl 41:2300–2308
Article Google Scholar
SmartThings Classic Developer Documentation (2019) https://buildmedia.readthedocs.org/media/pdf/smartthings/latest/smartthings.pdf
Sui Y, Cheng X, Zhang G, Wang H (2020) Flow2vec: value-flow-based precise code embedding. Proc ACM Program Lang 4(OOPSLA):233:1-233:27
Article Google Scholar
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from Tree-Structured long Short-Term memory networks. CoRR, arXiv:1503.00075
The Pandas Development Team (2020) Pandas-dev/pandas. Pandas, Zenodo
Google Scholar
Towards a definition of the Internet of Things (IoT) (2015) IEEE Internet Initiative and others
Walden J, Stuckman J, Scandariato R (2014) Predicting vulnerable components: software metrics vs text mining. In: 25th IEEE International symposium on software reliability engineering, ISSRE 2014, naples, Italy, pp 23–33
Wang S, Liu T, Tan L (2016) Automatically learning semantic features for defect prediction, pp 297–308
Zhao K, Zhang D, Su X, Li W (2015) Fest: a feature extraction and selection tool for Android malware detection. In: 2015 IEEE Symposium on computers and communication, ISCC 2015, Larnaca, Cyprus, pp 714–720
Zheng W, Gao J, Wu X, Xun Y, Liu G, Chen X (2020) An empirical study of high-impact factors for machine Learning-Based vulnerability detection. In: 2020 IEEE 2nd International workshop on intelligent bug fixing (IBF), pp 26–34
Zhu D, Jin H, Yang Y, Wu D, Chen W (2017) Deepflow: deep learning-based malware detection by mining Android application for abnormal usage of sensitive data. In: 2017 IEEE Symposium on computers and communications (ISCC), pp 438–443

Download references

Author information

Authors and Affiliations

Department of Computer Science, Toronto Metropolitan University, Toronto, Ontario, Canada
Hajra Naeem & Manar H. Alalfi

Authors

Hajra Naeem
View author publications
You can also search for this author in PubMed Google Scholar
Manar H. Alalfi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manar H. Alalfi.

Additional information

Communicated by: Foutse Khomh, Gemma Catolino and Pasquale Salza

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE).

Dataset used in the experiments FlowsMiner Webfront

Appendix A: Visual Analysis of sinks in Corpus1 and Coprpus2 datasets

1.1 A.1 Figures for Multiple Sinks in Corpus 1 and 2

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Naeem, H., Alalfi, M.H. Predicting sensitive information leakage in IoT applications using flows-aware machine learning approach. Empir Software Eng 27, 137 (2022). https://doi.org/10.1007/s10664-022-10157-y

Download citation

Accepted: 22 March 2022
Published: 23 July 2022
DOI: https://doi.org/10.1007/s10664-022-10157-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting sensitive information leakage in IoT applications using flows-aware machine learning approach

Abstract

Access this article

Similar content being viewed by others

An Information Flow-Based Taxonomy to Understand the Nature of Software Vulnerabilities

iDetect for vulnerability detection in internet of things operating systems using machine learning

IoT-Malware Classification Model Using Byte Sequences and Supervised Learning Techniques

References