Skip to main content
Log in

Machine Learning Approaches for Anomaly Detection in IoT: An Overview and Future Research Directions

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

The internet of things (IoT) is the networking of interrelated devices and sensors connected through the internet to transfer and share data. The data gathered from these devices may have anomalies or other errors for various reasons, such as malicious activities or sensor failures. Anomaly detection is found in several domains, such as fault detection and health monitoring systems. In this paper, we review and analyze the relevant literature on existing anomaly detection techniques that apply different machine learning approaches in the IoT. In addition, we examine different anomaly detection datasets used in IoT and highlight the most concerning issues with these datasets for different approaches, and list several future research directions. We believe this survey will serve as a starting point for researchers to gain knowledge from the IoT that employs machine learning approaches to detect anomalies. Moreover, the datasets that associate with anomaly detection in IoT, issues and future directions from a dataset perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study

Abbreviations

IoT:

Internet of things

IWC:

Inverse weight clustering

DBSCAN:

Density-based spatial clustering of applications with noise

RF:

Random forest

LDA:

Linear discriminant analysis

PCA:

Principal component analysis

NB:

Naive bayes

FF:

Farthest first

EM:

Expectation maximization

SVDD:

Support vector data description

DNNs:

Deep neural networks

DT:

Decision tree

DOS:

Denial of service

MD:

Mahalanobis distance

LOF:

Local outlier factor

KNNs:

K-nearest neighbours

SVMs:

Support vector machines

NNs:

Neural networks

OC-SVM:

One-class SVM

RBF:

Radial basis function

FPR:

False positive rate

TPR:

True positive rate

References

  1. Cisco global cloud index: Forecast and methodology, 2016–2021 white paper (2018). https://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/white-paper-c11-738085.html

  2. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15:1. https://doi.org/10.1145/1541880.1541882.

    Article  Google Scholar 

  3. Goldstein, M., & Uchida, S. (2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE, 11(4), 1. https://doi.org/10.1371/journal.pone.0152173.

    Article  Google Scholar 

  4. Ahmed, M., Mahmood, A. N., & Hu, J. (2016). A survey of network anomaly detection techniques. Journal of Network and Computer Applications, 60, 19.

    Article  Google Scholar 

  5. Tsai, C. F., Hsu, Y. F., Lin, C. Y., & Lin, W. Y. (2009). Intrusion detection by machine learning: A review. Expert Systems with Applications, 36(10), 11994. https://doi.org/10.1016/j.eswa.2009.05.029.

    Article  Google Scholar 

  6. Jang, J. S. R., Sun, C. T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing: A computational approach to learning and machine intelligence. New Jersey: Prentice Hall.

    Google Scholar 

  7. Ford, V., Siraj, A., & Eberle, W. (2014). In 2014 IEEE symposium on computational intelligence applications in smart grid (CIASG) (pp. 1–6). https://doi.org/10.1109/CIASG.2014.7011557

  8. Commission for Energy Regulation (CER), Irish Social Science Data Archive (ISSDA). (2012). www.ucd.ie/issda/data/commissionforenergyregulationcer/

  9. Cañedo, J., & Skjellum, A. (2016). In 2016 14th Annual conference on privacy, security and trust (PST) (pp. 219–222). https://doi.org/10.1109/PST.2016.7906930

  10. Jain, R., & Shah, H. (2016). In 2016 International conference on signal and information processing (IConSIP) (pp. 1–5). https://doi.org/10.1109/ICONSIP.2016.7857445

  11. Ali, M. I., Gao, F., & Mileo, A. (2015). In Proceedings of ISWC 2015–14th international semantic web conference (W3C (pp. 374–389). Bethlehem, PA

  12. Pollution Data, Citypulse Project. (2014). http://iot.ee.surrey.ac.uk:8080/datasets.html

  13. Hodo, E., Bellekens, X., Hamilton, A., Dubouilh, P., Iorkyase, E., Tachtatzis, C., & Atkinson, R. (2016). In International symposium on networks. Computers and communications (ISNCC) (Vol. 2016, pp. 1–6). https://doi.org/10.1109/ISNCC.2016.7746067

  14. Pachauri, G., & Sharma, S. (2015) Procedia Computer Science 70, 325. https://doi.org/10.1016/j.procs.2015.10.026. (Proceedings of the 4th International Conference on Eco-friendly Computing and Communication Systems).

  15. Goldberger, A. L., Amaral, L. A. N., Glass, L., Hausdorff, J. M., Ivanov, P. C., Mark, R. G., et al. (2000). Circulation electronic pages. Circulation, 101(23), e215. https://doi.org/10.1161/01.CIR.101.23.e215.

    Article  Google Scholar 

  16. PhysioNet. https://www.physionet.org/cgi-bin/atm/ATM

  17. Hasan, M., Islam, M. M., Zarif, M. I. I., & Hashem, M. (2019). Internet of Things, 7, 100059.

    Article  Google Scholar 

  18. Pahl, M. O., & Aubet, F. X. (2018). In 2018 14th International conference on network and service management (CNSM) (pp. 72–80).

  19. Pajouh, H. H., Javidan, R., Khayami, R., Ali, D., & Choo, K. K. R. (2016). A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks. IEEE Transactions on Emerging Topics in Computing. https://doi.org/10.1109/TETC.2016.2633228.

    Article  Google Scholar 

  20. Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). In 2009 IEEE symposium on computational intelligence for security and defense applications (pp. 1–6). https://doi.org/10.1109/CISDA.2009.5356528

  21. Pajouh, H. H., Dastghaibyfard, G., & Hashemi, S. (2017). Two-tier network anomaly detection model: A machine learning approach. Journal of Intelligent Information Systems, 48(1), 61. https://doi.org/10.1007/s10844-015-0388-x.

    Article  Google Scholar 

  22. Alghuried, A. (2017). A model for anomalies detection in internet of things (IOT) using inverse weight clustering and decision tree. Masters dissertation. https://doi.org/10.21427/D7WK7S

  23. Bodik, P., Hong, W., Guestrin, C., Madden, S., Paskin, M., & Thibaux, R. (2004). Intel Lab Data. http://db.csail.mit.edu/labdata/labdata.html

  24. Zhao, S., Li, W., Zia, T., & Zomaya, A. Y. (2017). In 2017 IEEE 15th international conference on dependable, autonomic and secure computing, 15th international conference on pervasive intelligence and computing, 3rd international conference on big data intelligence and computing and cyber science and technology congress(DASC/PiCom/DataCom/CyberSciTech) (pp. 836–843). https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.141

  25. KDD Cup 1999 Data. (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  26. SaeediEmadi, H., & Mazinani, S. M. (2018). A novel anomaly detection algorithm using DBSCAN and SVM in wireless sensor networks. Wireless Personal Communications, 98(2), 2025. https://doi.org/10.1007/s11277-017-4961-1.

    Article  Google Scholar 

  27. Hosseini, M., & Borojeni, H. R. S. (2018). In Proceedings of the international conference on smart cities and internet of things (SCIOT 18). Association for Computing Machinery. https://doi.org/10.1145/3269961.3269975

  28. Alghanmi, N., Alotaibi, R., & Buhari, S. M. (2019). HLMCC: A hybrid learning anomaly detection model for unlabeled data in internet of things. IEEE Access, 7, 179492. https://doi.org/10.1109/ACCESS.2019.2959739.

    Article  Google Scholar 

  29. Suthaharan, S., Alzahrani, M., Rajasegarar, S., Leckie, C., & Palaniswami, M. (2010). In 2010 Sixth international conference on intelligent sensors, sensor networks and information processing (pp. 269–274). https://doi.org/10.1109/ISSNIP.2010.5706782

  30. Vangipuram, R., Gunupudi, R. K., Puligadda, V. K., & Vinjamuri, J. (2020). A machine learning approach for imputation and anomaly detection in IoT environment. Expert Systems, 37(5), e12556. https://doi.org/10.1111/exsy.12556.

    Article  Google Scholar 

  31. Zheng, Y., Rajasegarar, S., Leckie, C., & Palaniswami, M. (2014). In 2014 IEEE ninth international conference on intelligent sensors, sensor networks and information processing (ISSNIP) (pp. 1–6). https://doi.org/10.1109/ISSNIP.2014.6827618

  32. San Francisco Parking Data. (2013). http://sfpark.org/

  33. Morrow, A., Baseman, E., & Blanchard, S. (2016). In 2016 International conference on computational science and computational intelligence (CSCI) (pp. 629–632). https://doi.org/10.1109/CSCI.2016.0124

  34. Schroeder, B., & Gibson, G. A. (2007). Workshop on reliability analysis of system failure data (RAF07). Cambridge: MSR Cambridge.

    Google Scholar 

  35. Garcia-Font, V., Garrigues, C., & Rifà-Pous, H. (2016). A comparative study of anomaly detection techniques for smart city wireless sensor networks. Sensors, 16, 6. https://doi.org/10.3390/s16060868.

    Article  Google Scholar 

  36. Martí, L., Sanchez-Pi, N., Molina, J. M., & Garcia, A. C. B. (2015). Anomaly detection based on sensor data in petroleum industry applications. Sensors, 15(2), 2774. https://doi.org/10.3390/s150202774.

    Article  Google Scholar 

  37. Inoue, J., Yamagata, Y., Chen, Y., Poskitt, C. M., & Sun, J. (2017). In 2017 IEEE international conference on data mining workshops (ICDMW) (pp. 1058–1065). https://doi.org/10.1109/ICDMW.2017.149

  38. Secure Water Treatment (SWaT). (2017). http://itrust.sutd.edu.sg/testbeds/secure-water-treatment-swat/

  39. Goh, J., Adepu, S., Junejo, K. N., & Mathur, A. (2017). Critical information infrastructures security (pp. 88–99). Cham: Springer.

    Book  Google Scholar 

  40. Hoang, D. H., & Nguyen, H. D. (2018). In 2018 20th International conference on advanced communication technology (ICACT) (pp. 381–386). https://doi.org/10.23919/ICACT.2018.8323766

  41. Traffic Data from Kyoto University’s Honeypots. (2006). http://www.takakura.com/Kyoto_data

  42. White, J., & Legg, P. (2021). In 2021 International conference on cyber situational awareness, data analytics and assessment (CyberSA) (pp. 1–8). https://doi.org/10.1109/CyberSA52016.2021.9478248

  43. Handl, J., Knowles, J., & Kell, D. B. (2005). Computational cluster validation in post-genomic data analysis. Bioinformatics, 21(15), 3201.

    Article  Google Scholar 

  44. Suo, H., Wan, J., Zou, C., & Liu, J. (2012). In 2012 International conference on computer science and electronics engineering (Vol. 3, pp. 648–651). https://doi.org/10.1109/ICCSEE.2012.373

Download references

Author information

Authors and Affiliations

Authors

Contributions

NA carried out the conceptualization, formal analysis, methodology, implementation, visualization, and writing the original draft of the manuscript. RA carried out the conceptualization, formal analysis, methodology, revising and editing the paper. SB revised and edited the paper.

Corresponding author

Correspondence to Nusaybah Alghanmi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflcit of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alghanmi, N., Alotaibi, R. & Buhari, S.M. Machine Learning Approaches for Anomaly Detection in IoT: An Overview and Future Research Directions. Wireless Pers Commun 122, 2309–2324 (2022). https://doi.org/10.1007/s11277-021-08994-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-021-08994-z

Keywords

Navigation