Skip to main content

Application of Machine Learning Techniques to Detecting Anomalies in Communication Networks: Datasets and Feature Selection Algorithms

  • Chapter
  • First Online:
Cyber Threat Intelligence

Part of the book series: Advances in Information Security ((ADIS,volume 70))

Abstract

Detecting, analyzing, and defending against cyber threats is an important topic in cyber security. Applying machine learning techniques to detect such threats has received considerable attention in research literature. Anomalies of Border Gateway Protocol (BGP) affect network operations and their detection is of interest to researchers and practitioners. In this Chapter, we describe main properties of the protocol and datasets that contain BGP records collected from various public and private domain repositories such as Route Views, Réseaux IP Européens (RIPE), and BCNET. We employ various feature selection algorithms to extract the most relevant features that are later used to classify BGP anomalies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. (Mar. 2018) BCNET. [Online]. Available: http://www.bc.net.

  2. (Mar. 2018) Data Mining Tools See5 and C5.0. [Online]. Available: http://www.rulequest.com/see5-info.html.

  3. (Mar. 2018) Sans Institute. The mechanisms and effects of the Code Red worm. [Online]. Available: https://www.sans.org/reading-room/whitepapers/dlp/mechanisms-effects-code-red-worm-87.

  4. (Mar. 2018) The Internet Engineering Task Force (IETF) [Online]. Available: https://www.ietf.org/.

  5. (Mar. 2018) bgpdump [Online]. Available: https://bitbucket.org/ripencc/bgpdump/wiki/Home.

  6. (Mar. 2018) mRMR feature selection (using mutual information computation). [Online]. Available: https://www.mathworks.com/matlabcentral/fileexchange/14608-mrmr-feature-selection--using-mutual-information-computation-.

  7. (Mar. 2018) MRT rooting information export format. [Online]. Available: http://tools.ietf.org/html/draft-ietf-grow-mrt-13.

  8. (Mar. 2018) Sans Institute. Nimda worm—why is it different? [Online]. Available: http://www.sans.org/reading-room/whitepapers/malicious/nimda-worm-different-98.

  9. (Mar. 2018) RIPE NCC: RIPE Network Coordination Center. [Online]. Available: http://www.ripe.net/data-tools/stats/ris/ris-raw-data.

  10. (Mar. 2018) YouTube Hijacking: A RIPE NCC RIS case study [Online]. Available: http://www.ripe.net/internet-coordination/news/industry-developments/youtube-hijacking-a-ripe-ncc-ris-case-study.

  11. (Mar. 2018) University of Oregon Route Views project [Online]. Available: http://www.routeviews.org/.

  12. (Mar. 2018) Center for Applied Internet Data Analysis. The Spread of the Sapphire/Slammer Worm [Online]. Available: http://www.caida.org/publications/papers/2003/sapphire/.

  13. (Mar. 2018) Sans Institute. Malware FAQ: MS-SQL Slammer. [Online]. Available: https://www.sans.org/security-resources/malwarefaq/ms-sql-exploit.

  14. T. Ahmed, B. Oreshkin, and M. Coates, “Machine learning approaches to network anomaly detection,” in Proc. USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques, Cambridge, MA, Apr. 2007, pp. 1–6.

    Google Scholar 

  15. N. Al-Rousan and Lj. Trajković, “Machine learning models for classification of BGP anomalies,” in Proc. IEEE Conf. on High Performance Switching and Routing (HPSR), Belgrade, Serbia, June 2012, pp. 103–108.

    Google Scholar 

  16. N. Al-Rousan, S. Haeri, and Lj. Trajković, “Feature selection for classification of BGP anomalies using Bayesian models,” in Proc. Int. Conf. Mach. Learn. Cybern. (ICMLC), Xi’an, China, July 2012, pp. 140–147.

    Google Scholar 

  17. K. El-Arini and K. Killourhy, “Bayesian detection of router configuration anomalies,” in Proc. Workshop Mining Network Data, Philadelphia, PA, USA, Aug. 2005, pp. 221–222.

    Google Scholar 

  18. M. Bhuyan, D. Bhattacharyya, and J. Kalita, “Network anomaly detection: methods, systems and tools,” IEEE Commun. Surveys Tut., vol. 16, no. 1, pp. 303–336, Mar. 2014.

    Article  Google Scholar 

  19. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, Aug. 1996.

    MATH  Google Scholar 

  20. Y.-W. Chen and C.-J. Lin, “Combining SVMs with various feature selection strategies,” Strategies, vol. 324, no. 1, pp. 1–10, Nov. 2006.

    Google Scholar 

  21. J. Chen, H. Huang, S. Tian, and Y. Qu, “Feature selection for text classification with naive Bayes,” Expert Systems with Applications, vol. 36, no. 3, pp. 5432–5435, Apr. 2009.

    Article  Google Scholar 

  22. M. Ćosović, S. Obradović, and Lj. Trajković, “Classifying anomalous events in BGP datasets,” in Proc. The 29th Annu. IEEE Can. Conf. on Elect. and Comput. Eng. (CCECE), Vancouver, Canada, May 2016, pp. 697–700.

    Google Scholar 

  23. S. Deshpande, M. Thottan, T. K. Ho, and B. Sikdar, “An online mechanism for BGP instability detection and analysis,” IEEE Trans. Comput., vol. 58, no. 11, pp. 1470–1484, Nov. 2009.

    Article  MathSciNet  Google Scholar 

  24. Q. Ding, Z. Li, P. Batta, and Lj. Trajković, “Detecting BGP anomalies using machine learning techniques,” in Proc. IEEE Int. Conf. Syst., Man, and Cybern., Budapest, Hungary, Oct. 2016, pp. 3352–3355.

    Google Scholar 

  25. T. Farah, S. Lally, R. Gill, N. Al-Rousan, R. Paul, D. Xu, and Lj. Trajković, “Collection of BCNET BGP traffic,” in Proc. 23rd ITC, San Francisco, CA, USA, Sept. 2011, pp. 322–323.

    Google Scholar 

  26. Q. Gu, Z. Li, and J. Han, “Generalized Fisher score for feature selection,” in Proc. Conf. Uncertainty in Artificial Intelligence, Barcelona, Spain, July 2011, pp. 266–273.

    Google Scholar 

  27. H. Hajji, “Statistical analysis of network traffic for adaptive faults detection,” IEEE Trans. Neural Netw., vol. 16, no. 5, pp. 1053–1063, Sept. 2005.

    Article  Google Scholar 

  28. G. H. John, R. Kohavi, and K. Pfleger, “Irrelevant features and the subset selection problem,” in Proc. Int. Conf. Machine Learning, New Brunswick, NJ, USA, July 1994, pp. 121–129.

    Chapter  Google Scholar 

  29. M. N. A. Kumar and H. S. Sheshadri, “On the classification of imbalanced datasets,” Int. J. Comput. Appl., vol. 44, no. 8, pp. 1–7, Apr. 2012.

    Google Scholar 

  30. J. Kurose and K. W. Ross, “Computer Networking: A Top-Down Approach (6th edition).” Addison-Wesley, 2012, pp. 305–431.

    Google Scholar 

  31. S. Lally, T. Farah, R. Gill, R. Paul, N. Al-Rousan, and Lj. Trajković, “Collection and characterization of BCNET BGP traffic,” in Proc. 2011 IEEE Pacific Rim Conf. Commun., Comput. and Signal Process., Victoria, BC, Canada, Aug. 2011, pp. 830–835.

    Google Scholar 

  32. F. Lau, S. H. Rubin, M. H. Smith, and Lj. Trajković, “Distributed denial of service attacks,” in Proc. IEEE Int. Conf. Syst., Man, and Cybern., SMC 2000, Nashville, TN, USA, Oct. 2000, pp. 2275–2280.

    Google Scholar 

  33. J. Li, D. Dou, Z. Wu, S. Kim, and V. Agarwal, “An Internet routing forensics framework for discovering rules of abnormal BGP events,” SIGCOMM Comput. Commun. Rev., vol. 35, no. 5, pp. 55–66, Oct. 2005.

    Article  Google Scholar 

  34. Y. Li, H. J. Xing, Q. Hua, X.-Z. Wang, P. Batta, S. Haeri, and Lj. Trajković, “Classification of BGP anomalies using decision trees and fuzzy rough sets,” in Proc. IEEE Trans. Syst., Man, Cybern., San Diego, CA, USA, Oct. 2014, pp. 1331–1336.

    Google Scholar 

  35. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. Hoboken, NJ, USA: Wiley-Interscience Publication, 2001.

    MATH  Google Scholar 

  36. H. Liu, H. Motoda, Eds., Computational Methods of Feature Selection. Boca Raton, FL, USA: Chapman and Hall/CRC Press, 2007.

    MATH  Google Scholar 

  37. (Mar. 2018) D. Meyer, “BGP communities for data collection,” RFC 4384, IETF, Feb. 2006. [Online]. Available: http://www.ietf.org/rfc/rfc4384.txt.

  38. Z. Pawlak, “Rough sets,” Int. J. Inform. and Comput. Sci., vol. 11, no. 5, pp. 341–356, Oct. 1982.

    Article  Google Scholar 

  39. C. Patrikakis, M. Masikos, and O. Zouraraki, “Distributed denial of service attacks,” The Internet Protocol, vol. 7, no. 4, pp. 13–31, Dec. 2004.

    Google Scholar 

  40. H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226–1238, Aug. 2005.

    Article  Google Scholar 

  41. (Mar. 2018) A. C. Popescu, B. J. Premore, and T. Underwood, The anatomy of a leak: AS9121. Renesys Corporation, Manchester, NH, USA. May 2005. [Online]. Available: http://50.31.151.73/meetings/nanog34/presentations/underwood.pdf.

  42. J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, Mar. 1986.

    Google Scholar 

  43. A. M. Radzikowska and E. E. Kerre, “A comparative study of fuzzy rough sets,” Fuzzy Sets and Syst., vol. 126, no. 2, pp. 137–155, Mar. 2002.

    Article  MathSciNet  Google Scholar 

  44. (Mar. 2018) Y. Rekhter and T. Li, “A Border Gateway Protocol 4 (BGP-4),” RFC 1771, IETF, Mar. 1995. [Online]. Available: http://tools.ietf.org/rfc/rfc1771.txt.

  45. (Mar. 2018) Y. Rekhter, T. Li, and S. Hares, “A Border Gateway Protocol 4 (BGP-4),” RFC 4271, IETF, Jan. 2016. [Online]. Available: http://tools.ietf.org/rfc/rfc4271.txt.

  46. L. Rokach and O. Maimon, “Top-down induction of decision trees classifiers—a survey,” IEEE Trans. Syst., Man, Cybern., Appl. and Rev., vol. 35, no. 4, pp. 476–487, Nov. 2005.

    Article  Google Scholar 

  47. M. Thottan and C. Ji, “Anomaly detection in IP networks,” IEEE Trans. Signal Process., vol. 51, no. 8, pp. 2191–2204, Aug. 2003.

    Article  Google Scholar 

  48. L. Wang, X. Zhao, D. Pei, R. Bush, D. Massey, A. Mankin, S. F. Wu, and L. Zhang, “Observation and analysis of BGP behavior under stress,” in Proc. 2nd ACM SIGCOMM Workshop on Internet Meas., New York, NY, USA, 2002, pp. 183–195.

    Google Scholar 

  49. J. Wang, X. Chen, and W. Gao, “Online selecting discriminative tracking features using particle filter,” in Proc. Comput. Vision and Pattern Recognition, San Diego, CA, USA, June 2005, vol. 2, pp. 1037–1042.

    Google Scholar 

  50. X.-Z. Wang, L. C. Dong, and J. H. Yan, “Maximum ambiguity based sample selection in fuzzy decision tree induction,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 8, pp. 1491–1505, Aug. 2012.

    Article  Google Scholar 

  51. D. P. Watson and D. H. Scheidt, “Autonomous systems,” Johns Hopkins APL Technical Digest, vol. 26, no. 4, pp. 368–376, Oct.–Dec. 2005.

    Google Scholar 

  52. D. S. Yeung, D. G. Chen, E. C. C. Tsang, J. W. T. Lee, and X.-Z. Wang, “On the generalization of fuzzy rough sets,” IEEE Trans. Fuzz. Syst., vol. 13, no. 3, pp. 343–361, June 2005.

    Article  Google Scholar 

  53. J. Zhang, J. Rexford, and J. Feigenbaum, “Learning-based anomaly detection in BGP updates,” in Proc. Workshop Mining Netw. Data, Philadelphia, PA, USA, Aug. 2005, pp. 219–220.

    Google Scholar 

  54. Y. Zhang, Z. M. Mao, and J. Wang, “A firewall for routers: protecting against routing misbehavior,” in Proc. 37th Annu. IEEE/IFIP Int. Conf. on Dependable Syst. and Netw., Edinburgh, UK, June 2007, pp. 20–29.

    Google Scholar 

Download references

Acknowledgements

We thank Yan Li, Hong-Jie Xing, Qiang Hua, and Xi-Zhao Wang from Hebei University, Marijana Ćosović from University of East Sarajevo, and Prerna Batta from Simon Fraser University for their helpful contributions in earlier publications related to this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ljiljana Trajković .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ding, Q., Li, Z., Haeri, S., Trajković, L. (2018). Application of Machine Learning Techniques to Detecting Anomalies in Communication Networks: Datasets and Feature Selection Algorithms. In: Dehghantanha, A., Conti, M., Dargahi, T. (eds) Cyber Threat Intelligence. Advances in Information Security, vol 70. Springer, Cham. https://doi.org/10.1007/978-3-319-73951-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73951-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73950-2

  • Online ISBN: 978-3-319-73951-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics