Designing a Machine Learning Intrusion Detection System

How Machine Learning is Revolutionizing Intrusion Detection

Your browser needs to be JavaScript capable to view this video

Try reloading this page, or reviewing your browser settings

This segment helps you understand the promise of Machine Learning and how it is revolutionizing intrusion detection.

Keywords

  • intrusion detection system
  • IDS
  • machine learning
  • AI

About this video

Author(s)
Emmanuel Tsukerman
First online
08 October 2020
DOI
https://doi.org/10.1007/978-1-4842-6591-8_2
Online ISBN
978-1-4842-6591-8
Publisher
Apress
Copyright information
© Emmanuel Tsukerman 2020

Video Transcript

Welcome to module 2 on designing a machine learning intrusion detection system. In this module, I’m going to discuss how machine learning is revolutionizing intrusion detection. IDSes fall into two varieties. There are traditional IDSes, and next generation IDSes.

Traditional IDSes, those that have been used for many years, rely on rules. In other words, as traffic comes in and out, the IDS monitors it and applies a set of rules to decide whether this traffic should be allowed or flagged as malicious. One of the great challenges with traditional IDSes is that these rules are very specific, and as a result, you constantly have to update these rules, kind of like worrying about the latest flu, where your best shot is to stay up to date.

Unfortunately, the problem is that even if you are consistently updating your IDS, the bad guys are always coming up with new methods of attack. And there are so many variants, and they’re constantly evolving, and as a result, traditional IDSes just cannot keep up. So let me give you an example.

For ease of explanation, I use firewall here, but the same ideas apply to IDSes. So one type of attack that hackers use is called packet fragmentation, and that allows hackers to avoid rules that will be applied to their packets to determine that they are malicious. Another type of attack that hackers can use is spoofing source IP address. The third method of attack is spoofing source port.

To stop any single one of these attacks requires great ingenuity, deep knowledge of domain, and plenty of expert hours. But the problem is that these are just three types of attacks on firewalls that are off the top of my head. What about the hundreds or even thousands of other attacks that hackers are constantly coming up with, improving, and evolving? How can you possibly keep up with all that?

The reality is that as much as we can hope to keep up with the attackers to figure out their methods, to improve our own defenses, and to keep constantly updating and working together, the reality is if we look at statistics, we can see that unfortunately it is pretty much impossible to keep up. If it was, the amount of breaches wouldn’t be increasing but rather decreasing, whereas we know that they are increasing in number, scale, and cost.

Fortunately, human ingenuity does offer us a solution. So it’s not that we cannot really keep up, it is that we cannot keep up using these old-fashioned methods. And this promising solution is AI, artificial intelligence, and in particular its subfield, machine learning. AI is so promising for cybersecurity, because it is scalable, meaning it can be applied in small organizations, but also in the largest organizations with the largest amounts of data, the largest amount of monitoring needed, and largest numbers of attacks.

It is scientific, in the sense that you can always recreate your experiments, you can control for variables, and you can debug it and understand why one thing happens or another, why it made a certain prediction, and learn lessons from. It is able to stop zero-day threat. In other words, threats that have never been seen before, which is something that traditional methods have absolutely had no success.

It can be adapted to be fast, it can be adapted to work in real time, as fast as you need it. For instance, if you needed to be put on a network, as packets come in and out, that can be done. It can also be adapted for low memory situations. In other words, it’s a very flexible framework. And finally, it is tunable, tunable to satisfy the goals of the organization or the customer.

In other words, you can decide on the relative importance of catching a certain threat compared to another threat, or compared to a false alarm, or compared to letting through benign traffic. You have full control over all of these decisions. If you have the knowledge that a particular attack is especially deadly and malicious, you can set this in your AI so that it knows that this is a much more costly attack, and therefore will treat it as such and to radically increase the odds of this attack getting caught.

So now that I have discussed the numerous benefits of machine learning, what exactly is it? Machine learning is the science of applying sophisticated statistical algorithms that have been designed to be scalable and suitable to be used with our computational infrastructure to automatically learn from data. In particular, a next generation intrusion detection system is a machine learning based intrusion detection system that automatically learns from the traffic that it monitors.