Artificial Intelligence Used to Combat Cybersecurity Vulnerabilities

In the world of cybersecurity, there are more than 213,800 known “keys” or vulnerabilities, and they are already in the hands of criminals. This makes it difficult for cyber defenders to track, prioritize and prevent threats and attacks. However, a team of scientists from the Department of Energy’s Pacific Northwest National Laboratory, Purdue University, Carnegie Mellon University, and Boise State University have developed an AI-based model to help solve this problem. The model automatically links vulnerabilities to specific lines of attack that adversaries could use to compromise computer systems.

The AI-Based Model

The new AI model uses natural language processing and supervised learning to bridge information in three separate cybersecurity databases: vulnerabilities, weaknesses, and attacks. Vulnerabilities refer to the specific piece of computer code that could serve as an opening for an attack. Weaknesses classify the vulnerabilities into categories based on what could happen if the vulnerabilities were acted upon. Attacks refer to what an actual attack exploiting vulnerabilities and weaknesses might look like.

While all three databases have information crucial for cyber defenders, there have been few attempts to knit all three together so that a user can quickly detect and understand possible threats and their origins, and then weaken or prevent these threats and attacks.

The Implications

By classifying the vulnerabilities into general categories and understanding how an attack might proceed, cyber defenders can neutralize threats much more efficiently. The higher you go in classifying the bugs, the more threats you can stop with one action. The goal is to prevent all possible exploitations.

The team’s model automatically links vulnerabilities to the appropriate weaknesses with up to 87% accuracy and links weaknesses to appropriate attack patterns with up to 80% accuracy. Those numbers are much better than today’s tools provide, but the scientists caution that their new methods need to be tested more widely.

Challenges

One hurdle is the lack of labeled data for training. Currently, less than 1% of vulnerabilities are linked to specific attacks. This is not a lot of data available for training.

To overcome the lack of data, the team fine-tuned pretrained natural language models using both an auto-encoder (BERT) and a sequence-to-sequence model (T5). The first approach used a language model to associate CVEs to CWEs and then CWEs to CAPECs through a binary link prediction approach. The second approach used sequence-to-sequence techniques to translate CWEs to CAPECs with intuitive prompts for ranking the associations. The approaches generated very similar results, which were then validated by the cybersecurity expert on the team.

The new AI-based model should help defenders spot and prevent attacks more often and more quickly. The work is open source, with a portion now available on GitHub. The team will release the rest of the code soon. Cybersecurity experts are encouraged to put this open-source platform to the test.

The AI-Based Model

The Implications

Challenges

Articles You May Like

Leave a Reply Cancel reply