Leveraging a natural language processing approach towards a more informed vulnerability documentation process

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Cybersecurity vulnerabilities are an ever-increasing threat to the current cybersecurity landscape. It has been previously suggested that Twitter is a robust data source for gathering Cyber Threat Intelligence data. This includes cyber vulnerabilities which can be retrieved via their Common Vulnerabilities and Exposures (CVE) identifier. However, the culture of post-disclosure vulnerability discussion is changing to sometimes include a ”nickname”, or a short name utilized instead of the CVE identifier. This trend poses a significant challenge to the retrieval of CVE-relevant information as not all text includes the CVE identifier. To address this challenge, a system was designed by utilizing an off-the-shelf machine learning model to link tweets that do not explicitly mention a CVE Identifier to their corresponding CVE. The system was tested utilizing several datasets and metrics to determine parameters required to obtain satisfactory performance with regards to retrieved information. The results show that machine learning makes it possible to retrieve relevant information corresponding to a specific CVE in the absence of the CVE identifier.

Description

Keywords

Cybersecurity, Cyber Threat Intel, Twitter, Natural Language Processing

Graduation Month

December

Degree

Master of Science

Department

Department of Computer Science

Major Professor

Doina Caragea

Date

Type

Thesis

Citation