Automated malware analysis for Android applications through raw bytecode

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Securing mobile phone applications is one of the large areas of research based on the wide spread of mobile phones today. Android encourages developers to make Java applications to run on Android devices. While this provides developers with a lot of freedom, this provides the same opportunity to malware authors. Therefore, defenses need to be put in place to determine which applications are malicious or benign. Additionally, an automatic way to determine if applications are malicious needs to be put in place given the massive amount of applications that incident responders would need to review. To address the question of how to determine if an application is malicious, this thesis approached the problem by utilizing a LSTM model. This approach was utilized to determine if treating individual Java bytecode instructions as "words'' in a sentence for an NLP task would provide decent performance compared to the expectations for this dataset. A logistic regression model was utilized to provide a baseline measurement for what the expected results were. Six different configurations were attempted for both of the models to determine which configuration provided the best performance for the applications pulled from the Androzoo repository. The LSTM model achieved very similar performance across all six experiments, with only the loss value changing. An accuracy of 0.9, a precision of 0.933, a recall of 0.83, a F1-score of 0.841, and a loss of 0.332 were the results of the best configuration for the LSTM. The equivalent logistic regression experiment resulted in 10.198 loss, 0.86 accuracy, 0.733 precision, 0.75 recall, and 0.731 F1-score. The LSTM model performed better than the logistic regression model, but increasing the amount of input may provide better results.

Description

Keywords

Android, Malware, LSTM, Java bytecode, Logistic regression, Malware analysis

Graduation Month

August

Degree

Master of Science

Department

Department of Computer Science

Major Professor

George Amariucai

Date

2021

Type

Thesis

Citation