Android malware detection using network-based approaches
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis is focused on the use of networks to identify potentially malicious Android applications. There are many techniques that determine if an application is malicious, and they are ever-changing. Techniques to identify malicious applications must be robust as the schemes of creating malicious applications are changing as well. We propose the use of a network-based approach that is potentially effective at separating malicious from benign apps, given a small and noisy training set. The applications in our data set come from the Google Play Store and have been scanned for malicious behavior using Virus Total to produce a ground truth dataset. The apps in the resulting dataset have been represented as binary feature vectors (where the features represent permissions, intent actions, discriminative APIs, obfuscation signatures, and native code signatures). We use the feature vectors corresponding to apps to build a weighted network that captures the \closeness" between applications. We propagate labels, benign or malicious, from the labeled applications that form the training set to unlabeled applications (which we aim to label), and evaluate the effectiveness of the proposed approach in terms of precision, recall and F1-measure. We outline the algorithms for propagating labels that were used in our research and discuss the fine tuning of hyper-parameters. We compare our results to known supervised learning algorithms, such as k-nearest-neighbors and Naive Bayes, that can be used to learn classifiers from the training labeled data and subsequently use the classifiers to label the unlabeled test data. We discuss potential improvements on our methods and ways to further this research.