Security of deep reinforcement learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Since the inception of Deep Reinforcement Learning (DRL) algorithms, there has been a growing interest from both the research and the industrial communities in the promising potentials of this paradigm. The list of current and envisioned applications of deep RL ranges from autonomous navigation and robotics to control applications in the critical infrastructure, air traffic control, defense technologies, and cybersecurity. While the landscape of opportunities and the advantages of deep RL algorithms are justifiably vast, the security risks and issues in such algorithms remain largely unexplored. It has been shown that DRL algorithms are very brittle in terms of their sensitivity to small perturbations of their observations of the state. Furthermore, recent reports demonstrate that such perturbations can be applied by an adversary to manipulate the performance and behavior of DRL agents. To address such problems, this dissertation aims to advance the current state of the art in three separate, but interdependent directions. First, I build on the recent developments in adversarial machine learning and robust reinforcement learning to develop techniques and metrics for evaluating the resilience and robustness of DRL agents to adversarial perturbations applied to the observations of state transitions. A main objective of this task is to disentangle the vulnerabilities in the learned representation of state from those that stem from the sensitivity of DRL policies to changes in transition dynamics. A further objective is to investigate evaluation methods that are independent of attack techniques and their specific parameters. Accordingly, I develop two DRL-based algorithms that enable the quantitative measurement and benchmarking of worst-case resilience and robustness in DRL policies. Second, I present an analysis of \emph{adversarial training} as a solution to the brittleness of Deep Q-Network (DQN) policies, and investigate the impact of hyperparameters on the training-time resilience of policies. I also propose a new exploration mechanism for sample-efficient adversarial training of DRL agents. Third, I address the previously unexplored problem of model extraction attacks on DRL agents. Accordingly, I demonstrate that imitation learning techniques can be used to effectively replicate a DRL policy from observations of its behavior. Moreover, I establish that the replicated policies can be used to launch effective black-box adversarial attacks through the transferability of adversarial examples. Lastly, I address the problem of detecting replicated models by developing a novel technique for embedding sequential watermarks in DRL policies. The dissertation concludes with remarks on the remaining challenges and future directions of research in emerging domain of DRL security.