Optimizing last-mile logistics with drones: A simulation-based stochastic deep Q-learning approach
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This research addresses the optimization of last-mile logistics using unmanned aerial vehicles (drones), focusing on the challenges of limited battery capacity, diverse charging strategies, and efficient routing in a complex, stochastic environment. The study aims to enhance drone delivery operations through advanced modelling and reinforcement learning techniques.
The inherent complexity of drone delivery systems, characterized by high-dimensional state spaces, continuous action spaces, and stochastic elements such as variable delivery demands and energy consumption, necessitates the use of advanced optimization methods. Traditional optimization approaches often struggle with such complex, dynamic systems. Therefore, this study employs deep Q-learning, a powerful reinforcement learning technique capable of handling high-dimensional state spaces and learning optimal policies in complex environments without explicit programming.
A deep Q-learning model was implemented to optimize decision-making processes, with particular attention to the state and action space, reward function, and training procedure. The model's performance was evaluated based on its ability to improve operational efficiency and make optimal charging and routing decisions in real-time. A simulated drone delivery network comprising 14 nodes, including 2 origin nodes and 1 charging station, was created using Simio to test the model. This setup allowed for the capture of discrete timings for drone movements and charging decisions, informing the development of a stochastic model that accurately represents system uncertainties.
Key findings reveal significant improvements in drone operation efficiency through the application of the deep Q-learning model. The model demonstrated the ability to learn complex strategies, balancing immediate rewards with long-term consequences in a way that would be challenging for traditional optimization methods. Analysis of charging behaviours provided insights into the trade-offs between fast and normal charging options and their impact on overall delivery performance, showcasing the model's capability to make nuanced decisions in a multi-objective optimization context.
This research contributes to the field by offering a scalable and effective solution for managing the complexity of drone delivery networks, with potential applications across various logistics scenarios. The proposed approach demonstrates the potential for substantial improvements in last-mile logistics efficiency and reliability, highlighting the value of combining advanced optimization techniques with deep learning to address complex transportation challenges in dynamic, uncertain environments.