Reinforcement learning in home energy management systems: tutorial, survey and application


Journal Title

Journal ISSN

Volume Title



This thesis provides a comprehensive overview of all major reinforcement learning algorithms and a review of all areas of home energy management systems. The last chapter of the research proposes a subjective method to improve human environmental comfort using imitation learning and a state-of-the-art tabular machine learning algorithm called deep attentive tabular neural network (TabNet). Previous studies have focused on using reinforcement learning as a method to replace human based control, are limited because human comfort is not a subjective value that you can easily measure. This research takes an alternate route; it applies imitation learning, a paradigm that is strongly connected to reinforcement learning, in order to simulate human behavior, while maintaining energy consumption at desirable levels. In treating humans as the ‘experts’, it eliminates the need to quantify occupant comfort during the learning process. Policy models were developed using TabNet. TabNet is trained on the expert’s samples, which are obtained from switching the HVAC thermostat settings, then this learned policy is applied to the period when the occupant is inactive (sleeping/lazy). Our primary objective is to minimize the difference of the indoor temperature and humidity of the occupants when he is active, sleeping or lazy. Predicted mean vote (PMV) is used here to measure if that objective have been met or not. PMV is a parametric equation that is used extensively to measure the indoor comfort. Our results show that applying our learned policy improved the comfort level by about 9% and close the comfort’s gap between the active and sleeping periods of the occupants.



Machine learning, Deep learning, Reinforcement learning, HEMS, Thermal comfort, TabNet

Graduation Month



Master of Science


Department of Electrical and Computer Engineering

Major Professor

Sanjoy Das