International Symposium On Leveraging Applications of Formal Methods, Verification and Validation • Crete, Greece
Time: Sunday, 3.11
Room: Room 1
Authors:
Abstract: Modern model checking tools like UPPAAL Stratego provide a rich framework for modeling cyber-physical systems involving non-determinism as well as time, stochastic and continuous state descriptors. A key objective is to design controllers for such systems that optimize a given objective, e.g., minimizing energy consumption. At an abstract level, the controller design problem can be cast as optimizing a strategy in a continuous (Euclidean) Markov decision process. Partitioning the continuous state space is a simple yet e↵ective strategy to solve this optimization problem in a flexible, non-parametric manner. In previous work we have introduced a reinforcement learning strategy under an undiscounted cost objective on dynamically refined partitions, and we have analyzed at the semantic level approximations of Euclidean MDPs by Imprecise MDPs. In this paper we are extending the approximation analysis to discounted and average cost objectives, and we are moving to close the gap between the theoretical analysis and the practical reinforcement learning approach. We introduce several alternative simulation strategies that on the one hand maintain approximation guarantees as the granularity of the partitioning increases, and on the other hand turns our learning scenario into a standard Q-learning procedure.
Paper: Download