Reinforcement Learning with Temporal Logic Constraints

This objective is to develop a model-free reinforcement learning method for stochastic planning under temporal logic constraints. In recent work [1], we propose an approach to translate high-level system specifications expressed by a subclass of Probabilistic Computational Tree Logic (PCTL) into chance constraints. We devise a variant of Approximate Dynamic Programming method—approximate value iteration— to solve for the optimal policy while the satisfaction of the PCTL formula is guaranteed.


Screen Shot 2019-09-26 at 6.55.10 PM Screen Shot 2019-09-26 at 6.55.25 PM

In [2], we study model-free reinforcement learning to maximize the probability of satisfying high-level system specifications expressed in a subclass of temporal logic formulas—syntactically co- safe linear temporal logic. In order to address the issue of sparse reward given by satisfaction of temporal logic formula, we propose a topological approximate dynamic programming which includes two steps: First, we decompose the planning problem into a sequence of sub-problems based on the topological property of the task automaton which is translated from a temporal logic formula. Second, we extend a model-free approximate dynamic programming method to solve value functions, one for each state in the task automaton, in an order reverse to the causal dependency. Particularly, we show that the run-time of the proposed algorithm does not grow exponentially with the size of specifications. The correctness and efficiency of the algorithm are demonstrated using a robotic motion planning example.


Screen Shot 2019-09-26 at 6.59.17 PM Screen Shot 2019-09-26 at 7.00.09 PM

Related Publications:

[1] Lening Li, Jie Fu, “Approximate Dynamic Programming with Probabilistic Temporal Logic Constraints”, arXiv:1810.02199, Annual American Control Conference, 2019.

[2] Lening Li, Jie Fu, “Topological Approximate Dynamic Programming under Temporal Logic Constraints”, arXiv: 1907.10510, IEEE Conference on Decision and Control, 2019.

[3]Lening Li, Jie Fu, “Sampling-based approximate optimal temporal logic planning”, IEEE International Conference on Robotics and Automation (ICRA), 2017.