The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

From MaRDI portal
Publication:6380341

arXiv2110.07409MaRDI QIDQ6380341

Author name not available (Why is that?)

Publication date: 14 October 2021

Abstract: We consider the problem of finding the best memoryless stochastic policy for an infinite-horizon partially observable Markov decision process (POMDP) with finite state and action spaces with respect to either the discounted or mean reward criterion. We show that the (discounted) state-action frequencies and the expected cumulative reward are rational functions of the policy, whereby the degree is determined by the degree of partial observability. We then describe the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints that we characterize explicitly. This allows us to address the combinatorial and geometric complexity of the optimization problem using recent tools from polynomial optimization. In particular, we estimate the number of critical points and use the polynomial programming description of reward maximization to solve a navigation problem in a grid world.




Has companion code repository: https://github.com/muellerjohannes/geometry-pomdps-iclr-2022

No records found.








This page was built for publication: The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6380341)