Online Markov Decision Processes Under Bandit Feedback
From MaRDI portal
Publication:2983230
DOI10.1109/TAC.2013.2292137zbMath1360.90281OpenAlexW2000850397MaRDI QIDQ2983230
András Antos, András György, Gergely Neu, Csaba Szepesvári
Publication date: 16 May 2017
Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1109/tac.2013.2292137
Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40) Online algorithms; streaming algorithms (68W27)
Related Items (6)
Learning Stationary Nash Equilibrium Policies in \(n\)-Player Stochastic Games with Independent Chains ⋮ Chasing Ghosts: Competing with Stateful Policies ⋮ An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions ⋮ Multi-period orienteering with uncertain adoption likelihood and waiting at customers ⋮ Unnamed Item ⋮ Allocating resources via price management systems: a dynamic programming-based approach
This page was built for publication: Online Markov Decision Processes Under Bandit Feedback