Logarithmic regret bounds for continuous-time average-reward Markov decision processes

From MaRDI portal
Publication:6608781