Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
From MaRDI portal
Publication:2887009
DOI10.1162/NECO_a_00199zbMath1237.68147OpenAlexW1971492381WikidataQ51539172 ScholiaQ51539172MaRDI QIDQ2887009
Hirotaka Hachiya, Masashi Sugiyama, Jan Peters
Publication date: 15 May 2012
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1162/neco_a_00199
Related Items
Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration, Autonomous reinforcement learning with experience replay
Uses Software
Cites Work
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- Adaptive importance sampling for value function approximation in off-policy reinforcement learning
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Efficient exploration through active learning for value function approximation in reinforcement learning
- Using Expectation-Maximization for Reinforcement Learning
- Input-dependent estimation of generalization error under covariate shift
- Trading Variance Reduction with Unbiasedness: The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression