Optimal policy evaluation using kernel-based temporal difference methods
From MaRDI portal
Publication:6656605
DOI10.1214/24-aos2399MaRDI QIDQ6656605
Yaqi Duan, Martin J. Wainwright, Unnamed Author
Publication date: 3 January 2025
Published in: The Annals of Statistics (Search for Journal in Brave)
dynamic programmingreinforcement learningreproducing kernel Hilbert spacesequential decision-makingnonparametric estimationtemporal difference learningpolicy evaluationMarkov reward process
Cites Work
- Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
- Markov decision processes in practice
- Kernel-based reinforcement learning
- Optimal global rates of convergence for nonparametric regression
- Linear least-squares algorithms for temporal difference learning
- Randomized sketches for kernels: fast and optimal nonparametric regression
- Some results on Tchebycheffian spline functions and stochastic processes
- ON RATE OPTIMALITY FOR ILL-POSED INVERSE PROBLEMS IN ECONOMETRICS
- Error Bounds for Approximations from Projected Linear Equations
- Instrumental Variables Regression with Independent Observations
- A Lower Bound on the Risks of Non-Parametric Estimates of Densities in the Uniform Metric
- An analysis of temporal-difference learning with function approximation
- High-Dimensional Statistics
- The variance of discounted Markov decision processes
- Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference
- Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning
- Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
- Discrete Dynamic Programming
- Discounted Dynamic Programming
- Minimax-optimal rates for sparse additive models over kernel classes via convex programming
- Instrumental Variable Estimation of Nonparametric Models
- Learning Bounds for Kernel Regression Using Effective Data Dimensionality
- Smoothing spline ANOVA models
- Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
This page was built for publication: Optimal policy evaluation using kernel-based temporal difference methods