Optimal policy evaluation using kernel-based temporal difference methods

Yaqi Duan, Martin J. Wainwright, Unnamed Author

Publication date: 3 January 2025

Published in: The Annals of Statistics (Search for Journal in Brave)

dynamic programming reinforcement learning reproducing kernel Hilbert space sequential decision-making nonparametric estimation temporal difference learning policy evaluation Markov reward process

Mathematics Subject Classification ID

Nonparametric estimation (62G05) Markov processes: estimation; hidden Markov models (62M05)

Cites Work

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
Markov decision processes in practice
Kernel-based reinforcement learning
Optimal global rates of convergence for nonparametric regression
Linear least-squares algorithms for temporal difference learning
Randomized sketches for kernels: fast and optimal nonparametric regression
Some results on Tchebycheffian spline functions and stochastic processes
ON RATE OPTIMALITY FOR ILL-POSED INVERSE PROBLEMS IN ECONOMETRICS
Error Bounds for Approximations from Projected Linear Equations
Instrumental Variables Regression with Independent Observations
A Lower Bound on the Risks of Non-Parametric Estimates of Densities in the Uniform Metric
An analysis of temporal-difference learning with function approximation
High-Dimensional Statistics
The variance of discounted Markov decision processes
Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference
Instance-Dependent ℓ_∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
Discrete Dynamic Programming
Discounted Dynamic Programming
Minimax-optimal rates for sparse additive models over kernel classes via convex programming
Instrumental Variable Estimation of Nonparametric Models
Learning Bounds for Kernel Regression Using Effective Data Dimensionality
Smoothing spline ANOVA models
Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item

This page was built for publication: Optimal policy evaluation using kernel-based temporal difference methods