scientific article
From MaRDI portal
Publication:3096132
zbMath1225.68203MaRDI QIDQ3096132
Publication date: 8 November 2011
Full work available at URL: http://www.jmlr.org/papers/v9/munos08a.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
optimal controlsupervised learningreinforcement learningregressionstatistical learning theorygenerative modelfitted value iterationdiscounted Markovian decision processesPollard's inequality
Related Items (24)
Dynamic Programming Deconstructed: Transformations of the Bellman Equation and Computational Efficiency ⋮ A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic ⋮ Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms ⋮ A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications ⋮ A convex optimization approach to dynamic programming in continuous state and action spaces ⋮ Efficient approximate dynamic programming based on design and analysis of computer experiments for infinite-horizon optimization ⋮ Batch mode reinforcement learning based on the synthesis of artificial trajectories ⋮ Variational actor-critic algorithms, ⋮ Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning ⋮ Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains ⋮ Approximate dynamic programming for stochastic \(N\)-stage optimization with application to optimal consumption under uncertainty ⋮ Quadratic approximate dynamic programming for input‐affine systems ⋮ Approximate dynamic programming with a fuzzy parameterization ⋮ A linear programming methodology for approximate dynamic programming ⋮ Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach ⋮ Empirical Dynamic Programming ⋮ Solving dynamic discrete choice models using smoothing and sieve methods ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ Learning When-to-Treat Policies ⋮ Convergence of Recursive Stochastic Algorithms Using Wasserstein Divergence ⋮ Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis ⋮ Analyzing Approximate Value Iteration Algorithms ⋮ Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics ⋮ Batch policy learning in average reward Markov decision processes
This page was built for publication: