Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
scientific article - MaRDI portal

scientific article

From MaRDI portal

Publication:3096132

Jump to:navigation, search

zbMath1225.68203MaRDI QIDQ3096132

Rémi Munos, Csaba Szepesvári

Publication date: 8 November 2011

Full work available at URL: http://www.jmlr.org/papers/v9/munos08a.html

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

optimal control supervised learning reinforcement learning regression statistical learning theory generative model fitted value iteration discounted Markovian decision processes Pollard's inequality

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Related Items (24)

Dynamic Programming Deconstructed: Transformations of the Bellman Equation and Computational Efficiency ⋮ A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic ⋮ Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms ⋮ A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications ⋮ A convex optimization approach to dynamic programming in continuous state and action spaces ⋮ Efficient approximate dynamic programming based on design and analysis of computer experiments for infinite-horizon optimization ⋮ Batch mode reinforcement learning based on the synthesis of artificial trajectories ⋮ Variational actor-critic algorithms, ⋮ Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning ⋮ Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains ⋮ Approximate dynamic programming for stochastic \(N\)-stage optimization with application to optimal consumption under uncertainty ⋮ Quadratic approximate dynamic programming for input‐affine systems ⋮ Approximate dynamic programming with a fuzzy parameterization ⋮ A linear programming methodology for approximate dynamic programming ⋮ Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach ⋮ Empirical Dynamic Programming ⋮ Solving dynamic discrete choice models using smoothing and sieve methods ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ Learning When-to-Treat Policies ⋮ Convergence of Recursive Stochastic Algorithms Using Wasserstein Divergence ⋮ Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis ⋮ Analyzing Approximate Value Iteration Algorithms ⋮ Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics ⋮ Batch policy learning in average reward Markov decision processes

This page was built for publication:

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:3096132&oldid=16159866"