Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
scientific article - MaRDI portal

scientific article

From MaRDI portal
Publication:3096132

zbMath1225.68203MaRDI QIDQ3096132

Rémi Munos, Csaba Szepesvári

Publication date: 8 November 2011

Full work available at URL: http://www.jmlr.org/papers/v9/munos08a.html

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.



Related Items (24)

Dynamic Programming Deconstructed: Transformations of the Bellman Equation and Computational EfficiencyA Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-CriticSome Limit Properties of Markov Chains Induced by Recursive Stochastic AlgorithmsA review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applicationsA convex optimization approach to dynamic programming in continuous state and action spacesEfficient approximate dynamic programming based on design and analysis of computer experiments for infinite-horizon optimizationBatch mode reinforcement learning based on the synthesis of artificial trajectoriesVariational actor-critic algorithms,Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-LearningAdaptive-resolution reinforcement learning with polynomial exploration in deterministic domainsApproximate dynamic programming for stochastic \(N\)-stage optimization with application to optimal consumption under uncertaintyQuadratic approximate dynamic programming for input‐affine systemsApproximate dynamic programming with a fuzzy parameterizationA linear programming methodology for approximate dynamic programmingRecovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approachEmpirical Dynamic ProgrammingSolving dynamic discrete choice models using smoothing and sieve methodsMulti-agent reinforcement learning: a selective overview of theories and algorithmsLearning When-to-Treat PoliciesConvergence of Recursive Stochastic Algorithms Using Wasserstein DivergenceMean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity AnalysisAnalyzing Approximate Value Iteration AlgorithmsToward theoretical understandings of robust Markov decision processes: sample complexity and asymptoticsBatch policy learning in average reward Markov decision processes




This page was built for publication: