Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
10.1162/1532443041827907 - MaRDI portal

10.1162/1532443041827907

From MaRDI portal

Publication:4826001

Jump to:navigation, search

DOI10.1162/1532443041827907zbMath1094.68080OpenAlexW2130005627MaRDI QIDQ4826001

Ronald Parr, Michail G. Lagoudakis

Publication date: 5 November 2004

Published in: CrossRef Listing of Deleted DOIs (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1162/1532443041827907

zbMATH Keywords

Markov decision processes reinforcement learning

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Related Items

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage, Approximate policy iteration: a survey and some new methods, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications, An incremental off-policy search in a model-free Markov decision process using a single sample path, Reinforcement learning-based design of sampling policies under cost constraints in Markov random fields: application to weed map reconstruction, Potential-based least-squares policy iteration for a parameterized feedback control system, An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning, Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning, Approximate dynamic programming for the dispatch of military medical evacuation assets, Probabilistic inference for determining options in reinforcement learning, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Least squares approximate policy iteration for learning bid prices in choice-based revenue management, Heuristic decision rules for short-term trading of renewable energy with co-located energy storage, A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning, Unnamed Item, Simple and Optimal Methods for Stochastic Variational Inequalities, II: Markovian Noise and Policy Evaluation in Reinforcement Learning, Approximate dynamic programming for the military inventory routing problem, Continual curiosity-driven skill acquisition from high-dimensional video inputs for humanoid robots, Anticipatory action selection for human-robot table tennis, Batch mode reinforcement learning based on the synthesis of artificial trajectories, Dynamic appointment scheduling with wait-dependent abandonment, Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation, Reducing reinforcement learning to KWIK online regression, Multibody dynamics and control using machine learning, A reinforcement learning approach to the stochastic cutting stock problem, Unnamed Item, Model selection in reinforcement learning, ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT, Recent advances in reinforcement learning in finance, Offline reinforcement learning with task hierarchies, Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning, Reinforcement learning algorithms with function approximation: recent advances and applications, Analysis and improvement of policy gradient estimation, Optimization of heuristic search using recursive algorithm selection and reinforcement learning, Hybrid least-squares algorithms for approximate policy evaluation, Optimal Curiosity-Driven Modular Incremental Slow Feature Analysis, Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains, Temporal difference-based policy iteration for optimal control of stochastic systems, A Method to Effectively Detect Vulnerabilities on Path Planning of VIN, Epoch-incremental reinforcement learning algorithms, Parameterized Markov decision process and its application to service rate control, Towards Min Max Generalization in Reinforcement Learning, Unnamed Item, Quadratic approximate dynamic programming for input‐affine systems, Approximate dynamic programming for missile defense interceptor fire control, Sell or store? An ADP approach to marketing renewable energy, Approximate dynamic programming with a fuzzy parameterization, Approximate dynamic programming via direct search in the space of value function approximations, Unnamed Item, A linear programming methodology for approximate dynamic programming, Adaptive importance sampling for value function approximation in off-policy reinforcement learning, Efficient exploration through active learning for value function approximation in reinforcement learning, Proximal algorithms and temporal difference methods for solving fixed point problems, Rollout sampling approximate policy iteration, MREKLM: a fast multiple empirical kernel learning machine, Kernel dynamic policy programming: applicable reinforcement learning to robot systems with high dimensional states, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, An approximate dynamic programming approach for comparing firing policies in a networked air defense environment, Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling, Regularized feature selection in reinforcement learning, Estimating optimal shared-parameter dynamic regimens with application to a multistage depression clinical trial, Challenges of real-world reinforcement learning: definitions, benchmarks and analysis, Learning with policy prediction in continuous state-action multi-agent decision processes, Unnamed Item, Natural actor-critic algorithms, Adaptive dynamic programming for model‐free tracking of trajectories with time‐varying parameters, Multi-agent reinforcement learning using ordinal action selection and approximate policy iteration, Unnamed Item, Unnamed Item, A simulation-based approach to stochastic dynamic programming, Model-free optimal control of discrete-time systems with additive and multiplicative noises, Allocating resources via price management systems: a dynamic programming-based approach, Batch policy learning in average reward Markov decision processes, Actor-Critic Algorithms with Online Feature Adaptation

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4826001&oldid=19153333"