Pages that link to "Item:Q1568533"
From MaRDI portal
The following pages link to Convergence results for single-step on-policy reinforcement-learning algorithms (Q1568533):
Displaying 47 items.
- Temporal-difference search in Computer Go (Q420936) (← links)
- Semiconductor final test scheduling with Sarsa\((\lambda , k)\) algorithm (Q421685) (← links)
- Adaptive dynamic programming and optimal control of nonlinear nonaffine systems (Q472591) (← links)
- Reinforcement learning algorithms with function approximation: recent advances and applications (Q903601) (← links)
- A theoretical analysis of temporal difference learning in the iterated prisoner's dilemma game (Q1048261) (← links)
- Multiagent learning using a variable learning rate (Q1605410) (← links)
- Approximate stochastic annealing for online control of infinite horizon Markov decision processes (Q1937498) (← links)
- A performance-centred approach to optimising maintenance of complex systems (Q2030609) (← links)
- A reinforcement learning approach for dynamic multi-objective optimization (Q2055564) (← links)
- Multi-agent reinforcement learning: a selective overview of theories and algorithms (Q2094040) (← links)
- Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning (Q2094051) (← links)
- Whittle index based Q-learning for restless bandits with average reward (Q2116660) (← links)
- Reference points and learning (Q2138367) (← links)
- A Q-learning predictive control scheme with guaranteed stability (Q2220029) (← links)
- Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies (Q2318167) (← links)
- An information-theoretic analysis of return maximization in reinforcement learning (Q2375396) (← links)
- Guiding exploration by pre-existing knowledge without modifying reward (Q2383522) (← links)
- Restricted gradient-descent algorithm for value-function approximation in reinforcement learning (Q2389624) (← links)
- The asymptotic equipartition property in reinforcement learning and its relation to return maximization (Q2488678) (← links)
- Generalised weakened fictitious play (Q2507678) (← links)
- A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning (Q2633537) (← links)
- On the convergence of reinforcement learning with Monte Carlo exploring starts (Q2665181) (← links)
- Convergence of discretization procedure in \(Q\)-learning (Q2725088) (← links)
- On modification of population-based search algorithms for convergence in stochastic combinatorial optimization (Q2808309) (← links)
- A simulation-based approach to stochastic dynamic programming (Q2863720) (← links)
- Optimal Learning with Local Nonlinear Parametric Models over Continuous Designs (Q3303989) (← links)
- Posterior Weighted Reinforcement Learning with State Uncertainty (Q3564827) (← links)
- On the Asymptotic Equivalence Between Differential Hebbian and Temporal Difference Learning (Q3616511) (← links)
- Optimal Learning for Nonlinear Parametric Belief Models Over Multidimensional Continuous Spaces (Q4554064) (← links)
- Optimal Learning for Stochastic Optimization with Nonlinear Parametric Belief Models (Q4586173) (← links)
- Convergence Properties of Policy Iteration (Q4652513) (← links)
- On-policy concurrent reinforcement learning (Q4670596) (← links)
- Q-Learning: computation of optimal Q-values for evaluating the learning level in robotic tasks (Q4784353) (← links)
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning (Q4943730) (← links)
- Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms (Q5037552) (← links)
- Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques (Q5166474) (← links)
- Asynchronous stochastic approximation with differential inclusions (Q5168859) (← links)
- A Gentle Introduction to Reinforcement Learning (Q5268414) (← links)
- Machine Learning: ECML 2004 (Q5450769) (← links)
- SOLVING DYNAMIC WILDLIFE RESOURCE OPTIMIZATION PROBLEMS USING REINFORCEMENT LEARNING (Q5697240) (← links)
- Concentration of Contractive Stochastic Approximation and Reinforcement Learning (Q5870773) (← links)
- Premium control with reinforcement learning (Q6174076) (← links)
- Simulation-based search (Q6198646) (← links)
- Reinforcement learning for control design of uncertain polytopic systems (Q6494651) (← links)
- Eligibility traces and forgetting factor in recursive least-squares-based temporal difference (Q6495643) (← links)
- Off-policy RL algorithms can be sample-efficient for continuous control via sample multiple reuse (Q6539379) (← links)
- A Q-learning algorithm for Markov decision processes with continuous state spaces (Q6569411) (← links)