Pages that link to "Item:Q4540300"
From MaRDI portal
The following pages link to Simulation-based optimization of Markov reward processes (Q4540300):
Displaying 41 items.
- On the Fisher metric of conditional probability polytopes (Q296470) (← links)
- Finding optimal memoryless policies of POMDPs under the expected average reward criterion (Q418072) (← links)
- An online actor-critic algorithm with function approximation for constrained Markov decision processes (Q438776) (← links)
- Performance optimization of queueing systems with perturbation realization (Q439492) (← links)
- A tutorial on event-based optimization -- a new optimization framework (Q461464) (← links)
- Event-based optimization of admission control in open queueing networks (Q461465) (← links)
- Parameterized Markov decision process and its application to service rate control (Q492972) (← links)
- Simulation-based optimization of Markov decision processes: an empirical process theory approach (Q608432) (← links)
- A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases (Q705478) (← links)
- Natural actor-critic algorithms (Q1049136) (← links)
- A reinforcement learning adaptive fuzzy controller for robots. (Q1398938) (← links)
- A time aggregation approach to Markov decision processes (Q1614322) (← links)
- Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains (Q1699932) (← links)
- Variance minimization of parameterized Markov decision processes (Q1745941) (← links)
- Basic ideas for event-based optimization of Markov systems (Q1773104) (← links)
- Coupling based estimation approaches for the average reward performance potential in Markov chains (Q1796998) (← links)
- Approximate gradient methods in policy-space optimization of Markov reward processes (Q1870312) (← links)
- Deep reinforcement learning for inventory control: a roadmap (Q2076812) (← links)
- Whittle index based Q-learning for restless bandits with average reward (Q2116660) (← links)
- Smoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpoint (Q2242923) (← links)
- On tight bounds for function approximation error in risk-sensitive reinforcement learning (Q2243003) (← links)
- Performance optimization for a class of generalized stochastic Petri nets (Q2348377) (← links)
- Dynamic programming and suboptimal control: a survey from ADP to MPC (Q2511993) (← links)
- Computing optimal policies for Markovian decision processes using simulation (Q2808287) (← links)
- On-Line Optimization of Simulated Markovian Processes (Q3348731) (← links)
- Simulation‐based Uniform Value Function Estimates of Markov Decision Processes (Q3593009) (← links)
- Automatic generation of efficient policy alternatives via simulation-optimization (Q4656747) (← links)
- Queueing Network Controls via Deep Reinforcement Learning (Q5084497) (← links)
- Risk-Sensitive Reinforcement Learning via Policy Gradient Search (Q5102286) (← links)
- Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets (Q5145843) (← links)
- Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme (Q5153609) (← links)
- Policy Gradient Approach of Event‐Based Optimization and Its Online Implementation (Q5177188) (← links)
- Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities (Q5265786) (← links)
- Actor-Critic Algorithms with Online Feature Adaptation (Q5270681) (← links)
- Concentration of Contractive Stochastic Approximation and Reinforcement Learning (Q5870773) (← links)
- Stochastic approximation algorithms: overview and recent trends. (Q5955825) (← links)
- A sensitivity formula for risk-sensitive cost and the actor-critic algorithm (Q5958425) (← links)
- On-line policy gradient estimation with multi-step sampling (Q5962027) (← links)
- Deep reinforcement trading with predictable returns (Q6098411) (← links)
- Geometry and convergence of natural policy gradient methods (Q6138809) (← links)
- Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning (Q6143823) (← links)