The following pages link to (Q3093369):
Displaying 14 items.
- Time-varying policy rule under learning (Q500481) (← links)
- The factored policy-gradient planner (Q835832) (← links)
- On high-order differentiability of the policy function (Q1341453) (← links)
- Inhomogeneous deep Q-network for time sensitive applications (Q2093364) (← links)
- Policy iterations for reinforcement learning problems in continuous time and space -- fundamental theory and methods (Q2664203) (← links)
- (Q4533363) (← links)
- (Q4558153) (← links)
- (Q4969241) (← links)
- (Q5053195) (← links)
- An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions (Q5380403) (← links)
- Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality (Q6136230) (← links)
- Linear Convergence of a Policy Gradient Method for Some Finite Horizon Continuous Time Control Problems (Q6140987) (← links)
- Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems (Q6490237) (← links)
- Recent developments in machine learning methods for stochastic control and games (Q6615618) (← links)