Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Simulation-based optimization of Markov reward processes - MaRDI portal

Simulation-based optimization of Markov reward processes

From MaRDI portal
Publication:4540300

DOI10.1109/9.905687zbMath0992.93088OpenAlexW2120465407MaRDI QIDQ4540300

Peter Marbach, John N. Tsitsiklis

Publication date: 21 July 2002

Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1109/9.905687




Related Items (35)

A time aggregation approach to Markov decision processesOn the Fisher metric of conditional probability polytopesQueueing Network Controls via Deep Reinforcement LearningRisk-Sensitive Reinforcement Learning via Policy Gradient SearchDeep reinforcement trading with predictable returnsA reinforcement learning adaptive fuzzy controller for robots.Geometry and convergence of natural policy gradient methodsConditionally Elicitable Dynamic Risk Measures for Deep Reinforcement LearningFinding optimal memoryless policies of POMDPs under the expected average reward criterionTotally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domainsAn online actor-critic algorithm with function approximation for constrained Markov decision processesPerformance optimization of queueing systems with perturbation realizationA tutorial on event-based optimization -- a new optimization frameworkEvent-based optimization of admission control in open queueing networksEfficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector SetsParameterized Markov decision process and its application to service rate controlFull Gradient DQN Reinforcement Learning: A Provably Convergent SchemeVariance minimization of parameterized Markov decision processesSmoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpointOn tight bounds for function approximation error in risk-sensitive reinforcement learningPolicy Gradient Approach of Event‐Based Optimization and Its Online ImplementationA unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain casesBasic ideas for event-based optimization of Markov systemsStochastic approximation algorithms: overview and recent trends.A sensitivity formula for risk-sensitive cost and the actor-critic algorithmOn-line policy gradient estimation with multi-step samplingDynamic programming and suboptimal control: a survey from ADP to MPCCoupling based estimation approaches for the average reward performance potential in Markov chainsDeep reinforcement learning for inventory control: a roadmapNatural actor-critic algorithmsPerformance optimization for a class of generalized stochastic Petri netsApproximation of average cost Markov decision processes using empirical distributions and concentration inequalitiesConcentration of Contractive Stochastic Approximation and Reinforcement LearningActor-Critic Algorithms with Online Feature AdaptationWhittle index based Q-learning for restless bandits with average reward




This page was built for publication: Simulation-based optimization of Markov reward processes