Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
scientific article - MaRDI portal

scientific article

From MaRDI portal

Publication:3093197

Jump to:navigation, search

zbMath1222.68099MaRDI QIDQ3093197

Shie Mannor, John N. Tsitsiklis

Publication date: 12 October 2011

Full work available at URL: http://www.jmlr.org/papers/v5/mannor04b.html

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)

Related Items (23)

Best Arm Identification for Contaminated Bandits ⋮ Approximation algorithms for stochastic combinatorial optimization problems ⋮ Sequential estimation of quantiles with applications to A/B testing and best-arm identification ⋮ Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model ⋮ Unnamed Item ⋮ An instance-based algorithm for deciding the bias of a coin ⋮ A perpetual search for talents across overlapping generations: a learning process ⋮ Pure exploration in finitely-armed and continuous-armed bandits ⋮ Learning the distribution with largest mean: two bandit frameworks ⋮ The \(K\)-armed dueling bandits problem ⋮ Online Regret Bounds for Markov Decision Processes with Deterministic Transitions ⋮ Amplification and Derandomization without Slowdown ⋮ Tractable Sampling Strategies for Ordinal Optimization ⋮ Near-optimal PAC bounds for discounted MDPs ⋮ UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem ⋮ Simple Bayesian Algorithms for Best-Arm Identification ⋮ Online regret bounds for Markov decision processes with deterministic transitions ⋮ Explore First, Exploit Next: The True Shape of Regret in Bandit Problems ⋮ Bayesian Incentive-Compatible Bandit Exploration ⋮ Pure Exploration in Multi-armed Bandits Problems ⋮ Multi-armed bandits with episode context ⋮ Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models ⋮ Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning

This page was built for publication:

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:3093197&oldid=16172848"