Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
scientific article - MaRDI portal

scientific article

From MaRDI portal
Publication:3093197

zbMath1222.68099MaRDI QIDQ3093197

Shie Mannor, John N. Tsitsiklis

Publication date: 12 October 2011

Full work available at URL: http://www.jmlr.org/papers/v5/mannor04b.html

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.



Related Items (23)

Best Arm Identification for Contaminated BanditsApproximation algorithms for stochastic combinatorial optimization problemsSequential estimation of quantiles with applications to A/B testing and best-arm identificationMinimax PAC bounds on the sample complexity of reinforcement learning with a generative modelUnnamed ItemAn instance-based algorithm for deciding the bias of a coinA perpetual search for talents across overlapping generations: a learning processPure exploration in finitely-armed and continuous-armed banditsLearning the distribution with largest mean: two bandit frameworksThe \(K\)-armed dueling bandits problemOnline Regret Bounds for Markov Decision Processes with Deterministic TransitionsAmplification and Derandomization without SlowdownTractable Sampling Strategies for Ordinal OptimizationNear-optimal PAC bounds for discounted MDPsUCB revisited: improved regret bounds for the stochastic multi-armed bandit problemSimple Bayesian Algorithms for Best-Arm IdentificationOnline regret bounds for Markov decision processes with deterministic transitionsExplore First, Exploit Next: The True Shape of Regret in Bandit ProblemsBayesian Incentive-Compatible Bandit ExplorationPure Exploration in Multi-armed Bandits ProblemsMulti-armed bandits with episode contextNonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit modelsTrading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning






This page was built for publication: