Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
scientific article; zbMATH DE number 6253919 - MaRDI portal

scientific article; zbMATH DE number 6253919

From MaRDI portal

Publication:5396654

Jump to:navigation, search

zbMath1280.91038MaRDI QIDQ5396654

Rémi Munos, Csaba Szepesvári, Gilles Stoltz, Sébastien Bubeck

Publication date: 3 February 2014

Full work available at URL: http://www.jmlr.org/papers/v12/bubeck11a.html

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

regret bounds minimax rates bandits with infinitely many arms optimistic online optimization

Mathematics Subject Classification ID

Decision theory (91B06) Learning and adaptive systems in artificial intelligence (68T05) Stochastic programming (90C15) Markov and semi-Markov decision processes (90C40) Optimality conditions for minimax problems (49K35) Probabilistic games; gambling (91A60)

Related Items (25)

Distributed Bayesian: A Continuous Distributed Constraint Optimization Problem Solver ⋮ Unnamed Item ⋮ Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information ⋮ Adaptive-treed bandits ⋮ Information theory for ranking and selection ⋮ Multi-armed bandits with censored consumption of resources ⋮ Nonparametric learning for impulse control problems -- exploration vs. exploitation ⋮ Treatment recommendation with distributional targets ⋮ Gaussian process bandits with adaptive discretization ⋮ Deep learning for ranking response surfaces with applications to optimal stopping problems ⋮ Learning in Combinatorial Optimization: What and How to Explore ⋮ Filtered Poisson process bandit on a continuum ⋮ A derivative-free optimization algorithm for the efficient minimization of functions obtained via statistical averaging ⋮ Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization ⋮ Unnamed Item ⋮ Learning to Optimize via Information-Directed Sampling ⋮ Learning‐based iterative modular adaptive control for nonlinear systems ⋮ Derivative-free optimization methods ⋮ Learning to Optimize via Posterior Sampling ⋮ Nonparametric Pricing Analytics with Customer Covariates ⋮ Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm ⋮ A Primal–Dual Learning Algorithm for Personalized Dynamic Pricing with an Inventory Constraint ⋮ Satisficing in Time-Sensitive Bandit Learning ⋮ Sequential Design for Ranking Response Surfaces ⋮ On two continuum armed bandit problems in high dimensions

This page was built for publication:

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5396654&oldid=20125405"