scientific article; zbMATH DE number 6253919
From MaRDI portal
Publication:5396654
zbMath1280.91038MaRDI QIDQ5396654
Rémi Munos, Csaba Szepesvári, Gilles Stoltz, Sébastien Bubeck
Publication date: 3 February 2014
Full work available at URL: http://www.jmlr.org/papers/v12/bubeck11a.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Decision theory (91B06) Learning and adaptive systems in artificial intelligence (68T05) Stochastic programming (90C15) Markov and semi-Markov decision processes (90C40) Optimality conditions for minimax problems (49K35) Probabilistic games; gambling (91A60)
Related Items (25)
Distributed Bayesian: A Continuous Distributed Constraint Optimization Problem Solver ⋮ Unnamed Item ⋮ Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information ⋮ Adaptive-treed bandits ⋮ Information theory for ranking and selection ⋮ Multi-armed bandits with censored consumption of resources ⋮ Nonparametric learning for impulse control problems -- exploration vs. exploitation ⋮ Treatment recommendation with distributional targets ⋮ Gaussian process bandits with adaptive discretization ⋮ Deep learning for ranking response surfaces with applications to optimal stopping problems ⋮ Learning in Combinatorial Optimization: What and How to Explore ⋮ Filtered Poisson process bandit on a continuum ⋮ A derivative-free optimization algorithm for the efficient minimization of functions obtained via statistical averaging ⋮ Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization ⋮ Unnamed Item ⋮ Learning to Optimize via Information-Directed Sampling ⋮ Learning‐based iterative modular adaptive control for nonlinear systems ⋮ Derivative-free optimization methods ⋮ Learning to Optimize via Posterior Sampling ⋮ Nonparametric Pricing Analytics with Customer Covariates ⋮ Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm ⋮ A Primal–Dual Learning Algorithm for Personalized Dynamic Pricing with an Inventory Constraint ⋮ Satisficing in Time-Sensitive Bandit Learning ⋮ Sequential Design for Ranking Response Surfaces ⋮ On two continuum armed bandit problems in high dimensions
This page was built for publication: