Pages that link to "Item:Q5959973"
From MaRDI portal
The following pages link to Finite-time analysis of the multiarmed bandit problem (Q5959973):
Displaying 50 items.
- Bayesian policy reuse (Q1689554) (← links)
- A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing (Q1690964) (← links)
- A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems (Q1694949) (← links)
- An optimal bidimensional multi-armed bandit auction for multi-unit procurement (Q1714944) (← links)
- A Monte Carlo tree search approach to finding efficient patrolling schemes on graphs (Q1735189) (← links)
- Markov decision processes with sequential sensor measurements (Q1737870) (← links)
- Anytime discovery of a diverse set of patterns with Monte Carlo tree search (Q1741383) (← links)
- On Bayesian index policies for sequential resource allocation (Q1750289) (← links)
- A methodology for determining an effective subset of heuristics in selection hyper-heuristics (Q1753518) (← links)
- Comparison of Kriging-based algorithms for simulation optimization with heterogeneous noise (Q1753578) (← links)
- A hybrid breakout local search and reinforcement learning approach to the vertex separator problem (Q1753627) (← links)
- Reinforcement learning with immediate rewards and linear hypotheses (Q1762980) (← links)
- An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem (Q1785430) (← links)
- On the probability of correct selection in ordinal comparison over dynamic networks (Q1935307) (← links)
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm (Q1945130) (← links)
- Regret bounds for sleeping experts and bandits (Q1959599) (← links)
- Batch repair actions for automated troubleshooting (Q1989400) (← links)
- Latest stored information based adaptive selection strategy for multiobjective evolutionary algorithm (Q1992994) (← links)
- Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards (Q2006767) (← links)
- Efficient crowdsourcing of unknown experts using bounded multi-armed bandits (Q2014933) (← links)
- An online algorithm for the risk-aware restless bandit (Q2029383) (← links)
- Rollout sampling approximate policy iteration (Q2036256) (← links)
- Neural precedence recommender (Q2055885) (← links)
- Enhancing gene expression programming based on space partition and jump for symbolic regression (Q2056306) (← links)
- A revised approach for risk-averse multi-armed bandits under CVaR criterion (Q2060576) (← links)
- Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit (Q2074307) (← links)
- Nonparametric Bayesian multiarmed bandits for single-cell experiment design (Q2078782) (← links)
- Two-armed bandit problem and batch version of the mirror descent algorithm (Q2081125) (← links)
- Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit (Q2081727) (← links)
- Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm (Q2091834) (← links)
- Multi-agent reinforcement learning: a selective overview of theories and algorithms (Q2094040) (← links)
- Fairness in learning-based sequential decision algorithms: a survey (Q2094049) (← links)
- Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning (Q2094051) (← links)
- The pure exploration problem with general reward functions depending on full distributions (Q2102381) (← links)
- Exploring search space trees using an adapted version of Monte Carlo tree search for combinatorial optimization problems (Q2108169) (← links)
- Online machine learning algorithms to optimize performances of complex wireless communication systems (Q2130281) (← links)
- PAC-Bayesian lifelong learning for multi-armed bandits (Q2134066) (← links)
- Adaptive large neighborhood search for mixed integer programming (Q2146445) (← links)
- Learning to steer nonlinear interior-point methods (Q2175369) (← links)
- The multi-armed bandit problem: an efficient nonparametric solution (Q2176624) (← links)
- A reliability-aware multi-armed bandit approach to learn and select users in demand response (Q2207171) (← links)
- A pricing problem with unknown arrival rate and price sensitivity (Q2216175) (← links)
- Ballooning multi-armed bandits (Q2238588) (← links)
- Dynamic pricing with finite price sets: a non-parametric approach (Q2238754) (← links)
- Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: a state-of-the-art (Q2242290) (← links)
- BoostingTree: parallel selection of weak learners in boosting, with application to ranking (Q2251442) (← links)
- Effective deadlock resolution with self-interested partially-rational agents (Q2254627) (← links)
- A multi-objective Monte Carlo tree search for forest harvest scheduling (Q2286904) (← links)
- Adaptive policies for perimeter surveillance problems (Q2286935) (← links)
- A survey of network interdiction models and algorithms (Q2294622) (← links)