On a sequential two-action decision model1with unbounded reward functions
From MaRDI portal
Publication:3830831
DOI10.1080/02331938908843421zbMath0675.90090OpenAlexW2049913415MaRDI QIDQ3830831
H. Benzing, Radu Theodorescu, Dieter Kalin
Publication date: 1989
Published in: Optimization (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1080/02331938908843421
stopping rulemonotonicity propertiestwo-armed banditstay-on-a-winner ruleBernoulli bandit modelsmaximum expected total discounted rewardone-step-look-ahead policiessampling without recallsequential Markov decision
Dynamic programming (90C39) Markov and semi-Markov decision processes (90C40) Sequential statistical design (62L05) Optimal stopping in statistics (62L15)
Cites Work