When to stop learning? Bounding the stopping time in the PAC model (Q2740050)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: When to stop learning? Bounding the stopping time in the PAC model |
scientific article; zbMATH DE number 1646470
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | When to stop learning? Bounding the stopping time in the PAC model |
scientific article; zbMATH DE number 1646470 |
Statements
16 September 2001
0 references
learning process
0 references
optimal stopping time
0 references
PAC model
0 references
forecast
0 references
expected reward
0 references
When to stop learning? Bounding the stopping time in the PAC model (English)
0 references
The author deals with the upper and the lower bounds of the stopping time in a PAC learning model. Let the learner be to determine a function \(f:X\rightarrow Y\), where \(f\) belongs to a class \(F=F(X,Y),\;Y=\{0,1\}\). The information arrives sequentially during learning in the form \(f(x_{t}), t=0,1,2,\ldots\) The points \(x_{t}\in X\) appear randomly and independently according to unknown to the learner's probability measure \(\mu\) on \(X\). At the time \(s\) the learner forms a hypothesis \(h_{x_0,\ldots,x_{s-1}}\in F\), which is consistent with \(f\) on the past data \(h_{x_0,\ldots,x_{s-1}}(x_{t})=f(x_{t}), t<s\). Suppose a correct guess \(h_{x_0,\ldots,x_{s-1}}(x_{s})=f(x_{s})\) at time \(s\) is rewarded while a wrong one \(h_{x_0,\ldots,x_{s-1}}(x_{s})\neq f(x_{s})\) is penalized with \(a\) and \(b\) monetary units, respectively. The expected reward \(r_{h,f,\mu}(s)\) from the recall at the time \(s\) is \(r_{h,f,\mu}(s)=a-(a+b)e_{h,f,\mu}(s)\), where \(e_{h,f,\mu}(s)\) is the expectation of a wrong guess. Assuming a fixed discounting factor \(0<\lambda<1\), the expected value of the combined learning and recalling process has the form \(-\sum_{0\leq j<s}\lambda^{j}+r(s) \sum_{s\leq j}\lambda^{j}=[(r(s)+1)\lambda^{s}-1]/(1-\lambda)\). The stopping time \(s\) is optimal if it maximizes the expected value of the combined learning and recalling process. The authors present the bounds for the optimal stopping time \(s\) for the ``least lucky'' learner with \(e(s)=\sup_{h,f,\mu}e_{h,f,\mu}(s)\).
0 references