When to stop learning? Bounding the stopping time in the PAC model (Q2740050)

The author deals with the upper and the lower bounds of the stopping time in a PAC learning model. Let the learner be to determine a function \(f:X\rightarrow Y\), where \(f\) belongs to a class \(F=F(X,Y),\;Y=\{0,1\}\). The information arrives sequentially during learning in the form \(f(x_{t}), t=0,1,2,\ldots\) The points \(x_{t}\in X\) appear randomly and independently according to unknown to the learner's probability measure \(\mu\) on \(X\). At the time \(s\) the learner forms a hypothesis \(h_{x_0,\ldots,x_{s-1}}\in F\), which is consistent with \(f\) on the past data \(h_{x_0,\ldots,x_{s-1}}(x_{t})=f(x_{t}), t<s\). Suppose a correct guess \(h_{x_0,\ldots,x_{s-1}}(x_{s})=f(x_{s})\) at time \(s\) is rewarded while a wrong one \(h_{x_0,\ldots,x_{s-1}}(x_{s})\neq f(x_{s})\) is penalized with \(a\) and \(b\) monetary units, respectively. The expected reward \(r_{h,f,\mu}(s)\) from the recall at the time \(s\) is \(r_{h,f,\mu}(s)=a-(a+b)e_{h,f,\mu}(s)\), where \(e_{h,f,\mu}(s)\) is the expectation of a wrong guess. Assuming a fixed discounting factor \(0<\lambda<1\), the expected value of the combined learning and recalling process has the form \(-\sum_{0\leq j<s}\lambda^{j}+r(s) \sum_{s\leq j}\lambda^{j}=[(r(s)+1)\lambda^{s}-1]/(1-\lambda)\). The stopping time \(s\) is optimal if it maximizes the expected value of the combined learning and recalling process. The authors present the bounds for the optimal stopping time \(s\) for the ``least lucky'' learner with \(e(s)=\sup_{h,f,\mu}e_{h,f,\mu}(s)\).

0 references

reviewed by

Aleksandr D. Borisenko

0 references

Identifiers

zbMATH Open document ID

0971.91043

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2740050