Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
When to stop learning? Bounding the stopping time in the PAC model - MaRDI portal

When to stop learning? Bounding the stopping time in the PAC model (Q2740050)

From MaRDI portal





scientific article; zbMATH DE number 1646470
Language Label Description Also known as
English
When to stop learning? Bounding the stopping time in the PAC model
scientific article; zbMATH DE number 1646470

    Statements

    0 references
    0 references
    16 September 2001
    0 references
    learning process
    0 references
    optimal stopping time
    0 references
    PAC model
    0 references
    forecast
    0 references
    expected reward
    0 references
    When to stop learning? Bounding the stopping time in the PAC model (English)
    0 references
    The author deals with the upper and the lower bounds of the stopping time in a PAC learning model. Let the learner be to determine a function \(f:X\rightarrow Y\), where \(f\) belongs to a class \(F=F(X,Y),\;Y=\{0,1\}\). The information arrives sequentially during learning in the form \(f(x_{t}), t=0,1,2,\ldots\) The points \(x_{t}\in X\) appear randomly and independently according to unknown to the learner's probability measure \(\mu\) on \(X\). At the time \(s\) the learner forms a hypothesis \(h_{x_0,\ldots,x_{s-1}}\in F\), which is consistent with \(f\) on the past data \(h_{x_0,\ldots,x_{s-1}}(x_{t})=f(x_{t}), t<s\). Suppose a correct guess \(h_{x_0,\ldots,x_{s-1}}(x_{s})=f(x_{s})\) at time \(s\) is rewarded while a wrong one \(h_{x_0,\ldots,x_{s-1}}(x_{s})\neq f(x_{s})\) is penalized with \(a\) and \(b\) monetary units, respectively. The expected reward \(r_{h,f,\mu}(s)\) from the recall at the time \(s\) is \(r_{h,f,\mu}(s)=a-(a+b)e_{h,f,\mu}(s)\), where \(e_{h,f,\mu}(s)\) is the expectation of a wrong guess. Assuming a fixed discounting factor \(0<\lambda<1\), the expected value of the combined learning and recalling process has the form \(-\sum_{0\leq j<s}\lambda^{j}+r(s) \sum_{s\leq j}\lambda^{j}=[(r(s)+1)\lambda^{s}-1]/(1-\lambda)\). The stopping time \(s\) is optimal if it maximizes the expected value of the combined learning and recalling process. The authors present the bounds for the optimal stopping time \(s\) for the ``least lucky'' learner with \(e(s)=\sup_{h,f,\mu}e_{h,f,\mu}(s)\).
    0 references

    Identifiers