

Of the values of states based on estimates of the values of successor The book describes various reinforcement algorithms where the target value is based on a previous approximation as bootstrap methods: Supervised learning were much more influential. This was an isolatedįoray into reinforcement learning by Widrow, whose contributions to

“ selective bootstrap adaptation” and described it as “learning with aĬritic” instead of “learning with a teacher.” They analyzed this ruleĪnd showed how it could learn to play blackjack. Learning rule that could learn from success and failure signals Widrow, Gupta, and Maitra (1973) modified the Least-Mean-Square (LMS)Īlgorithm of Widrow and Hoff (1960) to produce a reinforcement by kdgregory) and its use in statistics as discussed by Dirk Eddelbuettel. Bootstrapping has yet another meaning in the context of reinforcement learning that may be useful to know for developers, in addition to its use in software development (most answers here, e.g.
