Rollout, Policy Iteration, and Distributed Reinforcement Learning, ロールアウト、ポリシー反復、および分散強化学習, 9781886529076, 978-1-886529-07-6

Rollout, Policy Iteration, and Distributed Reinforcement Learning

学術書籍  >  理工学  >  知能情報学  > 




Rollout, Policy Iteration, and Distributed Reinforcement Learning

15,631(税込)

数量

【在庫有り】
 

書名

Rollout, Policy Iteration, and Distributed Reinforcement Learning
ロールアウト、ポリシー反復、および分散強化学習
著者・編者 Bertsekas, D.P.
出版社 Athena Scientific
発行年/月 2020年8月   
装丁 Hardcover
ページ数 376 ページ
ISBN 978-1-886529-07-6
発送予定 1-2営業日以内に発送します

Description

The purpose of this book is to develop in greater depth some of the methods from the author's Reinforcement Learning and Optimal Control recently published textbook (Athena Scientific, 2019). In particular, we present new research, relating to systems involving multiple agents, partitioned architectures, and distributed asynchronous computation. We pay special attention to the contexts of dynamic programming/policy iteration and control theory/model predictive control. We also discuss in some detail the application of the methodology to challenging discrete/combinatorial optimization problems, such as routing, scheduling, assignment, and mixed integer programming, including the use of neural network approximations within these contexts.

The book focuses on the fundamental idea of policy iteration, i.e., start from some policy, and successively generate one or more improved policies. If just one improved policy is generated, this is called rollout, which, based on broad and consistent computational experience, appears to be one of the most versatile and reliable of all reinforcement learning methods. In this book, rollout algorithms are developed for both discrete deterministic and stochastic DP problems, and the development of distributed implementations in both multiagent and multiprocessor settings, aiming to take advantage of parallelism.

Approximate policy iteration is more ambitious than rollout, but it is a strictly off-line method, and it is generally far more computationally intensive. This motivates the use of parallel and distributed computation. One of the purposes of the monograph is to discuss distributed (possibly asynchronous) methods that relate to rollout and policy iteration, both in the context of an exact and an approximate implementation involving neural networks or other approximation architectures.

Special Features

- Presents new research relating to distributed asynchronous computation, partitioned architectures, and multiagent systems, with application to challenging large scale optimization problems, such as combinatorial/discrete optimization, as well as partially observed Markov decision problems.
- Describes variants of rollout and policy iteration for problems with a multiagent structure, which allow the dramatic reduction of the computational requirements for lookahead minimization.
- Establishes a connection of rollout with model predictive control, one of the most prominent control system design methodology.
- Expands the coverage of some research areas discussed in the author?s 2019 textbook Reinforcement Learning and Optimal Control.


 

Contents:

1. Dynamic Programming Principles
1.1. Deterministic Dynamic Program
1.2. Stochastic Dynamic Program
1.3. Examples, Variations, and Simplifica
1.4. Reinforcement Learning and Optimal Control - Some Terminology
1.5. Notes and Sources

2. Rollout and Policy Improvement
2.1. Approximation in Value and Policy Space
2.2. General Issues of Approximation in Value Space
2.3. Rollout and the Policy Improvement Principle
2.4. Stochastic Rollout and Monte Carlo Tree Search
2.5. Rollout for Infinite-Spaces Problems - Otimization Heuristics
2.6. Notes and Sources

3. Specialized Rollout Algorithms
3.1. Model Predictive Control
3.2. Multiagent Rollout
3.3. Constrained Rollout for Deterministic Optimization
3.4. Constrained Rollout - Combinatorial and Discrete Optimization
3.5. Surrogate Dynamic Programming and Rollout
3.6. Rollout for Minimax Control
3.7. Notes and Sources

4. Learning Values and Policies
4.1. Approximation Architectures
4.2. Neural Networks
4.3. Training of Cost Functions in Approximate DP
4.4. Training of Policies in Approximate DP
4.5. Notes and Sources

5. Infinite Horizon: Distributed and Multiagent Algorithms
5.1. Stochastic Shortest Path and Discounted Problems
5.2. Exact and Approximate Policy Iteration
5.3. Abstract View of Infinite Horizon Problems
5.4. Multiagent Value and Policy Iteration
5.5. Asynchronous Distributed Value Iteration
5.6. Asynchronous Distributed Policy Iteration
5.7. Notes and Sources