Dynamic Policy Programming - Computer Science > LearningReportar como inadecuado




Dynamic Policy Programming - Computer Science > Learning - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Abstract: In this paper, we propose a novel policy iteration method, called dynamicpolicy programming DPP, to estimate the optimal policy in theinfinite-horizon Markov decision processes. We prove the finite-iteration andasymptotic l\infty-norm performance-loss bounds for DPP in the presence ofapproximation-estimation error. The bounds are expressed in terms of thel\infty-norm of the average accumulated error as opposed to the l\infty-norm ofthe error in the case of the standard approximate value iteration AVI and theapproximate policy iteration API. This suggests that DPP can achieve a betterperformance than AVI and API since it averages out the simulation noise causedby Monte-Carlo sampling throughout the learning process. We examine thistheoretical results numerically by com- paring the performance of theapproximate variants of DPP with existing reinforcement learning RL methodson different problem domains. Our results show that, in all cases, DPP-basedalgorithms outperform other RL methods by a wide margin.



Autor: Mohammad Gheshlaghi Azar, Vicenc Gomez, Hilbert J. Kappen

Fuente: https://arxiv.org/







Documentos relacionados