Combining policies: the best of human expertise and neurocontrolReportar como inadecuado

Combining policies: the best of human expertise and neurocontrol - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

1 LRI - Laboratoire de Recherche en Informatique 2 TAO - Machine Learning and Optimisation LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623

Abstract : We consider sequential decision making in the case where a generative model and a parametric policy are available. Such a framework is naturally tackled with Direct Policy Search, i.e. parametric op-timisation over simulations. We propose a simple method that combines this parametric policy with a more generic neural network, where all parameters are trained simultaneously. As such, our approach doesn-t require any computational overhead. We show that the resulting policy significantly outperforms both the domain specific policies and the neural network on a unit commitment test problem.

Autor: Vincent Berthier - Adrien Couëtoux - Olivier Teytaud -



Documentos relacionados