On the tactical and strategic behaviour of MCTS when biasing random simulationsReport as inadecuate

On the tactical and strategic behaviour of MCTS when biasing random simulations - Download this document for free, or read online. Document in PDF available to download.

1 LISIC - Laboratoire d-Informatique Signal et Image de la Côte d-Opale

Abstract : Over the last few years, many new algorithms have been proposed to solve combinatorial problems. In this field, Monte-Carlo Tree Search MCTS is a generic method which performs really well on several applications; for instance, it has been used with notable results in the game of Go. To find the most promising decision, MCTS builds a search tree where the new nodes are selected by sampling the search space randomly i.e., by Monte-Carlo simulations. However, efficient Monte-Carlo policies are generally difficult to learn. Even if an improved Monte-Carlo policy performs adequately in some games, it can become useless or harmful in other games depending on how the algorithm takes into account the tactical and the strategic elements of the game. In this article, we address this problem by studying when and why a learned Monte-Carlo policy works. To this end, we use 1 two known Monte-Carlo policy improvements PoolRave and Last-Good-Reply and 2 two connection games Hex and Havannah. We aim to understand how the benefit is related a to the number of random simulations and b to the various game rules within them, tactical and strategic elements of the game. Our results indicate that improved Monte-Carlo policies, such as PoolRave or Last-Good-Reply, work better for games with a strong tactical element for small numbers of random simulations, whereas more general policies seem to be more suited for games with a strong strategic element for higher numbers of random simulations.

Keywords : connexion games monte carlo tree search reinforcement learning

Author: Fabien Teytaud - Julien Dehos -

Source: https://hal.archives-ouvertes.fr/


Related documents