Maximin Action Identification: A New Bandit Framework for GamesReportar como inadecuado




Maximin Action Identification: A New Bandit Framework for Games - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

1 IMT - Institut de Mathématiques de Toulouse UMR5219 2 CNRS - Centre National de la Recherche Scientifique 3 CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 4 SEQUEL - Sequential Learning Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 5 CWI - Centrum Wiskunde & Informatica

Abstract : We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

Keywords : racing LUCB multi-armed bandit problems games best-arm identification





Autor: Aurélien Garivier - Emilie Kaufmann - Wouter Koolen -

Fuente: https://hal.archives-ouvertes.fr/



DESCARGAR PDF




Documentos relacionados