Two-stage index computation for bandits with switching penalties I : switching costsReportar como inadecuado




Two-stage index computation for bandits with switching penalties I : switching costs - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Editor: Universidad Carlos III de Madrid. Departamento de Estadística

Issued date: 2007-05

Serie-No.: UC3M Working papers. Statistics and Econometrics07-09

Keywords: Dynamic programming , Markov , Finite state , Bandits , Switching costs , Index policy , Whittle index , Hysteresis , Work-reward analysis , PCL-indexability , Analysis of algorithms

Rights: Atribución-NoComercial-SinDerivadas 3.0 España

Abstract:This paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a -continuation index- (its Gittins index) and a -switching indThis paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a -continuation index- (its Gittins index) and a -switching index-. They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O(n^3) arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n^2+O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances.+-





Autor: Niño-Mora, José

Fuente: http://e-archivo.uc3m.es


Introducción



Universidad Carlos III de Madrid Repositorio institucional e-Archivo http:--e-archivo.uc3m.es Departamento de Estadística DES - Working Papers.
Statistics and Econometrics.
WS 2007-05 Two-stage index computation for bandits with switching penalties I : switching costs Niño-Mora, José http:--hdl.handle.net-10016-794 Descargado de e-Archivo, repositorio institucional de la Universidad Carlos III de Madrid Working Paper 07-41 Statistics and Econometrics Series 09 May 2007 Departamento de Estadística Universidad Carlos III de Madrid Calle Madrid, 126 28903 Getafe (Spain) Fax (34-91) 6249849 Two-Stage Index Computation for Bandits with Switching Penalties I: Switching Costs ∗ José Niño-Mora 1 Abstract This paper addresses the multi-armed bandit problem with switching costs.
Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a ``continuation index (its Gittins index) and a ``switching index.
They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O ( n 3 ) arithmetic operations relative to those to compute the continuation index alone.
This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n 2 O ( n ) arithmetic operations.
The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author.
A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy ac...





Documentos relacionados