Interpolation-restart strategies for resilient eigensolversReportar como inadecuado

Interpolation-restart strategies for resilient eigensolvers - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

1 HiePACS - High-End Parallel Algorithms for Challenging Numerical Simulations LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest 2 GAUS - Departement de génie mécanique - Groupe d-Acoustique de l-Université de Sherbrooke

Abstract : The solution of large eigenproblems is involved in many scientific and engineering applications when, for instance stability analysis is a concern. For large simulation in material physics or thermo-acoustics, the calculation can last for many hours on large parallel platforms. On future large-scale systems, the time interval between two consecutive faults is forecast to decrease so that many faults could occur during the solution of large eigenproblems. Consequently it becomes critical to design parallel eigensolvers which can survive faults. In that framework, we mainly investigate the relevance of approaches relying on numerical techniques that might be combined with more classical techniques for real large scale parallel implementations. Because we focus on numerical remedies we do not consider parallel implementations nor parallel experiments but only numerical experiments.We assume that a separate mechanism ensures the fault detection and that a system layerprovides support for setting back the environment processes, \ldots in a running state.Once the system is in a running state, after a fault, our main objective is to provide robust resilient schemes so that the eigensolver may keep converging through the fault without restarting the calculation from scratch. For this purpose, we extend the interpolation-restart IR strategies introduced in a previous work for linear systems. For a given numerical scheme, the IR strategies consist in extracting relevant spectral information from available data after a fault. After data extraction, a well selected part of the missing data is regenerated through interpolation strategies to constitute meaningful input to restart the numerical algorithm. A main feature of this numerical remedy that it does not require extra resources, e.i., computational unit or computing time, when no fault occurs.In this paper, we revisit a few state-of-the-art methods for solving large sparse eigenvalue problems namely the Arnoldi methods, subspace iteration methods and the Jacobi-Davidson method, in the light of our IR strategies. For each considered eigensolver, we adapt the IR strategies to regenerate as much spectral information as possible.Through intensive numerical experiments, we illustrate the qualitative behavior of the resulting schemes when the number of faults and the amount of lost data are varied.

Keywords : resilience fault tolerance eigenvalue problems linear algebra HPC numerical methods Arnoldi IRAM subspace iteration Jacobi-Davidson

Autor: Emmanuel Agullo - Luc Giraud - Pablo Salas - Mawussi Zounon -



Documentos relacionados