Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU - Computer Science > Hardware ArchitectureReportar como inadecuado




Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU - Computer Science > Hardware Architecture - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Abstract: Graphics processing units GPUs are gaining widespread use in computationalchemistry and other scientific simulation contexts because of their hugeperformance advantages relative to conventional CPUs. However, the reliabilityof GPUs in error-intolerant applications is largely unproven. In particular, alack of error checking and correcting ECC capability in the memory subsystemsof graphics cards has been cited as a hindrance to the acceptance of GPUs ashigh-performance coprocessors, but the impact of this design has not beenpreviously quantified.In this article we present MemtestG80, our software for assessing memoryerror rates on NVIDIA G80 and GT200-architecture-based graphics cards.Furthermore, we present the results of a large-scale assessment of GPU errorrate, conducted by running MemtestG80 on over 20,000 hosts on the Folding@homedistributed computing network. Our control experiments on consumer-grade anddedicated-GPGPU hardware in a controlled environment found no errors. However,our survey over cards on Folding@home finds that, in their installedenvironments, two-thirds of tested GPUs exhibit a detectable, pattern-sensitiverate of memory soft errors. We demonstrate that these errors persist aftercontrolling for overclocking and environmental proxies for temperature, butdepend strongly on board architecture.



Autor: Imran S. Haque, Vijay S. Pande

Fuente: https://arxiv.org/



DESCARGAR PDF




Documentos relacionados