* Corresponding author 1 RUNTIME - Efficient runtime systems for parallel architectures Inria Bordeaux - Sud-Ouest, UB - Université de Bordeaux, CNRS - Centre National de la Recherche Scientifique : UMR5800 2 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest 3 LaBRI - Laboratoire Bordelais de Recherche en Informatique

Abstract : Recent cluster architectures include dozens of cores per node, with all cores sharing the network resources. To program such architectures, hybrid models mixing MPI+threads, and in particular MPI+OpenMP are gaining popularity. This imposes new requirements on communication libraries, such as the need for MPI THREAD MULTIPLE level of multi-threading support. Moreover, the high number of cores brings new op-portunities to parallelize communication libraries, so as to have proper background progression of communication and commu-nication-computation overlap. In this paper, we present pioman, a generic framework to be used by MPI implementations, that brings seamless asynchronous progression of communication by opportunistically using available cores. It uses system threads and thus is composable with any runtime system used for multithreading. Through various benchmarks, we demonstrate that our pioman-based MPI implementation exhibits very good properties regarding overlap, progression, and multithreading, and outperforms state-of-art MPI implementations.

Keywords : MPI pioman NewMadeleine

