A Lightweight Continuous Jobs Mechanism for MapReduce FrameworksReportar como inadecuado

A Lightweight Continuous Jobs Mechanism for MapReduce Frameworks - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

1 DOLPHIN - Parallel Cooperative Multi-criteria Optimization LIFL - Laboratoire d-Informatique Fondamentale de Lille, Inria Lille - Nord Europe 2 OASIS - Active objects, semantics, Internet and security CRISAM - Inria Sophia Antipolis - Méditerranée , COMRED - COMmunications, Réseaux, systèmes Embarqués et Distribués

Abstract : MapReduce is a programming model which allows the processing of vast amounts of data in parallel, on a large number of machines. It is particularly well suited to static or slow changing set of data since the execution time of a job is usually high. However, in practice data-centers collect data at fast rates which makes it very difficult to maintain up-to-date results. To address this challenge, we propose in this paper a generic mechanism for dealing with dynamic data in MapReduce frameworks. Long-standing MapReduce jobs, called continuous Jobs, are automatically re-executed to process new incoming data at a minimum cost. We present a simple and clean API which integrates nicely with the standard MapReduce model. Furthermore, we describe cHadoop, an implementation of our approach based on Hadoop which does not require modifications to the source code of the original framework. Thus, cHadoop can quickly be ported to any new version of Hadoop. We evaluate our proposal with two standard MapReduce applications WordCount and WordCount-N-Count, and one real world application RDF Query on real datasets. Our evaluations on clusters ranging from 5 to 40 nodes demonstrate the benefit of our approach in terms of execution time and ease of use.

Autor: Trong-Tuan Vu - Fabrice Huet -

Fuente: https://hal.archives-ouvertes.fr/


Documentos relacionados