An Optimized Data Structure for High Throughput 3D Proteomics Data: mzRTree - Computer Science > Computational Engineering, Finance, and ScienceReportar como inadecuado




An Optimized Data Structure for High Throughput 3D Proteomics Data: mzRTree - Computer Science > Computational Engineering, Finance, and Science - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Abstract: As an emerging field, MS-based proteomics still requires software tools forefficiently storing and accessing experimental data. In this work, we focus onthe management of LC-MS data, which are typically made available in standardXML-based portable formats. The structures that are currently employed tomanage these data can be highly inefficient, especially when dealing withhigh-throughput profile data. LC-MS datasets are usually accessed through 2Drange queries. Optimizing this type of operation could dramatically reduce thecomplexity of data analysis. We propose a novel data structure for LC-MSdatasets, called mzRTree, which embodies a scalable index based on the R-treedata structure. mzRTree can be efficiently created from the XML-based dataformats and it is suitable for handling very large datasets. We experimentallyshow that, on all range queries, mzRTree outperforms other known structuresused for LC-MS data, even on those queries these structures are optimized for.Besides, mzRTree is also more space efficient. As a result, mzRTree reducesdata analysis computational costs for very large profile datasets.



Autor: Sara Nasso 1, Francesco Silvestri 1, Francesco Tisiot 1, Barbara Di Camillo 1, Andrea Pietracaprina 1, Gianna Maria Toffolo 1 1 D

Fuente: https://arxiv.org/



DESCARGAR PDF




Documentos relacionados