Expanding the boundaries of local similarity analysisReportar como inadecuado

Expanding the boundaries of local similarity analysis - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Genomics

, 14:S3

First Online: 21 January 2013DOI: 10.1186-1471-2164-14-S1-S3

Cite this article as: Durno, W.E., Hanson, N.W., Konwar, K.M. et al. BMC Genomics 2013 14Suppl 1: S3. doi:10.1186-1471-2164-14-S1-S3


BackgroundPairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis LSA statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality - a statistical luxury dissociated from most real world datasets.

ResultsTo improve the performance of LSA on big datasets, an asymptotic upper bound on the p-value calculation was derived without the assumption of normality. This change in the bound calculation markedly improved computational speed from Opmn to Omn, where p is the number of permutations in a permutation test, m is the number of time series, and n is the length of each time series. The bounding process is implemented as a computationally efficient software package, FAST LSA, written in C and optimized for threading on multi-core computers, improving its practical computation time. We computationally compare our approach to previous implementations of LSA, demonstrate broad applicability by analyzing time series data from public health, microbial ecology, and social media, and visualize resulting networks using the Cytoscape software.

ConclusionsThe FAST LSA software package expands the boundaries of LSA allowing analysis on datasets with millions of co-varying time series. Mapping metadata onto force-directed graphs derived from FAST LSA allows investigators to view correlated cliques and explore previously unrecognized network relationships. The software is freely available for download at: http:-www.cmde.science.ubc.ca-hallam-fastLSA-.

List of abbreviationsLSA Local Similarity Analysis

PCC Pearson-s Correlation Coefficient

PCA Principal Component Analysis

MDS Multidimensional Scaling

DFA Discriminant Fraction Analysis

MPH Moving Pictures of the Human Microbiome

CDC Centre of Disease Control.

Download fulltext PDF

Autor: W Evan Durno - Niels W Hanson - Kishori M Konwar - Steven J Hallam

Fuente: https://link.springer.com/

Documentos relacionados