# Relative-Error CUR Matrix Decompositions - Computer Science > Data Structures and Algorithms  Abstract: Many data analysis applications deal with large matrices and involveapproximating the matrix using a small number of components.- Typically,these components are linear combinations of the rows and columns of the matrix,and are thus difficult to interpret in terms of the original features of theinput data. In this paper, we propose and study matrix approximations that areexplicitly expressed in terms of a small number of columns and-or rows of thedata matrix, and thereby more amenable to interpretation in terms of theoriginal data. Our main algorithmic results are two randomized algorithms whichtake as input an $m \times n$ matrix $A$ and a rank parameter $k$. In our firstalgorithm, $C$ is chosen, and we let $A-=CC^+A$, where $C^+$ is theMoore-Penrose generalized inverse of $C$. In our second algorithm $C$, $U$, $R$are chosen, and we let $A-=CUR$. $C$ and $R$ are matrices that consist ofactual columns and rows, respectively, of $A$, and $U$ is a generalized inverseof their intersection. For each algorithm, we show that with probability atleast $1-\delta$: $$||A-A-|| F \leq 1+\epsilon ||A-A k|| F,$$ where $A k$is the best- rank-$k$ approximation provided by truncating the singularvalue decomposition SVD of $A$. The number of columns of $C$ and rows of $R$is a low-degree polynomial in $k$, $1-\epsilon$, and $\log1-\delta$. Our twoalgorithms are the first polynomial time algorithms for such low-rank matrixapproximations that come with relative-error guarantees; previously, in somecases, it was not even known whether such matrix decompositions exist. Both ofour algorithms are simple, they take time of the order needed to approximatelycompute the top $k$ singular vectors of $A$, and they use a novel, intuitivesampling method called subspace sampling.-

Author: Petros Drineas, Michael W. Mahoney, S. Muthukrishnan

Source: https://arxiv.org/