# The sparsity and bias of the Lasso selection in high-dimensional linear regression - Mathematics > Statistics Theory

Abstract: Meinshausen and Buhlmann Ann. Statist. 34 2006 1436-1462 showed that,for neighborhood selection in Gaussian graphical models, under a neighborhoodstability condition, the LASSO is consistent, even when the number of variablesis of greater order than the sample size. Zhao and Yu 2006 J. MachineLearning Research 7 2541-2567 formalized the neighborhood stability conditionin the context of linear regression as a strong irrepresentable condition. Thatpaper showed that under this condition, the LASSO selects exactly the set ofnonzero regression coefficients, provided that these coefficients are boundedaway from zero at a certain rate. In this paper, the regression coefficientsoutside an ideal model are assumed to be small, but not necessarily zero. Undera sparse Riesz condition on the correlation of design variables, we prove thatthe LASSO selects a model of the correct order of dimensionality, controls thebias of the selected model at a level determined by the contributions of smallregression coefficients and threshold bias, and selects all coefficients ofgreater order than the bias of the selected model. Moreover, as a consequenceof this rate consistency of the LASSO in model selection, it is proved that thesum of error squares for the mean response and the $\ell {\alpha}$-loss for theregression coefficients converge at the best possible rates under the givenconditions. An interesting aspect of our results is that the logarithm of thenumber of variables can be of the same order as the sample size for certainrandom dependent designs.

Author: Cun-Hui Zhang, Jian Huang

Source: https://arxiv.org/