Recursive Cluster Elimination RCE for classification and feature selection from gene expression dataReport as inadecuate

Recursive Cluster Elimination RCE for classification and feature selection from gene expression data - Download this document for free, or read online. Document in PDF available to download.

BMC Bioinformatics

, 8:144

First Online: 02 May 2007Received: 11 September 2006Accepted: 02 May 2007


BackgroundClassification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination RCE rather than recursive feature elimination RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE.

ResultsWe have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines SVMs, a supervised machine learning classification method, to identify and score rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis PDA with recursive feature elimination SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights.

ConclusionSVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together provide greater insight into the structure of the microarray data. Clustering genes for classification appears to result in some concomitant clustering of samples into subgroups.

Our present implementation of SVM-RCE groups genes using the correlation metric. The success of the SVM-RCE method in classification suggests that gene interaction networks or other biologically relevant metrics that group genes based on functional parameters might also be useful.

Electronic supplementary materialThe online version of this article doi:10.1186-1471-2105-8-144 contains supplementary material, which is available to authorized users.

Download fulltext PDF

Author: Malik Yousef - Segun Jung - Louise C Showe - Michael K Showe


Related documents