Improving accuracy for cancer classification with a new algorithm for genes selection

Zhang, Hongyan; Wang, Haiyan; Dai, Zhijun; Chen, Ming-Shun; Yuan, Zheming

Improving accuracy for cancer classification with a new algorithm for genes selection

dc.citation.doi	doi:10.1186/1471-2105-13-298	en_US
dc.citation.jtitle	BMC Bioinformatics	en_US
dc.citation.spage	298	en_US
dc.citation.volume	13	en_US
dc.contributor.author	Zhang, Hongyan
dc.contributor.author	Wang, Haiyan
dc.contributor.author	Dai, Zhijun
dc.contributor.author	Chen, Ming-Shun
dc.contributor.author	Yuan, Zheming
dc.contributor.authoreid	hwang	en_US
dc.contributor.authoreid	mchen	en_US
dc.date.accessioned	2013-04-02T17:02:49Z
dc.date.available	2013-04-02T17:02:49Z
dc.date.issued	2013-04-02
dc.date.published	2012	en_US
dc.description.abstract	Background: Even though the classification of cancer tissue samples based on gene expression data has advanced considerably in recent years, it faces great challenges to improve accuracy. One of the challenges is to establish an effective method that can select a parsimonious set of relevant genes. So far, most methods for gene selection in literature focus on screening individual or pairs of genes without considering the possible interactions among genes. Here we introduce a new computational method named the Binary Matrix Shuffling Filter (BMSF). It not only overcomes the difficulty associated with the search schemes of traditional wrapper methods and overfitting problem in large dimensional search space but also takes potential gene interactions into account during gene selection. This method, coupled with Support Vector Machine (SVM) for implementation, often selects very small number of genes for easy model interpretability. Results: We applied our method to 9 two-class gene expression datasets involving human cancers. During the gene selection process, the set of genes to be kept in the model was recursively refined and repeatedly updated according to the effect of a given gene on the contributions of other genes in reference to their usefulness in cancer classification. The small number of informative genes selected from each dataset leads to significantly improved leave-one-out (LOOCV) classification accuracy across all 9 datasets for multiple classifiers. Our method also exhibits broad generalization in the genes selected since multiple commonly used classifiers achieved either equivalent or much higher LOOCV accuracy than those reported in literature. Conclusions: Evaluation of a gene’s contribution to binary cancer classification is better to be considered after adjusting for the joint effect of a large number of other genes. A computationally efficient search scheme was provided to perform effective search in the extensive feature space that includes possible interactions of many genes. Performance of the algorithm applied to 9 datasets suggests that it is possible to improve the accuracy of cancer classification by a big margin when joint effects of many genes are considered.	en_US
dc.identifier.uri	http://hdl.handle.net/2097/15443
dc.language.iso	en_US	en_US
dc.relation.uri	http://www.biomedcentral.com/1471-2105/13/298	en_US
dc.subject	Cancer classification	en_US
dc.subject	Gene expression	en_US
dc.subject	Binary Matrix Shuffling Filter (BMSF)	en_US
dc.subject	Support Vector Machine (SVM)	en_US
dc.title	Improving accuracy for cancer classification with a new algorithm for genes selection	en_US
dc.type	Article (publisher version)	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: WangBMCBioinformatics2012.pdf
Size:: 1.07 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.62 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Statistics Faculty Research and Publications