Neighborhood-Oriented feature selection and classification of Duke’s stages on colorectal Cancer using high density genomic data.

dc.contributor.authorPeng, Liang
dc.date.accessioned2011-07-28T15:05:45Z
dc.date.available2011-07-28T15:05:45Z
dc.date.graduationmonthAugusten_US
dc.date.issued2011-07-28
dc.date.published2011en_US
dc.description.abstractThe selection of relevant genes for classification of phenotypes for diseases with gene expression data have been extensively studied. Previously, most relevant gene selection was conducted on individual gene with limited sample size. Modern technology makes it possible to obtain microarray data with higher resolution of the chromosomes. Considering gene sets on an entire block of a chromosome rather than individual gene could help to reveal important connection of relevant genes with the disease phenotypes. In this report, we consider feature selection and classification while taking into account of the spatial location of probe sets in classification of Duke’s stages B and C using DNA copy number data or gene expression data from colorectal cancers. A novel method was presented for feature selection in this report. A chromosome was first partitioned into blocks after the probe sets were aligned along their chromosome locations. Then a test of interaction between Duke’s stage and probe sets was conducted on each block of probe sets to select significant blocks. For each significant block, a new multiple comparison procedure was carried out to identify truly relevant probe sets while preserving the neighborhood location information of the probe sets. Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classification using the selected final probe sets was conducted for all samples. Leave-One-Out Cross Validation (LOOCV) estimate of accuracy is reported as an evaluation of selected features. We applied the method on two large data sets, each containing more than 50,000 features. Excellent classification accuracy was achieved by the proposed procedure along with SVM or KNN for both data sets even though classification of prognosis stages (Duke’s stages B and C) is much more difficult than that for the normal or tumor types.en_US
dc.description.advisorHaiyan Wangen_US
dc.description.degreeMaster of Scienceen_US
dc.description.departmentDepartment of Statisticsen_US
dc.description.levelMastersen_US
dc.identifier.urihttp://hdl.handle.net/2097/10751
dc.language.isoen_USen_US
dc.publisherKansas State Universityen
dc.subjectFeature selectionen_US
dc.subjectClassificationen_US
dc.subjectHypothesis testingen_US
dc.subjectCross validationen_US
dc.subjectMultiple comparisonen_US
dc.subjectGenomic dataen_US
dc.subject.umiBioinformatics (0715)en_US
dc.subject.umiComputer Science (0984)en_US
dc.subject.umiStatistics (0463)en_US
dc.titleNeighborhood-Oriented feature selection and classification of Duke’s stages on colorectal Cancer using high density genomic data.en_US
dc.typeReporten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LiangPeng2011.pdf
Size:
659.82 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: