Improving the performance of the prediction analysis of microarrays algorithm via different thresholding methods and heteroscedastic modeling

dc.contributor.authorSahtout, Mohammad Omar
dc.date.accessioned2014-07-09T15:45:01Z
dc.date.available2014-07-09T15:45:01Z
dc.date.graduationmonthAugust
dc.date.issued2014-07-09
dc.date.published2014
dc.description.abstractThis dissertation considers different methods to improve the performance of the Prediction Analysis of Microarrays (PAM). PAM is a popular algorithm for high-dimensional classification. However, it has a drawback of retaining too many features even after multiple runs of the algorithm to perform further feature selection. The average number of selected features is 2611 from the application of PAM to 10 multi-class microarray human cancer datasets. Such a large number of features make it difficult to perform follow up study. This drawback is the result of the soft thresholding method used in the PAM algorithm and the thresholding parameter estimate of PAM. In this dissertation, we extend the PAM algorithm with two other thresholding methods (hard and order thresholding) and a deep search algorithm to achieve better thresholding parameter estimate. In addition to the new proposed algorithms, we derived an approximation for the probability of misclassification for the hard thresholded algorithm under the binary case. Beyond the aforementioned work, this dissertation considers the heteroscedastic case in which the variances for each feature are different for different classes. In the PAM algorithm the variance of the values for each predictor was assumed to be constant across different classes. We found that this homogeneity assumption is invalid for many features in most data sets, which motivates us to develop the new heteroscedastic version algorithms. The different thresholding methods were considered in these algorithms. All new algorithms proposed in this dissertation are extensively tested and compared based on real data or Monte Carlo simulation studies. The new proposed algorithms, in general, not only achieved better cancer status prediction accuracy, but also resulted in more parsimonious models with significantly smaller number of genes.
dc.description.advisorHaiyan Wang
dc.description.degreeDoctor of Philosophy
dc.description.departmentDepartment of Statistics
dc.description.levelDoctoral
dc.identifier.urihttp://hdl.handle.net/2097/17914
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectPrediction analysis of microarrays
dc.subjectHigh dimensional classification
dc.subjectNearest shrunken centroids
dc.subjectThresholding
dc.subject.umiStatistics (0463)
dc.titleImproving the performance of the prediction analysis of microarrays algorithm via different thresholding methods and heteroscedastic modeling
dc.typeDissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MohammadSahtout2014.pdf
Size:
1.27 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: