A pipeline for improved QSAR analysis of peptides: physiochemical property parameter selection via BMSF, near-neighbor sample selection via semivariogram, and weighted SVR regression and prediction

K-REx Repository

Show simple item record

dc.contributor.author Dai, Zhijun
dc.contributor.author Wang, Lifeng
dc.contributor.author Chen, Yuan
dc.contributor.author Wang, Haiyan
dc.contributor.author Bai, Lianyang
dc.contributor.author Yuan, Zheming
dc.date.accessioned 2014-06-24T20:09:25Z
dc.date.available 2014-06-24T20:09:25Z
dc.date.issued 2014-06-24
dc.identifier.uri http://hdl.handle.net/2097/17878
dc.description.abstract In this paper, we present a pipeline to perform improved QSAR analysis of peptides. The modeling involves a double selection procedure that first performs feature selection and then conducts sample selection before the final regression analysis. Five hundred and thirty-one physicochemical property parameters of amino acids were used as descriptors to characterize the structure of peptides. These high-dimensional descriptors then go through a feature selection process given by the Binary Matrix Shuffling Filter (BMSF) to obtain a set of important low dimensional features. Each descriptor that passed the BMSF filtering also receives a weight defined through its contribution to reduce the estimation error. These selected features were served as the predictors for subsequent sample selection and modeling. Based on the weighted Euclidean distances between samples, a common range was determined with high-dimensional semivariogram and then used as a threshold to select the near-neighbor samples from the training set. For each sample to be predicted, the QSAR model was established using SVR with the weighted, selected features based on the exclusive set of near-neighbor training samples. Prediction was conducted for each test sample accordingly. The performances of this pipeline are tested with the QSAR analysis of angiotensin-converting enzyme (ACE) inhibitors and HLA-A*0201 data sets. Improved prediction accuracy was obtained in both applications. This pipeline can optimize the QSAR modeling from both the feature selection and sample selection perspectives. This leads to improved accuracy over single selection methods. We expect this pipeline to have extensive application prospect in the field of regression prediction. en_US
dc.language.iso en_US en_US
dc.relation.uri http://link.springer.com/article/10.1007%2Fs00726-014-1667-5 en_US
dc.rights The final publication is available at link.springer.com en_US
dc.subject Peptides en_US
dc.subject Quantitative structure-activity regression en_US
dc.subject Feature selection en_US
dc.subject Semivariogram en_US
dc.subject Support vector regression en_US
dc.title A pipeline for improved QSAR analysis of peptides: physiochemical property parameter selection via BMSF, near-neighbor sample selection via semivariogram, and weighted SVR regression and prediction en_US
dc.type Article (author version) en_US
dc.date.published 2014 en_US
dc.citation.doi doi:10.1007/s00726-014-1667-5 en_US
dc.citation.epage 1119 en_US
dc.citation.issue 4 en_US
dc.citation.jtitle Amino Acids en_US
dc.citation.spage 1105 en_US
dc.citation.volume 46 en_US
dc.contributor.authoreid hwang en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search K-REx


Advanced Search

Browse

My Account

Statistics








Center for the

Advancement of Digital

Scholarship

118 Hale Library

Manhattan KS 66506


(785) 532-7444

cads@k-state.edu