Li, JinliangWang, LifengWang, HaiyanBai, LianyangYuan, Zheming2013-04-032013-04-032013-04-03http://hdl.handle.net/2097/15449Identification of splice sites plays a key role in annotation of genes and hence, the improvement of computational prediction of splice sites with high accuracy has great significance. In this article, we first quantitatively determined the length of window and the number and position of the consensus bases by a Chi-square test, and then extracted the sequence multi-scale component (MSC) features and the position (Pos) and adjacent positions relationship (APR) features of consensus sites. Then we constructed a novel classification model using SVM with above features and applied it to the HS³D dataset. Compared with the results in current literatures, our method produces a great improvement in the 10-fold cross validation accuracies for training sets with true and spurious splice sites of both equal and different-proportions. This method was also applied to the NN269 dataset for further evaluation and independent test. The obtained results are superior to those in literature, which demonstrates the stability and superiority of this method. Satisfying results show that our method has high accuracy for prediction of splice sites.en-USPermission to archive granted by Genetics and Molecular Research, March 18, 2013Splice site predictionMulti-scale component featuresPosition featuresAdjacent positions relationship featuresSupport Vector Machine (SVM)High-accuracy splice sites prediction based on sequence component and position featuresArticle (author version)