High-accuracy splice sites prediction based on sequence component and position features

Date

2013-04-03

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Identification of splice sites plays a key role in annotation of genes and hence, the improvement of computational prediction of splice sites with high accuracy has great significance. In this article, we first quantitatively determined the length of window and the number and position of the consensus bases by a Chi-square test, and then extracted the sequence multi-scale component (MSC) features and the position (Pos) and adjacent positions relationship (APR) features of consensus sites. Then we constructed a novel classification model using SVM with above features and applied it to the HS³D dataset. Compared with the results in current literatures, our method produces a great improvement in the 10-fold cross validation accuracies for training sets with true and spurious splice sites of both equal and different-proportions. This method was also applied to the NN269 dataset for further evaluation and independent test. The obtained results are superior to those in literature, which demonstrates the stability and superiority of this method. Satisfying results show that our method has high accuracy for prediction of splice sites.

Description

Keywords

Splice site prediction, Multi-scale component features, Position features, Adjacent positions relationship features, Support Vector Machine (SVM)

Citation