Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.

Tangirala, Karthik

Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.

dc.contributor.author	Tangirala, Karthik
dc.date.accessioned	2011-08-12T13:14:50Z
dc.date.available	2011-08-12T13:14:50Z
dc.date.graduationmonth	August
dc.date.issued	2011-08-12
dc.date.published	2011
dc.description.abstract	As genomes are sequenced, a major challenge is their annotation -- the identification of genes and regulatory elements, their locations and their functions. For years, it was believed that one gene corresponds to one protein, but the discovery of alternative splicing provided a mechanism for generating different gene transcripts (isoforms) from the same genomic sequence. In the recent years, it has become obvious that a large fraction of genes undergoes alternative splicing. Thus, understanding alternative splicing is a problem of great interest to biologists. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to produce accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi-supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider three sets of features for constructing views for the problem of predicting alternatively spliced exons: lengths of the exon of interest and its flanking introns, exonic splicing enhancers (a.k.a., ESE motifs) and intronic regulatory sequences (a.k.a., IRS motifs). Naive Bayes and Support Vector Machine (SVM) algorithms are used as based classifiers in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone. In addition to semi-supervised approaches, we also also study the usefulness of graph based transductive learning approaches for predicting alternatively spliced exons. Similar to the semi-supervised learning algorithms, transductive learning algorithms can make use of unlabeled data, together with labeled data, to produce labels for the unlabeled data. However, a classification model that could be used to classify new unlabeled data is not learned in this case. Experimental results show that graph based transductive approaches can make effective use of the unlabeled data.
dc.description.advisor	Doina Caragea
dc.description.degree	Master of Science
dc.description.department	Department of Computing and Information Sciences
dc.description.level	Masters
dc.identifier.uri	http://hdl.handle.net/2097/12013
dc.language.iso	en
dc.publisher	Kansas State University
dc.rights	© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	Alternative splicing
dc.subject	Co training
dc.subject	Semi supervised learning
dc.subject	Transductive learning
dc.subject	Graph based approach
dc.subject.umi	Bioinformatics (0715)
dc.subject.umi	Computer Science (0984)
dc.title	Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes.
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: KarthikTangirala2011.pdf
Size:: 1.95 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -