Exploring transcription patterns and regulatory motifs in Arabidopsis thaliana

K-REx Repository

Show simple item record

dc.contributor.author Bahirwani, Vishal
dc.date.accessioned 2010-05-21T16:21:51Z
dc.date.available 2010-05-21T16:21:51Z
dc.date.issued 2010-05-21T16:21:51Z
dc.identifier.uri http://hdl.handle.net/2097/4194
dc.description.abstract Recent work has shown that bidirectional genes (genes located on opposite strands of DNA, whose transcription start sites are not more than 1000 basepairs apart) are often co-expressed and have similar biological functions. Identification of such genes can be useful in the process of constructing gene regulatory networks. Furthermore, analysis of the intergenic regions corresponding to bidirectional genes can help to identify regulatory elements, such as transcription factor binding sites. Approximately 2500 bidirectional gene pairs have been identified in Arabidopsis thaliana and the corresponding intergenic regions have been shown to be rich in regulatory elements that are essential for the initiation of transcription. Identifying such elements is especially important, as simply searching for known transcription factor binding sites in the promoter of a gene can result in many hits that are not always important for transcription initiation. Encouraged by the findings about the presence of essential regulatory elements in the intergenic regions corresponding to bidirectional genes, in this thesis, we explore a motif-based machine learning approach to identify intergenic regulatory elements. More precisely, we consider the problem of predicting the transcription pattern for pairs of consecutive genes in Arabidopsis thaliana using motifs from AthaMap and PLACE. We use machine learning algorithms to learn models that can predict the direction of transcription for pairs of consecutive genes. To identify the most predictive motifs and, therefore, the most significant regulatory elements, we perform feature selection based on mutual information and feature abstraction based on family or sequence similarity. Preliminary results demonstrate the feasibility of our approach. en_US
dc.description.sponsorship National Science Foundation and Ecological Genomics Institute at Kansas State University en_US
dc.language.iso en_US en_US
dc.publisher Kansas State University en
dc.subject Gene regulatory networks en_US
dc.subject Machine learning en_US
dc.subject Arabidopsis thaliana en_US
dc.subject Motif en_US
dc.subject Hierarchical agglomerative clustering en_US
dc.subject Bioinformatics en_US
dc.subject Bidirectional genes en_US
dc.title Exploring transcription patterns and regulatory motifs in Arabidopsis thaliana en_US
dc.type Thesis en_US
dc.description.degree Master of Science en_US
dc.description.level Masters en_US
dc.description.department Department of Computing and Information Sciences en_US
dc.description.advisor Doina Caragea en_US
dc.subject.umi Computer Science (0984) en_US
dc.date.published 2010 en_US
dc.date.graduationmonth August en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search K-REx

Advanced Search


My Account


Center for the

Advancement of Digital