Exploring transcription patterns and regulatory motifs in Arabidopsis thaliana

dc.contributor.authorBahirwani, Vishal
dc.date.accessioned2010-05-21T16:21:51Z
dc.date.available2010-05-21T16:21:51Z
dc.date.graduationmonthAugusten_US
dc.date.issued2010-05-21T16:21:51Z
dc.date.published2010en_US
dc.description.abstractRecent work has shown that bidirectional genes (genes located on opposite strands of DNA, whose transcription start sites are not more than 1000 basepairs apart) are often co-expressed and have similar biological functions. Identification of such genes can be useful in the process of constructing gene regulatory networks. Furthermore, analysis of the intergenic regions corresponding to bidirectional genes can help to identify regulatory elements, such as transcription factor binding sites. Approximately 2500 bidirectional gene pairs have been identified in Arabidopsis thaliana and the corresponding intergenic regions have been shown to be rich in regulatory elements that are essential for the initiation of transcription. Identifying such elements is especially important, as simply searching for known transcription factor binding sites in the promoter of a gene can result in many hits that are not always important for transcription initiation. Encouraged by the findings about the presence of essential regulatory elements in the intergenic regions corresponding to bidirectional genes, in this thesis, we explore a motif-based machine learning approach to identify intergenic regulatory elements. More precisely, we consider the problem of predicting the transcription pattern for pairs of consecutive genes in Arabidopsis thaliana using motifs from AthaMap and PLACE. We use machine learning algorithms to learn models that can predict the direction of transcription for pairs of consecutive genes. To identify the most predictive motifs and, therefore, the most significant regulatory elements, we perform feature selection based on mutual information and feature abstraction based on family or sequence similarity. Preliminary results demonstrate the feasibility of our approach.en_US
dc.description.advisorDoina Carageaen_US
dc.description.degreeMaster of Scienceen_US
dc.description.departmentDepartment of Computing and Information Sciencesen_US
dc.description.levelMastersen_US
dc.description.sponsorshipNational Science Foundation and Ecological Genomics Institute at Kansas State Universityen_US
dc.identifier.urihttp://hdl.handle.net/2097/4194
dc.language.isoen_USen_US
dc.publisherKansas State Universityen
dc.subjectGene regulatory networksen_US
dc.subjectMachine learningen_US
dc.subjectArabidopsis thalianaen_US
dc.subjectMotifen_US
dc.subjectHierarchical agglomerative clusteringen_US
dc.subjectBioinformaticsen_US
dc.subjectBidirectional genesen_US
dc.subject.umiComputer Science (0984)en_US
dc.titleExploring transcription patterns and regulatory motifs in Arabidopsis thalianaen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
VishalBahirwani2010.pdf
Size:
1.68 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.7 KB
Format:
Item-specific license agreed upon to submission
Description: