Exploring transcription patterns and regulatory motifs in Arabidopsis thaliana

dc.contributor.authorBahirwani, Vishal
dc.date.accessioned2010-05-21T16:21:51Z
dc.date.available2010-05-21T16:21:51Z
dc.date.graduationmonthAugust
dc.date.issued2010-05-21T16:21:51Z
dc.date.published2010
dc.description.abstractRecent work has shown that bidirectional genes (genes located on opposite strands of DNA, whose transcription start sites are not more than 1000 basepairs apart) are often co-expressed and have similar biological functions. Identification of such genes can be useful in the process of constructing gene regulatory networks. Furthermore, analysis of the intergenic regions corresponding to bidirectional genes can help to identify regulatory elements, such as transcription factor binding sites. Approximately 2500 bidirectional gene pairs have been identified in Arabidopsis thaliana and the corresponding intergenic regions have been shown to be rich in regulatory elements that are essential for the initiation of transcription. Identifying such elements is especially important, as simply searching for known transcription factor binding sites in the promoter of a gene can result in many hits that are not always important for transcription initiation. Encouraged by the findings about the presence of essential regulatory elements in the intergenic regions corresponding to bidirectional genes, in this thesis, we explore a motif-based machine learning approach to identify intergenic regulatory elements. More precisely, we consider the problem of predicting the transcription pattern for pairs of consecutive genes in Arabidopsis thaliana using motifs from AthaMap and PLACE. We use machine learning algorithms to learn models that can predict the direction of transcription for pairs of consecutive genes. To identify the most predictive motifs and, therefore, the most significant regulatory elements, we perform feature selection based on mutual information and feature abstraction based on family or sequence similarity. Preliminary results demonstrate the feasibility of our approach.
dc.description.advisorDoina Caragea
dc.description.degreeMaster of Science
dc.description.departmentDepartment of Computing and Information Sciences
dc.description.levelMasters
dc.description.sponsorshipNational Science Foundation and Ecological Genomics Institute at Kansas State University
dc.identifier.urihttp://hdl.handle.net/2097/4194
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectGene regulatory networks
dc.subjectMachine learning
dc.subjectArabidopsis thaliana
dc.subjectMotif
dc.subjectHierarchical agglomerative clustering
dc.subjectBioinformatics
dc.subjectBidirectional genes
dc.subject.umiComputer Science (0984)
dc.titleExploring transcription patterns and regulatory motifs in Arabidopsis thaliana
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
VishalBahirwani2010.pdf
Size:
1.68 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.7 KB
Format:
Item-specific license agreed upon to submission
Description: