Inferential considerations for low-count RNA-seq transcripts: a case study on an edaphic subspecies of dominant prairie grass Andropogon gerardii

dc.contributor.authorRaithel, Seth
dc.date.accessioned2015-06-05T20:23:44Z
dc.date.available2015-06-05T20:23:44Z
dc.date.graduationmonthAugust
dc.date.issued2015-06-05
dc.description.abstractBig bluestem (Andropogon gerardii) is a wide-ranging dominant prairie grass of ecological and agricultural importance to the US Midwest while edaphic subspecies sand bluestem (A. gerardii ssp. Hallii) grows exclusively on sand dunes. Sand bluestem exhibits phenotypic divergence related to epicuticular properties and enhanced drought tolerance relative to big bluestem. Understanding the mechanisms underlying differential drought tolerance is relevant in the face of climate change. For bluestem subspecies, presence or absence of these phenotypes may be associated with RNA transcripts characterized by low number of read counts. So called low-count transcripts pose particular inferential challenges and are thus usually filtered out at early steps of data management protocols and ignored for analyses. In this study, we use a plasmode-based approach to assess the relative performance of alternative inferential strategies on RNA-seq transcripts, with special emphasis on low-count transcripts as motivated by differential bluestem phenotypes. Our dataset consists of RNA-seq read counts for 25,582 transcripts (60% of which are classified as low-count) collected from leaf tissue of 4 individual plants of big bluestem and 4 of sand bluestem. We also compare alternative ad-hoc data filtering techniques commonly used in RNA-seq pipelines and assess the performance of recently developed statistical methods for differential expression (DE) analysis, namely DESeq2 and edgeR robust. These methods attempt to overcome the inherently noisy behavior of low-count transcripts by either shrinkage or differential weighting of observations, respectively. Our results indicate that proper specification of DE methods can remove the need for ad- hoc data filtering at arbitrary expression threshold, thus allowing for inference on low-count transcripts. Practical recommendations for inference are provided when low-count RNA-seq transcripts are of interest, as is the case in the comparison of subspecies of bluestem grasses. Insights from this study may also be relevant to other applications also focused on transcripts of low expression levels.
dc.description.advisorNora M. Bello
dc.description.degreeMaster of Science
dc.description.departmentStatistics
dc.description.levelMasters
dc.description.sponsorshipUnited States Department of Agriculture, Abiotic Stress Program (2008-35001-04545)
dc.identifier.urihttp://hdl.handle.net/2097/19712
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectRNA-Seq low-count transcripts
dc.subjectLow-count transcripts
dc.subjectAndropogon gerardii
dc.subjectGene filtering plasmode
dc.subject.umiStatistics (0463)
dc.titleInferential considerations for low-count RNA-seq transcripts: a case study on an edaphic subspecies of dominant prairie grass Andropogon gerardii
dc.typeReport

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SethRaithel2015.pdf
Size:
12.42 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: