Statistical analysis of pyrosequence data

K-REx Repository

Show simple item record Keating, Karen 2012-07-13T14:46:36Z 2012-07-13T14:46:36Z 2012-07-13
dc.description.abstract Since their commercial introduction in 2005, DNA sequencing technologies have become widely available and are now cost-effective tools for determining the genetic characteristics of organisms. While the biomedical applications of DNA sequencing are apparent, these technologies have been applied to many other research areas. One such area is community ecology, in which DNA sequence data are used to identify the presence and abundance of microscopic organisms that inhabit an environment. This is currently an active area of research, since it is generally believed that a change in the composition of microscopic species in a geographic area may signal a change in the overall health of the environment. An overview of DNA pyrosequencing, as implemented by the Roche/Life Science 454 platform, is presented and aspects of the process that can introduce variability in data are identified. Four ecological data sets that were generated by the 454 platform are used for illustration. Characteristics of these data include high dimensionality, a large proportion of zeros (usually in excess of 90%), and nonzero values that are strongly right-skewed. A nonparametric method to standardize these data is presented and effects of standardization on outliers and skewness are examined. Traditional statistical methods for analyzing macroscopic species abundance data are discussed, and the applicability of these methods to microscopic species data is examined. One objective that receives focus is the classification of microscopic species as either rare or common species. This is an important distinction since there is much evidence to suggest that the biological and environmental mechanisms that govern common species are distinctly different than the mechanisms that govern rare species. This indicates that the abundance patterns for common and rare species may follow different probability models, and the suitability of the Pareto distribution for rare species is examined. Techniques for classifying macroscopic species are shown to be ill-suited for microscopic species, and an alternative technique is presented. Recognizing that the structure of the data is similar to that of financial applications (such as insurance claims and the distribution of wealth), the Gini index and other statistics based on the Lorenz curve are explored as potential test statistics for distinguishing rare versus common species. en_US
dc.language.iso en_US en_US
dc.publisher Kansas State University en
dc.subject Statistics en_US
dc.subject Ecology en_US
dc.subject Standardization en_US
dc.subject Gini Index en_US
dc.subject Pareto distribution en_US
dc.title Statistical analysis of pyrosequence data en_US
dc.type Dissertation en_US Doctor of Philosophy en_US
dc.description.level Doctoral en_US
dc.description.department Department of Statistics en_US
dc.description.advisor Gary L. Gadbury en_US
dc.subject.umi Ecology (0329) en_US
dc.subject.umi Statistics (0463) en_US 2012 en_US August en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search K-REx

Advanced Search


My Account


Center for the

Advancement of Digital