Brown, Shawn PaulVeach, Allison M.Rigdon-Huss, Anne R.Grond, KirstenLickteig, Spencer K.Lothamer, Kale M.Oliver, Alena K.Jumpponen, Ari M.2015-09-222015-09-22http://hdl.handle.net/2097/20444Metabarcoding data generated using next-generation sequencing (NGS) technologies are overwhelmed with rare taxa and skewed in Operational Taxonomic Unit (OTU) frequencies comprised of few dominant taxa. Low frequency OTUs comprise a rare biosphere of singleton and doubleton OTUs, which may include many artifacts. We present an in-depth analysis of global singletons across sixteen NGS libraries representing different ribosomal RNA gene regions, NGS technologies and chemistries. Our data indicate that many singletons (average of 38 % across gene regions) are likely artifacts or potential artifacts, but a large fraction can be assigned to lower taxonomic levels with very high bootstrap support (~32 % of sequences to genus with ≥90 % bootstrap cutoff). Further, many singletons clustered into rare OTUs from other datasets highlighting their overlap across datasets or the poor performance of clustering algorithms. These data emphasize a need for caution when discarding rare sequence data en masse: such practices may result in throwing the baby out with the bathwater, and underestimating the biodiversity. Yet, the rare sequences are unlikely to greatly affect ecological metrics. As a result, it may be prudent to err on the side of caution and omit rare OTUs prior to downstream analyses.en-USNOTICE: this is the author's version of a work that was accepted for publication in . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Fungal Ecology, 13 (2015), DOI: 10.1016/j.funeco.2014.08.006FungiHigh-throughput sequencingRare biosphereSingletonScraping the bottom of the barrel: are rare high throughput sequences artifacts?Article (author version)