Imputation of unordered markers and the impact on genomic selection accuracy

dc.citationRutkoski, Jessica E., Jesse Poland, Jean-Luc Jannink, and Mark E. Sorrells. “Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy.” G3: Genes|Genomes|Genetics 3, no. 3 (March 1, 2013): 427–39. https://doi.org/10.1534/g3.112.005363.
dc.citation.doi10.1534/g3.112.005363
dc.citation.epage439
dc.citation.issn2160-1836
dc.citation.issue3
dc.citation.jtitleG3
dc.citation.spage427
dc.citation.volume3
dc.contributor.authorRutkoski, Jessica E.
dc.contributor.authorPoland, Jesse A.
dc.contributor.authorJannink, Jean-Luc
dc.contributor.authorSorrells, Mark E.
dc.contributor.authoreidjpolanden_US
dc.date.accessioned2013-05-08T21:14:36Z
dc.date.available2013-05-08T21:14:36Z
dc.date.issued2013-03-01
dc.date.published2013
dc.descriptionCItation: Rutkoski, Jessica E., Jesse Poland, Jean-Luc Jannink, and Mark E. Sorrells. “Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy.” G3: Genes|Genomes|Genetics 3, no. 3 (March 1, 2013): 427–39. https://doi.org/10.1534/g3.112.005363.
dc.description.abstractGenomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large proportion of missing data. Because marker imputation algorithms were developed for species with a reference genome, algorithms suited for unordered markers have not been rigorously evaluated. Using four empirical datasets, we evaluate and characterize four such imputation methods, referred to as k-nearest neighbors, singular value decomposition, random forest regression, and expectation maximization imputation, in terms of their imputation accuracies and the factors affecting accuracy. The effect of imputation method on the genomic selection accuracy is assessed in comparison with mean imputation. The effect of excluding markers with a large proportion of missing data on the genomic selection accuracy is also examined. Our results show that imputation of unordered markers can be accurate, especially when linkage disequilibrium between markers is high and genotyped individuals are related. Of the methods evaluated, random forest regression imputation produced superior accuracy. In comparison with mean imputation, all four imputation methods we evaluated led to greater genomic selection accuracies when the level of missing data was high. Including rather than excluding markers with a large proportion of missing data nearly always led to greater GS accuracies. We conclude that high levels of missing data in dense marker sets is not a major obstacle for genomic selection, even when marker order is not known.
dc.description.versionArticle: Version of Record
dc.identifier.urihttp://hdl.handle.net/2097/15767
dc.language.isoen_USen_US
dc.relation.urihttps://doi.org/10.1534/g3.112.005363
dc.rights© 2013. This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/ by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/
dc.subjectGenomic selectionen_US
dc.subjectImputation algorithmsen_US
dc.subjectGenotyping-by-sequencingen_US
dc.subjectGenPreden_US
dc.subjectShared data resourcesen_US
dc.titleImputation of unordered markers and the impact on genomic selection accuracyen_US
dc.typeText

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Poland-G3-2013.pdf
Size:
1.83 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: