Abstract
Predictive breeding has quickly become a powerful tool for plant breeding because of its ability to apply a high level of genotypic selection in non-target environments. However, much of the work regarding predictive breeding has used training data that is highly replicated and maintains the same genotypes and locations over multiple years, for instance the Genomes2Fields project. In commercial breeding programs, selection from predictive breeding could have the greatest return early on, when there is a great deal of genetic diversity. However, training models on this early pipeline data is difficult because genotypes are generally not replicated, and locations may not be static over time. The greatest difficulty that arises from a lack of replication is effectively parsing genotypic and environmental effects. This research, through the Purdue Data Mine and Becks Hybrids, has sought to use machine learning and deep learning to develop predictive breeding models that are able to accurately parse and predict genotypic and environmental effects for novel hybrids in unknown environments. We have developed methods for representing environmental and genotypic data that limits overfitting and emphasizes within location ranking accuracy. This poster shows our findings for the best model types and architecture to provide accurate predictions for unknown genotypes and environments and details some of the unique challenges caused by working with a commercial breeding dataset.
Keywords
Predictive Breeding, Deep Learning, Machine Learning
DOI
10.5703/1288284318195
Applied Statistical and Deep Learning Methods for Multi-Environment Genomic Prediction in Maize
Predictive breeding has quickly become a powerful tool for plant breeding because of its ability to apply a high level of genotypic selection in non-target environments. However, much of the work regarding predictive breeding has used training data that is highly replicated and maintains the same genotypes and locations over multiple years, for instance the Genomes2Fields project. In commercial breeding programs, selection from predictive breeding could have the greatest return early on, when there is a great deal of genetic diversity. However, training models on this early pipeline data is difficult because genotypes are generally not replicated, and locations may not be static over time. The greatest difficulty that arises from a lack of replication is effectively parsing genotypic and environmental effects. This research, through the Purdue Data Mine and Becks Hybrids, has sought to use machine learning and deep learning to develop predictive breeding models that are able to accurately parse and predict genotypic and environmental effects for novel hybrids in unknown environments. We have developed methods for representing environmental and genotypic data that limits overfitting and emphasizes within location ranking accuracy. This poster shows our findings for the best model types and architecture to provide accurate predictions for unknown genotypes and environments and details some of the unique challenges caused by working with a commercial breeding dataset.