Learning from data: Plant breeding applications of machine learning

Alencar Xavier, Purdue University


Increasingly, new sources of data are being incorporated into plant breeding pipelines. Enormous amounts of data from field phenomics and genotyping technologies places data mining and analysis into a completely different level that is challenging from practical and theoretical standpoints. Intelligent decision-making relies on our capability of extracting from data useful information that may help us to achieve our goals more efficiently. Many plant breeders, agronomists and geneticists perform analyses without knowing relevant underlying assumptions, strengths or pitfalls of the employed methods. The study endeavors to assess statistical learning properties and plant breeding applications of supervised and unsupervised machine learning techniques. A soybean nested association panel (aka. SoyNAM) was the base-population for experiments designed in situ and in silico. We used mixed models and Markov random fields to evaluate phenotypic-genotypic-environmental associations among traits and learning properties of genome-wide prediction methods. Alternative methods for analyses were proposed.




Muir, Purdue University.

Subject Area

Biostatistics|Agriculture|Plant sciences

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server