Key points
- Genomic selection techniques require more rapid statistical analysis of larger datasets than the current system can handle
- University of Wollongong biometricians are developing a new 'real-time' analysis program to help select better yielding crop varieties
Australian biometricians are set to make history again with the creation of new real-time analysis for genomic selection of better-yielding crop varieties.
Genomic selection, which uses DNA markers to predict complex traits - such as yield - is considered the most effective way to speed the process of breeding new varieties. Extensive genomic records are now routinely incorporated into the analysis of already large plant improvement datasets.
It is a massive task and the current statistical analysis package, ASReml, is not able to complete these analyses in a practical timeframe, forcing breeders to use suboptimal approaches for their genomic analyses.
Early innovation
ASReml was cutting-edge technology when it was designed in the early 1990s to handle the uniquely large and complex NSW Crop Variety Evaluation Program wheat dataset.
At the time, existing software was unable to handle complex genotype by environment modelling that spanned many years and a large number of individual trials.
To solve this problem, ASReml was developed through an international collaboration between Rothamstead Research (UK) and the NSW Department of Primary Industries.
The system was based on the original Residual Maximum Likelihood (REML) technique developed in the UK by professors Robin Thompson and Desmond Patterson in 1971.
Since its release in 1996, GRDC investment in ASReml has:
- yielded a suite of new statistical models;
- developed more efficient computing approaches; and
- supported the training of biometricians and statistical computing scientists.
Today, ASReml is ubiquitous in the analysis of GRDC's research programs, including the National Variety Trials, frost and Blackleg ratings and classification of wheat varieties for expression of late-maturity alpha-amylase. It is also extensively used by Australia's private breeding companies and many international research programs.
But a lot has changed since last century and the genomic datasets being generated by breeding companies are becoming too complex for ASReml to compute in a timely manner.
To overcome these problems and take Australian plant improvement to the next level, GRDC and the University of Wollongong are developing a new system, called MFXLM, as part of the EssCargoT project.
New processes
By employing techniques such as improved memory management and parallel processing, MFXLM can significantly improve processing time and get results back to breeders in real time.
One of the major drawbacks of many older systems, including ASReml, is that large chunks of computer memory are allocated at the start of the process and only return to the pool once the analysis is complete.
The new software is already being road-tested on grains datasets and will ultimately allow high-speed analysis to provide real-time input into variety selection.
Modern systems are able to allocate and release memory as required, as well as being able to undertake shared memory and distributed memory parallel processing - enabling faster, more efficient analysis.
Contemporary memory management allows MFXLM to handle much larger datasets across several dimensions, such as a larger number of varieties, trials or genetic markers.
No single solution is optimal in all situations and the system has been built with modularity to leverage existing packages to improve performance. For instance, it can link with existing off-the-shelf programs to solve specific computational challenges for the more intensive components of the mathematical process.
The new software is already being put through its paces in a one-on-one race against ASReml using real genomic datasets provided by Australian Grain Technologies.
Preliminary results are very exciting, with MFXLM achieving a twenty-fold reduction in computing time for the most intensive component of the algorithm even with modest hardware. This will be even greater when parallel processing is exploited via multiple computer nodes.
The University of Wollongong has also attracted separate investment from private breeding companies to customise solutions to meet their needs.
GRDC Research Code UW00010
More information: Professor Brian Cullis, 02 4221 5641, bcullis@uow.edu.au