GRDC is investing in innovative computing capabilities to support plant-breeding developments
Modern plant breeding methods, once relatively mechanistic, are changing thanks to monumental shifts in computational and mathematical capacities. Researchers can now take an integrated approach that considers biological complexities.
It amounts to an ability to decode gene-by-environment-by-management (GxExM) interactions and predict, with models, the impact on yield and yield stability.
This new capability requires analytical tools that can process vast amounts of data. Collected over decades, the data spans the genetic, physiological, ecological, climate and agronomic spheres. Analysing this amount of data and the interactions also requires skills in statistics and analytics.
Scientists needed a new language to communicate breakthroughs that are now occurring. Some are now speaking in terms of a cultivar’s ‘idiotype’ – a term we can add to genotypes (and genomics) and phenotypes (and phenomics). These interactive propensities or tendencies to behave in a particular way – idiotypes – are statistical.
Recognising this need, GRDC invested in research dedicated to supporting the grains industry’s need for statistical and computational skill. This is represented today by the Analytics for the Australian Grains Industry (AAGI) initiative.
AAGI is a strategic partnership with the University of Adelaide, Curtin University and the University of Queensland. It was designed to provide grower-facing analytic technologies and work with GRDC research partners to improve research and development outcomes, especially in pre-breeding work.
Not widely known is that AAGI undertakes research to innovate the analytical tools available to agricultural scientists.
Recently, AAGI researchers at the University of Adelaide – biomathematician Dr Mario Fruzangohar and Associate Professor Julian Taylor – announced a breakthrough. Its impact is on the genomics data pillar – the one that attempts to understand the underlying genetic framework of a variety’s genotype and ultimately its statistical contribution to the GxExM effect.
Efficient detection of genetic diversity
The AAGI breakthrough takes the form of a new software tool called CoreDetector. It allows for improved alignment of large and evolutionary diverse genomes, including bread wheat. CoreDetector increases the efficiency of detecting DNA sequence differences between cultivars or breeding lines or mapping populations.
It is these differences that are statistically linked with crop performance in different environments over time. These differences also reveal ancestry DNA-style relationships (so-called ‘phylogenetic trees’) between organisms in a gene pool.
To get at these insights, scientists were previously limited to breaking down genome data into smaller fragments and only then aligning them to detect genetic differences.
These limitations were due to software constraints. Dr Fruzangohar says that the resulting phylogenetic trees were not sufficiently accurate. The software could also only align two sequences at a time, making larger comparisons cumbersome.
“With CoreDetector, we have extended the ability to align whole genome DNA sequence data for a population, even if that genome is very large – as is the case with bread wheat,” he says. “No other tool can do that.”
Co-developed with PhD student
Nicolete Bakaj, the software started out mapping genetic differences within a pool of organisms with a much smaller genome – the pathogens that infect wheat.
“We couldn’t find a tool that could generate a phylogenetic tree of wheat pathogens, so we decided to generate one. We started with pathogens and then extended the software’s capabilities to the evolutionary diversity of much larger genomes.” Dr Fruzangohar says.
The proof-of-concept study came in 2020 when the entire DNA sequence of 14 bread wheat cultivars from around the world were completed. At that time, no means existed to directly align and compare those genomes and detect what made those genomes different from each other. The best option then was to align two at a time and then create a program to merge the output.
“We redid the analysis with CoreDetector and aligned all 14 genomes,” Dr Fruzangohar says.
“What’s more, the complexity associated with comparing 14 very large genomes at once is hidden from the user.”
This amounts to a formidable gain in analytic and bioinformatics capability.
Multiple sequence alignments
The benefits that are possible for multiple sequence alignments are numerous.
The software can detect:
- single nucleotide polymorphisms (SNPs), which are the genetic differences mapped by SNP markers that are often used to identify quantitative trait loci (QTLs) during trait discovery work;
- insertions, deletions and inversions of genetic material, which is useful when accounting for trait differences between cultivars; and
- introgressions, which is especially important when bringing in and working with novel genetic diversity from wild relatives.
Furthermore, Associate Professor Taylor says that CoreDetector is also the precursor to yet another step change in analytics – a tool capable of pan-genome analytics – in which the entire set of genes from all strains within a ‘clade’ (that is, the modern descendents of a common ancestor) are aligned. This would unleash another wave of breeder-friendly information about cultivars and the source of their adaptation over time to different environments.
Work is also underway to swap out the command-line, JAVA-based interface with a graphic interface that is more user-friendly. That work is getting underway with Professor Dave Edwards at the University of Western Australia.
Applications
CoreDetector has proven already that it can solve problems that were previously beyond reach. For example, growers have long known that some wheat cultivars are resistant to orange wheat blossom midge, but nobody knew why. Even when the resistance was mapped to chromosome 2B, nobody could figure out the key genetic difference that accounted for the resistance.
Following multiple sequence alignments with CoreDetector, a ‘copy number’ variation was detected in which a short sequence of DNA is repeated a variable number of times within a gene. This variation then alters the length of a particular domain within the encoded protein (a kinase enzyme). It is the ‘copy number’ that mediates the change from sensitive to resistant. Once a key genetic difference is detected, it is a simple step to produce an analytical DNA marker for easy selection of the correlated trait.
CoreDetector is a pure pre-breeding enabling tool. It allows us to see things at the genome level that can be applied downstream in the breeding pipeline to improve varieties and make some seriously big gains.
Importantly, Dr Fruzangohar has integrated into CoreDetector a set of additional analytical tools, making the platform highly elastic in terms of its applications. For example, there are tools for pair-wise alignment and to calculate the percentage level of divergence between genomes. There are also tools to assemble a newly derived genome sequence. The software is publicly available for free on GitHub.
More information: Dr Mario Fruzangohar, mario.fruzangohar@adelaide.edu.au