- Machine learning is the next-generation solution to identifying patterns in large datasets
- GRDC is putting machine learning to the test with nine use-cases designed to tackle some of the grain industry's most intractable challenges
- A technical consultation group will help build capacity and capability in machine learning within the grains sector, with links to international experts and other industry sectors.
Closing the yield gap, overcoming disease and breeding better varieties - every year GRDC invests in research to solve these and other challenges.
Research is based on collecting data and identifying the trends to provide solutions, but how much data is enough, and can it ever be too much to handle?
Data generation and processing capabilities are increasing exponentially. My first computer was a Mac Plus, with the entire system based on a single 800-kilobyte floppy disc. Today even my mobile phone has 64 gigabytes - 80,000 times larger.
In 2017, precision agriculture researchers from Ohio State University set out to see how much data they could collect about a single stalk of corn. Starting with the tractor in the field and working their way up to thermal imagery collected by plane, they eventually collected 18.5 gigabytes for that single stalk. They estimated the data for a 40-hectare field would be in the order of 60 million gigabytes.
'Big data' is the term now commonly used to describe this explosion of data collected about almost every aspect of our lives - from what we buy in the supermarket, right through to how much was harvested from the dune compared with the swale.
We are investing in some really creative projects over the next two years, with the aim of identifying opportunities to take our research to the next level. - GRDC manager of data analytics Dr Jeff Cumpston
Data has always been the mainstay of research, and statistical analysis that was once done by hand is now handled by skilled biometricians with larger and more complicated computer programs.
However, while the volume of data has grown exponentially, traditional analysis methods have not been able to keep up.
Big data has the potential to solve some of most intractable problems, but the sheer size of it has moved beyond the capacity of our traditional methods of analysis.
The next quantum leap in analysis is 'machine learning' and it is increasingly becoming the only way to handle big data.
These use-cases are an opportunity to take a fresh look at the vast collection of data from our existing and prior investments and dig deeper to identify patterns that may hold the keys to better yields and other intractable challenges.
Machine learning involves using computer science and statistics to analyse very large datasets to identify patterns that are too complex to uncover with traditional human-led analysis.
So-called 'supervised' machine learning algorithms are a powerful method for building predictive computational models of systems with complex inner workings.
These algorithms are trained with known data and known solutions, building computational models that recreate the complex relationships between the two. After training, the model can use new input data to generate powerful predictions (see Figure 1 above).
Now, GRDC has invested in nine use-cases to put machine learning to the test with research data that has already been generated (see Figure 2, below).
The projects will tackle subjects as broad as:
- crop genetics;
- crop and soil variability mapping;
- agronomy and farming systems;
- crop monitoring; and
- information delivery.
GRDC manager of data analytics Dr Jeff Cumpston says the new project is about building capacity in machine learning.
"It's an opportunity to build links between the people who have skills in machine learning and grains researchers, who would like to use big data to solve specific problems," he says.
"We are investing in some really creative projects over the next two years, with the aim of identifying opportunities to take our research to the next level."
As part of its investment, GRDC will establish a Machine Learning Technical Consultation Group to help build capacity and capability in machine learning within the grains sector.
Project team members will be joined by Associate Professor Ian Stavness, from the University of Saskatchewan in Canada, and experts from the University of Queensland - Associate Professor Yoni Nazarathy and Associate Professor Marcus Gallagher, Dr Thomas Taimre and Dr Sally Shrapnel - to peer-review reports and build on each other's expertise in the machine learning field.
This team will also be on hand to provide GRDC with insights into opportunities for investments in new analytics projects.
Associate Professor Stavness specialises in machine learning for analysis of aerial images and remote sensing and the UQ team will bring a range of complementary skills in data science.
The group will link researchers and GRDC with local and international expertise and draw on the lessons from other industries, such as engineering, logistics and finance to ensure Australian grain growers realise timely and robust benefits from machine learning.
This technology is emerging so quickly. We need to bring a really broad range of skills together to get the most from machine learning.
The University of Western Australia will lead two of the investments.
One, led by Professor Mohammed Bennamoun, will use computer vision for the early detection of crop disease and stress.
The other, led by Professor Dave Edwards, will identify the genetic contributions to stress tolerance in crops.
"This technology is emerging so quickly," Professor Edwards says.
"We need to bring a really broad range of skills together to get the most from machine learning.
"Professor Bennamoun is an expert in computer science, particularly robot vision and image analysis, and my own skills are in crop genetics.
"We will need both these skills and more."
Computer vision, in particular, has the potential to identify crop diseases and other stresses before they become obvious to the human eye.
For instance, drones or small field robots could be fitted with hyperspectral cameras to collect images based on both visible light and a whole spectrum of invisible wavelengths.
"We want to use machine learning to train a model to recognise these diseases and stresses from the hyperspectral images so that we can predict problems before the damage becomes obvious," Professor Edwards says.
"As farm sizes increase, there is huge potential for this sort of automation on-farm.
"We could have small robot scouts in the field to alert growers to the need to spray for disease - or we could map frosted areas and program the harvester to separate frost-damaged grain from undamaged grain during harvest."
Overcoming soil constraints
A better understanding of the soil factors that impact on grain yield variability will be the subject of a University of Adelaide project to identify opportunities to overcome these issues.
"Currently, we can identify areas in the paddock that are underperforming, but we want to better understand the cause and work out what we can do about it," says project leader Dr Rhiannon Schilling, who is based at the South Australian Research and Development Institute and leads the agronomy program.
"We are particularly interested in sodic and saline soils and want to use machine learning to analyse existing genetic, soil and environmental data - including mapping layers."
Their aim is to use machine learning to undertake a more detailed analysis of data already generated as part of GRDC research.
This will include better linking of soil, agronomic and weather data with spatial information, such as remote sensing by the University of Queensland.
The model will be trained on paddock data and tested on research plots to see whether the machine learning algorithms can detect patterns from the research that traditional approaches were unable to identify.
The Agricultural Production Systems sIMulator (APSIM) model will also be used to generate additional scenarios for testing with the machine learning algorithm.
"We are bringing a lot of data to the table with 13 different partner organisations, including the Australian Institute for Machine Learning," Dr Schilling says.
"Ultimately, by bringing together agricultural science and data science, we hope to extract the maximum value from historical datasets to improve our understanding of crop and soil variability and better inform paddock management decisions."
"One of the really exciting projects is working towards unlocking the information buried in thousands of GRDC reports and scientific publications," says Dr Cumpston.
"The University of Queensland (UQ), CSIRO, and the Queensland Department of Agriculture and Fisheries propose to build 'AgAsk', a machine learning 'question-and-answer' system that will be able to search through existing publications to provide succinct and relevant evidence-based answers to growers' questions (see Figure 3)."
While traditional search engines, such as Google, use machine learning to help identify the most useful resources on the internet, it is still up to the individual person to visit all of the web pages, review the content and come to their own conclusions.
The project team wants to streamline this process by developing a tool that will search GRDC project reports and scientific publications to provide growers with evidence-based information.
The tool could even be augmented with contextual information about the grower's location, along with weather and soil data, to personalise the proposed solutions.
"We are going to design a conversational agent based on 'question-and-answer' dialogues, like those deployed in public and private customer service settings," says project leader Dr Guido Zuccon from UQ.
"The tool will use natural language processing to understand questions posed by growers and then draw upon machine-learnt text to provide succinct and relevant answers to their questions.
"It is about providing contextualised access to insights into agricultural research. Improving information delivery to growers and advisers has the potential to lead to enormous improvements in profitability for the industry."
GRDC's Dr Cumpston says: "These use-cases are an opportunity to take a fresh look at the vast collection of data from our existing and prior investments and dig deeper to identify patterns that may hold the keys to better yields and other intractable challenges."
More information: Dr Jeff Cumpston, 0437 160 693, firstname.lastname@example.org; Professor Mohammed Bennamoun, 08 6488 2715, email@example.com; Professor Dave Edwards, 0423 826 042, firstname.lastname@example.org; Dr Rhiannon Schilling, 0431 469 200, email@example.com; Dr Guido Zuccon, 07 3365 3864, firstname.lastname@example.org