T E S T I N G
AUCKLAND
RESEARCHERS
ANALYSE DAIRY
PROCESSING
DATA WITH
MACHINE
LEARNING
Industrial Information and Control
Centre (I2C2) researchers have
developed machine learning models
that are helping Fonterra to optimise
product quality and streamline
production processes.
I2C2 is a joint research institute
between Auckland University of
Technology (AUT) and the University
of Auckland, established to improve
process simulation and control
in New Zealand’s dairy and other
export industries.
Among the institute’s industrial
partners is Fonterra, the largest
producer of milk powder in the
country. In a recent project, I2C2
researchers developed machine
learning models that are helping
Fonterra to optimise product
quality and streamline production
processes.
Using MatLab and Statistics and
Machine Learning Toolbox, the
researchers analyzed data collected
from a number of production facilities
across New Zealand to predict the
functional properties of milk powder
based on process conditions.
“The breadth of MatLab is
unmatched by other environments
we’ve used for statistical analysis,”
says David Wilson, co-director of
I2C2 and associate professor in
the Department of Electrical and
Electronic Engineering at AUT.
“With MatLab, we work with huge
amounts of information within a
single environment without needing
to move large datasets from one tool
to another.”
Challenge
Milk powder quality is assessed
by its chemical composition, such
as fat and protein content, and
physical and functional properties,
such as bulk density and solubility
Although chemical composition is
relatively well regulated by existing
industrial processes, ensuring
consistent functional properties has
proved to be more challenging. The
plants that produce the powder vary
widely in design and age, and often
use vastly different process settings.
As a result, when a batch of powder
is produced with variable quality,
determining what went wrong and
exactly when can be problematic.
Motivated in part by the Food and
Drug Administration’s ‘Quality by
Design and Process Analytical
Technology’ initiatives, I2C2
researchers set out to analyse
millions of rows of time-series data
(including temperatures and other
logged process variables, as well
as measured values of physical and
functional properties), from three
different processing plants over a
six-year period. As collected, the
raw data was inconsistent and
not well aligned. There was no
common reference between the
process measurements and the
product values, recording errors and
instrument failures had on occasion
resulted in missing data, and the
time stamps for different datasets
were in disparate formats.
Nevertheless, the team needed
to use this data to determine the
conditions under which a plant
was operating when a particular
sample was produced. They
then needed to determine which
abnormal conditions contributed
to milk powder of varying quality,
and recommend procedures for
correcting those conditions. Ideally,
the corrections had to be made
while the plant was in operation
rather than hours or days later when
the relevant lab test results became
available.
Solution
I2C2 used MatLab to preprocess
and align the data from milk
processing plants, analyze and
visualise the data, and develop
machine learning models capable
of predicting the milk powder’s
functional properties.
I2C2 researchers loaded process
data extracted from Fonterra’s
databases. Cleaning and aligning
the data involved estimating values
for missing data using interpolation
and aligning disparate datasets by
interpreting time stamps generated
in multiple formats.
Once the team had a clean set
of data, they used Statistics and
Machine Learning Toolbox to
perform statistical analyses using
principal component analysis
(PCA) and partial least squares
(PLS) regression. The team
complemented that multivariate
analysis with 3D histograms, scatter
plots, and other graphs to visualise
results and share their findings with
Fonterra engineers.
The I2C2 team then implemented
more advanced regression models
using the least absolute shrinkage
and selection operator (LASSO)
method and evaluated various
machine learning classifiers.
Initially, the classifiers achieved a
prediction accuracy of less than
50%. This was because the training
data included only a few instances
of data recorded when milk powder
processing parameters varied
significantly. While a low number of
such instances pleases operations
staff, it does not provide sufficient
data for model building. To rectify
this issue, the team upsampled
substandard samples in the
training data and downsampled the
remaining samples.
To improve prediction accuracy,
they used the resampled training
data to assess other classifier
types. With the Classification
Learner app, they rapidly evaluated
more than 20 classifiers, including
36 APRIL 2019