Assessing cellulose micro/nanofibre morphology using a high throughput fibre analysis device to predict nanopaper performance

Characterising cellulose nanofibre (CNF) morphology has been identified as a grand challenge for the nanocellulose research field. Direct techniques for CNF morphology characterisation exhibit various difficulties related to the material network structure and equipment cost, while indirect techniques that investigate fibre-light interaction, fibre-solvent interaction, fibre-fibre interaction, or specific fibre surface area involve relatively facile methods but may be more unreliable. Nanopaper mechanical testing is a prevalent metric for assessing fibre-fibre interaction, but is an off-line, time-consuming, and destructive methodology. In this study, an optical fibre morphology analyser (MorFi, Techpap) was employed as an on-line, high throughput, fast turnaround tool to assess micro/nanofibre pulp morphology and predict the properties of nanopaper material. Correlation analysis identified fibre content and fibre kink properties as most correlated with nanopaper strength and toughness, while fibre width and coarseness were most inversely correlated with nanopaper performance. Principal component analysis (PCA) was employed to visualise interdependent morphological and mechanical data. Subsequently, two data driven statistical models—multiple linear regression (MLR) and machine learning based support vector regression (SVR)—were established to predict nanopaper properties from fibre morphology data, with SVR generating a more accurate prediction across all nanopaper properties (NRMSE = 0.13–0.33) compared to the MLR model (NRMSE = 0.33–0.51). This study highlights that statistical methods are useful to disentangle and visualise interdependent morphological data from an on-line fibre analysis device, while regression models are also capable of predicting paper mechanical properties from CNF samples even though these devices do not operate at nanoscale resolution.

toughness, while fibre width and coarseness were most inversely correlated with nanopaper performance. Principal component analysis (PCA) was employed to visualise interdependent morphological and mechanical data. Subsequently, two data driven statistical models-multiple linear regression (MLR) and machine learning based support vector regression (SVR)-were established to predict nanopaper properties from fibre morphology data, with SVR generating a more accurate prediction across all nanopaper properties (NRMSE = 0.13-0.33) compared to the MLR model (NRMSE = 0.33-0.51). This study highlights that statistical methods are useful to disentangle and visualise interdependent morphological data from an on-line fibre analysis device, while regression models are also capable of predicting paper mechanical properties from CNF samples even though these devices do not operate at nanoscale resolution.

Introduction
Characterising CNF morphology has been identified as a grand challenge for nanocellulose research (Moon et al. 2011). Across various nanocellulose applications, it is an important parameter for assessment of product quality, quality control, and material safety (Campano et al. 2020). Fibre morphology encompasses the average fibre dimensions (length, width), the relative size distribution of fibre dimensions throughout the sample, fibre aspect ratio, fibre surface area, the degree of fibrillation and branching, fibre hydrodynamic volume (rigidity), and fibre shape (kink, curl, curvature). Subsequently, nanofibre morphology influences the effective nanofibre surface area, fibre-fibre and fibre-solvent interaction, net surface charge, gel point or networking concentration, and therefore the product quality in terms of its mechanical properties, rheological and colloidal behaviour, hydrophilicity, optical properties, electrical conductivity, film permeability, and material reactivity (Li et al. 2021). Fibre morphology has a well-established impact on the mechanical properties of paper, such that fibre length strongly affects paper strength, whereas fibre width decreases fibre flexibility and conformability, which has a negative impact on paper strength (Seth 1995;Larsson et al. 2018). Fine content improves the strength, smoothness, and optical properties of the final paper (Moral et al. 2010;Motamedian et al. 2019), while decreasing the freeness of the pulp (Dienes et al. 2005).
Analysing cellulose nanofibre morphology Measuring fibre morphology has been a difficult task in nanocellulose research and development, primarily due to nanocellulose materials existing over a range of length scales, from poorly fibrillated millimetric scale cellulose bundles to micron scale cellulose microfibres (CMF), and down to nanoscale fibrillated cellulose nanofibres (Tanaka et al. 2012;Chinga-Carrasco 2013). This length scale range introduces a challenge for capturing a representative, bulk analysis of fibre morphology. Other challenges for characterising CNF morphology include the interconnected network structure of the material, and the difficulty of observing nanofibres in their native, aqueous suspended state (Haapala et al. 2013).
Both direct and indirect fibre morphology characterisation exhibit difficulties. Direct nanofibre characterisation enables high resolution visualization of individual particles or aggregates at multiple magnification levels, but is limited in its ability to analyse a representative sample of the material in a timely manner and for fibres across multiple length scales (Legland and Beaugrand 2013). While the resolution of TEM can analyse fibre width down to a few nanometres, SEM is only capable of analysing fibre width greater than 100 nm (Kangas et al. 2014). In addition, the three-dimensional structure and bulk morphology of the sample is disrupted during the drying step required for sample preparation (Peng et al. 2012;Silva et al. 2021). Meanwhile, indirect characterisation involves measuring a derived property of the nanofibre system, such as the fibre-light interaction (DLS, UV-vis transmittance), fibre-solvent interaction (rheology, sedimentation behaviour, water retention capacity), fibre-fibre interaction (nanopaper mechanical properties), or specific surface area (SANS, SAXS, DSC, BET adsorption, solvent relaxation NMR, conductimetric titration). Typically, calculations or models are employed to infer fibre morphology characteristics. These tools generally involve more simple methods of nanofibre characterisation but are often unreliable due to potential inaccuracy and lack of generalisation in the models inferring nanofibre properties. This issue is emphasised in the nanocellulose field, where different biomass sources and processing methodologies can produce material ranging widely in terms of fibre morphology and mechanical properties.
As nanocellulose materials become increasingly commercial, there is strong incentive to shift from laboratory scale characterisation methods of fibre morphology to scalable, fully automated, on-line characterisation systems that are capable of assessing thousands of fibre elements over a relatively short time period (Legland and Beaugrand 2013;Balea et al. 2021).

Fibre analysis tools in nanocellulose literature
Optical fibre analysis devices are a potential solution for this characterisation challenge. These tools have the potential to provide high throughput, fast turnaround analysis of micro and nanofibre pulp morphology. Commercial fibre analysis devices have previously been compared with varying results depending on their fibre analysis algorithm (Guay et al. 2005;Turunen et al. 2005;Hirn and Bauer 2006). One such commercial fibre analysis device is the MorFi (Techpap, France), which has been selected as the fibre analysis system for this study. Although MorFi is designed to analyse fibres in the size range produced within the pulp and paper industry, many studies have been conducted with MorFi as an indirect nanofibre characterisation tool, as shown in Table 1.
Typically, MorFi is used to add another facet of characterisation for CNF pulp, but the fact that it cannot detect nano-scale fibre dimensions is rarely taken into account (Espinosa et al. 2020). Alternatively, MorFi has also been used as a quality control tool to inform when sufficient mechanical processing has been applied to the sample to reach a desired morphology setpoint. Less commonly, MorFi been used to conduct detailed characterisation of the nanofibre pulp morphology (Lacerda et al. 2013;Rol et al. 2018), and integrating results with additional characterisation methods to assess emergent relationships (Rol et al. 2019;Espinosa et al. 2020). This style of investigation is valuable, because information obtained at the fibre population level can greatly aid process development strategies and product quality monitoring (Haapala et al. 2013).

Predicting nanopaper quality
Nanopaper fabrication and testing is one of the predominant methods to characterise cellulose nanofibre quality. It is well established that nanopaper mechanical performance is correlated to the degree of fibrillation and nano-scale morphology of CNF pulp. Although nanopaper fabrication and testing is a relatively facile method for CNF characterisation (TAPPI 2006), it is necessarily off-line, time-consuming, and destructive (Aguado et al. 2016). In addition, nanopaper mechanical characterisation can suffer from low precision due to defects introduced during the sample cutting procedure (Hervy et al. 2017). As such, a model that is capable of predicting nanopaper performance based off of predictor variables, such as fibre morphology data, is advantageous for CNF research and development (García-Gonzalo et al. 2016). In addition, conventional CNF characterisation typically involves off-line measurement techniques performed in a laboratory setting, which generally require high capital investment, highly qualified personnel for device operation, and involve complex and time-consuming post-processing and analysis of data (Balea et al. 2021). Alternatively, a fibre morphology analysis device could be employed as an on-line quality control tool for continuous CNF processing, with its simple operation, fast analysis, and low cost encouraging the commercialisation of CNF products (Aguado et al. 2016). While the integration of fibre analysis tools with pulp and paper production would greatly benefit CNF commercialisation, it has only been addressed in a handful of previous studies (Pande and Roy 1998;Oluwafemi and Sotannde 2007;Lin et al. 2014;Nasser et al. 2015;Aguado et al. 2016). To address this gap in the literature, this study investigates the relationships between fibre morphology and nanopaper properties for a broad sample population consisting of different varieties and sections of sorghum biomass-a globally important agricultural crop (Borrell et al. 2021)processed under 3 different energy levels. A statistical methodology has been undertaken, including correlation analysis, principal component analysis, and regression modelling, to assess the relationships between fibre morphology parameters and nanopaper mechanical properties, and to predict nanopaper performance from fibre morphology data.

Biomass preparation & CNF production
Biomass was cut into approximately 5 cm lengths, washed three times in distilled water at 80°C for approximately 30 min, and subsequently dried in a convection oven at * 55°C for 3 days. Dried biomass was ground using a Retsch SM300 mill (Retsch GmbH, Germany) at 3000 rpm with a 1 mm trapezoidal mesh screen. Oversized material was separated from the ground biomass with a 0.71 mm aperture sieve. Ground biomass was dispersed and stirred overnight at * 350 rpm in deionised water at a solid ratio of 1:20 (20 g of water for every 1 g of ground biomass). Chemical pretreatment was performed using a 2% NaOH solution (w/v) at 80°C for 2 h, stirred at * 350 rpm. NaOH treated (delignified) pulp was separated from the waste liquor through a fine mesh sieve (53 lm aperture) and rinsed extensively until the filtrate pH was below 8. The delignified pulp suspension was diluted to a setpoint of 0.5% (w/ v) using a Mettler Toledo moisture analyser.
Fibre morphology analysis 10 g of CNF suspension at 0.5% (w/v) was randomly sampled and added to approximately 1 L of water for fibre analysis using the MorFi Compact analyser (Techpap, France) equipped with CCD video camera, a high magnification optical flow cell and MorFi R.10.07 automatic analysis software. According to the default settings of the MorFi device, Fibres are classified as elements with a length between 100 and 10 mm and a width between 5 and 75 lm, while Fine elements have a length between 5 and 100 lm and a width \ 5 lm. Material below the optical resolution of this device (length \ 5 lm) were not captured in the analysis. Four technical replicates were run for each sample. The MorFi parameters are outlined in Table 2.

Nanopaper mechanical properties
Nanopaper was prepared from CNF pulp according to our previous work (Pennells et al. 2021), as described in brief below. CNF pulp was dewatered and dried into a nanopaper handsheet using an automatic Rapid Köthen handsheet former (Xell, Austria) according to the ISO 5269-2:2004 standard operating procedure (International Organization for Standardization 2001). First, the CNF suspension was dewatered under -0.7 kPa vacuum until sufficiently dry, as assessed with a P-2000 handheld moisture analyser (Delmhorst, USA). Subsequently, the wet cake was dried under vacuum at 110°C for 20 min. Nanopaper handsheets were conditioned in the laboratory (25°C, 50-75% RH) for at least 1 day prior to mechanical testing. Up to eight rectangular nanopaper strips (L = 150 mm, W = 15 mm) were cut out from each handsheet, and up to two handsheets were prepared for each sample. Tensile properties of each nanopaper strip were measured using an Instron model 5543 universal testing machine (Instron Pty Ltd., Melbourne, Australia) equipped with a 500 N load cell. Tensile index was calculated with Eq. 1, where r T w is the tensile index or specific tensile strength per unit weight in Nm Á g À1 , UTS is the ultimate tensile strength in Pa, and q n is the nanopaper density in kg Á m À3 .
Toughness was calculated as a numerical approximation of the energy absorbed by the nanopaper strip, according to Eq. 2, where U T is the toughness in MJ Á m À3 , 0 is the zero-strain starting point, and f is the nanopaper failure strain.
Statistical methodology

Correlation analysis
Following the generation and compilation of fibre morphology data, for which the four technical replicates for each sample were calculated into a mean, two Pearson's correlation matrices were built to assess the relationships between fibre morphology parameters and nanopaper mechanical properties for all sorghum varieties, sections, and energy levels. The two correlation matrices included: (1) the correlation between each fibre morphology parameter, and (2) the correlation between each fibre morphology parameter and each of the four nanopaper mechanical properties.

Principal component analysis
Principal component analysis (PCA) was performed using the prcomp command of the R statistical where O i and P i represent the observed and predicted values for each sample (of size n) (Ritter and Muñoz-Carpena 2013). RMSE was normalised by the mean value of each nanopaper metric.

Model validation
For model validation, an additional series of CNF suspension was prepared through HPH processing over an extended mechanical energy series. The biomass sample used for this analysis was the Sugargraze variety with all sections combined in equal proportions. CNF pulp was prepared according to the mechanical processing conditions outlined in Table 3. Nanopaper handsheets were fabricated from the prepared CNF pulp and tested in accordance with the previously described experimental methodology. A graphical overview of the experimental methodology and statistical analysis employed in this study is presented in the Graphical Abstract.

Fibre Morphology Correlations
Initially, a correlation matrix detailing the relationship between each pair of fibre morphology parameters was calculated, as displayed in Fig. 1. Using a significance threshold of x ! 0:7, a total of 27 of the 210 pairwise correlations were considered significantly correlated.
Relationships of interest included the positive correlation between fibre width (fibre_W), fibre coarseness (fibre_coarse), fibre curl index (fibre_curl), macrofibrillation index (MF.index), and broken fibre content (fibre_broken), the inverse correlation between these parameters and fibre content (fibre_cont), and the inverse correlation between kinked fibre content (fibre_kink.cont) and mean fine area (fine_A) and length (fine_L). Following the analysis of pairwise correlations in Fig. 1 Fig. 2 Principal Component Analyses for: a the association between Nanopaper properties grouped by HPH energy level; and b the association between fibre morphology parameters grouped by HPH energy level for a single nanopaper property was the inverse correlation between Young's modulus and mean fibre width (-0.8). Other inverse correlations between fibre morphology parameters and nanopaper properties include fibre coarseness (fibre_coarse) for all nanopaper metrics besides strain at break (-0.51 to -0.8) and mean fine length (fine_L) and mean fine area (fine_A) for Young's modulus (-0.69 and -0.71, respectively).

Principal component analysis (PCA)
Following the correlation analysis, a series of PCAs were run to visualise the grouping and variance of fibre morphology data across the entire sample population in a reduced dimensionality format. Initially, a PCA was conducted on the nanopaper mechanical property data across all biomass samples and energy levels (Fig. 2a). The results from the nanopaper PCA demonstrate that the increase in mechanical processing energy from low to high shifted the data along the first principal component, which explains 61.3% of the overall variance. This shift was closely matched by the shift in nanopaper density and Young's modulus, and to a lesser extent tensile index (TI), toughness and strain at break. This confirms the existing notion that energy level has a strong positive correlation with these nanopaper material properties. Subsequently, a PCA was conducted on the fibre morphology data across all biomass samples and energy levels (Fig. 2b). The results from the fibre morphology PCA demonstrate the fibre morphology parameters that are correlated and inversely correlated with the mechanical processing energy (along PC1), which are known to relate to the nanopaper properties from Fig. 2a. Fibre kink properties and fibre content are positively associated with processing energy, and therefore nanopaper performance, while parameters such as fibre width, broken fibre content, and fibre coarseness are inversely associated with processing energy and nanopaper performance. Fibre morphology parameters that correlate to PC2 include the fine number and the fine content weighted by area, length, and length-weighted length, respectively. The variance in these parameters is more closely associated with different plant sections, specifically the leaf section, as demonstrated in the Supplementary Material. The two first principal components for the fibre morphology PCA explain 35.6% and 22.8% of the total variance, respectively.

Nanopaper regression models
Prediction of nanopaper mechanical properties from fibre morphology data was performed using two regression modelling techniques: Multiple Linear Regression (MLR) and Support Vector Regression (SVR). For both regression models, fitting of the fibre morphology parameters was assessed for each nanopaper metric based on their R 2 and NRMSE values, as seen in Table 5. Based on the MLR and SVR model outputs, it can be concluded that the given MorFi data best explains nanopaper performance in terms of tensile index (NRMSE of 0.4 and 0.23, respectively) and Young's modulus (NRMSE of 0.33 and 0.13, respectively). For the given dataset, both the MLR and SVR model outputs demonstrate a high level of accuracy for predicting nanopaper properties from fibre morphology data across all four nanopaper metrics (R 2 [ 0.88). A full overview of the MLR and SVR regression coefficients and parameters are provided in the Supplementary Material.
However, as the SVR model demonstrated the higher accuracy for predicting nanopaper properties, it was selected for further investigation in the subsequent analyses. Figure 3 portrays the measured and SVR predicted nanopaper tensile index, which demonstrates the high level of accuracy for this model for all sorghum samples across difference varieties, sections, and energy levels.
Considering the impact of the mechanical energy level on the accuracy of the regression model predictions, Table 6 demonstrates that medium and high energy samples had the lowest NRMSE values across the majority of nanopaper metrics (excluding toughness), indicating a higher fibre morphology to nanopaper prediction capability when higher processing energy was applied. For tensile index and Young's modulus, which were the metrics that the SVR model most accurately predicted in the previous section, low energy samples had the highest NRMSE across the energy series, while the high energy samples for these metrics were the most accurately predicted out of all energy levels, nanopaper metrics, and model types.

Validation of model predictions
Following the high level of accuracy achieved for the fibre morphology to nanopaper SVR model, experimental data for an additional HPH validation series was collected to test the fibre morphology to nanopaper model predictions within a new sample population. The HPH validation series extended the processing energy input for CNF production to range from a minimum of one pass at 200 bar, to a maximum of 3 passes at 1100 bar. The validation series was performed on an aggregated biomass sample of the Sugargraze variety with all sections combined in equal proportions. The predicted values for each of the four nanopaper metrics, based on the HPH validation series fibre morphology data and the previously established SVR model, was compared to the actual mechanical property results collected from HPH validation series nanopaper samples, to test the degree of overfitting of the initial SVR model. As seen in Table 7, the SVR model is once again more accurate than the MLR model for all nanopaper metrics besides tensile index, with Young's modulus again demonstrated as the most accurate metric for the SVR model. However, the accuracy of the validation series models was significantly lower than the original regression models, with

Influence of kinked fibres
The relationship between fibre kink and paper properties has rarely been described and has not been extensively elucidated in the literature thus far (Leopold and Thorpe 1968;Guangsheng et al. 2012;Sood and Sharma 2021). For tissue paper applications, the presence of kinked fibres increases material porosity and surface roughness, but negatively impacts the paper density and inter-fibre bond strength (Morais et al. 2021). However, in the case of a cellulose nanofibre system, fibre bundles that have undergone partial microfibrillation through homogenisation may be interpreted as fibre kinks due to limitations in optical resolution, which would be related with an increase in nanopaper strength. Alternatively, fibre kinks have previously been described as deformations induced by mechanical stress rather than by chemical pulping (Aguado et al. 2016). Therefore, their association with nanopaper performance could be related to the increasing energy applied to the fibre bundles over the HPH processing series, which induced fibre deformation.

Influence of fibre width and coarseness
Unsurprisingly, fibre width and coarseness are inversely correlated with all facets of nanopaper performance, such that less fibrillated materials with higher average fibre width yields lower performing nanopaper. Fibres with a larger width decrease the degree of fibre collapse during paper formation, impacting paper density and strength, in addition to reducing water retention properties (Morais et al. 2021). In addition, the higher size and rigidity of coarse fibres decreases the number of contact points and the bonding strength between fibres. Correspondingly, the wet strength of paper has previously been shown to be inversely proportional to the square of the fibre coarseness (Seth 1995).

Influence of fibre content
In the context of a cellulose nanofibre suspension, fibre content can be considered a proxy for the degree of fibrillation, such that the higher the number of distinct fibres present within a gram of material infers the disintegration of larger aggregated fibre bundles into smaller fibre structures. Fibre content only relates to the detectable cellulose microfibres (CMF), as a substantial fraction of the total fibre content has been liberated into smaller nanofibres through mechanical processing. Nanofibres that exist below the theoretical detection limit of the MorFi device (Fine element length \ 5 lm) are not accounted for in this parameter (Di Giuseppe et al. 2016). To take fibre content as a reliable proxy for the degree of nanofibrillation, an assumption must be made that the shift in micron-scale fibre content with increased processing energy is mirrored in magnitude by the shift in the nanofibre content, which itself is unable to be measured directly with precision and reliability. Many confounders are present in this parameterfirstly, a significant fraction of the nanofibre population within the sample is hidden from detection (Morais et al. 2021). In addition, with the increase in mechanical processing, the content of micron-scale CMF will firstly increase in number as cellulose bundles are disrupted and partially fibrillated, but subsequently decrease once they are sufficiently fibrillated below the MorFi detection limit, creating a non-linear trend with increasing mechanical energy. In summary, it is impossible to know the true fibre content (percentage of fibres between 100 and 10 mm in length) or fine content of any sample using the MorFi device. However, fibre content still appears to be a promising proxy for the degree of nanofibrillation due to the relatively strong correlation with nanopaper mechanical properties, as demonstrated in Fig. 2a and Table 4. In a broad sense, this challenge of accurately characterising the true nanofibre content of CNF pulp is pervasive across nanocellulose research (Foster et al. 2018). High resolution microscopy is unreliable due to previously discussed drawbacks such as the analysis of representative samples, the greater length scale of CNF material than the observation window, sufficient image quality, and the time-consuming postprocessing and analysis of images. Fractionation methods such as mechanical screening (Tanaka et al. 2012), gravimetric centrifugation (Ahola et al. 2008), and tube flow fractionation (Haapala et al. 2013) can assess fibre size distribution over multiple length scales, but are limited by their minimum size range for analysis and time-consuming operation. Fractionation and flow cytometry analyses are promising techniques for population level analysis of fibre size distribution, and have the potential to be used as a process quality monitoring and development tool. However, they are excluded from nano-scale particle analysis due to the limited particle size recognition range (Haapala et al. 2013). The MorFi device fits in a similar category to flow cytometry analysis as a potential on-line, high throughput process quality tool for monitoring micron-scale particles within CMFs or CNFs. While optical fibre analysis does not provide true quantitative information on the fibre dimensions and morphology of the material, it may provide valuable insights into the status of the micron scale sub-region of the material at the population level, which can be used to comparatively assess shifts in morphology across different source materials or mechanical processing levels.

Influence of random variation
Across the different facets of nanopaper performance, fibre morphology parameters didn't correlate as strongly with nanopaper strain at break as they did with other mechanical properties. A hypothesis to explain this is the impact of non-fibre related factors on the strain at break value. A number of microfractures are expected to be imparted to the edges of some nanopaper strips during the sample cutting procedure, which could be a random process or associated with the biochemical composition and rigidity of the nanopaper handsheet itself (Pennells et al. 2021). As such, the number and size of microfractures imparted to the nanopaper strip would disproportionately impact the strain at break result for the tested nanopaper strip.
PCA for population level fibre analysis PCA has previously been employed to visualise and assess the properties that influence fibre quality (Legland and Beaugrand 2013;García-Gonzalo et al. 2016;Desmaisons et al. 2017). In the case of Desmaisons et al., a PCA methodology was employed to reduce the number of relevant parameters required for subsequent multivariate linear regression (Desmaisons et al. 2017). In the case of Legland and Beaugrand, a PCA methodology was employed to identify highly correlated variables and variable clusters within the fibre morphology data to eliminate redundant variables. This methodology was extended to identify groups of variables that cluster together to generate a hierarchical clustering dendrogram that delineated fibre morphology features based on size, elongation, and tortuosity. This methodology allows for the high resolution morphological characterisation of a diverse fibre population (Legland and Beaugrand 2013). Lastly, in the case of García-Gonzalo et al., a PCA methodology was employed to cluster together different paper properties and the effect of different biomass sources on paper properties (García-Gonzalo et al. 2016).
In this study, PCA was employed to assess and visualise fibre morphology and nanopaper properties from a large population of CNF samples across different biomass types and processing energy levels. Each point represents an individual sample replicate characterised by MorFi fibre morphology or nanopaper mechanical performance. The PCA ellipses represent the region defined by the 68% normal probability (Prager et al. 2020). The normal probability definition of the ellipse can be adjusted to generate a more or less rigorous ellipse visualisation. Arrows represent the fibre morphology parameters or nanopaper properties determined through CNF pulp and nanopaper characterisation, respectively. The arrow direction represents the correlation between the fibre morphology parameter/nanopaper properties and the principal component, and the arrow length represents the strength of the relationship between the parameter/ property and the principal component. The strength of this methodology is the visualisation of an array of fibre morphology data at the population level on a single plot, with the elucidation of biomass and processing factors that are associated with different morphology parameters and nanopaper properties through the grouping with probability ellipses.

Predicting nanopaper properties
The overarching goal of this publication is to analyse whether fibre morphology generated from an optical fibre analysis device can be used to predict the quality of cellulose nanofibres in aqueous suspension, without having to fabricate and test nanopaper samples. Achieving this goal would provide substantial benefit for industrial processing of nanocellulose, as this would allow for the adoption of an on-line, fast turnaround quality control tool and save time from the fabrication and testing of nanopaper samples. This goal was addressed by analysing fibre morphology and nanopaper mechanical property data using two modelling techniques: Multiple Linear Regression (MLR) and Support Vector Regression (SVR). All fibre morphology parameters generated by the MorFi device were included in the MLR model, as no additional effort is required to gather all data outputs when running this fibre analysis. However, this approach has the potential to lead to model overfitting, which was assessed through model validation. Considering that the NRMSE values were 4.1 to 7.5 times higher for the validation series over the original data series, this indicates that the SVR model was somewhat overfit for the original sample population. The inclusion of all fibre morphology parameters is a potential explanation for the result, with the exclusion of non-significant parameters expected to improve the degree of model overfitting. Subsequent work will investigate the adjusted R 2 of the model as an indicator of sufficient parameter inclusion. An additional factor that may reduce model overfitting is further optimisation of SVR hyperparameters.

Effect of energy level on nanopaper predictions
It is well established that the level of energy applied during mechanical processing of biomass into CNFs influences the fibre morphology and mechanical properties of the resulting materials, as demonstrated by PCA visualisation in Fig. 2. Therefore, it is important to assess the effect of processing energy level on the accuracy of nanopaper predicting models. The results in Table 6 demonstrate that low energy samples had the highest NRMSE values, which indicated that the fibre morphology to nanopaper prediction accuracy was lower for these samples. This result was somewhat unexpected, considering that the MorFi device is attuned to analysing micro-scale CMF that are more likely to be present in higher proportions at low energy conditions. The higher the mechanical energy level, the more likely that microfibres are deconstructed to nano-sized fibres that are outside the detection limit of the device. On the other hand, the nanopaper mechanical properties had a higher distribution between biomass samples at low energy conditions. Homogenisation at the high energy level led to a more homogenised data distribution between biomass samples, which allowed for a more accurate prediction of nanopaper properties for the regression models.
To visualise the accuracy of SVR model for the validation series dataset and further assess the effect of energy level on nanopaper predictions, measured nanopaper values were compared to the SVR estimated values for the data validation series (Fig. 4). Firstly, these results demonstrate that the medium processing energy region is most accurately predicted in the validation series, while the samples at the low and high energy extremes were less accurately predicted. This relationship held for both tensile index and toughness, the latter of which demonstrated a lower prediction accuracy for medium energy samples earlier in Table 6. This suggests that the applicability of the model is lessened when the processing energy conditions are broadened, indicating some degree of overfitting to the specific processing energy conditions of the initial dataset for the SVR model.

Conclusions
To address the challenges associated with scalable CNF characterisation, this study investigated the relationships between fibre morphology and nanopaper properties for a broad sample population of sorghum biomass. Important fibre morphology parameters elucidated through correlation analysis included the positive correlation between fibre content and fibre kink with nanopaper properties, and the inverse correlation between fibre width and coarseness with nanopaper properties. Regression modelling of the fibre morphology to predict nanopaper properties demonstrated superior predictive power for the machine learning based support vector (SVR) model. The SVR model was further validated through the replication data set over an extended processing energy range, yielding a lower prediction accuracy than the original dataset that implied some degree of model overfitting. This study constitutes a platform for future investigation targeted at predicting nanopaper mechanical properties from the morphological properties of CNF pulp, with a focus on improving model generalisability. Ultimately, the development of more accurate and generalisable models for the prediction of nanopaper mechanical properties from morphological data will enable scalable and expedient characterisation of CNF products in the future industrial setting.
Acknowledgments The authors gratefully acknowledge the Australian Government Research Training Program (RTP) scholarship and the University of Queensland's tuition fee offset, along with the Grains Research and Development Corporation (GRDC) Research Scholarship for their support of this research.We appreciate the assistance provided by Yan Luo and Didier Rech from Techpap, France for installation and maintenance of the MorFi device, along with assistance in the interpretation of fibre morphology data.The authors acknowledge the contribution of Dugalunji Aboriginal Corporation on behalf of the Indjalandji-Dhidhanu peoples through use of their equipment and in-kind support.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.