Introduction

Wetlands are areas where the soil is saturated or inundated for some extended period and ecological communities have specifically adapted to these conditions. Wetlands are an important component of our environment, providing ecological services such as maintaining water quality (Crumpton et al. 2020), flood control (Tang et al. 2020) and carbon sequestration (Carnell et al. 2018; Macreadie et al. 2017). Wetlands are under increasing stress from threatening processes that include, but are not limited to climate change, land clearing and development (Burgin et al. 2016; Davis and Froend 1999; Edyvane 1999; Kingsford and Thomas 2002; Finlayson et al. 2013)(Anderson et al. 2023).

Wetlands in the Northern Jarrah Forest in Southwestern Australia face similar threats as wetlands worldwide. The wetlands in this region are experiencing decline due to pressures from vegetation clearing, altered fire regimes, and altered hydrology (Williams and Mitchell 2003). The task of protecting wetlands in the Northern Jarrah Forest is hampered by the absence of a consistent dataset capturing the location, type and condition of the region’s wetlands. Although wetland datasets exist in the surrounding landscape (e.g. the Swan Coastal Plain), there have not been any recent efforts to map the Northern Jarrah Forest. The Swan Coastal Plain has been mapped according to geomorphic wetland type (Semeniuk and Semeniuk 1995) and conservation importance (Department of Biodiversity, Conservation and Attractions 2017), which incorporates a vegetation condition scale (Keighery 1994) (Table 1). This has been achieved through a combination of field survey and interpretation of aerial imagery over the period 1986–1995. It is important that the location, type, and condition of Northern Jarrah Forest wetlands is assessed and documented so that they can be managed appropriately.

Table 1 Geomorphic wetland classification matrix taken from Semeniuk and Semeniuk (2011)

Alternative methodologies are available for mapping wetlands. The Interim Australian National Aquatic Ecosystem (ANAE) Classification Framework (Aquatic Ecosystems Task Group 2012) is used to classify wetland type at various scales according to attributes of geomorphic setting, dominant water source, hydrologic dynamics and habitat. A simplified Vegetation Assets, States and Transitions (VAST) framework exists for vegetation condition assessment (Thackway and Leslie 2006) (Table 2). These methodologies may be appropriate for classification based on remote sensing data and would deliver the added benefit of aligning with national guidelines and regional practices.

Table 2 The VAST framework for vegetation modification from Thackway and Leslie (2006)

The last decade has brought about fundamental changes in remote sensing and classification methods. A movement towards open data policies, development of cloud-based data archives coupled with advances in processing power and storage capacity mean that analysis can now be applied to time-series and provide more than a snapshot of the world. These changes combined with refinement of the methods are enabling higher quality mapping. With the launch of Sentinel 2A and 2B beginning in 2015, there is access to 10 m-by-10 m resolution imagery with a revisit of 5 days. This has enabled classification studies to use satellite imagery with finer resolution (Simioni et al. 2020; Araya-López et al. 2018; DeLancey et al. 2019; Kaplan and Avdan 2019; Minotti et al. 2021; Valenti et al. 2020). Further, convoluted neural networks (CNN) and decision trees (such as Random Forest™) are generating more accurate classifications. A recent meta-analysis has summarised the methodological traits which improve outcomes as including: the use of high spatial resolution imagery; utilising CNN, Random Forest™ or Support Vector Machine learning (SVM); using object-based image analysis; including time-series; and combining imagery with ancillary data (Mahdianpari et al. 2020). As a result, it is now possible to map land cover at a finer spatial resolution and capture more complexity in class attributes.

Many different wetland classification schemes have been used in remote sensing projects in recent years. Most classification schemes applied a description of hydrology and dominant vegetation (structure and/or species) (Corcoran et al. 2013, 2015; Ding et al. 2020; Hunter et al. 2012; Li et al. 2018; Liu et al. 2018; Minotti et al. 2021). But other classification schemes took approaches such as: percent annual inundation (Halabisky et al. 2018); hydrological system (emergent, lacustrine, riparian, riverine) (Fickas et al. 2016); broad grouping of open water, wetland and dryland (Araya-López et al. 2018; Heine et al. 2016; Valenti et al. 2020); hydroperiod (Schlaffer et al. 2016); likelihood of a wetland type (Heimhuber et al. 2018; Raney et al. 2018; Rebelo et al. 2017); dominant vegetation (Berhane et al. 2018; Zhu et al. 2017).

The ‘Canadian Wetland Classification System’ (Zoltai and Vitt 1995), which describes wetlands according to the three characteristics of nutrient availability, chemistry and hydrology is of relevance to this research. This classification scheme was applied across Canada over numerous studies and was significant in demonstrating that a structured scheme based on hydrology and not considering vegetation characteristics can be applied successfully on a large scale (Amani et al. 2020, 2021; DeLancey et al. 2019; Mahdianpari et al. 2017; Rezaee et al. 2018).

To maximise useful application of mapping for natural resource management and conservation the wetlands of the Northern Jarrah Forest should be mapped in a format that captures their extent, type, and condition. To be fit for purpose this dataset needs to be comprehensive, reliable, and aligned with wetland mapping projects from other regions of Western Australia. A traditional wetland mapping project, involving attentive interpretation of aerial imagery and intensive ground truthing is time and resource intensive, and as a result the Northern Jarrah Forest region is one of a number of large areas in the South West of Western Australia where wetland mapping does not exist. The use of existing environmental datasets, remotely sensed resources and machine learning offers a cost-effective approach to developing a wetland dataset for the Northern Jarrah Forest region.

The overall aim of this research was to use machine learning and remote sensing resources with a variety of existing environmental datasets to map wetland type, vegetation type and vegetation condition in the Northern Jarrah Forest region of Western Australia. Further, the study aimed to determine which wetland type and vegetation condition frameworks, or combination of, could be applied to produce the most reliable mapping of wetlands using Random Forest™. This study set out to achieve this through the following objectives: map wetland type; map vegetation type; map vegetation condition; and compare accuracy achieved using different classification combinations. Providing wetland delineation and degradation mapping, previously unavailable for the region, will enable further wetland research, streamline assessments of wetland significance, and facilitate conservation efforts. This methodology can be applied to numerous other WA and international regions lacking wetland mapping. This study contributes to the growing literature on using Random Forest™, remote sensing, and ancillary datasets to map ecosystems in general and/or the impacts of anthropogenic land use.

Study Area

The Northern Jarrah Forest region lies east of Perth and west of Narrogin in the South West of Western Australia (Fig. 1). The study area for this classification included the southern portion of the Northern Jarrah Forest subregion from the Interim Biogeographic Regionalisation for Australia dataset which covers 15,327 km2 (Australian Government Department of Sustainability 2012). The Northern Jarrah Forest has a warm Mediterranean climate with mean annual rainfall ranging from 1200 mm in the west to 400 mm in the northeast (Bureau of Meteorology 2023a, b). The vegetation of the region is typified by Jarrah-Marri forest over lateritic soils in the west, Wandoo-Marri woodland over clay soils in the east and Agonis shrublands over alluvial deposits.

Fig. 1
figure 1

Study area (black boundary), the southern portion of the Northern Jarrah Forest IBRA subregion (blue boundary)

Methods

The approach followed to map wetlands in the southern portion of the Northern Jarrah Forest the process involved collecting training data, explanatory variables and applying a Random Forest algorithm in GEE code. The training data was improved upon, and the explanatory and dependent variables were refined through an iterative approach by assessing the output of an initial classification attempt. These steps are explained in detail below and a diagrammatic representation is provided in Appendix D.

Initial Classification – Explanatory Variables

In the first run of the model, 70 explanatory variables were tested. These variables included data derived from Sentinel 2 MSI Level-2A GEE image collection and several relevant ancillary datasets. The first group accounting for 44 explanatory variables was the seasonal medians of the Sentinel 2 spectral (11 bands). The seasonal medians were defined as summer (335th to 59th day of the year), autumn (60th to 151th day of the year), winter (152th to 243th day of the year), and spring (244th to 334th day of the year)1, beginning from the start of Sentinel coverage in the study area (October 2015) to the time of classification (May 2022). The second group of variables were normalised difference pond index (NDPI), normalised difference vegetation index (NDVFootnote 1I) and normalised difference water index (NDWI) derived from the seasonal median spectral values. The third group of variables were topographic, including contributing area, Prescott index, topographic position index and topographic wetness index (derived from 1″ SRTM DEM-S CSIRO) and a digital elevation model with 5 m resolution. The fourth group of variables were land quality factors from the 2017 Soil Landscape Mapping (Department of Primary Industries and Regional Development 2017). These factors included flood hazard, subsurface acidification susceptibility, salinity hazard, water repellence susceptibility, soil water storage capacity, site drainage potential, waterlogging and inundation risk, water erosion hazard, and wind erosion hazard all of which were ordinal ranks for example “3–10% of map unit has a high to extreme hazard”. Where required, these explanatory variables were transformed into raster and uploaded to GEE code.

Initial Classification—Training Data

Training and validation data was obtained from several sources, including historical records (Table 3), as well as the interpretation of high-resolution aerial imagery. Coarse wetland delineation, the description of wetland habitat or the presence of wetland flora in historic records focused efforts to delineate reliable wetland polygons through interpretation of aerial imagery, this was conducted by an experienced wetland scientist. A total of 3,058 polygons were delineated and attributed to wetland type categories over a 316km2; five points from each polygon were randomly selected for the purpose of classification. Each reference polygon was classified according to five-tiers of wetland type, and a two-tiers of vegetation condition (percent of training polygons) (Table 4).

Table 3 Sources of historic records used for deriving model training and validation data when mapping wetlands in the Northern Jarrah Forest, Western Australia
Table 4 Dependant variable hierarchical classification scheme and proportion of training samples

Initial Classification – Training and Classification

The training and classification process was done using the ‘smileRandomForest’ javascript library available through GEE code. The initial training and classification run was applied to 19 dependent variable combinations (Table 5).

Table 5 Dependant variable combinations explored in the initial pass at classification of wetlands in the Northern Jarrah Forest of Western Australia

Initial Classification – Verification

The training data polygons were used to generate 30,520 random points with the same distribution amongst classes as in the training data which were used for verification. The output of the classification was sampled with the attributed points and confusion matrices were used to derive overall accuracy, producer accuracy and consumer accuracy.

Final Classification – Explanatory Variables

The 70 explanatory variables were optimised for the final classifications by conducting iterative regressions to remove variables with multicollinearity based on a variance inflation factor threshold of 10. Once a minimum set of non-multicollinear variables was obtained, explanatory variables were reintroduced one at a time to replace variables already in the set. Preference was accorded to importance indicated by the Gini Index reported in the initial classification. The explanatory variables for the classification of wetland type (hydroperiod), vegetation type and vegetation condition are detailed in Table 6.

Table 6 Explanatory variables used to classify wetlands in the Northern Jarrah Forest of western Australia according to wetland type (hydroperiod), vegetation type, and vegetation condition in the final classification

Final Classification – Training Data

An additional 1,896 polygons were delineated to balance the categories in the training data. The new polygons were only attributed for Type Level 3 (hydroperiod). This increased the area of the training data to 365km2 (2.4% of the study area). The training polygons divided into 2-hectare segments and 3 points from each were randomly selected resulting in 71,621 training points. The distribution of these training points amongst the wetland type classes was: ‘Permanent inundation’ (2%); ‘Seasonal inundation’ (19%); ‘Seasonal waterlogging’ (22%); ‘Upland’ (44%); ‘Bare Rock’ (2%); ‘Built up’ (7%); and Artificial Lake (4%). The training points for vegetation type and vegetation condition were taken from the original set of training polygons. These polygons were divided into 2-hectare sections and 3 points randomly selected from each polygon resulting in 65,652 training points. The distribution amongst classes for vegetation type and vegetation condition remained the same as in the initial classification.

Final Classification – Training and Classification

The training and classification used the ‘smileRandomForest’ javascript library available through GEE code. The initial training and classification run was applied to three dependent variables separately: wetland type level 3 (hydroperiod), vegetation type, and vegetation condition level B. First, we select the respective explanatory variables in Table 6 and apply the ‘sampleRegions’ function using the training points. Secondly, the sampled training points are used to train an empty Random Forest classifier. Finally, the trained classifier is applied to the stacked explanatory variable raster creating a classed raster which is exported. The classification results were output to a single band TIFF with a resolution of 10 m.

Final Classification – Post-Processing

To create the final maps, the classification output was post-processed. The output raster was smoothed using a majority filter with a radius of 1.5 pixels (15 m), which was iterated twenty times. The filtered raster was then transformed to a polygon format.

Final Classification – Verification

The training data polygons were used to generate 49,288 random points with the same distribution amongst classes as in the training data which were used for verification. Verification for vegetation type and vegetation condition used 36,977 random points. The output of the classification was sampled with the attributed points and confusion matrices were used to derive overall accuracy, producer accuracy and consumer accuracy.

Results

The initial classification (19 dependent variable combinations) gave an indication of which level within the classification hierarchy would return meaningful results while maximising detail on wetland type. The accuracy of both wetland type and vegetation condition was greater with less refined levels of classification (Table 7). When classified at a level of simplified land cover (wetland type level 1), the overall accuracy was 89.8%. This fell to 83% for ANAE classification (wetland type level 2), 79.1% for hydroperiod (wetland type level 3) and 72.7% for geomorphic wetland type (wetland type level 4) (Table 7).

Table 7 Overall accuracy achieved for different dependant variable combinations in the initial and final classifications when mapping wetlands in the Northern Jarrah Forest of Western Australia

The overall accuracy for identifying vegetation condition when classified alone was 89.3% for condition level A and 76.3% for condition level B (Table 7). The accuracy fell for every class when wetland type, vegetation type and/or vegetation condition were merged as the dependent variable (Table 7). When classified by hydroperiod or a lower level of detail, the output was consistent in producing objects which followed expected patterns of wetland occurrence across the landscape and were grouped at an appropriate scale rather than a large variation of pixels within a small area. The lowest performing model was geomorphic wetland type combined with vegetation type and condition level B with an overall accuracy of 54.3% (Table 7). When classified according to geomorphic wetland type, the output included small, scattered, intermixed areas of geomorphic type. Visual inspection confirmed that the best wetland type classification level to be targeted for refinement was hydroperiod as the results from geomorphic wetland type were too unstable. The results supported the need for further classification to be applied to wetland type, vegetation type and vegetation condition as separate dependent variables.

The final classification improved the overall accuracy of the wetland type level 3 (hydroperiod) classification to 82% (Table 7) with an out of bag error estimate (OOB) of 0.11. The most accurate class in the hydroperiod classification in terms of producer accuracy was ‘Upland’ at 92.4%, while the least accurate was ‘Artificial Lake’ at 43%. The best performing class in the hydroperiod classification in terms of user accuracy was ‘Artificial Lake’ at 96.8% while the worst was ‘Bare Rock’ at 15.8%. Further, ‘Built Up’ and ‘Permanent Inundation’ performed well in terms of user accuracy (95.7% and 92.4%), but poorly in terms of producer accuracy (56.2% and 77.5%), meaning there is high confidence that those features captured are correct, but less confidence that all features have been captured. The opposite is true for ‘Seasonal Inundation’ and ‘Seasonal Waterlogging’ for which there is high confidence that all features have been captured, but less confidence in the accuracy of the features in these classes (Table 8). The most important explanatory variables were DEM, Prescott Index, the Spring B11 median, Spring NDVI and the Summer B12 median according to the accumulated GINI index (Appendix A).

Table 8 Producer and user accuracy achieved for different classes in the final classification of wetlands in the Northern Jarrah Forest of Western Australia according to Wetland type (hydroperiod), Vegetation type, and Vegetation Condition

The overall accuracy achieved for vegetation condition level B increased to 80% (Table 7), from 76% (Table 7) with an out of bag error estimate (OOB) of 0.13. The best performing class in vegetation condition level B in terms of producer accuracy was ‘Residual’ at 92.9% while the worst was ‘Modified’ at 50%. The best performing class in vegetation condition level B in terms of user accuracy was ‘Replaced or Removed’ at 88.7% while the worst was ‘Modified’ at 64%. Further, ‘Naturally Bare’ performed well in terms of user accuracy, but less well in terms of producer accuracy, the opposite is true for ‘Residual’, ‘Replaced or Removed’ performed well in both forms of accuracy, while ‘Modified’ and ‘Transformed’ performed poorly in both forms of accuracy (Table 8). The most significant explanatory variables were DEM, Prescott Index, the Winter B11 median, Summer NDWI and the Winter B6 median according to the accumulated GINI index (Appendix B).

The overall accuracy achieved for vegetation type was 78%. This dependent variable was not tested alone in the initial classification (Table 7) with an out of bag error estimate (OOB) of 0.12. The best performing class in vegetation type in terms of producer accuracy was ‘Forbs, Grasses and Sedges’ at 86.5%, while the worst was ‘Scrubland and Shrubland’ at 53.5%. The best performing class in vegetation type in terms of user accuracy was ‘Bare’ at 94.6%, while the worst was ‘Scrubland and Shrubland’ at 69.9%. Further, ‘Forbs, Grasses and Sedges’ and ‘Forest and Woodland’ performed well in terms of user accuracy, but less well in terms of producer accuracy. The opposite is true for the class ‘Bare’ (Table 8). The most significant explanatory variables were DEM, Prescott Index, the Summer B8A median, the Winter B8A median and the Spring B2 median according to the accumulated GINI index (Appendix C).

The classification of wetland type across the southern portion of the Northern Jarrah Forest indicates that there are 62km2 of permanently inundated wetland, 766km2 of seasonally inundated wetland, 1,480km2 of seasonally waterlogged wetland and 76km2 of artificial lake. Wetlands occur in greatest concentration in the western portion of the study area, where narrow seasonally waterlogged wetlands dominate (Figs. 2 and 3). In the northeast portion of the study area, wetlands are distributed sparsely and seasonally inundated wetlands are the most common type (Fig. 4). In the southeast portion of the study area wetlands occur as broad, seasonally waterlogged systems (Fig. 5).

Fig. 2
figure 2

Classification of wetlands by hydroperiod in the northwest of the Northern Jarrah Forest of Western Australia study area

Fig. 3
figure 3

Classification of wetlands by hydroperiod in the southwest of the Northern Jarrah Forest of Western Australia study area

Fig. 4
figure 4

Classification of wetlands by hydroperiod in the northeast of the Northern Jarrah Forest of Western Australia study area

Fig. 5
figure 5

Classification of wetlands by hydroperiod in the southeast of the Northern Jarrah Forest of Western Australia study area

Vegetation condition varies between the wetland types. Half the area of permanently inundated wetlands were covered by vegetation of residual condition (45%), 19% were naturally bare, 14% were replaced or removed, 12% were modified, and 10% were transformed (Fig. 6). Half the area of seasonally inundated wetlands was covered by vegetation of residual condition (52%), 18% were replaced or removed, 15% were transformed, 13% were modified, and 1% were naturally bare (Fig. 6). Half the area of seasonally waterlogged wetlands has been replaced or removed (51%), 35% were residual, 8% were transformed, and 6% were modified (Fig. 6). Most artificial lakes were replaced or removed (95%), 2% were residual, 1% was naturally bare, 1% was modified, and 0.2% was transformed (Fig. 6).

Fig. 6
figure 6

Vegetation condition within different wetland types by hydroperiod in the Northern Jarrah Forest of Western Australia

Most wetland degradation in terms of vegetation condition occurs in the east of the study area (Fig. 7). There is also a string of heavily impacted subcatchments along the western edge of the study area. Despite this and localised impacts around dams and mining activities, the inner west of the study area is largely intact (Fig. 7).

Fig. 7
figure 7

Subcatchments with percent of wetlands by area in a modified or better condition in the Northern Jarrah Forest of Western Australia

Vegetation type varied between wetland types. Half the area of permanently inundated wetlands was covered by forest and woodland (56%), 31% were bare, 12% were forbs, grasses and sedges, and 1% were scrubland and shrubland (Fig. 8). The area of seasonally inundated wetlands was covered by forbs, grasses and sedges (42%), forest and woodland (40%), scrubland and shrubland (13%), and bare (5%) (Fig. 8). Half the area of seasonally waterlogged wetlands was covered by forbs, grasses and sedges (58%), 36% were forest and woodland, 4% were scrubland and shrubland, and 2% were bare (Fig. 8). Most artificial lakes were bare (97%), 2% were forest and woodland, 0.5% were scrubland and shrubland, and 0.4% were forbs, grasses and sedges (Fig. 8).

Fig. 8
figure 8

Vegetation classes found across different wetland types (by hydroperiod) in the Northern Jarrah Forest of Western Australia

Discussion

This study demonstrates the utility of using machine learning, remote sensing datasets and ancillary variables to map wetlands in the Northern Jarrah Forest of Western Australia which until now, lacked a dataset showing the extent, type, and condition of wetlands. The classification produced a mapping product that shows indicative wetland distribution, while capturing wetland type at a hydroperiod level and vegetation condition with accuracy above 80%. Wetland type was mapped accurately at the hydroperiod level (overall accuracy of 82%). The ANAE system type was mapped more accurately with an overall accuracy of 83%. Mapping vegetation condition using the VAST framework achieved an overall accuracy of 79.6%.

Results show that wetlands occur in greater concentration in the western portion of the study area, specifically as narrow seasonally waterlogged sites. In the northeast, wetlands tend to be more sparse and seasonally inundated, while in the southeast they occur as broad systems that are seasonally waterlogged. The geographic variation in wetland type may be attributed to climatic and landform influence. Climate may influence wetland distribution with rainfall increasing and temperatures decreasing going south. The mean annual rainfall since records began for Dwellingup in the west of the study area is 1227.1 mm; this drops to 644.7 mm for Bannister in the east of the study area (Bureau of Meteorology 2022). Landform may influence wetland distribution with a greater prevalence of steep gullies on the scarp and broad basin in the southeast. The wetlands in the western portion of the study are more likely to be seasonally waterlogged and smaller than those in the east. This might be attributed to shallower valleys with smaller contributing area, or it could be an impact of the vegetation influencing the training data. The wetlands in the northeastern portion of the study area were more sparsely distributed but more likely to be classed as seasonally inundated and cover larger elongated areas. This may be attributed to larger catchments creating systems with more contributing area. The southeastern portion of the study area was the only place that broad seasonally inundated and waterlogged flats were identified which can be attributed to the flat topography in this area. This relationship is similar to that described by another study in Maryland, USA, which found that TWI (closely related to contributing area) and normalised relief were an effective predictor in the periodicity of wetland inundation (Lang et al. 2012).

Degradation in terms of vegetation condition is concentrated in the eastern part of the study area. Most impacts are on seasonally waterlogged wetlands where 51% of wetlands by area were replaced or removed, compared to 14% and 18% in permanently inundated and seasonally inundated wetlands respectively. The distribution of degraded wetlands being concentrated in the east of the study area and most impacting seasonally waterlogged wetlands reflects the distribution of arable land. This trend is repeated in other parts of the country where flat seasonally waterlogged land is primarily cleared for agriculture such as on the Swan Coastal Plain where 95% of palusplain by area is cleared (Department of Biodiversity, Conservation and Attractions 2017). This information on wetland degradation in subcatchments can be used to steer conservation efforts in the Northern Jarrah Forest, however, as roughly a tenth of the subcatchments in the study area were smaller than 100 hectares, the use of subcatchments as management blocks to guide conservation prioritisation are too granular. In contrast, whole catchments are too coarse to derive meaningful insights. Managers should direct efforts towards aggregating smaller subcatchments into conservation units to match some of the larger subcatchments.

In this study, the greatest level of detail which could be classified was hydroperiod. Unfortunately, mapping at the geomorphic type achieved an accuracy of 73%. Although the difference in overall accuracy between the hydroperiod and geomorphological approach seems small, the effective difference in performance at producing interpretable results was considerable. This may be due to two factors: the lack of an explanatory variable for morphology and the training data quality. None of the explanatory variables tested in this study held a meaningful connection to morphology. A recent study applied a CNN to extract features from DEM to classify plains, platforms, hills and mountains in Shangdong, China, with up to 84.35% overall accuracy (Xu et al. 2021). This suggests that future models that tie topographic variables to morphology if interpreted by a model using object-based image analysis are likely to increase the applicability of these methods for mapping geomorphology. Furthermore, the distinction between geomorphic types is prone to subjectivity. For instance, it is difficult to distinguish between a very broad dampland and a small palusplain. As such, there will be overlap between adjacent classes in terms of geomorphology in both the historic wetland data and the newly derived training data. The study by Semeniuk and Semeniuk (1995) which led to the adoption of geomorphic wetlands in Western Australia, did not provide a methodology for delineating wetland geomorphology which could be applied programmatically. If clear metrics to describe geomorphic wetlands could be determined, the study Xu et al. (2021) shows that there is a good prospect for geomorphic classification using remote sensing.

While the overall accuracy for wetland hydroperiod was high, there were examples of misaligned mapping when the output is visually assessed. There is overtraining error in the northeast quadrant of the mapping due to sparse training data and poor generalisation. Nevertheless, the output from this study can be considered a useful source of indicative information for wetland distribution and condition in the southern portion of the Northern Jarrah Forest. The results can be used to guide focused investigation of reportedly heavily impacted subcatchments. Definitive wetland mapping for the Northern Jarrah Forest can be achieved through remote sensing if follow up efforts are made to balance the training data and conduct field verification of training attribution.

Another finding from this study is that the classifier performed better when dependent variables were mapped independently. For instance, mapping wetland type alone produced better results than mapping wetland type combined with vegetation condition. The initial hypothesis was that two degraded wetlands of different types might be more similar than a pristine and a degraded wetland of the same type. To the contrary, the best results were obtained by mapping dependent variables separately and then overlaying the results. This reflects the outcomes of a classification of freshwater wetlands and aquatic habitats in Lake Baikal, Russia, where the overall accuracy fell as more detail was targeted in a hierarchical class structure (Lane et al. 2014).

The model did not perform the same for all classes. The classes which performed best in accuracy were those that represented terminal states. ‘Permanent Inundation’ and ‘Upland’ performed best in the wetland type category while ‘Naturally Bare’, ‘Residual’, and ‘Replaced or Removed’ performed best in the vegetation condition category. Wetlands of transitory vegetation condition will have a variety of spectral signatures reflecting different types of disturbance and degrees of disturbance. Similarly, seasonal wetlands can vary greatly between more and less saturated before becoming inundated. As a result, they support different ecological communities with unique spectral signatures. Similar results were reported by Li et al. (2018) in their efforts to classify coastal wetlands, the author reported low producer accuracy for the class ‘Impervious Surface Area’ which they attributed to mixed land use associated with diverse spectral signatures.

Elevation and Prescott index were the most important input classification features. For each of the classification classes wetland type, vegetation type and vegetation condition elevation had the highest accumulated Gini index values, followed by Prescott index. There is a direct link between these variables and wetland type where accumulation of surface water is determined by relative elevation, similarly Prescott index is a measure of water balance. The link between these variables and vegetation type may simply be that dryland was much more likely to host woodland and grassland, while wetlands were more likely to host shrubland and sedgeland. The link between elevation and Prescott index and vegetation condition may be less causal and reflect the increased land clearing in the lower drier half of the study area. The importance of elevation has been reported in numerous wetland mapping studies (Franklin and Oumer 2017; Valenti et al. 2020). Topographic indices and spectral information were consistently more important than soil characteristics in the classification of classes. This was expected with topography driving wetland distribution and spectral information reflecting the outcome of increased moisture and dense vegetation.

This study utilised the ‘smileRandomForest’ javascript library, future efforts should investigate the use of other classification algorithms. A Random Forest based algorithm was chosen because this approach yielded positive results in the literature (Berhane et al. 2018; Liu et al. 2018; Mahdianpari et al. 2017). The aim of this study was to demonstrate that wetland type, vegetation type and condition can be delineated in regional Australia over a broad area using sparse historic reference material and without excessive labour. In application of similar methodology in other under-surveyed regions the application of other classification algorithms should be explored with aim of achieving more reliable wetland delineation.

Conclusion

This study set out to map wetland extent, type, and condition in the Northern Jarrah Forest of Western Australia using machine learning, remote sensing datasets and ancillary variables. We found that wetlands occur in greater concentration as narrow seasonally waterlogged sites in the west, more sparsely and seasonally inundated sites in the northeast, and as broad seasonally waterlogged sites in the southeast of the southern half of the Northern Jarrah Forest. The broad condition of wetlands in the Northern Jarrah Forest is found to be good, relative to surrounding IBRA regions, but the distribution of vegetation degradation is uneven. Degradation determined through vegetation condition is concentrated in the east of the southern half of the Northern Jarrah Forest and most impacts are on seasonally waterlogged wetlands.

Our study suggests that satellite imagery and topographic indices can be used to map wetland with a fine level of detail in the Southwest of Western Australia. Using easily accessible cloud resources and existing datasets, it is possible to map wetland type to the level of hydroperiod and vegetation condition using the VAST framework to produce a product with suitable reliability for indicative wetland mapping for land use management. While the results can only be considered as indicative, applying this methodology to other regions which lack wetland mapping would provide great value for conservation efforts. The methods detailed in this study can be adopted with minimal expense in terms of computing power and labour and are readily replicable. Early investigation indicated that mapping wetlands by remote sensing at the ANAE system type would yield more reliable results and align the Western Australian wetland mapping products with national standards. Future works should investigate alternative approaches to gathering training data efficiently in regions where there is low survey coverage in historic resources. The novel insight offered by this study is in the comparison of prevalent Australian wetland classification schemes suitability as dependant variables for mapping wetland at broad scales. Wetland delineation in a region that is only coarsely mapped is invaluable for wetland management. This classification methodology should be improved and repeated to monitor change in wetland condition in the Northern Jarrah Forest as threatening processes such as land clearing and climate change progress.