Using harvester data from on-board computers: a review of key findings, opportunities and challenges

Single-grip harvesters are equipped with an on-board computer that can normally collect standardized data. In times of increased mechanization, digitalization and climate change, use of this extensive data could provide a solution for better managing calamities-outbreaks and gaining competitiveness. Because it remains unclear in which way harvester data can contribute to this and optimization of the forest supply chain, the focus of this review was to provide a synopsis of how harvester data can be used and present the main challenges and opportunities associated with their use. The systematic literature review was performed with Scopus and Web of Science in the period from 1993 to 2019. Harvester data in form of length and diameter measurements, time, position and fuel data were used in the fields of bucking, time study, inventory and forest operation management. Specifically, harvester data can be used for predicting stand, tree and stem parameters or improving and evaluating the bucking. Another field of application is to evaluate their performance and precision in comparison to other time study methods. Harvester data has a broad range of application, which offers great possibilities for research and practice. Despite these advantages, a lack of precision for certain data types (length and diameter), particularly for trees exhibiting complex architecture where the contact of the measuring wheel on the harvesting head to the wooden body cannot be maintained, and position data, due to signal deflection, should be kept in mind.


Cut-to-length mechanization
In the field of mechanized forest operations, over 25 years have passed since the first single-grip harvester was equipped with a computer-based measurement system (Ponsse n.d). During this time, additional developments and refinements have ensued to offer a full on-board computer (OBC), which collects data in real-time. The data can contain information about production control, for example the product instructions in the form of price or demand matrices. In most instances, the data can be standardized where information regarding production reporting such as the harvested production with specifications of each harvested log, e.g., assortment, length, diameters, tree number, species, GPS-position, etc. The data also encompasses information concerning quality assurance and calibration (Skogforsk 2019). The global standard for communication between computers in forest machines is referred to as the "Standard for Forest machine Data and Communication" (StanForD) (Wodniok 2018), introduced in 1990 and later upgraded to StanForD 2010 in 2011 and used by most forest machine producers. The standard is a XML-based format with an open interface, which enables the extraction and reading of the collected data in a structured report. Data collected by harvester OBC's is becoming more relevant since the number of forest machines used in Europe and North-America (Skogsforum/Red. 2019; Harbauer Communicated by Thomas Seifert.

Electronic supplementary material
The online version of this article (https ://doi.org/10.1007/s1034 2-020-01313 -4) contains supplementary material, which is available to authorized users. 1 3 2018) is growing due to increasing awareness of productivity, stand and soil protection as well as ergonomics and work safety of single-grip harvesters (BaySF 2019). The collected data offers great potential of exchange and improvement of forest operations and the wood supply chain especially in times where the environment (nature, society) is rapidly changing.

Climate change
Forest ecosystems are being exposed to a changing climate that entails a higher frequency and severity of wind storms and drought periods (Gregow et al. 2017;Hogan et al. 2018;Rodriguez-Vallejo and Navarro-Cerrillo 2019). Winters are also increasing in temperature (IPCC 2007;Lindner et al. 2010), thus often removing the natural barrier that could be relied upon to mitigate the further expansion of certain pest outbreaks. To maintain constant wood flow to processing facilities, a larger proportion of forest operations are now performed on unfrozen soil conditions that are commonly at high soil moisture content, thus presenting a higher risk of soil disturbances. These variations can have significant influence on forest ecosystem dynamics in such a way that forests are progressively getting more instable (Dale et al. 2001). In an attempt to circumvent this instability, mixed species stands are often preferred to monocultures since the presence of multi species on a single site can increase stem volume and stability of the trees (Peltola et al. 2000;Dhôte 2005). However, within an operational context, more species entail a higher number of assortments, which further complicates bucking (process of cross-cutting a stem into desired assortments) and sorting.

Calamities-outbreaks
As a consequence of increased wind storms, there will be a higher occurrence of wind throws and wind breaks (Gardiner et al. 2010). Because of stressed trees and warm weather, bark beetles and other pathogens will find optimal conditions to attack and damage trees (Williams and Liebhold 2002;Schlyter et al. 2006;Cudmore et al. 2010). These are some of the most relevant reasons why is it imperative for the forest logistics to extract the timber as fast as possible out of the forest to mitigate bark beetle infestations and the associated timber value loss (Dale et al. 2001). It is therefore necessary to obtain knowledge as quickly as possible about the harvested and processed timber. Harvester data could be a start for this information that would also allow for a better planning of the timber transportation from the stand to the processing facilities.

Digitalization-Forestry 4.0
In times of the digital revolution, transformation processes are becoming increasingly fast. The society is going to be an information society, which is completely saturated with information and communication technology (Koshel et al. 2019). Physical and virtual worlds will be merged (BMVI 2019). In the case of forestry, data collection and its exchange among stakeholders will gain enormous importance as a central decision-making basis and will play a key role for technical developments and the automation of processes beyond company boundaries. The aim is to collect digital data that can be fully integrated to planning and wood purchasing, procurement and harvesting operations in addition to planning of wood processing (Müller et al. 2019;Söderberg and Pihlajamaki 2019). As an example of this change in paradigm, the German government is now offering funds for digitalization, and in particular where digitalization can be used throughout entire supply chains and for keeping pace with the international competitiveness of the forestry and timber sector (FNR 2018). This reiterates the importance of harvester data and how it can be used.

State of knowledge
Harvester data is already being used on different levels by researchers. In studies performed by Siipilehto et al. (2016) and Barth et al. (2015), harvester data was not the target objective but rather used as reference data for evaluating three different pre-harvest inventory methods and for assessing two inventory methods to predict product recovery. Harvester data was also used as test data for optimizing the bucking algorithm (Liski and Nummi 1995). However, other articles are focusing on harvester data as the main objective. For example, Holmgren et al. (2012), Caccamo et al. (2018), Maltamo et al. (2019) and Saukkola et al. (2019) predicted stem attributes or other forest inventory attributes by combining airborne laser scanning data (ALS) and harvester measurements, whereas Vesa and Palander (2010) used harvester measurements for modeling stump biomass in stands. In other instances, harvester data can also supplement harvesting productivity analyses (Purfürst and Erler 2011;Gerasimov et al. 2012;Eriksson and Lindroos 2014;Labelle et al. 2017). In a review article by Hiesl and Benjamin (2013), the productivity of international harvesting equipment was compared and a short overview of studies in which on-board computers were used was provided. Through their summary of on-board computer functions,  already provided some potential uses of harvester data. Irrespective of study objectives, harvester data is used as an absolute method or as a tool for research.
Despite these advances, it remains unclear in which way harvester data can contribute to digitalization and optimization of the forest supply chain processes. The focus of this review was to provide a synopsis of ways harvester data can be used. More specifically, the objective was to provide an overview of the different applications of harvester data and discuss their possible uses including benefits, shortfalls and areas for future research. In this review, harvester data is referred to as all data, which can be read out by an on-board computer of a single-grip harvester or an excavator with a processing head or in form of a protocol in the StanForD format. Data originating from additional instruments or sensors such as cameras or 3D laser scanning were not considered.

Databases
The systematic literature review was performed with Scopus and Web of Science (WOS). In Scopus, the Medline database was excluded because medicine topics were not relevant. In WOS, the literature results in the internal database called "Web of Science Core Collection" were used but chemical indexes were removed in the search mask. Both databases were limited to the document type "articles" to receive only high-quality literature. Furthermore, the advanced search was used as search mask for each database.

Research strategy
To obtain relevant articles relating to the objective of the review, titles, keywords and abstracts were searched with the grouped synonyms of the search terms "harvester", "mechanized harvesting", "data" and "on-board computer". The synonyms were combined with the Boolean operator OR and the search terms with AND. The time span settings were 1st of January 1993 till 31st of December 2019, because the first harvester on-board computer was available in 1993.
Selection of appropriate search terms was pivotal since the interest in the word harvester was limited to the field of forestry, not agriculture. After testing several trials, certain key words were omitted because their addition did not result in new relevant articles. Final search terms were: • ("single grip" OR (harvester AND forest*)) OR (timber W/3 harvest*) OR ("forest operation") • data OR measurement OR report • computer Search query for Scopus was: (((TITLE-ABS-KEY ("single grip" OR (harvester AND forest*)) PUBYEAR > 1992 AND NOT INDEX (medline)) OR (TITLE-ABS-KEY ((((timber) W/3 harvest*) OR ("forest operation"))) PUBYEAR > 1992 AND NOT INDEX (medline))) AND (TITLE-ABS-KEY (data OR measurement OR report) PUBYEAR > 1992 AND NOT INDEX (medline)) AND (TITLE-ABS-KEY (computer) PUBYEAR > 1992 AND NOT INDEX (medline))) AND (LIMIT-TO (DOCTYPE,"ar")).

Search query and article analysis
The search query resulted in 86 articles for Scopus and 37 for WOS (Fig. 1). The line in Fig. 1 shows the yearly sum of published articles from Scopus and WOS from 1993 to 2019 (total search results). Search results common to both databases were only counted once. Despite yearly oscillations, a light trend of increased yearly publications over the reviewed period can be seen. On average, three to four articles matching the search criteria were published per year with a maximum of 10 published in 2006.
Each article resulting from the search queries were evaluated in a three-step process.
1. Titles were verified to make sure that the field of study was forestry. 2. Abstracts were read and the ones that did not meet the following criteria were excluded (harvester data, planning with data/collecting data, log length/diameter, bucking, productivity, automatic time study/modeling). 3. Full articles were read and inclusion rested on the use of harvester data meeting the above-mentioned definition. In total, 23 articles fulfilled all criteria (Fig. 1). Eleven articles were reported solely in Scopus, three in WOS and nine more were common to both databases.

Descriptive analysis
From the reviewed literature, four main topics emerged as most relevant (inventory, bucking, time study, and improving forest operation management). The most frequently reported topic was bucking (10 of 23), where the data were utilized for bucking algorithms, evaluating the accuracy of harvester measurements, comparing manual versus automatic bucking, or predicting stem quality. The second most cited topic was time study (8 of 23). In this instance, harvester data was used to determine the accuracy of the observers during manual time studies. Moreover, manual time study methods were compared with those using harvester data, as well as those determining harvesting productivity. Time study data was also used to determine factors that can influence harvesting productivity. Inventory, relating to the use of harvester data for stand estimation and for reconstructing tree heights, was reported in 4 of 23 articles. A single article (1 of 23) was concerned with improving forest operation management of live operations by integrating scientific methods using harvester data.

Categories of harvester data
Beyond the initial topical classification, articles were also clustered based on how harvester data was used and categorized as: time data, measurement data, positioning data, and fuel data. For a comprehensive overview, the specific purpose of data use from each article is presented in Table 1. The most frequently used data type were measurements originating from the harvesting head, in particular length and diameter of the log or thereof resulting parameters as volume and taper curves. Then, there comes time, position and lastly fuel data.
For clarity, the table is structured topic wise, then the articles are grouped sub-topic wise following a chronological order. The results are presented according to the table.

Inventory
For inventory purposes, stand parameters can be predicted. For example, harvester collected stem data was used as input in different locally adaptable nonparametric Most Similar Neighbor (MSN) methods to estimate stand characteristics (Malinen 2003). This was achieved by comparing different local adaption methods with the k-nearest neighbor (k-nn) MSN method via a stem database prototype, which resulted from the collection of harvester data. It contained mean stand characteristics such as, stand area (ha), stand age (year), basal area (m 2 ), basal area mean diameter (cm) and height of basal area median tree (m). As a result, the locally adaptable neighborhood (LAN) MSN method was more accurate than the k-nn methods. The local k-nn MSN method was not notably better than the k-nn MSN method (Malinen 2003). Rasinmäki and Melkas (2005) used spatial data in form of GPS position of the harvester, and the two probability density functions (distance and the angle from the harvester) at which the tree was cut were used in combination with diameter, length and volume data to simulate tree composition and volume of arbitrary regions of a harvested stand. The average estimated volume root mean squared error (RMSE) varied from 4 to 29% depending on the size of the subregions. With the use of tree location simulation, improvements in volume estimates (total and species-wise)  Brown et al. (2011) varied between 5 and 35% as compared to when only harvester data was used (Rasinmäki and Melkas 2005). Beyond stand parameters, it is also possible to predict tree parameters. In a study by Lu et al. (2018), diameter overbark (DOB) was used to estimate DBH overbark (DBHOB) of trees of any height up to 3 m above ground in a radiata pine (Pinus radiata D. Don) plantation. Harvester data was used to develop equations that predict the total height of harvested stems. In detail, a harvester dataset containing information about log length, overbark volume and SED overbark of all logs cut from each stem, were used to derive patterns for the simulated cutting of the constructed stem profiles. For the estimation of total tree height, both harvester and taper data were used to generate a dataset that contained stump height, number of logs, their length, large-end-diameter outside bark (LEDOB), SEDOB and the length of the top section of every tree. Furthermore, the harvester dataset was compared with the DBHOB and the estimated total length of each tree of the taper dataset to select the most similar stem for the tree. The developed equation for predicting total tree height from DBHOB, total log length, and SED of the top log was better than the conventional equations found in the literature (Lu et al. 2018).

Fig. 1 Search results of Scopus and WOS databases
In another study, Murphy et al. (2006) evaluated the use of harvester data for augmenting or replacing pre-harvest inventory data from the MARVL (Method of assessment of recoverable volume by log type) system of a company. In this instance, the harvester optimizer was used to destructively sub-sample trees of a radiata pine plantation to estimate volume and grade with the aim to test the effectiveness of the harvester optimizer as a pre-harvest inventory system. The method performed well and the value recovery of the harvest was 98% of what was predicted from the 12% or less pre-harvested trees. Higher sampling intensities provided even better agreement between the measured and predicted values. In addition, Murphy et al. (2006) compared the estimated and actual product outturns from the harvester with the MARVL inventory system. Total volume estimates of the MARVL system were 8% lower than the harvester optimizer volume and 14% less than the estimated value recovery. In a second analysis, harvester optimizer and MARVL diameters, as well as log length were compared to quantify the effects that the MARVL taper function had on total volume recovery and log product outturns. Results indicated that the harvester underestimated underbark diameters and the MARVL predicted diameters were partly incorrect. This is why Murphy et al. (2006) assumed that harvester collected stem data can be used to build site-specific taper functions. As a last analysis, the value recovery of non-optimizing versus optimizing harvesters was compared and the appropriateness of using swath cuts as a pre-harvest inventory method was examined. Therefore, different harvesters (optimizing, nonoptimizing) were used to harvest stands or swaths of a stand.
In this situation, non-optimizing harvesters produced 19% less total volumes as medium sawlogs and 20% more volume as pulp logs than predicted by pre-harvest-swath cutting with an optimizing harvester. This led to only 81% of the predicted value. Optimizing harvesters produced more logs with an overall lower average SED. Total stand value recovery estimates were better from optimizing harvesters used for actual cutting of the stands as well as for pre-harvest inventory swath cutting. Estimates by Murphy et al. (2006) indicated that up to 50% of the conventional pre-harvest inventory costs for the fieldwork component can be eliminated through the use of harvesters for collecting inventory data.

Bucking
A broad field of application of measurement data is bucking. An important sub-category of bucking is the optimization of the bucking algorithm. For example, Liski and Nummi (1996) used length and diameter measurements of Norway spruce (Picea abies (L.) H. Karst) trees to predict the unknown part of the stem by using an expectation maximization (EM) algorithm. The prediction was based on the known part of the current processed stem and on the knowledge of previously processed trees. The stem curve estimate was determined utilizing these new measurements. The results were that longer known stem parts lead to better predictions and that the second-degree polynomial model proved to be a good choice in most cases. Furthermore, the predictor with independence structure provided the most promising starting point for the implementation of stem curve predictors in harvesters. These results provided important knowledge for developing automatic bucking systems of modern harvesters (Liski and Nummi 1996). Measurement data of Norway spruce was also used in a bucking optimization algorithm to compile tree profiles required by the genetic algorithm (GA) in order to optimize the system that searches for stand specific price matrices (Kivinen 2004). Moreover, Kivinen (2004) used real stem data for pre-adjustment of log prices, which lead to better results than simply using price matrices controlled by estimated stem data. The simulation showed that even if log prices were adjusted, the pre-control of price matrices did not improve the fit between the overall demand matrix and the global output matrix. From a previously harvested block, collected harvester stem data was used in a developed adaptive control heuristic to improve the prediction of appropriate prices and log specifications to meet market operational constraints (Murphy et al. 2004).
Harvester stem data was also used for comparison with data from pre-harvest inventory. Results indicated that previously collected harvester data improved meeting the order book and targeted proportions the most (19-26%), followed by pre-harvest inventory data (17-22%), and the combination of both datasets lay between the results (Murphy et al. 2004).

3
In a study from Kivinen (2006), harvester data in form of taper curves were used in GA-controlled demand matrices in parallel with the overall demand and price matrices of each log product to optimize bucking, in such a way that the fit between the required overall log demand distribution and the cumulative log output distribution could be improved. Results showed that the GA-controlled demand matrices improved the bucking from 22 to 103% compared to the overall demand matrices, used as reference Kivinen (2006). Moberg et al. (2006) took a different approach of bucking optimization. In their study, the relationship between diameter at breast height (DBH) and the largest small-enddiameter (SED), originating from the harvester on-board computer, was used to automatically identify saw logs containing sound-knots suitable for appearance-grade sawn wood to produce sound-knot sawn wood in center boards. A low sound knot quotient (SKQ)-level, which meant strict settings, led to a good in-grade sawn wood classification, but there was still a proportion of appearance-grade sawn wood in the out-of-grade logs. The harvester was able to identify about 80% of the total available furniture grade sawn wood (Moberg et al. 2006).
Other articles focused on the accuracy and errors of bucking. Chiorescu and Grönlund (2001), for example, used simulated harvester measurements in modeled scenarios to investigate the impact of their accuracy on the theoretical sensitivity in comparison to other final product parameters such as sawlog features, sawing pattern optimization, and log positioning in a saw line. When considering the accuracy of diameter measurements, results indicated that 29% of the logs were incorrectly sorted, 45% sawlogs were overestimated and 55% were underestimated. It was determined that small improvements in the harvester's measuring performance could lead to considerable improvements in the wood transformation chain. When focusing on value loss, the diameter seemed to have more influence on the sorting than the length. In general, logs with a trimming allowance less than 50 mm always produced off-grade boards (Chiorescu and Grönlund 2001). In a study by Nummi and Möttönen (2004), the prediction accuracy of stem curves for low degree polynomial models under harvester data measurement errors were analyzed concerning the error type. For harvester data of pine trees, the first-degree polynomial model with dependent measurement errors provided the best prediction, whereas second-degree polynomial models with independent measurement errors performed well for manually measured data. Marshall et al. (2006) analyzed the length and diameter measurement errors of harvesters concerning their impact on value loss by using an error estimation simulation model with an embedded optimal bucking algorithm. The error was calculated by subtracting the manual log measurement from the harvester measurement. It was found that there are value losses between 3 and 23% by harvesting with a harvester depending on the type of error (scanning length and diameter, bucking). Similarly, Leitner et al. (2014) analyzed the accuracy of harvester length measurements to determine what was the impact on the supply chain and economic value loss. Furthermore, they wanted to assess the influence of harvesting head calibration on the accuracy of length measurements. This was done by comparing the target length, which was shown on the OBC to the actual physical length. The comparison was done before and after calibration, as well as for different harvesting heads. Results showed that in 73.7% of the cases, logs were cut longer than the desired length and that logs processed during the winter were on average 2 cm longer as compared to other seasons. When logs were too long, the associated value loss was between 0.93 € and 1.90 € per cubic meter. Proper calibration of harvesting heads could improve the length measurement between 58 and 70% with a length difference of 0.5 cm (Leitner et al. 2014).
A different field of bucking optimization is the comparison of bucking methods. Stem diameter values and length measurements were used in the study from Labelle and Huß (2018) to feed the bucking algorithm of the OBC to predict optimal bucking in a Norway spruce dominated stand. The objective was to determine if automatic bucking lead to different harvesting productivity, product recovery and product revenue through the use of an optimization software as compared to manual bucking. Results showed that, when using identical price matrices, product recovery was slightly increased with manual bucking but that automatic bucking entailed a 17% higher harvesting productivity. Revenues of products stemming from automatic bucking were approx. 4% higher than in manual bucking.

Time study
In time studies, harvester data is generally used to develop productivity models using automatically collected harvester data or to analyze influencing factors of harvester productivity and to evaluate their performance in comparison to other time study methods.
In the article by Palander et al. (2013), which focuses on the development of an automated time study, automatically computed controller area network (CAN-bus) time consumption of a harvester OBC was used in combination with manually recorded time study data to develop an automatic time study method with data input of a harvester OBC. The main work phases could be identified by Palander et al. (2013) and the method provided good results.
Harvester data is used for analyzing influences on the productivity in the following articles.  firstly used time stamp data to calculate cycle time. More specifically, they used the processing time component of the effective work time from a.drf file (detailed information on time and mechanical events during the operation) to identify the beginning of each period of down time and to calculate work statistics. The DBH (40 mm > DBH > 400 mm) was used as a criterion for excluding trees from the study, as well as the commercial height and the top diameter (top diameter > DBH). With automatically collected harvester data as input, the influence of different factors such as terrain slope, operator, species, DBH, and shift on machine productivity was assessed. Therefore, the stem merchantable volume divided by the cycle time was used as the modeled dependent productivity variable. Independent variables concerning metric data were DBH, interactions between DBH and species, and DBH and terrain slope. For slope, the automatically collected geographic coordinates of a harvester were also used as input to create a shape file of all stem records. The shapefile was then overlaid with a slope surface to evaluate its effect, besides other factors, on machine productivity. Results indicated that DBH was the most influential variable on harvester productivity and that terrain slope and shift (early vs. late) had no significant effect, whereby the study area only had flat and mildly sloping terrain . A similar approach of evaluating influencing factors on productivity was performed by Rossit et al. (2019) where harvester data was used to calculate the productivity of harvesters and to compare the decision tree (DT) technique with the multiple regression analysis of . In this test, time data in form of time stamp records were used to calculate cycle time by determining the difference between consecutive stem time stamps. Afterward, productivity was calculated by dividing the volume of a processed tree by the respective cycle time. The dependency of DBH, operator, shift, and species on the productivity were also tested. Therefore, DBH was tested as an independent variable alone and in combination for the productivity variable. Moreover, they compared the Decision Tree (DT) technique with the multiple regression analysis of . The result was that the most significant influencing variable on productivity was DBH, followed by operator and species. According to Rossit et al. (2019), decision trees and k-means algorithms were deemed suitable methods for analyzing large amounts of data and the DT algorithm was suitable to model harvester productivity.
Focusing on a complete harvesting system, Apăfăian et al. (2017) investigated as part of a productivity study the performance of a mid-sized harvester-forwarder system in clear-cuts when supplementary processing tree-tops. They used daily initial and end fuel levels to estimate hourly fuel consumption of the harvester. The result was an hourly consumption of about 21 l resulting in a unit fuel consumption of about 1.1 l m 3 o.b. The delivery of one cubic meter of wood to roadside (entire harvesting system) required on average 1.7 l of fuel (Apăfăian et al. 2017).
In another study, Nuutinen et al. (2008) used time consumption data of the harvester's PlusCAN data logger for felling and processing as the reference for manual time measurements in order to investigate the accuracy and variation of experienced versus unexperienced time recording observers. Results indicated that inexperienced observers made more measurement errors, but there was no significant difference between the two groups (Nuutinen et al. 2008).
Another field that the reviewed articles are addressing is the topic of time and productivity studies is the comparison of manual versus automated methods. For example, time data of a harvester's PSION OBC was used in a work cycle analysis (Szewczyk and Sowa 2017). This cumulative method was compared statistically with cumulative video recording time analysis and the standard snap-back timing method, with the result that snap-back timing was slightly more accurate than the cumulative timing methods. The OBC PSION method had significantly shorter mean duration of harvester work cycles clear-cutting (22%) and in thinning (approx.14%) operations as compared to video recording. Snap-back timing underestimated work cycle time in clearcut operations by 6%, but provided overestimations in thinning operation by approx. 12% (Szewczyk and Sowa (2017).
In a study performed by Strandgard et al. (2013), Stan-ForD stem files were used to create harvester productivity models, which were then compared with results obtained from a conventional time and motion method (video recording). Time differences between consecutive StanForD stem files were used to estimate cycle time and log volumes of the stem files were used to build sums to estimate the merchantable tree volume. Productivity was then estimated by dividing cycle time with merchantable tree volume. Furthermore, end height (total length of processed stem) was used as one of the filters to remove stems with broken tops or multiple-leaders. Merchantable tree volume was calculated for both time and motion methods (video recording vs. harvester data) on the basis of the stem files. In summary, merchantable volume of trees where the final stem section passes through the harvesting head without further cutting were slightly underestimated, because the OBC only recorded logs that were cut at the small end. Results for the time and motion comparison showed no statistical differences between harvester productivity models created by stem files to those created from the conventional time and motion method. However, stem file productivity models had a poorer fit than the time and motion models (Strandgard et al. 2013). Brewer et al. (2018) also modeled and estimated productivity of a CTL harvesting operation from harvester data and compared it with manual time study. This was done by using harvest time stamp to provide the exact time in hour, minute and second (h:m:s) format when processing of an individual tree was completed. The difference between consecutive time stamps determined individual cycle times 1 3 (seconds per tree). Harvester estimated tree volumes were calculated by summing the volume of merchantable logs from a respective tree. This technique led to underestimations compared to the manually calculated volumes, which is the reason Brewer et al. (2018) used individual tree volumes from manual measurements for the calculation of harvesting productivity for each individual cycle. When comparing both methods (harvester data vs. manual data (Time Study App)), volume estimates and DBHs differed significantly. However, the productivity of the harvester data method was not significantly lower to the one measured by manual time study. Brewer et al. (2018) reported that when the volume estimates are standardized, no statistical differences existed between different productivity models. Brown et al. (2011) used time and stem file information in a productivity model for harvesters to; (1) estimate rates and manage operations by harvesting entrepreneurs, (2) plan harvest schedules and estimate harvest costs by forest managers, and (3) allow researchers to evaluate harvester performance. The tool developed could import stem files from the harvester and apply user-selected filters to remove cycle times with large delays and trees with broken tops and multiple leaders. Results showed no statistical differences in productivity models from stem file approach compared to those produced from time and motion studies (Brown et al. 2011).

Limitations of literature review
Even though different search terms were tested diligently to obtain the most appropriate search query, it is possible that other relevant articles could have been omitted. Testing supplementary search terms more in the direction of digitalization and information technologies lead to different search results, since more specific computer related keywords could be used. For example, additional bucking algorithm articles could appear. Despite the expected benefits of an increased pool of articles, these search terms were not used since the resulting literature was too broad and out of context (e.g., in Scopus the search query TITLE-ABS-KEY (forest* AND machine AND data) PUBYEAR > 1992 AND NOT INDEX (medline) AND (LIMIT-TO (DOCTYPE, "ar")) led to almost 5000 search results). Conversely, using more specific search terms easily led to too narrow search results (e.g., TITLE-ABS-KEY ("single grip" AND harvester AND computer) PUBYEAR > 1992 AND NOT INDEX (medline) AND (LIMIT-TO (DOCTYPE, "ar")) achieved fewer than 10 search results), thus leaving out pertinent articles. This occurred frequently with the word "harvester", because of its dual application in forestry and agriculture. Another potential limitation can be that the search was focused on scientific articles. In computer science topics, it is often common to publish in conference papers or proceedings, because it is a quickly evolving research field.
Another reason why relevant articles might not have passed the filtering process could be that before the era of forestry 4.0 harvester data was not as important. They could be used in the methods, for volume calculations or time measurements, but only rarely be mentioned. For example, in productivity or time studies they could be used, but not explicitly described. This could also be estimated by the rising number of relevant published articles in the last 8 years. This is maybe why not in every article, such as Rasinmäki and Melkas (2005), , Lu et al. (2018) and Brewer et al. (2018) harvester data is explicitly mentioned in the title or in the keywords.

Time
Harvester data plays an important role for time and productivity studies. The easily and quickly available time data makes it possible to rapidly obtain cycle times and to calculate productivity. As Brown et al. (2011) reported, there were no statistical differences between productivity models from stem file approach and those from manual time and motion studies. It is also possible to evaluate the influence of different factors such as DBH, terrain slope, shift, operator and species on machine productivity. Another strong advantage of calculating productivity from harvester recorded data is that it does not require people to be sent to the stands and being placed in rather dangerous positions in relation to the harvester. This is why it is worth to consider, if expensive manual time studies are still necessary.
Concerning the accuracy of harvester data using for time and productivity studies, Strandgard et al. (2013) reported no significant differences in productivity compared to manual time and motion methods. However, stem file productivity models had a poorer fit (Strandgard et al. 2013). The lack of a field observer when using stem file productivity models is a considerable advantage in overcoming the Hawthorne effect (variation in the performance of forest machine operators caused by the knowledge that they are being observed) but can also become a disadvantage since no records of changing stand or field conditions can be collected. Palander et al. (2013) could identify the main work phases from harvester time data. In combination with manually recorded time data, the automatic time study model could be adjusted to improve accuracy.  could explain 73% of the variability of productivity with their mixed-effect model. Brewer et al. (2018) found out that no statistical differences existed between the productivity (models) from harvester data and manual time study. According to their productivity model based on a DT algorithm, Rossit et al. (2019) achieved an accuracy of over 90%.
Unlike these positive aspects, Szewczyk and Sowa (2017) found out, that snap-back timing was slightly more accurate than the harvester data method. The latter had significantly shorter mean durations of some harvester work cycles (Szewczyk and Sowa 2017). It is also important to note that stem volume and DBH estimates differed significantly from those obtained from manual measurements (Brewer et al. 2018), which is why only time data was used in this study. Nonetheless, it is possible to record more accurately the important work phases and larger volumes of data at a lower cost with the use of harvester data (Palander et al. 2013;Strandgard et al. 2013). Ultimately, the decision to select an automated or manual method depends on the goal and the conditions of a time study. Besides the advantage of relatively easily implementation of automatic data collection, which allows collecting large amounts of data for extended periods, new knowledge about the interpretation of new data formats, organization and processing of large databases is needed (Koŝir et al. 2015). It is also important that researchers do expand their objectives in accordance with the superior strength of their new methods (Koŝir et al. 2015).

Precision
In studies performed by Strandgard et al. (2013) and Brewer et al. (2018), the volume of harvested wood was generally underestimated by the harvester, but had no significant influence on productivity models when the volume estimates were standardized (Brewer et al. 2018). Concerning the estimation of tree stock volume of a subregion, the improvement of volume estimation was between 5 and 35% when using harvester data with tree location simulation compared to only using harvester data (Rasinmäki and Melkas 2005). The estimation of DBHOB at any height below 3 m was good, because errors were smaller than 1 cm for the first tree heights below 0.3 m (Lu et al. 2018). For the prediction of the total tree height, values between 1.25 and 1.35 m across the height range were reported. Stem parameter prediction from harvester data was quite satisfactory. It was possible to identify about 80% of the total available furniture grade sawn wood with the harvester (Moberg et al. 2006). Concerning bucking optimization, previously collected harvester data improved meeting the book order (Murphy et al. 2004) and the GA-controlled demand matrices also improved the bucking drastically (Kivinen 2006). Automatic bucking can also ameliorate the productivity and revenue of products as compared to quality bucking (performed by the harvester operator), particularly when trees with strong apical dominance are harvested (Labelle and Huß 2018). Conversely, when trees exhibit severe crooks and forks, such as the case with Scots pine (Pinus sylvestris L.) found in Germany, quality bucking generally generated a higher harvesting productivity and revenue (Labelle et al. 2017). This diverging result could be linked to the fact that the algorithm of the OBC was designed for pine trees with a simpler crown architecture than the trees harvested in the study. Furthermore, an operator performing quality bucking can see problematic sections approaching while processing occurs and can therefore react accordingly, a task that the bucking optimization system cannot currently achieve.
The versatility and usefulness of harvester data can be increased substantially when geopositioning is available. Cross-referencing harvester volume data with geospatial information of processed trees opens up new avenues for research and can strengthen our understanding of the effect of tree form on harvesting productivity (Labelle et al. 2016). When considering the positioning of the harvester or that of felled trees during clear-cutting operations, Hauglin et al. (2017) reported promising results where sub-meter accuracy was obtained for tree positioning with an integrated accurate positioning system based on real-time kinetic global satellite positioning. However, under partial harvest conditions, the presence of a continuous cover could greatly hinder the precision of the positioning system.

Value creation
It is important to take care of the measurement system of the harvester, since incorrect cutting and sorting can cause value losses that are very difficult to overcome in a later stage. In the study of Chiorescu and Grönlund (2001), nearly one third of the logs were sorted incorrectly due to diameter overestimation (45%) and underestimation (55%). When considering length, nearly 74% of the logs were longer than required in the study of Leitner et al. (2014), thus triggering a value loss between 0.93€ and 1.90€ per cubic meter. For example, Marshall et al. (2006) found that mechanized harvesting operations lose 18% of the potential value compared to 11% for motor-manual operations. Appropriate calibration of the harvesting head remains the most suitable method of improving length measurements.
Considering the findings from the results above it makes sense to firstly improve the harvester measurements, because of the largest error potential and its effects. Most errors are rather easy to address as they are not technical errors, but mostly managerial errors. Many forest entrepreneurs do not know which stem volume and bark equations their system is using and are often operating using the wrong functions. Performing the appropriate settings of the stem volume and bark equations as well as the calibration and regular control of measurement systems can lead to substantial improvements. In Germany, there exists a quality assurance system for the harvester measurement for forest entrepreneurs (KWF 2020). After appropriate calibration, the bucking sequence with its algorithm should be optimized and finally, the price and/or demand matrices should be adapted in order to maximize value of the harvest.

Areas of improvement for science and practice
For forest science and in particular forest growth and yield, possibilities of using harvester data instead or in addition to conventional inventories are promising. However, a crucial aspect to consider is that harvester-based inventory data are only available post-harvest. Stem profiles of harvested trees, which are generated in the harvester, can provide an opportunity for increasing the knowledge of tree growth and refine tree growth models. Through the establishment of ratios between merchantable and non-merchantable volume per tree, forest managers could better understand the impact of different silvicultural treatments on harvested volume. Within the same theme, using the estimation of tree height (Lu et al. 2018) for calculation of the total tree volume could provide more representative values for the calculation of harvesting productivity than simply relying on biomass expansion factors. In general, harvester data should be faster and less expensive to acquire as compared to conventional labor-intensive methods, thus opening the door to big data approaches.
Within a practical scope, the question of measurement accuracy is both relevant and timely in harvesting productivity studies. Determining an acceptable level of accuracy in harvesting productivity studies is a topic that requires further attention. If a harvesting head is properly calibrated, the bottleneck could quickly become the noise in the datasets produced over long periods of time as opposed to the accuracy of the measurements and the methods of analyzing them. With a standardized procedure, it would be possible to get a broader overview of the range of services of the machines depending on the objectives of the data user. Some key examples are harvesting productivity and fuel consumption, information that would prove highly helpful for forest entrepreneurs when preparing offers. In the broader context of digitalization, it will become even more relevant not only to use harvester data, but to provide input of data in relation to price and demand matrices in the harvester and integrate both incoming and outgoing data continuously within the supply chain and its stakeholders. Another worthy aspect is that with the large quantity and fast availability of data collected from processed trees, accuracy improvements could be made to inventory methods to better reflect the actual volume Malinen 2003). However, one has to consider that harvester data is only available post-harvest.
Monitoring data of machine parameters could also be helpful to reduce interruptions of working time and optimization of fleet management. Furthermore, the real harvested timber could be proven and how much timber is remaining as death wood in the forest.

Conclusions
In this systematic review, it is shown that despite being quickly acquired and readily accessible to users, harvester data remains comprehensive and under-utilized in many regions. Thus far, it has most frequently been used for bucking optimization and during productivity studies, while topics of inventory and improvement of operation management have received less attention. Despite some noticeable advancements, available data from harvester OBC's is often not used to its full potential. An avenue of future research lies in big data analysis where different approaches (data mining, machine learning, predictive modeling, etc.) should be further developed and tested. Moreover, by intersecting harvester data with other data sources, such as those collected at wood processing facilities, laser scanner, manual measurement (inventory, productivity) data and geo data (satellite and aerial data), the current methods could be improved. Therefore, in future research it should be analyzed how much time and money could be saved with the use of integrated harvester data. As the review is focused on time, measurement, position and fuel data, other harvester data types, such as machine monitoring data, could also be a worthy of future consideration. Extracting and using harvester data seems rather straightforward, but data ownership and protection are still lacking clarity and additional research could be directed at these topics.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.