The need for reference machines when energy labelling electric household appliances that are tested to international standards

There are over 80 countries in the world that currently use some kind of energy label for electric household appliances. In Europe, as an example, a lot of appliances are obliged to have an energy label when shown for sale, including online. Energy labels give relevant information to the consumers to help them make an environmentally beneficial choice when buying a new appliance. However, the desire for an energy efficient appliance does not outweigh the wish for good performance. Therefore, some energy labels provide information about the performance of the appliance based on international performance measurement standards (hereafter: “international standards”). Indeed, within the one appliance, increased performance can often mean increased energy consumption, so a balance between these parameters needs to be made by product designers and users. Unlike measurements that are traceable to Systeme Internationale units through metrological traceability chains, there is no natural reference data for performance measurements. Therefore, some international standards use a reference machine to relate their testing results to. The comparison of test and reference machine eliminates variances, for example, due to the auxiliary materials used and the influence of manual preparation or assessment methods. Three international standards that are currently using reference machines are examined closely in this paper. It is assessed how the reference machines and their testing results are treated, whether the reference machines are comparable with their corresponding test machines and if the use of a reference machine can be considered beneficial for the testing procedure. Additionally, three key questions are developed that will indicate whether 13 other international standards for electric household appliances could also benefit from using a reference machine. The paper concludes with six recommendations for standardisation groups and energy policymakers that will help with deciding whether a reference machine should be implemented.


Introduction
Energy labels for electric household appliances are implemented worldwide to help the consumer to make an energy-efficient choice when buying a new household appliance. A study on washing machines, for example, showed that the energy and water efficiency are the most important aspects for European consumers when buying a new product (Alborzi et al. 2016). Jeong and Kim (2015) showed that South Korean customers were even willing to pay more for a household appliance that was labelled as environmentally friendly. However, although low energy consumption is important to most consumers, the performance of the electric household appliance also matters (Bengtsson and Berghel 2017;Hook et al. 2018). That is why a lot of energy labels do not only display the energy efficiency of the product but do also make demands on performance. Many energy labels do not display non-energy performance on the label itself, though. Often the minimum performance requirements are specified separately and are not visible on the label itself. This is in fact the case now for dishwashers and clothes washers in Europe, as an example, where minimum requirements for the cleaning performance are specified under the Ecodesign directive and is no longer shown on the European energy label as a performance parameter. For that kind of information, energy labels are referring to performance measurement standards (hereafter "standards"), linking the energy use of the appliance to the service for the consumer.
All labels in the European Union (EU), for example, refer to the testing methods of the European standards (EN) which, in turn, are based on the International Electrotechnical Commission (IEC) standards. Some of these standards use reference machines to compare their measurement results, while others do not. Therefore, the question arises: When and why are reference appliances in international standards for electric household appliances useful?
This paper examines whether the existing reference machines are useful and whether new ones would be of benefit for other standards. General recommendations are derived from what we have learned about which criteria should be applied when thinking about implementing a reference machine into an international standard.

Reference systems in international standards for electric household appliances
The IEC is one of the leading global organisations developing and publishing international standards. Experts, for example, from industry, consumer organisations, research institutes and testing laboratories have formed international committees to discuss the contents of standards (European Parliament and Council 2012;ISO/IEC Guide 21-1 2005).
According to the IEC, a standard is a document, established by consensus and approved by a recognized body, that provides, for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context. [...] Standards should be based on the consolidated results of science, technology and experience, and aimed at the promotion of optimum community benefits. (ISO/IEC Guide  21-1 2005, page 2) Although international standards are not legally binding themselves, they are, in fact, often referenced in national laws and regulations. Therefore, they are adopted at both a regional and/or national level in a variety of countries. The World Trade Organization acknowledged the potential to remove technical barriers for trade with international standards explicitly in the "Agreement on Technical Barriers to Trade" (European Parliament and Council 2012;ISO/IEC Guide 21-1 2005;WTO 1994).
Performing a measurement in metrology means to experimentally obtain quantity values that can reasonably be attributed to one of the seven base quantities: length, mass, time, electric current, thermodynamic temperature, amount of substance and luminous intensity. The quantity value measured through calibration is compared with a measurement unit, a measurement procedure, a reference material or a combination of these. When comparing to a reference material, a corporeal reference, it is very important that the latter is constant in time; otherwise, the measurement result would vary. Any measurement result can be related to a reference, for example, a unit of the Systeme Internationale (SI) through a so-called metrological traceability chain (BIPM 2008).
All measurements for international standards for electric household appliances that can follow the procedure explained do so. Measurements such as the water consumption in litres (1 L = 0.001 m 3 ) or the energy consumption in joules (1 J = 1 kgÁm 2 s 2 ) can be related to the SI units length, mass and time through a metrological traceability chain. However, there are also measurements that cannot follow this example. When measuring the cleaning or drying performance of a dishwasher, for example, these measurement results do not relate to one of the SI units. Therefore, in this case, the results are related to a reference machine (IEC60436 2015).
Usually, a measurement procedure that is implemented into an international standard needs to ensure repeatability and reproducibility. Repeatability is when the test results of a measurement procedure can be replicated in the same laboratory with the same staff. If they can be replicated in another laboratory with different staff, this is called reproducibility (BIPM 2008). According to Spiliotopoulos et al. (2018), an additional important point is that the measurement procedure in an international standard is relevant for the consumer. This is a conflict every international standard needs to manage: the compromise of having a robust measurement procedure that has a high level of repeatability and reproducibility on the one hand and, on the other hand, to ensure consumer relevance. According to Spiliotopoulos et al. (2018), a measurement standard needs to deliver repeatable, reproducible and valid results while having reasonable costs.
General assessment of international standards using a reference machine The main problem for the standardised testing of electric household appliances is the use of natural products in the process to ensure consumer relevance. An example is that the dishes are soiled with various foodstuffs in the method for testing the performance of electric dishwashers (IEC60436 2015). However, every natural product is vulnerable to natural variations. The characteristics of spinach, for example, are strongly influenced by the season and the kind of cultivation (Rimbach et al. 2015). Therefore, it is very difficult to really standardise the auxiliary material of the international standard for dishwashers. This is why a comparison with a reference machine is helpful in this case. The variations due to the auxiliary material will be mathematically erased by comparing the results of the test machine and the reference machine. Of course, this is only true if the reference machine reacts to changes in the auxiliary material in the same way as the test machine does. This aspect needs to be investigated and ensured before implementing a reference machine to an international standard.
Another conceivable solution would be to test one machine with different batches or different brands of foodstuffs. One could, for example, test a toaster with ten different types of bread and then average the results. However, this solution would be quite time-consuming and expensive, and the natural variations within the product, for example, throughout the year, would still be implemented in the measurement results. Therefore, this approach is not very practical as an alternative to a reference system in an international standard. The latter procedure, on the other hand, results in the reference machine becoming a kind of gold standard. A disadvantage can be that the operating principles are also fairly prescribed by comparing all the measurement results of the test machines with those of a predefined reference machine. Therefore, the possibility for innovations of the test machines can be limited.
Another cause of variations in a testing method is the use of manual assessment methods. Even if the assessment criteria are very well described, a manual assessment method is always dependent to the assessor and therefore subjective. This is also a reason why a comparison of test results with those from a reference machine can be useful. When both machines are assessed by the same assessor, the differences between different assessors can be removed through calculation (IEC60456 2010). This, of course, also applies only for a limited range of testing results. Regarding the cleaning performance of the dishwasher standard IEC60436, for example, there would be no possibility of eliminating certain influences by calculation if there is no soil left. If the reference machine and the test machine(s) only contain perfectly cleaned test load items after the test run, there would no longer be any way to differentiate between different test machines. The same applies to the case that too much soil is left. If all items get the lowest possible score, again the possibility of differentiating between the test machines is not available. That is why the IEC60436 established an intricate system consisting of the amount and type of soil, the way of preparing the dishes for the test run, the amount and composition of detergent and the programme parameters for the reference machine. This ensures that the reference cleaning performance always lies within a certain range of the scoring table between 0 and 5 (3.3 ± 0.4) in which the possibility to differentiate is given.
However, to achieve high levels of repeatability, the reference machine must give ongoing consistent performance from run to run. To achieve high levels of reproducibility, reference machines located in different laboratories need to all achieve very similar performance for the parameters against which test machine performance is normalised. And, in order to serve as a corporeal reference, a reference machine for performance measurement standards also needs to be similar to the test machine to create results that can actually be compared to each other (BIPM 2008).
Measurements in comparison to a corporeal reference For a long time, all measurements were performed in reference to a corporeal reference. From 1889 to 2018, every measurement of weight referred to the prototype kilogramme. This was the primary reference for the SI unit. Several nations had replications of the prototype kilogramme, that is, stored near Paris, for more suitability. Those were the secondary references. The Bureau International de Poids et Mesures conducted verification procedures for the secondary references at irregular time intervals. It was noticed that the prototype kilogramme also showed variations over time. As a consequence of this variability in the weight of the prototype kilogramme, it was the intention to replace this reference by an invariable reference, which is based only on physical constants. This finally took place in 2019 when the unit weight in the SI system was replace by a measure of the Planck constant h (BIPM 2019).
Another example of a corporeal reference can be found in the field of colorimetry. When measuring colours, the measuring instrument usually needs to be calibrated before first use. Therefore, manufacturers of colour measuring instruments provide a white disc often made of barium sulphate, BaSO 4 . This measurement of the white standard is often combined with a measurement of a value for black, for example, by measuring without any incidence of light into the measuring instrument. After this procedure, which is a simple form of calibration, the measuring instrument can be used for other colorimetric measurements (ISO/CIE 11664-3:2019).
In both cases, the corporeal references act as a source of validation for the measurement performed on another object. Hence, reference systems establish a relation between certain objects: one object is known and can, therefore, act as a means to which another unknown object is connected. A form of calibration needs to take place at (regular) intervals to verify the measurement procedure.
However, there is no corporeal reference for the performance of electric household appliances; for example, the cleaning performance of a dishwasher where natural soils are used to achieve consumer-relevant results. Therefore, there are no absolute values that can be used for comparison. That is why the reference object of such a measurement needs to look different than, for example, the one of the prototype kilogramme. Furthermore, the assessment of the performance of a dishwasher uses natural foodstuffs and assesses how much of this burned-in food is removed, respectively, remains on the load items. What is even more difficult from a point of measurement is that this assessment is done by visual inspection by trained technicians. Therefore, the measurement becomes less repeatable and comparable due to the inherent variability of the soiling agents and the subjective assessment method. That is why some international standards for household appliances, such as for dishwashers (IEC60436), use a reference machine as a corporeal reference that is handled in the same way as the test machine. Variations in the measurement process, for example, due to the use of different batches of auxiliary materials, are eliminated by comparing the test results of the test machine with the ones of the reference machine because both machines use the same detergent and soiling agents and are assessed by the same evaluators. To ensure repeatability, the reference machine must produce consistent results from run to run. To ensure reproducibility, all reference machines located in different laboratories must deliver the same (or almost the same) performance.
Several questions arise in this context, such as the following: Is the use of a reference machine as a corporeal reference working in a proper way? What are the requirements for the successful use of a reference machine within a reference system? What qualifications do a reference system in an international standard need to fulfil? And would it be more helpful or disadvantageous to use a reference machine in more energy and performance measuring standards for electric household appliances?
This paper examines these questions by looking at different international standards for electric household appliances. Those standards currently using a reference machine within their reference system are looked upon closely in the next paragraphs. Data collected by the standardisation committees responsible are evaluated in order to investigate the similarity of current test and their corresponding reference machines. After a short overview over all relevant international standards for electric household appliances, a generalised list of recommendations for the implementation of a reference machine is derived.

Methodology
This paper evaluates 19 international standards for electrical household appliances to answer the questions mentioned above regarding the usefulness of and the recommendations for reference machines in international standards. Three of them already use a reference machine, and therefore, they are looked closely in the next paragraphs. What kind of reference system is used in those standards, and how the reference machine functions in each case are described? Additional sources for this data are three experts that were interviewed for this paper. The corresponding expert from the standardisation committee responsible was questioned for each reference machine. Anna Wendker from Miele & Cie. KG was interviewed in regard of the dishwasher reference machine. John Johansson from Electrolux Professional AB provided information about the reference machine for the washing machine standard, and Albrecht Liskowsky from SLG Prüf-und Zertifizierungs GmbH was interviewed in regard of the vacuum cleaner reference machine. Furthermore, data of so-called round robin tests ("RRTs") for the individual standards are evaluated to determine how well the test machines are currently reflected by their respective reference machine. Therefore, all p values < 0.05 will be considered statistically significant. This will indicate how similar current test machines are to their corresponding reference machines.
Regarding those international standards that have not yet used reference machines, a collection of evidence is taking place to decide whether the implementation of one could be of benefit. Therefore, three guiding questions are used: Does the standard use natural products, such as foodstuffs or natural fibres? Are there any kinds of manual assessment methods, for example, for the preparation of the measurement or the assessment of the results? Is there currently an energy label for the electric household appliance examined in the standard? The answers to these questions will give hints regarding whether an implementation of a reference machine would be of benefit.
Finally, a generalised list of recommendations for the implementation of a reference machine into an international standard is derived from what has been learned in this paper. This will help the standardisation groups responsible for the decision whether an implementation of a reference machine would be of benefit for their standard. This may also generally help users of the international standards, such as those designing environmental policies, to understand where the limits of standardised measurement procedures are and to see what kind of improvement potential still exists at what costs.

Reference machines in international standards
There are five international standards currently that put parts of their test results into perspective by comparison to a reference machine. However, only the international standards for electric dishwashers, washing machines and vacuum cleaners define a testing method with a reference machine. The other two, the international standards for washer-dryers and cleaning robots, simply refer to the testing methods of the documents mentioned above and do not define their own reference machine themselves. Therefore, only the first three international standards, IEC60436 (for dishwashers), IEC60456 (for washing machines) and IEC62885-2 (for vacuum cleaners), will be examined in this paper regarding their specifications and uses of their reference machines.
The three international standards are presented in the following three subsections. Each section follows the general structure: -Description of the reference system -Description of the reference machine in more detail with & Comments from the standardisation experts responsible & Requirements for the calibration procedures -Data that do or do not show statistically significant correlations between the reference machine examined and current test machines. These results are discussed later on in the "Discussion of learnings and remaining questions" section.
An assessment of the possible benefits of a reference machine for other electric household appliance standards and a generalised list of recommendations for the implementation of a reference machine into an international standard is presented later on.

Dishwasher standard IEC60436
The international standard IEC60436 uses a reference system, consisting of a reference machine with a reference programme, a reference detergent and a reference rinse aid (IEC60436 2015). In the EN standard, this is adapted and used for energy label purposes (EN60436 2020).
The reference machine according to the standard is the model "G 1222 SC Reference". It is a household dishwasher that is run in parallel with the test dishwasher. This machine series "G 1222 SC Reference" is manufactured by the company Miele & Cie. KG only for use in the standardisation testing method. In addition to the information written in the international standard, Anna Wendker from Miele & Cie. KG provided information about the manufacturing of the reference machine in a personal conversation. According to her, after the manufacturing process is completed, the new reference machines undergo a so-called specification check in the Miele & Cie. KG laboratory. The devices are loaded with a clean load, and the test is conducted without detergent. The parameters that are listed in the international standard IEC60436 are tested. A primary reference machine serves as a validating comparison that is operated in parallel to the newly manufactured reference machines, which can be seen as secondary references. Furthermore, the test results are compared to the values and ranges that the international standard IEC60436 prescribes. According to Miele & Cie. KG all the parameters tested need to meet the prescriptions of the standard and the performance of the secondary references shall not differ from the primary reference too much. The ratios, as calculated in the IEC60436, shall be close to 1.0 which would indicate equality. Unfortunately, exact tolerances could not be disclosed by the manufacturer. If the newly manufactured machines do not meet the prescriptions, they cannot be released. The only adjustments Miele & Cie. KG conducts are the replacement of individual machine components, for instance, the dosing device for the rinse aid or the temperature sensor. Other parameters, for example, in the programming of the machine will not be altered to ensure the integrity of the manufacturing process. Machines that still do not meet the required values of the IEC60436, even after these changes, must not be sold (IEC60436 2015). The production on the one hand and the extensive tests that each reference device has to pass before it is sold on the other hand are the main reasons for the considerable costs. A new "G 1222 SC Reference" currently costs about 15000 to 20000 Euro.
Although the reference machines are tested in the Miele & Cie. KG laboratory, they should always be checked after delivery to the test laboratories before starting a test series. Additionally, a routine check of the reference machine should be done at least every 6 months. If the measurement results no longer match the prescriptions of the IEC60436, the manufacturer needs to be contacted and informed. The experts from Miele & Cie. KG then need to decide what interventions are possible and necessary for each individual problem (IEC60436 2015).
The reference programmes are also designed especially for the testing of electrical household dishwashers regarding the cleaning and drying performance. There are two reference programmes available on the "G 1222 SC Reference": one for the European application of the standard, called "IEC/EN", and one for the application in Australia and New Zealand, called "AS/NZ". The drying performance is assessed with a scale from 0 to 2 (with 0 being the worst and 2 being the best performance) for all applications. Regarding the assessment of the cleaning performance, the international standard foresees a scale from 0 to 5 (with 0 being the worst and 5 being the best performance). The reference system is designed to reach a target value of 3.3 ± 0.4 (when using the oven drying method) for one test series consisting of five to eight individual runs. This allows differentiation between the performance of the reference and the test machine and between different test machines (IEC60436 2015).
When a test series is conducted, the assessment of the cleaning and drying performance is carried out by trained technicians. When the assessment of a test series is finished, the test results of the reference machine are compared to those of the test machine. This comparison will generally eliminate variations due to the characteristics of the foodstuffs used for soiling the test load items, the detergent composition, the ambient conditions and the handling, as well as the visual assessment by the personnel (IEC60436 2015).
The actual values measured are used for the cleaning performance. This is currently changing for the drying performance. The upcoming amendment of IEC60436 foresees using a fixed value for the drying performance of the reference machines. Hence, the adjusting calculation for the drying performance uses the value measured for the test machine and a predefined value for the reference machine. This change is necessary because the last examination of the international performance standard found a diverging behaviour of the reference machine and current test machines regarding the drying performance. Apparently, the reference machine was not able to normalise the variation occurring in the drying process and the assessment thereof. In 2014, a RRT was conducted by APPLiA, the European association of the household appliance industry (formerly CECED: Conseil Européen de la Construction d'appareils Domestiques), the standardisation working group CLC TC59X WG2 and the University of Bonn. Two different dishwasher models were tested by 16 and 17, respectively, participating laboratories according to the European standard. The aim was to assess the clarity of the standard by showing whether there were significant differences between the results of all the participating laboratories. The linear correlations of the results of the test and the reference machines regarding the drying and the cleaning performance are shown in Figs. 1 to 2.
A Spearman's correlation analysis was run (see Table 1) to assess the relationship between machine A and B of the RRT and their corresponding reference machines because the data were mainly not normally distributed. The RRT found statistically significant correlations between machines A and B and their corresponding reference for the cleaning performance. For machine A, the significant positive correlation to its reference was a little stronger (r s = .4962, p = .0000) than for machine B (r s = .4082, p = .0002). On the other hand, there was no significant correlation found regarding the drying performance. There was a low positive correlation for machine A and its reference. This was not statistically significant, though (r s = .1171, p = .2585). This is true for machine B as well (r s = .2062, p = .0666).
The dishwasher RRT from 2014 could only show a correlation of the test results regarding the cleaning performance. The drying of the two test machines and their corresponding reference machine did not show a statistically significant correlation. Consequently, the use of the reference machine for the drying assessment would increase the uncertainty of the result measured instead of reducing it. Therefore, the standardisation group responsible decided to use the IEC60436 target value for the drying performance of the reference machine as a constant value to calculate the drying ratio (in Amendment 1 to IEC60436 2015).

Washing machine standard IEC60456
The international standard IEC 60456 also prescribes a reference system that is used in the EN standard for energy label purposes. It consists of a reference machine with a reference detergent and a variety of reference Energy Efficiency (2021) 14: 28 Page 7 of 18 28 programmes that match different test programmes of test machines (IEC60456 2010). The reference machine according to IEC60456 is Electrolux Professional AB's model "Wascator FOM 71 CLS". It is a professional washing machine and is supposed to be run in the same way as and in parallel with the test washing machine. The comparison of the professional washing machine and the household washing machine tested will give a measure for the relative performance and plausible results (IEC60456 2010). According to the expert John Johansson, Electrolux Professional AB does not test the newly produced reference washing machines for performance before sale. Thus, in contrast to the dishwasher and the vacuum cleaner standard, there is no prototype reference machine used in this case. However, the parameters that are decisive for performance-for example, power consumption, heating capacity and water level-are measured during production. In this way, performance is checked indirectly by ensuring that all influencing parameters are set correctly. For these tests, the same tolerances apply as those laid down in the IEC60456. A reference washing machine currently costs around 18000 Euros.
Before starting a test series, the international standard demands that the reference programme "Cotton 60°C" or "Cotton 40°C" without load is run. The testing results are then compared with the manufacturer's programming guide for the reference machine. If the values measured differ from the specifications in the programming guide, a calibration of the reference machine is necessary. The same procedure takes place after every test run of a test series conducted. The reference machine is generally calibrated at least once a year according to certified procedures or the manufacturer's instructions for calibration. An additional check according to the manufacturer's programming guide for reference machines should take place 6 months after every calibration (IEC60456 2010).
Several reference materials are used for the testing of the washing machine itself. The loading of the machine requires certain pieces of laundry that the international standard specifies. Furthermore, specially stained test strips are added to the test load. The assessment of the cleaning performance is not conducted manually but with a photometer. One test series consists of five individual runs. The reflectance values measured with the photometer are summed up (y-value sum) and then compared to those of the reference machine. This comparison eliminates variations due to the characteristics of the reference laundry items, the stained test strips, the detergent composition, the ambient conditions and the handling by the personnel (IEC60456 2010).
Regarding the washing machine standard, the standardisation working group responsible conducted a RRT in 2015. One test machine was sent to six laboratories which conducted performance tests according to the IEC60456 in comparison to the reference washing machine of the respective laboratory. The RRT measurements were performed with a full load with either two or three rinses and, additionally, with a half load and two rinses. It has been shown in previous RRTs for washing machines that the variability of the reference detergent and the test strips has the greatest influence on the variability of the measurement results. Therefore, these two factors were deliberately specified for the RRT WM 2015, thus, limiting the variability of the measurement results. Therefore, no significant correlation is to be expected for the following correlation calculations. Unfortunately, no other RRT data was available for this paper.
This paper examines the correlation of the cleaning performance, which is derived from the y-value sum and Regarding the calculation of the different correlations, the test data were only compared to the reference data if the two machines were run on the same day. This is important because, among other things, the reference machine is intended to normalise the environmental conditions in the testing laboratory. If the test and the reference machine are run on different days, this task cannot be satisfactorily completed, though, because different conditions may apply to the runs. Therefore, the results of only four laboratories could be considered and some only partially.
A Spearman's correlation analysis was run (see Table 2) to assess the relationship between the test machine runs of the RRT and their corresponding reference machine runs due to the data mainly not being normally distributed. The RRT found a significant positive correlation between the test machine and its corresponding reference for the cleaning performance, but only for the runs with a half load and two rinses (r s = .4162, p = .0385). The statistical test shows a positive correlation which is not statistically significant for a full load with two rinses (r s = .3820, p = .0965) and a slightly negative correlation which is not statistically significant for a full load with three rinses (r s = −.0964, p = .07325).
Regarding the alkalinity, there were significant correlations found for the runs with a full load. There was a positive correlation of r s = .6361 which was significant with p = .0026 for the testing results with two rinses and a positive correlation of r s = .6500 which was significant with p = .0087 for the testing results with three rinses. On the other hand, the Spearman's test did not show a significant correlation for the runs with a half load and two rinses (r s = .2816, p = .1727).

Vacuum cleaner standard IEC62885-2
The international standard IEC62885-2 also uses a reference system. The reference system consists of different reference materials and the use of a reference machine. According to the standard, the performance of vacuum cleaners is tested on different floor types, such as wooden floor and carpet, and with different testing The reference machine according to IEC62885-2 is a special device that is manufactured by SLG Prüf-und Zertifizierungs GmbH only for use in standardisation. The reference vacuum cleaner system (so-called RSB) is only used for the testing of dust removal from a specific carpet flooring (the test carpet from the manufacturer Wilsons Carpets) and run in the same way as the vacuum cleaner tested. The comparison of the RSB results and the ones of the vacuum cleaner tested will eliminate variations due to the characteristics of the carpet used (e.g. batch and condition), the ambient conditions and the handling by the personnel. One test series consists of three or five test runs. The calculation formula given in the international standard puts the measurement results of the testing laboratory with its own carpet and RSB (secondary reference) into perspective by adding an adjustment factor. This factor relates to measurement data of SLG Prüf-und Zertifizierungs GmbH with the prototype reference carpet piece and the prototype RSB, "RSB00" (primary reference) (IEC62885-2 2016). A  newly produced RSB will be tested by SLG Prüf-und Zertifizierungs GmbH to determine the adjustment factor. According to the expert Albrecht Liskowsky, SLG Prüf-und Zertifizierungs GmbH therefore conducts several tests-most importantly-they determine the dust absorption of the RSB with a passive and an active nozzle on the prototype reference carpet piece. The tolerance before correction is 3.0% (absolute dust removal percentage) for the newly manufactured RSB.
The production on the one hand and the extensive tests that each reference device has to pass before it is sold on the other hand are the main reasons for the considerable costs. A new RSB currently costs 24600 Euro. According to the standard, the RSB needs to be sent in for recalibration after 2000 cleaning cycles or at least every 3 years (IEC62885-2 2016). In 2018, a RRT was conducted (RRT VC 2018). According to Albrecht Liskowsky, the goal of this RRT was to investigate how the current correcting calculation of the testing results by putting them into perspective with the calibrated laboratory RSB can be improved. It was brought to attention beforehand that the current correction formula seems to discriminate very high performing vacuum cleaners by adjusting the value measured in reference to the RSB to a lower value. That is why the RRT examined the results of using two reference nozzles (a high performing and a low performing one), instead of vacuuming with only one nozzle for the RSB. The use of an extra nozzle changes the adjustment procedure from a two-point correction (one reference point and the zero point) into a three-point correction (two reference points and the zero point) for the dust pickup value of the test machine and the process becomes fairer. Therefore, the RRT examined four different test machines in comparison to the respective RSBs of the participating laboratories. Two higher performing and two lower performing vacuum cleaners, which differed regarding their nozzle, were sent around. Thus, in each case, ten laboratories investigated the dust pickup performance of one higher and one lower performing test machine with either an active or passive nozzle. The linear correlations Energy Efficiency (2021) 14: 28 Page 11 of 18 28 Fig. 5 a, b Correlations of the cleaning performance of a high performing test machine with an active (left) or a passive nozzle (right) and their corresponding reference machine with a low performing nozzle from the vacuum cleaner round robin test (RRT VC 2018) Fig. 6 a, b Correlations of the cleaning performance of a low performing test machine with an active (left) or a passive nozzle (right) and their corresponding reference machine with a low performing nozzle from the vacuum cleaner round robin test (RRT VC 2018) calculated with the corresponding reference machine using either a high or low performing nozzle are shown in Figs. 5, 6, 7, and 8. A Spearman's correlation analysis was run (see Table 3) to assess the relationship between the test machine runs of the RRT and their corresponding reference machine runs due to the data mainly not being normally distributed. When compared to the runs with a high performing reference nozzle, all four test machines show significant positive correlations. The high performing vacuum cleaner shows a statistically significant positive correlation with an active (r s = .4975, p = .0000) and a passive nozzle (r s = .7137, p = .0000), and the low performing vacuum cleaner shows a statistically significant positive correlation with an active (r s = .6477, p = .0000) and a passive nozzle (r s = .7904, p = .0000).
The same applies to the comparison of the RRT machine results with the low performing reference nozzle runs. The high performing vacuum cleaner shows a statistically significant positive correlation with an active (r s = .4866, p = .0000) and a passive nozzle (r s = .3869, p = .0000). This is true for the low performing vacuum cleaner as well, which shows a statistically significant positive correlation with an active (r s = .6915, p = .0000) and a passive nozzle (r s = .4232, p = .0000).
Regarding the change of the correction formula from a 2-to a 3-point correction, the test report of the RRT VC 2018 states that the new correction method is an improvement. This conclusion was derived from the values of the expanded uncertainty, which are relevant for assessing label class intervals and verification tolerances of an energy label. According to the standardisation experts responsible, energy label class intervals should be larger than the values for expanded uncertainty to avoid ambiguous declarations, whereas the verification tolerance levels of declared values should have the same order of magnitude as those for expanded uncertainty. However, this is not currently the case for all energy labels. In Europe, as an example, the range related to expanded uncertainty for both the high and low performing vacuum Fig. 7 a, b Correlations of the cleaning performance of a high performing test machine with an active (left) or a passive nozzle (right) and their corresponding reference machine with a high performing nozzle from the vacuum cleaner round robin test (RRT VC 2018) Fig. 8 a, b Correlations of the cleaning performance of a low performing test machine with an active (left) or a passive nozzle (right) and their corresponding reference machine with a high performing nozzle from the vacuum cleaner round robin test (RRT VC 2018) cleaners exceeds the European label class intervals and the verification tolerance levels. Therefore, it would be beneficial to reduce the range related to expanded uncertainty to align the European regulation with the perception of the standardisation experts, and the RRT VC 2018 showed that the 3-point correction can actually achieve that in comparison with the current 2-point correction. Therefore, the standardisation experts recommended implementing this new correction method into a revised version of the European regulation.
Overview and assessment of international standards without a reference machine Table 4 shows a listing of 16 international standards for different electric household appliances. Apart from indicating whether the standard already uses a reference machine, the table assesses three key questions that will help standardisation groups that do not currently specify a reference machine whether one could be useful. Firstly, it is important to answer whether the standard uses natural products for testing, for example, foodstuffs or natural textiles such as cotton or wool, that is, vulnerable to natural variation. The next question aims at answering whether there is a manual component in the measurement standard, for example, during the preparation of the measurement, such as soiling of dishes by hand in the IEC60436 or a visual assessment of performance results, which is vulnerable to subjectivity. The answer to the third question shows whether there is an energy label for the standardised product already, respectively, or whether there is one planned. For this column, Europe is taken as an example. The existence of an energy label is important to understand the requirements of a measurement. An energy label demands repeatable and reproducible test results, whereas testing for a test magazine, for example, only needs repeatable results.
Thus, the three criteria considered for this assessment are as follows: -Use of natural products -Manual assessment -Existence of an energy label (with Europe as an example) Almost every one of the 16 international standards for electric household appliances uses some kind of natural auxiliary material, such as foodstuffs or natural textiles, with the notable exceptions of electric kettles and household refrigerating appliances. Additionally, half of the international standards examined include a manual assessment of the performance of the electric household appliances. However, only six of the international standards are required to provide an energy label up to now. This is important information for the decision whether there is a need for a reference machine or not. If a testing laboratory only makes comparative testing of electric household appliances, a reference machine is not necessarily required. But as soon as the laboratory wants to produce repeatable and reproducible data, for example, for an energy label, a reference machine might be of benefit to level out both person-to-person differences as well as lab-to-lab differences. Of course, these differences become greater when the measurement procedure uses natural products and/or manual assessment methods. When an international standard only uses standardisable auxiliary materials and no manual assessment, on the other hand, such as the standards for electric kettles (IEC60530) and for refrigerators (IEC62552-2), a reference machine is not necessarily required. The combination of all three criteria considered gives a hint that a reference machine might be of benefit for the measurement procedure of the respective standard. However, as Spiliotopoulos et al. (2018) and Lekov et al. (2014) pointed out, an international standard also needs to be economically justified. Thus, the prescribed measurements need to impose only reasonable costs and the environmental benefits that result from an international standard need to exceed its burdens. Therefore, a detailed cost-benefit analysis is required before implementing a reference machine to an international standard for electric household appliances. The standardisation group responsible can only take a decision after considering all associated costs and comparing them to the overall benefit of repeatable and reproducible testing results.

Discussion of learnings and remaining questions
The following paragraphs summarise the learnings of this paper up to this point. After this summary, some remaining questions are discussed.
The standards for dishwashers, washing machines and vacuum cleaners use similar methods for several aspects. They all use a reference machine that is run in the same way as the test machine. The goal is to eliminate variations, for example, due to auxiliary materials and human performance.
The reference machines required in the standards all incur costs to be considered. However, compared to the total costs of the respective tests, the reference machine is only one expense item. Other cost items to consider are personnel costs, costs for electricity and, in the case of washing machines and dishwashers, water, further test materials (such as test laundry items, test dishwasher load items or test floors), auxiliary materials (such as soil strips, soiling food agents or test dust), measuring instruments and their supply as well as chemicals (e.g. cleaner and rinse aid or for test water treatment). Overall, however, the costs of a reference machine must be offset by the improved quality of the measurement results. After all, a manufacturer invests a lot of time and money in the development of a new machine anyway. And the bottom line is that costs can be justified if the results are accurate, reproducible and on target.
A RRT was conducted for all standards to compare the testing results of different laboratories that tested the same appliance in comparison to their own reference machine. It was shown for the RRT VC 2018 that the values of the RSB and the test machine have a consistently high statistically significant correlation. This seems to be independent of which nozzle was used for the RSB and for the test machine or whether the vacuum cleaners were high or low performing. On the other hand, the correlation results were not as consistent for the RRT DW 2014 and RRT WM 2015. The results for the washing machine standard IEC60456 did not show significant correlations between the test and the reference machine for all parameters examined. However, this outcome was expected because the variability of the measurement results was deliberately limited. Therefore, no significant correlation was expected for the correlation calculations. On the other hand, there was a significant correlation expected for the results of the RRT DW 2014. However, there was only a statistically significant correlation for the cleaning performance for the dishwasher standard IEC60436. The drying performance of the test machines and their corresponding reference did not correlate. In conclusion, apparently, a RRT can be an assessment tool for determining the correlating behaviour of a test and reference machine.
The RRT results can also assess whether a reference machine is useful for eliminating variations within and between labs. When both test and reference machine react in the same way to different testing environments, the mathematical comparison of the testing results are suitable to eliminate the influence of environmental variations.
Another learning of the examination of the three international standards is the prescriptions for checkups and recalibrations in the standards for dishwashers, washing machines and vacuum cleaners. The prescriptions of IEC60456 are very clear: a calibration of the reference washing machine needs to take place at least once a year and another check-up at least half a year after every calibration. Additionally, the reference machine needs to be checked before and after each test series. If the prescribed values are not met, a calibration needs to take place. The calibration interval in IEC62885-2, where the RSB needs to be sent to SLG Prüf-und Zertifizierungs GmbH for recalibration, is fixed at 2000 cleaning cycles or at least every 3 years. The lowest requirements are set in IEC60436 for the check-ups of the reference dishwasher. A routine check of the reference machine should be done before starting a test series and at least every 6 months. If the requirements prescribed in the standard are not met, the manufacturer needs to be contacted.
Additionally, the reference machines that are sent out to the testing laboratories (secondary references) are compared to a prototype reference machine in the manufacturing process (primary reference). In the case of the standard for vacuum cleaners, this comparison of primary and secondary reference is maintained throughout the product life.
After examining the three international standards, a number of questions remained unanswered, which will now be discussed.
What happens when there is no statistically significant correlation between the test and the reference machine? Clearly, in that case, the function of eliminating variations is not fulfilled. Therefore, the reference machine is currently excluded from calculating the drying index for the dishwasher tested in the example of IEC60436. Due to the lack of correlation regarding the drying performance results, a fixed value is used for the calculation instead. But is this really sufficient or would it be better if the reference machine was changed or updated in some way to fulfil its purpose again? Objects can only be seen as similar if the parameters describing them are the same. In case of the drying assessment for the energy label according to the dishwasher standard, this is no longer the case for the programme parameters of common test machines and the reference machine. Current ECO programmes, which are the programmes tested-at least for the European energy label-are designed with longer durations and lower temperatures than in the current reference programme of IEC60436, respectively, EN60436. Therefore, the physical processes influencing the drying performance are completely different from each other. That is why using a predefined value for the adjusting calculation is considered to give more reliable results. There will still be an assessment of the drying performance of the reference machine in the future, but the results will no longer be used for the calculation. They will only indicate whether the test results are valid or not by comparing them to the specified range in the international standard. Thus, the reference machine needs an update in the programme conception to reflect the current ECO programmes of test machines again and to being able to behave as a reference. In a way, this is also true for the cleaning performance because the reference machine uses more water and higher temperatures for the cleaning process than current ECO programmes for test machines. Perhaps that is why the statistically significant correlation in the last RRT existed but was relatively low (r s = 0.50 for test machine A, r s = 0.41 for test machine B, respectively).
The question of similarity also arises for washing machines. One would not generally consider an industrial washing machine, such as the reference machine, and common household test machines as similar. But when the new reference machine was introduced into the standard in 2005, tests that were conducted by wfk Testgewebe GmbH showed that the test results were comparable. Even the comparison of the reference machine, which has a horizontal axis, with test machines from the US and Asian markets, which usually have a vertical axis, was legitimised. Unfortunately, these examination results could not be made available for this paper. So, even in this case where the physical similarity of test and reference machine cannot be derived from the physical theory of similarity, a correlation analysis of the test results can legitimise the use of the reference machine (Lysjanski et al. 1983). Therefore, an examination of different data from a RRT without limitations to the variability of the measurement results would help. The reason why the standardisation committee chose an industrial model over a household machine as a reference was to ensure the durability of the individual reference machines and the long-term availability of the model. These are also important requirements for a reference machine.
Questions arise for the vacuum cleaner standard due to the diverging method in comparison with the other two international standards: why is the reference machine in this standard only used for one part of the testing method-the testing on the test carpet from Wilsons Carpets? And why does the standard need to compare the results not only to the corresponding RSB but also with what results the prototype RSB would have achieved on the prototype reference carpet? According to the experts from the standardisation group responsible, the use of the reference carpet is the answer to both questions. Tests have shown that the use of the test carpet from Wilsons Carpets produces results with a very high variation because the carpet is made of wool. The use of this natural product causes variances in the production between different batches and also within the same batch. In addition, the test carpet is subject to a considerable amount of wear and tear when the dust pickup is measured. The results become repeatable and reproducible only when compared to the local RSB and to the prototype RSB on the prototype reference carpet.
In all three cases, the question arises: what happens when a reference machine shows noticeable irregularities during the calibration process and is still out of tolerance even after adjustments? This question is not easy to answer. The manufacturer needs to be contacted if the reference machine is not acting according to the prescribed requirements in any of the three standards. The manufacturer then needs to assess the specific problem and come up with possible solution approaches. If this first step fails, the customer service needs to come to the lab and inspect the reference machine or the reference machine needs to be sent in for inspection. If all repair attempts are unsuccessful, the reference machine needs to be replaced by a new functioning one.
Another question is what happens when this primary reference changes? In the case of IEC62885-2, there is a calculation method to erase any changes of the prototype RSB00. In the case of IEC60456 and IEC60436, this point remains unclear.

Conclusion and recommendations
This paper discusses the usefulness of reference machines in different international performance measurement standards for electrical household appliances. Therefore, three international standards are particularly investigated: IEC60436 for dishwashers, IEC60456 for washing machines and IEC62885-2 for vacuum cleaners. How the reference machines are currently used in the standards and why this is of benefit for the measurement procedure are examined. Additionally, in each case, the data of the latest RRT were assessed to examine the correlation of the reference machine in use and several test machines. It was concluded that this correlation data can be used to evaluate whether the reference machine is acting similarly to the corresponding test machines. It was concluded with a high correlation that the purpose of the reference machine, to eliminate variations due to the structure of the measurement process, is fulfilled.
Additionally, which elements of a measurement procedure mostly cause variations in the testing results was examined. Two criteria were selected: the use of natural products and manual assessment. Additionally, the existence of an energy label was considered because measurements for such a label require low variations in results. These three criteria were assessed for 16 international standards. Together with the requirements compiled for a reference machine in general, the following recommendations can be derived for standardisation groups and energy policymakers in the field of performance measurement of electric household appliances: I. In general, the whole international standard needs to be created in such a way that it not only provides repeatable, reproducible and valid testing results but utilises a testing procedure that is also relevant for the consumer. II. Regarding deciding whether a reference machine could be useful for an international standard, the standardisation group responsible should answer the following questions: -Does the standard use naturally grown auxiliary materials that are vulnerable to variation? -Does the standard use manual methods for the preparation or assessment of the tests? -Will the standard provide repeatable and reproducible results, for example, for an energy label? -Does the cost-benefit analysis for the implementation support the assumption of the usefulness of a reference machine?
III. When the decision whether to use a reference machine is made, the model chosen should fulfil the following criteria: -Consistent, repeatable, reproducible and valid testing results (with the testing method of the international standard) -Long product lifespan -Long-term availability of the model -Similarity to test machines (according to the theory of similarity and/or comparability of results) IV. Regarding assessing the usefulness of the reference machine, the standardisation group should conduct a RRT and calculate correlations between test machines and the reference machine chosen. V. Regarding upholding a properly working reference system, the prescriptions for regular check-ups and recalibrations should be clear and monitored. VI. Finally, there should be an explicit procedure for assessing changes in the primary reference machine and how to deal with this circumstance.
59D for the performance of household and similar electrical laundry appliances and SC 59F for surface cleaning appliances.

Declarations
Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.