Main findings
A novel aspect of this study was the application of genotype analysis using DNA from buccal cell samples to validate the identity of participants recruited via the internet. By replicating the analysis of 33 genetic variants, we showed 99.9 % concordance between patterns of genotypic variants in DNA collected in the VS and those observed in DNA obtained from previous, self-collected buccal cell samples. This demonstrates the utility of this novel approach for identity checking—a potentially sensitive aspect of internet-based interventions delivered remotely which has not been investigated in earlier studies. In addition, our findings provide further evidence that SR data via internet for height, weight and BMI showed a high degree of reliability compared with face-to-face measurements made by experienced researchers using standard protocols. Concordance for BMI classification between SR and measured data was strong, and we observed perfect agreement for SR sex and age with that assessed in the VS.
Validation of participant identity
Administrating lifestyle-based interventions via the internet offers advantages of scale, efficiency and cost-effective data collection (Wright 2005; Celis-Morales et al. 2014). Nevertheless, internet-based intervention studies conducted remotely may result in problems of reliability in the recruitment of participants and in the collection of biological samples. To the best of our knowledge, the issue of validation of participant identity appears to have been overlooked in previous validation studies. Inevitably, the use of internet to recruit participants to intervention studies provides undesirable opportunities for participant misrepresentation, which may undermine the study objectives. In the current VS, we replicated the analysis of 33 genetic variants as a proxy of validation of identity. We found strong agreement for over 99.9 % of participant genotypes, with just four examples showing disagreement. As our results showed a perfect concordance for age and sex verification, these minor mismatches represent technical errors during genotyping or may reflect the presence of copy number variants (CNVs), which complicate genotyping. LGC Genomics reports that the average genotyping error in positive control DNA samples using Kompetitive Allele Specific PCR, or KASP™, is between 0.7 and 1.6 % and the assay design success rate is between 98 and 100 % (Semagn et al. 2014). We conclude that it is likely that we had perfect agreement in participant identity between samples collected remotely during the Food4Me study and those collected in the VS. Furthermore, we suggest that this novel genotype-based approach to validation of participant identity may be used in many internet-based observational and intervention studies.
Comparison with other studies
The magnitude of differences between SR and measured height (0.19 cm, SD 1.2), weight (−0.70 kg, SD1.5) and BMI (−0.29 kg m−2, SD 0.6) observed here is similar to findings from previous internet-based studies in adult populations. NutriNet-Sante (Lassale et al. 2013) a French internet-based prospective cohort study including a VS in a sub-sample of 815 adults found that height was over-reported by 0.56 cm (SD 2.4) and that weight and BMI were under-reported by 0.49 kg (SD 1.4) and 0.34 kg m−2 (SD 1.5), respectively. A study conducted in 177 adults (aged 18–35 years) in Australia (Pursey et al. 2014) observed a larger over-reporting bias for height (1.36 cm, SD 1.9), and a similar under-reporting bias for weight (−0.55 kg, SD 2.0) and BMI (−0.56 kg m−2, SD 0.08) compared with the present study. In contrast, an internet-based study conducted in 149 adults in Sweden (Bonn et al. 2013) reported larger differences between SR and measured weight (1.2 kg, SD 2.6) compared with our results. A systematic review (Gorber et al. 2007) of validation of SR anthropometric data found that height was over-reported by 0.6–7.5 cm, whereas weight and BMI were under-reported by −0.1 to 6.5 kg and 0 to −2.2 kg m−2, respectively. It should be noted that under-reporting of body weight is quite common particularly among overweight and obese subjects (Johansson et al. 1998; Spencer et al. 2002; Merrill and Richardson 2009; Lassale et al. 2013).
In agreement with some (Niedhammer et al. 2000; Spencer et al. 2002; Merrill and Richardson 2009) but not all previous studies (Bonn et al. 2013; Lassale et al. 2013), men in the Food4Me study were more likely to over-report height. Although women appeared more likely to under-report weight than men, this difference was not significant in our study. Previous studies have observed that women were significantly more likely to under-report their weight compared with men (Spencer et al. 2002; Merrill and Richardson 2009; Lassale et al. 2013). Whilst height was more likely to be over-reported with increasing age in previous studies (Kuczmarski et al. 2001; Bes-Rastrollo et al. 2011; Lassale et al. 2013), we did not find any effect of age on differences between SR and measured height.
In addition to sex and age, BMI was a strong predictor of differences between SR and measured methods. As a consequence of misreporting of the primary measurements of height and weight, differences in under-reporting of calculated BMI were 4.8 times higher in both overweight and obese individuals compared with normal weight participants (Δ −0.12, −0.54 and −0.53 kg m−2 for normal, overweight and obese participants, respectively). Our results confirm previous findings of under-reporting of BMI by 0.16, 0.36 and 0.63 kg m−2 for normal weight, overweight and obese participants, respectively (Lassale et al. 2013). However, we found smaller differences in weight misreporting between BMI categories than those observed by another internet-based study (Pursey et al. 2014) in which under-reporting among overweight and obese participants was −1.36 kg compared with −0.31 kg in those of normal BMI. A possible explanation for the greater degree of misreporting of body weight by overweight and obese individuals lies in the social desirability concept, which argues that perceptions are influenced by desires to conform to perceived societal norms and that, with respect to body weight, such pressures apply more strongly in obese participants (Larson 2000). However, the estimated proportion of subjects for whom SR height, weight and calculated BMI were within 5 % of the measured values were 100 % (n = 140) for height, 96 % (n = 135) for weight and 92 % (n = 129) for estimated BMI, respectively. This suggests that most Food4Me participants provided reliable measures of their anthropometrics.
Concordance of BMI classification
One of the main concerns arising from data collection, either SR via the internet or SR with paper-based forms, is the validity and accuracy of the data provided and its utility as a basis for provision of health-related advice. Several studies have reported greater underestimation of weight (and BMI) with remote SR collection methods than with face-to-face interviews (Hood et al. 2012). However, we observed a good agreement between the BMI classifications derived from SR and measured height and weight (κ = 0.939), with just six participants being wrongly classified when SR data were used. There were no differences in the proportions of those classified as underweight, and only small differences in the proportions of normal weight (3.6 %), overweight (−2.9 %) and obese participants (−0.7 %). These results are comparable with previous findings reporting a κ of 0.97 for BMI classification and prevalence differences between SR and measured values of 0.6 and 0.7 % for overweight and obese participants, respectively (Lassale et al. 2013). Similarly, Pursey et al. (2014) reported that the prevalence of overweight was 2.6 % lower when using SR compared with measured values, but there was no difference for obesity prevalence.
Although social desirability may drive differences between SR and measured values (Joinson 1999), we found very good agreement between the internet-based SR and validation measures for the key anthropometric variables height and weight, suggesting that, in an internet-based setting, participants may be less prone to social desirability bias. This apparently enhanced truthfulness may result from the greater feeling of anonymity when using the web rather than other media such as the telephone (Joinson 1999). However, the reliability of more difficult self-measurements such as waist and hip circumferences need to be explored in future studies.
Strengths and limitations
To our knowledge, this is the first internet-based study that has validated participant identity using genotypic analysis. Our findings of the utility, and practicability, of this approach to validation of participant identity provide proof of concept for remotely conducted, e.g. internet-based, studies in which participant misrepresentation is a potentially major, and often ignored, concern. A particular strength of this study was the collection of data via a novel internet-based server in European countries from a relatively large sample of the adult population with a wide range of ages and BMIs. Our ability to obtain reliable SR anthropometric data was enhanced by the use of standardised protocols by study participants. Protocols were provided in text format with pictures, but also as a series of online videos. In addition, during the VS, trained researchers collected the anthropometric data using the same standardised protocols. An additional strength of our study was the short period of time (i.e. up to 2 weeks) between the collection of internet-based SR data and direct measurement by the researchers. Furthermore, to ensure independence of measurements in the subsequent VS, subjects were invited to participate in the VS only after they had completed their internet-based measures.
A potential limitation of our study is that participants in the Food4Me study were recruited from those showing interest in an intervention study on PN (Livingstone et al. 2015). As a result, we may have recruited those with a particular interest in lifestyle-based interventions, but we have no reason to believe that this interest influenced the truthfulness of SR data. In addition, the BMI distribution among Food4Me participants was comparable with the prevalence of normal weight, overweight and obesity in the adult European population (OECD 2012; Celis-Morales et al. 2014; Livingstone et al. 2015).
In conclusion, we introduced and tested a simple genotype-based approach for validation of the identity of study participants recruited to internet-based studies. This approach is simple and robust, and given the low costs of genotyping we envisage that it may have wide utility for identity validation in the many types of studies (including internet-based studies) where participant recruitment and sample data collection are conducted remotely. Although overall agreement between SR and measured values was excellent, under-reporting of weight was more common among overweight and obese individuals, and such SR data should be interpreted with caution when adiposity is an important outcome. Overall, our findings clearly demonstrate the reliability of internet-based, SR anthropometric and demographic data collected in the Food4Me study.