Do we still need psychological self-report questionnaires in the age of the Internet of Things?

Digital data are abundantly available for researchers in the age of the Internet of Things. In the psychological and psychiatric sciences such data can be used in myriad ways to obtain insights into mental states and traits. Most importantly, such data allow researchers to record and analyze behavior in a real-world context, a scientific approach which was expensive and difficult to conduct until only recently. Much research in recent years linked digital footprints to self-report questionnaire data, likely to demonstrate proof of concept(s)—for instance linking socializing on the smartphone to self-reported extraversion (a personality trait linked to socializing)—in the sciences investigating the human mind. The present perspective piece reflects on this approach by revisiting recent work which has been carried out mining smartphone log and social media data and questions if and when self-report data will still be of relevance in psychological/psychiatric research in the near future.

For several years, the psychological and psychiatric sciences have increasingly relied on the study of digital footprints to obtain insights into human nature [1][2][3][4]. With digital footprints we refer to data left by humans interacting with their digital environment in the age of the Internet of Things (IoT). In particular the smartphone has been a game changer, because this device is currently owned by more than six billion people 1 and the devices are carried around wherever they go. With its manifold built-in sensors, the smartphone provides information on the environment a person currently is navigating, and how one interacts with the device provides insight into diverse mental states and traits (see also the smartphone-psychology-manifesto by Miller [5]). In this context, the terms "digital phenotyping" and "mobile sensing" are often used [6]. Digital phenotyping describes the prediction of mental states and traits using digital footprints from the IoT (including smartphones). Mobile sensing is narrower in its terminology, involving the detection of one's mental states and traits via mobile devices such as the smartphone. The research path probably most chosen at the moment to obtain insight into mental states/traits from digital footprints is to study digital data or traces such as smartphone log data or from social media and link them to self-report data assessing variables such as mood or personality [7][8][9][10]. Is the reliance on participant self-report still an approach for the future? From our perspective, the answer to this question will depend on the stance a researcher takes regarding the validity, reliability and objectivity of self-report measurements. This should be illustrated in the following. A first example touches upon predicting personality from digital footprints. In recent years a large number of studies linked digital footprints on social media/mobile devices with self-reported personality (via questionnaires) [11,12]. For instance, a meta-analysis by Marengo and Montag, see [13] showed that extraversion in particular could be associated with Facebook data. But what does such a robust link actually mean? Well, first it means that behavior on social media is linked to self-rated scores of the personality dimension extraversion. Hence a person viewing himself/herself as extraverted, and possessing the traits of being social and outgoing, behaves in certain ways on social media as reflected in their digital footprint (e.g., being more active on these platforms). It is well known though that self-report has many problems: some people have difficulty with introspection, so they are simply not good at rating themselves. Furthermore, social desirability might have an influence on how people present themselves on a questionnaire, and also some versions of personality questionnaires might be easier to understand and filled in than others (see a good overview on the self-report method in [14]). Against this background the question arises if digital data should be really validated against such inventories to predict personality or if we should consider new avenues such as simply relying exclusively on the digital data themselves (see also a recent work by Boyd et al. [15] coming to similar conclusions). As behavior-and self-report-ultimately is the output of a biological system, one should also take the importance to link digital data and biological data into account, resulting in digital biomarkers [16].
There is a high consensus that to witness core features of extraversion (as an example) one would need to observe stable social, assertive, and outgoing behavior, as the example above already illustrates. Shouldn't it be desirable-and perhaps even more ecologically valid-to build personality profiles by relying on actual behavior? For instance, one might build such "digital behavioral extraversion scores" from the size of one's social network as derived from the number of telephone numbers on a person's smartphone or one's social network site account (see a study [17] that investigated extraversion and size of different social network layers). Another example might be to record how frequently a person makes contact with their social networks [7]. This would seem to be the more objective way to obtain insight into personality. In this realm, it will be interesting to learn if factor analysis (or other data reduction methods) of digital behavior will yield different (or similar) personality factors as the factor analysis of adjectives, which resulted in the taxonomy of the Big Five of Personality; for a short history of the Big Five see [18]. Interestingly, the recent past has also seen research endeavors applying the lexical analysis approach to social media text data [19]. A different taxonomy might still be a result of the study of digital footprints, when not only text-mining is applied, but actual behavior will be the focus of analysis. See also Fig. 1 illustrating some of these ideas.
From our view, the points discussed above do not mean to suggest that self-report is or will become obsolete when applied studies are focused on the investigation of digital footprints or even psychological/psychiatric research questions as a broader category. Despite the inherent limitations, asking a person about their thoughts/well-being is still the most direct and easiest path to obtain insights into mental states/traits. This said, clinical research at the moment relies to a certain degree on asking persons about their current (diminished) well-being and at the moment such mental states cannot be fully derived by mining data from the IoT (e.g., the smartphone). Beyond this, self-report data without doubt, has proven to yield valid research: if we stick with the example of personality, it has been shown that psychometrically sound measured personality (via self-report) predicted or was associated with important real world outcomes (e.g., diet behavior, financial decision making, online social media use, etc., [20][21][22][23][24]). Therefore, the self-report method has led to important insights over many years. But one could still say: why should we measure personality via questionnaires to predict behavior, if the behavior itself can be readily recorded? Harari et al. [25] speak in this context of physical data, social behavior and smartphone use data to be studied. The future will show what kind of personality taxonomies or other psychological taxonomies appear from the study of different sets of digital data. And this will also shed light on the validity of different methods used-including the self-report method-to investigate the human mind.
In sum, with the abundance of digital data being available in the age of the Internet of Things, the question arises, whether behavioral (digital) data alone should be the primary source of investigation in future research endeavors. In many ways, studies linking digital footprints in a meaningful way to psychological constructs as assessed via self-report questionnaires reflected proof of concept research (see also the extraversion example above). Although interesting, should we move onwards and conduct research differently in this area from now on? The answer is yes and no. Beyond what has been mentioned so far, and taking a different perspective, we believe that there is indeed good reason to always include standardized self-report data in psychological/psychiatric research dealing with digital footprints in upcoming years. For instance, in the study of technological use disorders it is well-known that humans have problems in correctly estimating their own technology use [26]. Work by Turel et al. [27] showed that persons with higher tendencies towards disordered Facebook use had "upward time estimate bias" (p. 84) regarding the passed time, when they filled in questionnaires related to their Facebook use. This phenomenon shows that from a clinical psychologist's or psychiatrist's perspective, it is always of great interest to contrast how a person sees or perceives the world and what objectively can be observed with respect to the variable of interest. Discrepancies between objective digital data and one's subjective views might be the most interesting variables to be studied in the near future. From such a perspective, self-report data will be needed in every case. It might be also true that the role of self-report variables will then change and take on a different role in future work in the psychological and psychiatric sciences. Moreover, at least for the moment, some digital footprints may give more valid insights than others into a person's characteristics. For instance, it might be the case that a person behaves in a certain way on social media and behaves differently in an "offline" context (a condition not existing anymore in a fully connected world to the IoT). Then, the researcher has to think about what data might be more relevant for his/her research question (for examples of traditional self-report and digital sources to be studied see Table 1) or we come back to the ideas of Walter Mischel [28] using if-then functions to solve the "personality paradox" (p. 250), hence that a person with a  Table 1 Examples of (traditional) self-report approaches and digital phenotyping/mobile sensing approaches to obtain insights into human mental processes and related behaviors (each position in the left and right part of the table is not meant as a direct contrast) a Please note that language posted on social media could be also seen as a form of self-report (and this underlines our argument that even self-report will happen in an Internet of Things, but it will be mostly of an unstructured nature and consists of everyday-human-communications) Traditional self-report approaches (some examples of current applied methods) Digital phenotyping/mobile sensing (some examples of current digital sources to be studied) certain personality trait does not always behave consistently according to his/her personality in different environments. A fictitious example to illustrate this would be that a person on social media shows impulsive behavior, but in an offline context is less impulsive. Clearly interesting and important research questions will arise in this context in the near future (also regarding how people cultivate images of themselves in the digital space).
Please note, that in the present short perspective, we cannot discuss the many barriers to be overcome in this rapidly evolving research field. It is not trivial to deal with Big Data from a methodological aspect [29] coming in the three V's (different velocity, variety and volume) requiring analysis methods going beyond inferential statistics. And Big Data is what researchers usually are facing when studying data from the IoT. Finally it is not trivial to deal with the many ethical aspects arising when applying digital phenotyping/mobile sensing, in particular privacy concerns, which need to be dealt with [30][31][32]. Without fostering trust towards such new methods in patients and study participants, progress in this vivid research area will be hindered.
Finally, we want to mention that the present work mainly drew on examples from personality psychology/clinical psychology. For other disciplines of psychology, such as social psychology, but also cognitive and developmental psychology, where both structured/unstructured interviews or self-report questionnaires are applied, we believe the here mentioned arguments to be equally true. For instance, social psychologists are interested in the study of group behavior. Here, social media data represent a highly relevant research opportunity, to investigate how opinions are formed and how groups interact with each other in political debates. Moreover, smartphone log data could present insights both into developing and declining cognitive functions, such as via reaction times while typing messages or the size of a person's vocabulary used in everyday life. Hence, we are of the opinion that the psychological sciences as a general discipline will see tremendous changes in methods and analysis strategies applied.