1 Introduction

Digital Twins are gaining much attention in research and with practitioners (Detecon Consulting 2019; PwC 2020), which also becomes apparent in a steep rise in academic publications (Scopus 2020). Yet, there is still ambiguity regarding the term’s precise definition (Cimino et al. 2019). Instead, a spectrum of termini has emerged, including Digital Shadows, Digital Threads, or Digital Models (Helu et al. 2017; Schuh et al. 2019; Urbina Coronado et al. 2018).

Each term is defined differently and often used synonymously to Digital Twin, further blurring the concept. Our research addresses precisely that issue, as it proposes a clear distinction of types of Digital Twins. More specifically, we suggest foundational and distinguishable types, so-called archetypes, based on a taxonomical analysis and morphological characteristics. The significant advantage of that approach is that we can differentiate Digital Twins based on morphological characteristics and representative patterns which we can use to distinguish archetypes. To achieve that goal, our study pursues a mixed-method design.

First, we developed initial characteristics and archetypes purely based on findings from a structured literature review, carried out following the recommendations of Webster and Watson (2002) and Vom Brocke et al. (2009). Next, we validated, extended, and triangulated our findings by means of a qualitative interview study. In the study, we collected data from 15 industry experts from various industries in semi-structured interviews. The interviews brought to light rich findings of high relevance for the field of Digital Twin design and archetypical patterns. For example, when asked, “what do you think a Digital Twin looks like in general”, the overwhelming majority answered: “A Digital Twin is the digital picture of a physical asset”. The statement's uniformity somewhat clouds the more detailed specifics distinguishing one instance of a Digital Twin from another. Illustratively, points of differentiation include whether the Digital Twin is a basis for information (e.g., data repository) or a real-time representation of an asset's physical state. Interestingly, the understanding and definition of Digital Twins varied across the study’s participants according to their industrial sector. Those findings further added to our motivation to harmonize the understanding of Digital Twins and standardize it to some degree.

For that purpose, we develop archetypes that differentiate understandings of Digital Twins. Archetypes are a form of standardization (Thesaurus 2020). They originate from Greek and describe original patterns that are “[…] a primordial image, character, or pattern of circumstances that recurs throughout literature and thought consistently enough to be considered a universal concept or situation” (Encyclopedia Britannica 2011, p.1). Given the usefulness of the archetypical differentiation of Digital Twins, as per the argumentation given above, we derive the following research questions (RQ):

RQ1

What clusters of Digital Twins can be derived from the literature corpus?

RQ2

What are the design characteristics of Digital Twins in research and practice?

RQ3

What are the archetypes of Digital Twins?

The paper is structured as follows: After a brief overview of the definitions of Digital Twins, their usages, and origins, we will describe our research methods, i.e., the structured literature review, the development of a taxonomy, the qualitative research approaches, and the method of deriving archetypical patterns. After this, we introduce the different dimensions and characteristics of Digital Twins, followed by the derivation of the archetypes. The archetypes are discussed and evaluated before we provide the conclusion, limitations, and a brief outlook on future research.

2 Digital Twin Types

The concept of twins is a well-known and widely used technology in classical manufacturing processes since NASA used physical twins as a copy of space vehicles in the Apollo project (Rosen et al. 2015). Since then, the concept has been approached from various points of view, which has lead to different types of Digital Twins. The different definitional approaches become apparent when drawing from the literature corpus (see Table 1). Yet, uniformity in Digital Twin definitions is still lacking. There is only a vague understanding of the concept of Digital Twins (Haag and Anderl 2019). For that matter, Wagner et al. (2019) state that the definition of Digital Twins highly depends on the individual use case. Evidently, using the Digital Twin concept in healthcare use cases (e.g., see Rivera et al. (2019) for the use case digital patients) requires a different set of specialized characteristics than a Digital Twin in manufacturing (e.g., see Kritzinger et al. (2018)). Yet, both use cases might have shared underlying technology and characteristics, for instance, real-time updates. The composition of the interview partners, therefore, reflects our desired use case independence. Each interviewee described a Digital Twin that was specific to the individual use case. For example, interviewee 1 described a use case in which the Digital Twin was used as a tool to gather, check, and maintain master data. The use case presented by interviewee 2 was an application of a Digital Twin in a warehouse to heighten the transparency and to analyze and improve the processes inside the warehouse. Further use cases included the utilization of Digital Twins in production environments (e.g., interviews 4, 11, 12, or 15), the application of Digital Twins to monitor products over their life cycle (e.g., interviews 10 or 13), the usage in the healthcare sector (e.g., interviews 16 and 17), and in the supply chain management (interview 18).

Table 1 Commonly used definitions of digital twins

Based on our literature review and findings (van der Valk et al. 2020), Table 1 shows the most relevant definitions. From these, we synthesize a working definition that guides our research.

The definitions above show a baseline of understanding of a Digital Twin, which leads us to the preliminary definition of a Digital Twin which was used in the analysis of the literature and interviews as a working definition:

Definition 1

The Digital Twin is a virtual construct that represents a physical counterpart, integrates several data inputs with the aim of data handling and processing, and provides a bi-directional data linkage between the virtual world and the physical one. Synchronization is crucial to the Digital Twin in order to display any changes in the state of the physical object.

At this point, we have to stress the fact that more recent reviews brought the kind of data linkage into focus. Kritzinger et al. (2018) proposed that a Digital Twin should contain an automatic data linkage. This approach is backed by several reviews, e.g., the review of Errandonea et al. (2020). Fuller (2020) comes to a similar conclusion in his review. Nevertheless, he also describes the discrepancy between what is called a Digital Twin and what is a Digital Twin per the definition of Kritzinger et al. (2018). We also noticed that many so-called Digital Twins do not provide an automatic data linkage and, therefore, should not be labeled as a Digital Twin. However, at this point we will include descriptions of so-called Digital Twins which do not mandate an automatic data flow to gain deeper insights into this discrepancy.

Digital Twins possess many overlaps with other digitization technologies, and also a variety of synonymously used terms is noticeable. For example, similarities exist with the concepts of Digital Models (Urbina Coronado et al. 2018), Digital Shadows (Kritzinger et al. 2018), and Digital Threads (Helu et al. 2017). In the following, we aim to provide definitions of these concepts and to stress the differences to a Digital Twin.

Definition 2

A Digital Model poses as a virtual representation of a physical product that may contain a data linkage between both (Kritzinger et al. 2018). However, this linkage is manually at best. The Digital Model will not replicate a change of state of the physical object.

As a representation of the physical object as well as of the changes of any state are crucial to a Digital Twin, the Digital Model cannot be seen as a Digital Twin. Furthermore, a Digital Model lacks the opportunity to handle and process any kind of data. We see a Digital Model as part of a Digital Twin in the sense that it provides the virtual picture of the physical object. Furthermore, a Digital Model does not provide a bi-directional data linkage per se.

Definition 3

The Digital Shadow provides highly accurate representations of processes with the aim to create a real-time picture based on the relevant data (Bauernhansl et al. 2016).

Digital Shadows do not possess automatic bi-directional data links (Kritzinger et al. 2018). Furthermore, an internal data processing is not seen as mandatory for a Digital Shadow (Schuh et al. 2019). Hence, the Digital Shadow is a digital construct on the way to a Digital Twin, but not an actual twin.

Definition 4

Digital Threads connect various data sources along the life cycle of a product and enable a data linkage between physical assets and software products, but do not further process the data (Helu et al. 2017).

The main differences between Digital Threads and Digital Twins are the twins’ ability to process the data instead of just gathering it and that a Digital Twin shall represent a physical product. Data feedback, i.e., a bi-directional data flow, is not mandatory for a Digital Thread. Hence, the term Digital Threads may not be used synonymously to Digital Twins.

As related works we have to emphasize the work of Enders and Hoßbach (2019), who developed a taxonomy of different Digital Twin applications, Josifovska et al. (2019), who created a framework for Digital Twins in cyber-physical systems, and as the most recent Jones et al. (2020), who conducted a literature review and detected research gaps regarding Digital Twins. To a certain extent, the different reviews show a convergence, e.g., when portraying the automatic data flows. However, these studies focus on a narrow branch or specialized fields, like cyber-physical systems or manufacturing contexts. Hence, we aim to provide a more general view on Digital Twins which is independent of branches and use cases but contributes a broader classification of integral parts of Digital Twins. Therefore, it allows the classification of different, domain-independent Digital Twin types.

3 Mixed-Method Design

The paper aims to engineer theoretical and practical descriptions of Digital Twins in reverse, which were collected through a structured literature review and an interview study. We aim to synthesize comprehensive Digital Twin archetypes based on the literature and a qualitative interview study with industry professionals. Due to the large-scale research objective, our research design is a combinatory approach subsuming multiple, mixed methods. Figure 1 illustrates our research process, consisting of two qualitative (literature review, qualitative interview study) and one quantitative (cluster analysis) sections that are organized in action steps (Action 1–Action 7). Greene et al. (1989, p. 256) define mixed-method research processes as those “that include at least one quantitative method (designed to collect numbers) and one qualitative method (designed to collect words)”. The mixed-method approach has the clear advantage of triangulating results by using multiple data sources instead of just one. In our case, these data sources are the literature corpus on Digital Twins and practitioners from different industries. Summarizing the above, our research approach includes the following steps (see Fig. 1):

Fig. 1
figure 1

Research process


Action 1: Exhaustive literature review following Webster and Watson (2002) and Vom Brocke et al. (2009). Action 2: Development of a taxonomy for Digital Twins based on van der Valk et al. (2020). Action 3: Cluster analysis of the underlying data of the taxonomy. Action 4: Qualitative interview study with 15 industry experts to triangulate the findings. Action 5: Cluster analysis with industry experts. Action 6: Synthesis of the qualitative and quantitative results. Action 7: Evaluation through a second interview series and finalization of archetypes. Figure 1 graphically illustrates the seven-step research process while simultaneously indicating its qualitative and quantitative parts.

3.1 Structured Literature Review

The literature review uses the method of Vom Brocke et al. (2009). Vom Brocke et al. (2009) recommend a five steps when conduction a literature review: first, the definition of the review scope, second, the conceptualization of the topic, third, the actual search process, fourth, the analysis of the literature, and, lastly, the revision of the research agenda. Accordingly, in the first step, we defined the literature review's scope to consider only peer-reviewed publications dealing with the topic of Digital Twins. Because the literature about Digital Twins is growing exponentially (Scopus 2020), we limited the research scope to the scientific databases AIS eLibrary, ACM Digital, IEEE Xplore, Science Direct, and JSTOR. By selecting these databases, we cover the research in the fields of information systems and engineering. In the second step, we conceptualized the topic of the literature review. To do so, we especially searched for definitions of Digital Twins. As Cimino et al. (2019) highlight, there is a wide variety of definitions for Digital Twins. However, we could identify specific definitions, often used in many publications (see Table 1).

In step three, we searched the databases for publications with the search string “Digital Twin”. In total, we found 579 publications which contain the term Digital Twin in some context. During the analysis (step 4), we applied several filtering mechanisms, which we drew from Cooper (1988) and Vom Brocke et al. (2009) and which focus on relevance, accessibility, and removal of duplicates. The filtering mechanisms are threefold; first, we consider the relevance mechanism, meaning that the publication must explicitly deal with the Digital Twin. Therefore, we eliminated every paper which does not mention Digital Twins at least in the title, the abstract, or keywords. The second filtering mechanism eliminated all papers that were not accessible. As many publications are published in multiple databases, the results contained a not-neglectable number of duplicates, which we eliminated. Lastly, the third filtering mechanism is quality-related. The publications have to include a comprehensive argumentation, consistent use of established research methodologies and must deal with Digital Twins in a non-trivial fashion (Levy and Ellis 2006). To ensure that papers adhere to the quality criteria mentioned above, we analyzed each paper in-depth. All in all, very few papers did not meet our inclusion requirements. Following Webster and Watson (2002), we added relevant papers cited in the literature corpus through a backward search. Finally, we identified 233 publications as appropriate for our research purpose. Figure 2 shows the distribution of the publications about Digital Twins during the time frame from 2012 to 2020. One can notice the exponential growth of the literature, with more than 80% of the total papers on Digital Twins published in the last two years. That steep rise paints an illustrative picture of the importance and conception of Digital Twins in academia today. The quantitative distribution of papers about Digital Twins (see Fig. 2) weighs towards the recent time period. As each description of a Digital Twin, i.e., each paper, has the same weight and, therefore, the same impact in our analysis, the more recent papers gain a bigger influence in this research due to their amount outweighing the older papers. Hence, we see the influences of the development of Digital Twins over time as considered in the cluster analysis.

Fig. 2
figure 2

Yearly publications about digital twins from 2012 to 2020 (Scopus 2020)

3.2 Taxonomy Design

Starting from the literature base created during action 1 (see Fig. 1), we developed a taxonomy of Digital Twins. We applied the method of Nickerson et al. (2013), which has emerged as the de facto standard for taxonomy design in Information Systems (Oberländer et al. 2019; Szopinski et al. 2019). The methodology helps to create the taxonomy comprehensively and transparently (Oberländer et al. 2019). In general, a taxonomy can classify and structure a given field of interest (Glass and Vessey 1995). As a taxonomy enables an empirical structuring of the area of interest, we preferred using a taxonomy over a conceptualization via typologies or ontologies (Bailey 1994).

For the development of the taxonomy and the definition of the taxonomy´s purpose, one has to determine the meta-characteristic (Step 1), define ending conditions (Step 2), and choose the empirical-to-conceptual or the conceptual-to-empirical approach (Step 3). This decision predetermines the steps 4e to 6e, or 4c to 6c, respectively. The conceptual-to-empirical approach requires to define characteristics a priori (Step 4c) before the analysis of objects (Steps 5c and 6c). The empirical-to-conceptual approach starts with studying a subset of objects (Step 4e) and crystallizing characteristics from their comparative analysis (Steps 5e and 6e). Both approaches are iteratively and mutually pursued for as long as the ending conditions have not been reached (Step 7). Once they are fulfilled, the taxonomy has reached its final state. Figure 3 shows the four iterations we conducted during the research process. In the 1st, 2nd, and 3rd iteration, we followed the empirical-to-conceptual way and only analyzed the literature corps (see van der Valk et al. (2020)). During the 4th iteration, we proceeded with the conceptual-to-empirical approach and only analyzed the interview manuscripts. As we met all ending conditions after the 4th iteration, we stopped developing the taxonomy and continued with the cluster analysis.

Fig. 3
figure 3

Creating a taxonomy following Nickerson et al. (2013)

3.3 Qualitative Research Design

To evaluate the literature base's conceptual insights, we expanded our research by menas of a qualitative study with expert interviews. In total, we conducted 18 interviews in two interview series (see Table 2). The qualitative research follows the approach from Sarker and Sarker (2009). First the interviewees were identified. In line with the ‘known sponsor approach’ (Patton 2002), we got in touch with the interviewees through senior personal contacts. In preparation for the interview, we provided each interviewee with a brief overview of the research project. The study consists of 18 interviews with industry experts with diverse backgrounds and from different industries. The interviews followed a semi-structured approach, as we only prescribed the superordinate areas of the questions (Myers and Newman 2007; Patton 2002). The research guide includes questions about the general, individual understanding and definition of the interviewees regarding Digital Twins. Mirroring the literature-based taxonomy, we presented each interviewee with the taxonomy of van der Valk et al. (2020). Each interviewee could add or dismiss dimensions or characteristics. Furthermore, we asked the participants which characteristics would be part of their individual configuration of a Digital Twin. This approach allows for a discussion between the interviewer and the interviewee while ensuring comparability between the personal interviews (Merton and Kendall 1946; Myers and Newman 2007; Patton 2002). Each interview was recorded and transcribed. After the first interview series, we analyzed the interviews' transcriptions and coded them accordingly to the Grounded Theory Methodology. The transcripts provide profound access to the full information potential and are the first step towards a thorough analysis (Lapadat and Lindsay 1999; Ochs 1979). Following the recommendations of Iivari et al. (2020), we had a second round of interviews with a smaller set of experts to validate our findings.

Table 2 Interview partners by sector and research action

3.4 Cluster Analysis

Archetypes are a "typical example of a certain […] thing" (Oxford Dictionary 2020, p. 1) and have emerged as purposeful results in Information Systems (e.g., see Möller et al. (2019) or Weking et al. (2018)). The literal Greek translation for archetype is “first-molded as a pattern” (Liddell et al. 1940), which we aim to achieve in this paper. Cluster analysis organizes patterns into clusters (Jain et al. 1999). We try to sort the patterns along the structure given by the taxonomy of Digital Twins. For the cluster analysis, we choose the statistical language R, using the daisy function (to identify dissimilarities between data sets in the data matrix), the Gower measurement coefficient, and the library cluster to analyze and visualize the data (Gower 1971; Maechler et al. 2019). For the clusters' partition, we used the k-means algorithm, which is the most popular hierarchical algorithm (Jain 2010). The algorithm “finds a partition such that the squared error between the empirical mean of a cluster and the points in the cluster is minimized” (Jain 2010, p. 653). Therefore, in an iterative process, the algorithm sorts the data points into clusters that contain the minimum error. To define the appropriate number of clusters, we used the elbow-method. We triangulated the preliminary results using a mixed-methods approach (Greene et al. 1989) so that we could synthesize the final results into archetypical patterns. Denzin (2017) recommends triangulating the results, as just one empirical method cannot provide a valid result. Therefore, we compare two sets of clusters, eliminate duplets, and condense them into non-optional characteristics. We synthesize the clusters into aggregated cluster types, from which we derive the archetypes by their configuration of each characteristic.

4 Taxonomy of Digital Twins

The following section discusses the literature-based taxonomy of Digital Twins (see van der Valk et al. (2020)) and provides the foundation for the first cluster analysis. The taxonomy required four design iterations (see section Taxonomy Design). In total, we identified eleven relevant dimensions with multiple characteristics. The dimensions are grouped into meta-dimensions. These meta-dimensions are arranged along the way that data move through a Digital Twin, i.e., the data collection, the data handling and distribution, and the conceptual scope.

To achieve the goal of integrating qualitative data from industry experts into the taxonomy, the 4th iteration of its design is embedded in a qualitative interview study. We presented each expert with the taxonomy from the 3rd iteration (see van der Valk et al. (2020)) and allowed them to adjust it and illustrate archetypical configurations. Even though the experts agreed on the validity of the dimensions and characteristics, naturallya spectrum of different archetypical designs reflects each expert's unique perspective. Subsequently, the resulting taxonomy should reflect design characteristics and dimensions relevant to archetype design.

Given that understanding, a Digital Twin must contain mandatory and optional characteristics, as well as mutually exclusive dimensions. Table 3 describes the individual classification of the designation:

Table 3 Designation of characteristics as mandatory, mutually exclusive, not relevant, and optional

The categorization into mandatory, mutually exclusive, not relevant, and optional is based on the literature review and from an understanding derived from the expert’s insights. In the following, we will describe the different dimensions along with their classification into the meta-dimensions data collection, data handling and distribution, and conceptual scope. We derived the meta-dimensions inductively based on the dimensions' perceived similarity to each other (Bronowski 1953). Table 4 illustrates the final taxonomy. (Table 5).

Table 4 Taxonomy of digital twins (M = mandatory, ME = mutually exclusive, N = not relevant, O = optional)
Table 5 Final characteristics with clusters, M = mandatory, O = optional

Interestingly, the taxonomy shows certain differences to the working definition provided in Sect. 2. Occurring differences will be discussed below under the corresponding dimension.

4.1 Meta Dimensions: Data Collection

The meta-dimension Data Collection describes all processes to collect data. This category's dimensions are data acquisition, data source, synchronization, and data input.


Data Acquisition: Following Sect. 2, one would expect only an automated data acquisition. Some descriptions of Digital Twins merely mention a manual or semi-manual data acquisition (Miller et al. 2018). However, it was apparent that most publications only describe an automated data acquisition, e.g., through sensors (Cai et al. 2017). Nonetheless, the interview study showed the contrast between the literature and the industrial opinion, as a semi-manual data acquisition was demanded.

“Whereby data acquisition, if you see human–machine-interfaces as input-interfaces, then I would of course also allow manual and semi-manual data acquisition.” – Interview 9

Yet, (semi-) manual options are only additional options to mandatory automated data acquisition. Hence, we continue with the decision of whether a Digital Twin acquires its data fully automatically or semimanually, which consists of a mixture of automated and manual processes.


Data Source: In this context, single data sources do not mean that just one device gathers data, but that only one type of device, e.g., sensors, is used. Multiple data sources include different types of sources. For example, a Digital Twin used in the chemistry sector receives its live data from sensors attached to the physical asset, historical data from external databases, and humidity or temperatures from the national weather services. As Digital Twins overwhelmingly use multiple data sources, the analysis omits exclusively single sources. The interviewees backed this postulation, as they corroborated a Digital Twin notion, necessarily, requiring data from multiple sources instead of a single source.


Synchronization: As most definitions mandate a synchronization between the Digital Twin and the physical part, the option of without synchronization is somewhat surprising (Kritzinger et al. 2018). Nevertheless, there are some examples in which a Digital Twin is described as a not synchronized digital object (e.g., Banerjee et al. (2017), Grube et al. (2019)). However, concepts without any kind of synchronization contradict definition 1. Hence, we deem synchronization as mandatory.

Interestingly, the dimension synchronization brought forward different views during the interviews, as, though the dichotomous division into discrete and continuous is valid, it is a matter of realizability.

“It is our reality in the industry that we cannot permanently transfer all data. […] we can't always guarantee the transmission or a WLAN in the depot. That's why the question is: what minimum data must be transmitted and which data can we do without, and of course it is not easy to agree with my demand that the Digital Twin is always up to date.” – Interview 6.

“Well, I think there are many cases where the real-time connection is not crucial. And where on the other hand it would cost you a lot of money to implement.” – Interview 15

The industry partners identify hurdles for the implementation of real-time synchronization. Implementation is expensive and depends on the availability of local mobile networks. Correspondingly, the experts stress that real-time synchronization should be implemented when it generates adequately high benefits.


Data Input: We distinguish between raw and preprocessed data. Raw data is unprocessed data. These data may stem from sensors, data collection devices, or databases. (Pre-)processed data contains all data which comes from software tools, i.e., analytical tools, applications, or smart devices. In most cases, the Digital Twin integrates both data types for internal data processing (Boschert and Rosen 2016; Shangguan et al. 2019).

4.2 Meta-Dimension: Data Handling and Distribution

The meta-dimension Data handling and distribution deals with the dimensions data governance, data link, interface, interoperability and the purpose of a Digital Twin.


Data Governance: Data governance is one of the most critical aspects of data flows (Otto and Weber 2018). Data governance was an umbrella term for everything related, e.g., data security, data sovereignty, or access control. Therefore, more detailed consideration was not possible, and we evaluated data governance as necessary for a Digital Twin. However, the descriptions did not make it clear which specific data governance rules were applied in each case. Hence, we divided this dimension into rules applied or not applied. The dimension data governance was highlighted as very relevant during the interviews, and suggestions for extensions to more detailed sub-dimensions, e.g., ownership of the Digital Twin, data accessibility, cyber-security, or data quality management, were provided for further research.


Data Link: We consider a data flow to be bi-directional when the Digital Twin communicates with the physical asset and gives feedback to the physical twin. A one-directional data link means a data flow just from the physical asset onwards. However, a one-directional data link does not fulfill definition 1. A Digital Twin must provide a bi-directional data link which is, therefore, mandatory. This dimension is especially important, as the data link provides the foundation for a Digital Twin. Furthermore, a bi-directional data link is one of the enablers for autonomous management of the physical asset through the Digital Twin. A multidirectional data link is conceivable when considering a network of a physical asset, Digital Twins, and supplementary systems (downstream and upstream). Data will flow from multiple sources into multiple systems with the Digital Twin as the center of gravity. Nonetheless, this characteristic is not part of a generic twin, and hence, it is not considered any further.


Interface: The dimension interface defines through which gateways the data and information leave the Digital Twin. At this point, we only consider two characteristics of interfaces to be relevant for data output, as data input through machine-to-machine interfaces is mandatory. The first one is a human–machine interface (HMI) that allows any operator or user of a Digital Twin to access the output data. We do not go into more detail on purpose, as several options, i.e., augmented reality (Tao et al. 2019), dashboards, light- or audio-signals, and more, seem possible (Lutters and Damgrave 2019; Ma et al. 2019). The second option for the output interface is a machine-to-machine interface (M2M). This interface provides the possibility for the Digital Twin to communicate with the physical asset directly. This is the primary enabler for an autonomously operating Digital Twin. We do not define the exact design of the M2M interfaces, as they can be manifold (Martinez et al. 2018; Merkle et al. 2019). Additionally, a Digital Twin can possess both interfaces simultaneously (Petrova-Antonova and Ilieva 2019).

Many companies state problems in safety-relevant, infrastructural sectors with machine-to-machine interfaces when it comes to the interviews. Exemplarily, direct integration with a digital tool via machine to machine-to-machine interfaces is forbidden.

“This is certainly due to the special nature of railroad technology systems, but other critical infrastructures will also have this. We are always forced to prove that there are no retroactive effects.” – Interview 6

As the interviewees were given a choice to manipulate the status quo of design dimensions and characteristics, we introduced the dimension interoperability during the interviews.

Interviewer: “So in your opinion, an additional dimension of’interoperability’ would be necessary. What could the characteristics look like then?”

Interviewee: “Non-interoperable, interoperable with translation interface and interoperable per se. Standards play a role here. [...] So interoperability is about standards and the degree of interoperability.” – Interview 8

Interoperability guarantees standards for data transfer. The Digital Twin must be able to understand data, especially data that have been preprocessed by others. Foundationally, the Digital Twin must have interoperable interfaces to represent the physical objects continuously. From the interviews, we derive the dimensions non-interoperable, fully interoperable, or interoperable via a translator.


Purpose: This dimension integrates a variety of purpose options. Different tasks with their percentage of occurrence follows:

Simulation (64%), Condition Monitoring and Analysis (50%), Forecast and Prediction (44%), Optimization (38%), Representation (15%), Data Transfer and Storage (10%), Controlling (8%), Machine Learning (7,5%), Decision Making (5%), and Cost Reduction (2,5%)

Due to the high number and variety of tasks we saw the need to further aggregate them. Subsequently, we opted for a threefold classification into data processing, data transfer, and repository. We see this dimension as mutually non-exclusive. A Digital Twin can process, store, and transfer data at the same time.

4.3 Meta-Dimension: Conceptual Scope

Finally, the last three dimensions belong to the meta-dimension conceptual scope. This meta-dimension contains accuracy, the conceptual elements, and the time of creation.


Accuracy Accuracy deals with the model part of a Digital Twin. With this dimension, we aim at the scope of the model. We divided accuracy into identical and partial. Identical accuracy describes a physical asset fully comprehensively, while partial means that a physical asset is reduced to the crucial parts. However, the dimension accuracy includes the idealized characteristic identical, which designates an exact digital representation of physical objects. As full model accuracy is a state that is likely not attainable, the interviewees suggested that there is no merit in further considering this characteristic.

“I believe that this [identical accuracy] will never be achieved, because there are so many different characteristics that we have, because we always have only one model. And a model can never be complete. I can always think of something, which is part of it.” – Interview 15

Hence, we deem identical model accuracy in analogy to the single data source as not relevant for the taxonomy's practical usage.


Conceptual Elements We divided this dimension into physically independent and bound. The former describes only the virtual representation, whereas the latter includes the physical aspect in the Digital Twin concept. This dimension focuses on the used definition in a publication. It does not affect the connection between the physical asset and the Digital Twin or the presence of a physical asset, and hence is not of relevance for the development of the archetypes.


Time of Creation the last dimension consists of three mutually exclusive characteristics, namely, digital-first, physical first, and simultaneous. They describe the point in time when the Digital Twin is created. As the creation of an artifact is a process, a discrete point in time cannot be determined. However, we regard the initial creation process as completed when the developed object's commissioning takes place. This point in time is the time of the creation of the Digital Twin. We evaluate whether the digital representation or the physical asset was developed and, therefore, commissioned. Rarely, both objects were commissioned simultaneously. The time of creation marks the point in time when the object is completed. Digital-first means that the Digital Twin is usable before the physical asset's main development steps. On the other hand, physical first means that the physical asset exists before the digital representation. All other issues fall under the characteristic simultaneous. In alignment with recent literature (Boschert and Rosen 2018), the experts agreed that Digital Twins are designed after the physical assets. Nonetheless, the experts identified that the dimensions of the conceptual scope have no merit for archetype derivation and, hence, are excluded from the further analysis.

Figure 4 graphically shows the remaining characteristics and how they relate to each other. Especially, the illustration emphasizes the differentiation between inputting raw data (e.g., from sensors) and pre-processed data (e.g., from external sources), as well as the feedback of data and information as an output of the Digital Twin. Structurally, the twin consists of a digital representation, the data flow, the internal processing, and the internal repository.

Fig. 4
figure 4

Conceptual model of a digital twin

4.4 Digital Twin Clusters

We analyzed the database consisting of 233 publications and 15 interviews using the statistical software R, Gower (1971)’s coefficient, and the k-means algorithm (see section Cluster Analysis). First, we analyzed the 233 publications. For a sound analysis, we had to eliminate the outliers (Punj and Stewart 1983). As two dimensions (data acquisition and data source) did not contain relevant distinguishing characteristics, they do not influence archetype design. We designated these dimensions as not relevant for further analysis. As stated above, the conceptual scope dimensions concern the definitional scope of a Digital Twin but not the actual architecture. Thus, we marked these dimensions as applicable but not highly relevant.

Furthermore, we eliminated data governance and data interoperability as a general concept, which led us to five highly relevant dimensions with distinguishing characteristics. The dimensions data link, purpose, interface, data input, and synchronization remained for the cluster analysis. We rated every publication, which did not reveal at least three of the five dimensions, as irrelevant and omitted them. We proceeded with 187 publications. Several iterations of the cluster analysis showed that we gained the best results with seven clusters, which was in line with the elbow method. With only twelve additional runaways, we could proceed with 175 objects.

The seven clusters distinguish themselves from one another, as there are no duplicates. Each cluster is denoted in the same way. To better understand the designations, we labeled the characteristic one-directional as without feedback because the data do not flow back to the physical asset.

The individual configurations of a Digital Twin from the interviews were analyzed analogously to the literature. The cluster analysis could be conducted with 12 of the 15 interviews of the first interview series, as three interviews were outliers and did not provide usable results for the analysis. The analysis revealed three clusters of Digital Twins with essential differences from each other. However, we can identify overlaps between the literature-based clusters and interview-based clusters by comparing the three new clusters with Cluster 1–Cluster 7. The second cluster of the interviews provides the same configuration as Cluster 6. Analogously, the third interview cluster is the same as Cluster 4. Lastly, Cluster 8 (first interview cluster) is a narrowed-down version of Cluster 5. In general, both configurations are designed in the same way. However, the interview-based configuration does not provide the option to transfer data in downstream systems.

Following definition 1 (see Sect. 2), a Digital Twin has to provide the mandatory characteristics. As clusters 1, 2, 6, and 9 lack crucial, mandatory characteristics, we exclude them from the development of the archetypes. Cluster 1 is missing synchronization with the physical world. Therefore, the digital part may control the physical world, but it cannot regulate the physical object in dependence of any state changes. The cluster contains only 2% of all reviewed literature, and none of the interviewees described a Digital Twin belonging to cluster 1. Hence, this cluster is of little relevance and dismissed in further analysis. Examples for this cluster can be found in Beregi et al. (2018), or Lohtander et al. (2018).

Cluster 2 lacks two critical features, namely the bi-directional data linkage as well as the synchronization. Even though 10% of the analyzed literature described a Digital Twin concept that belongs to this cluster, it provides fewer characteristics than cluster 1. It does not fulfill the requirements stated in definition 1. This cluster describes a digital artifact that gathers and stores data, for instance, databases. Schluse et al. (2017) or Radchenko et al. (2018) provide examples for this cluster.

Finally, identical clusters 6 and 9 have to be excluded from the further analysis. Again, there is no option to provide a bi-directional data link between the physical and digital parts. Furthermore, the concept does not offer the ability to store data. Nearly 20% of the literature and 3 out of 12 interviews described an artifact of this cluster. Nevertheless, the mandatory characteristics from definition 1 were not met. Examples for these clusters provide a.o. Buldakova and Suyatinov (2019), and the Interviews 2, 10, and 12. This leaves six clusters for further analysis, namely the derivation of archetypes.

5 Archetypes of Digital Twins

Having illustrated archetypes generated through cluster analysis from the literature and a qualitative interview study, we triangulate our findings by synthesizing methodological approaches (Denzin 2017). The triangulation evaluates each of the qualitative and quantitative research results through each other (Hammersley 1996). Here, we evaluate the cluster analysis through the interviews and vice versa.

The clusters 3 to 5, 7,8, and 10 describe possible Digital Twins according to definition 1. The optional characteristics provide the distinction between the clusters. For the development of the archetypes, we will proceed with these clusters. Comparing them, cluster 3 describes the Digital Twin with the least capabilities and clusters 4 and 10 with the most ones. All clusters are designated and described in Table 6:

Table 6 Definitions of the archetypes

6 Evaluation

We conducted a second series of interviews with experts from different industrial fields to evaluate the five archetypes. We presented the archetypes with the individual characteristics as shown in Table 5. The evaluation interview series confirmed these archetypes, however with minor tweaks:

For the time being, I do not find any contradiction. I believe that AT 5 would not be accepted in our industry [Healthcare] today. Technically, it can be painted on faster than it can be used. I lack the belief that it would be accepted today. In my world or in our world, this would mean that we would have to take action in the customer system. And I believe that no operator would authorize us to do so. The operator would be interested in the information but would not wish for/allow active intervention in his processes. – Interview 17

The problem seen here is that a highly developed archetype like AT 5 is technically possible. However, the practical agreement and regulatory aspects may hinder the realization of highly developed archetypes. Another interview-partner agreed with the archetypes but also saw minor issues with the highly developed archetypes due to the high costs while implementing:

The archetypes seem to make sense. I would make the Digital Twin as simple as possible. I usually have an incredible complexity in the surface, but if I have a lot of things that can be done on a thin budget, I don't think I can afford the luxury of having an expensive complex Digital Twin. – Interview 18

Besides, the evolutionary process from archetype one to five is described:

Today, we are at AT 1 for interlogistics. Maybe also partly AT 2, the topic of preprocessing data is already happening. If I look at intralogistics, we already have AT5 today, but as soon as we talk about industrial borders, about interlogistics between different destinations, then it still takes time. But within a test bed, no question, this [AT 5] is already possible today. – Interview 18

Hence, we conclude that the archetypical patterns AT 1 to AT 5 show an evolutionary process for Digital Twins. The patterns contained a high degree of validity through the application of the triangulation research process. Furthermore, they were confirmed by the evaluation interviews. Additionally, the archetypes represent a sizeable number of papers and interviews (see Table 5).

While AT 3 dominates the industrial view on Digital Twins with a 58% share, the stakes between the archetypes based on the clusters from the literature analysis are more evenly distributed and range between 9 and 25%. Nevertheless, a high interest in the exhaustive Digital Twin is obvious, as this percentage is the second-highest amongst the literature clusters. Additionally, it corresponds to one of the interview clusters. The interest of the industry experts stretches from the autonomous control twin to the exhaustive twin. This is as one would expect, as the different archetypes can be seen as development steps towards the exhaustive one.

Especially the question of interoperability is a highly discussed one within the industry, but is neglected in the research focusses. This shows a particular gap between the theoretical understanding and the practical use of Digital Twins. Therefore, we provide the industrial relevance for each archetype by supplying industrial examples fitted to the archetypes (Table 7).

Table 7 Examples for the archetypes

The archetypes are a reflection of recent trends and developments in Digital Twins. For example, the mandatory characteristics are echoed by the existing literature corpus (e.g., see Kritzinger et al. (2018), Jones et al. (2020), or van der Valk et al. (2020)). Consequently, we see the mandatory characteristics as the smallest common denominator that is a potential baseline for a common understanding of Digital Twins. Beyond that baseline, our archetypical representations enhance the prevailing understanding in the literature through an extensive and in-depth interview study that produces optional characteristics in dependence on various individual use cases and industrial applications. Exemplarily, we point to the issue of interoperability, which was discussed prominently in the interview study, yet neglected in the literature. This shows a conceptual disconnection between the existing theoretical understanding of Digital Twins and their practical application in industry.

With these insights, we extend the definition 1 in the following:

Definition 5

The Digital Twin is a virtual construct that represents a physical counterpart, integrates several data inputs with the aim of data handling, data storing, and data processing, and provides an automatic, bi-directional data linkage between the virtual world and the physical one. Synchronization is crucial to the Digital Twin to display any changes in the state of the physical object. Additionally, a Digital Twin must comply with data governance rules and must provide interoperability with other systems.

7 Conclusion, Limitations, and Contributions

This paper developed archetypes based on Digital Twins characteristics derived from a sound literature base and extended through interviews with industry experts. From this database, we derived clusters of Digital Twins (RO1). Each cluster possesses a particular configuration of characteristics. We could identify seven clusters, which showed specific patterns in their configurations. From these patterns, we were able to derive characteristics that each cluster contains (RO2). Denoted as preliminary mandatory characteristics, we could identify that a Digital Twins should contain an automated data acquisition, multiple data sources, the appliance of data governance rules, a data processing and repository, and raw data input.

The interview series provided some interesting insights. Most characteristics could be confirmed. However, some new aspects appeared, such as the semi-manual data acquisition and the dimension interoperability. The analysis of the configurations from the experts showed more mandatory characteristics. The additional mandatory characteristics are a synchronization between the Digital Twin and the physical asset and a bi-directional data link.

Furthermore, we could identify six optional characteristics. This leads to the identification of five archetypes for Digital Twins (RO3). These archetypes build upon each other. All archetypes contain the mandatory parts, but they show different configurations in the optional parts from an Assistance Twin to an Exhaustive Twin. Furthermore, we recognize the most important identifying characteristics, which distinguish a Digital Twin from other concepts, i.e., Digital Threads or virtual models, as the presence of synchronization and bi-directional data linkage between the Digital Twin and its physical counterpart. Additionally, the archetypes represent a development of Digital Twins from a more Basic Twin towards the Exhaustive Twin. Hence, the different archetypes may act as a maturity model for the overall development of Digital Twins.

Our work is subject to certain limitations. As the definition of the review scope for the literature analysis is subjective, other research teams might define other scopes and, therefore, might find other results. Secondly, in a similar way to coding this process is prone to subjective influences. This research provides several contributions. As scientific contribution, this paper analyzes patterns of Digital Twins through the derivation of archetypes. It lays a profound framework for the classification of Digital Twins. We provide an ample contribution to the scientific knowledge base of Digital Twins, which is established by the generalized nature of archetypes. With the derived archetypes, one can sort the differentiating streams in research on Digital Twins. This lays the foundation for further research. Starting from this conceptualization, further scientific contributions could focus on one particular archetype and provide a deeper understanding and elaboration of each archetype. For example, reference models or design principles, including specific technical regulations for implementation, are conceivable.

As our work is based on and partly evaluated through input from industry experts, it provides ample managerial contributions. It can be used as a guideline for the development of Digital Twins in commercial environments. Practitioners can compare their understanding of the archetypes and may find a perfect fit with additional information on supplementary modules of a Digital Twin. At the very least, practitioners will gain insights into the fast-growing field of Digital Twin research. Additionally, one can compare the development process's position with the different developed Digital Twins groups. The groups' size will make it possible to conclude how far the development processes have progressed.