The identification of the literature base with the help of Web of Science leads to 1876 hits. Most articles were published during the last five years, as seen in Fig. 3. We assume the attention on the research is still growing as it has raised attention since 2013. More than 300 papers were published in the journal “Expert Systems with Applications” which focuses on technical solutions and intelligent systems applied in different contexts and is not limited to a specific area. Moreover, many articles were published in “Decision Support Systems” and the “European Journal of Operational Research”. Besides these journals from a business perspective, other journals with a more psychological view were found.
The technologies investigated in the analyzed articles (recognized by keywords) can be seen in Fig. 4. Especially research on big data is gaining more and more attention during the last 5 years. As big data can be understood as a large amount of data (Chen 2014) as well as technological challenges associated with these data (Madden 2012) many articles are dealing with this topic. The number of articles on cloud computing also rose significantly since 2013. As the Internet of Things emerged as a concept by Kevin Ashton in 2009 (Ashton 2009) research grew from that time. Artificial intelligence, machine learning, as well as augmented and virtual reality, seem to be rather steady topics in research.
For the identification of clusters and superior research streams, the cited references were included in the analysis. For the qualitative analysis, 22 clusters were analyzed in-depth which represent the most important topics in our database. For an overview of the clusters, see Table 2. The clusters are further introduced in the following chapters by presenting the research streams identified. This means we merged clusters dealing with similar research issues to one topic. In total, we introduce nine identified streams in the following chapters. The numbering of clusters is based on their size regarding articles found (see # in Table 2). During the qualitative analysis, we identified two clusters which were excluded for further examination because they do not fit the business perspective that was intended. One of these was named “methods” as it mainly deals with research methods, especially in statistics and game theory. Moreover, many papers are technology focussed as they deal with programming issues. We also did not investigate the cluster “health care” in further detail because of a missing business perspective.
Table 2 Cluster with color coding, article count, and central keywords The size of the clusters can be found in Table 2. “Total” includes articles from the base sample, as well as references. The column “found” shows only the articles found during the Web of Science search. QA (qualitative analysis) is the number of articles, which were in-depth analysed in the third step. Lastly, the cluster trust index is used to evaluate the quality of the cluster-building process.
The ratio of the size of the clusters, measured by the number of articles, seems to be rather unchanged. A peak of articles can be found between 2011 and 2014 for the innovation and manufacturing cluster (see Fig. 5). Yet the topics seem to decline afterwards in the field of DT research leading one to the assumption that these fields are in a more advanced stage than the others from a research perspective. Research on innovation, especially, has been carried out extensively in the last 5 years. Analytics and society, too, have the most articles in 2014. A growing interest in societal questions can be observed as there are more articles in the last few years. The research interest on implications regarding whole societies is getting higher but is still a less mature field of research, e.g. in the field of changing labour markets due to more automation of tasks. Knowledge management, tourism, and marketing seem to be rather steady areas of research. Regarding DT in finance, the interest has decreased a little bit which indicates an advanced stage in this application field of digital technologies. As the total number of papers has grown significantly since 2006, there are no outstanding results before that time.
In the following, the identified research streams are presented by highlighting important results and articles.
Finance
Within this research stream, three clusters were identified and named credit and risk management (cluster 1), artificial intelligence (AI) methods (cluster 10), and trading of investment certificates (cluster 16). The leading journal in this field is ‘Expert Systems with Application’. Within the second cluster, the ‘European Journal of Operational Research’ and within the third cluster ‘Quantitative Finance’ are additional sources with a high number of articles related to the field.
In the first cluster, three articles from ‘Expert Systems with Application’ show high ranks above 150 in their times of citations. Regarding the in-degree, these articles are outstanding with values of six and five. Looking at the betweenness centrality, articles from Tsai and Wu (2008) as well as Min and Lee (2005) show values above 1000. They are also those most cited. As “the performance of multiple classifiers in bankruptcy prediction and credit scoring is not fully understood,” Tsai and Wu (2008) propose to compare a single classifier with multiple classifiers and diversified multiple classifiers by using them on three different datasets.
In the second cluster, two articles from the ‘European Journal of Operational Research’ as well as ‘Information & Management’ have citations above 100. Looking further at in-degree and betweenness centrality the article from the ‘European Journal of Operational Research’ is outstanding with values of 11 as well as 1538. This article is written by Zhang et al. (1999) and provides a general framework for better understanding artificial neural networks. The authors show the advantage of neural networks over logistic regression and classification rate estimation, relating to the prediction of bankruptcy as well as robustness towards variation in the sample.
In the third cluster, four articles show highest ranks between 20 and 30 citations. All are from the ‘Expert Systems with Application’. Looking at the betweenness centrality, two articles show values above 100. Booth et al. (2014) also have a high value of citations. In their work, they use seasonal effects and regularities in financial data to develop an expert system based on random forests techniques to develop a trading strategy. The performance of the models is assessed by using data from the German Stock Exchange Index (DAX). In general, using seasonal effects has proven to produce superior results.
Compared to the other two clusters, this third cluster is smaller and the articles newer. Specific algorithms still need to be applied in this area. Interestingly, Hsu et al. (2016) are questioning the efficiency of financial markets. Views which financial economists have been taken on markets for decades such as Smith’s invisible hand might have to be adjusted. All in all, the field of finance has already presented significant changes and developments due to DT, especially forecasts which are useful for financial decisions can be made using algorithms. Technology enables the control of complex environments like financial markets. However, many unpredictable events still make forecasting difficult and lead to challenges for the DT in the finance sector.
Marketing
The marketing stream focuses on three aspects: the use of virtual reality (VR) in marketing and sales (cluster 3), the possibilities to work with user-generated content to deduce sentiments and further data (cluster 5) and computer-assisted customer relationship management (cluster 19). For cluster 3, we dismissed topics regarding VR application for pedestrians and mere VR acceptance. The most cited article (288 times with betweenness centrality of 134) of cluster 3 is written by Coyle and Thorson (2001). This work deals with the perceptions towards websites and the influence of the characteristics vividness and interactivity. This work is closely tied to the work about the effects of different technologies on product ratings. Moreover, the ability to use reviews for further marketing and sales purposes is shown in this cluster (Singh et al. 2017; Ordenes et al. 2017; Sodero and Rabinovich 2017).
Cluster 19 is about customer relationship management (CRM) and technical implications using automated responses for service purposes. The analysis of the most used words within the keywords showed an accumulation of the fields of BD, user-generated content, and consumer. Cui et al. (2006) show the highest values of in-degree (3) and betweenness centrality (239) of cluster 19. The text deals with machine learning (ML) for direct marketing response to enable immediate response to customer inquiries.
The work of Das and Chen (2007) provides the highest in-degree (12) in cluster 5 and a betweenness centrality of 1133. The authors developed a methodology for extracting small investor sentiment from stock message boards. The content analysis of cluster 5 shows: BD, customer, social, marketing, and ML are the most used words of the keywords of cluster 5. In general, cluster 5 deals with articles about user-generated content and text mining systems that are used to gain additional information from the data. The analysis of user- or customer- generated data via reviews and the fast reaction of the enterprises play a vital role in this research stream. We identified several articles in all marketing clusters that focus on that topic and on response modelling (Kim et al. 2008). Furthermore, new technologies and opportunities like VR and AR enable new dimensions of online product presentation (Yim et al. 2017).
In summary, marketing activities are highly influenced by DT which opens up new possibilities of understanding customer behavior and placement of individually adapted advertising which is possible due to a huge amount of data created by the user or automatically generated data. A further need for research in the field of VR and AR for marketing purposes is identified. These technologies should be developed and enhanced to create a more sensual atmosphere.
Innovation
The clusters of this stream deal with business model innovation (cluster 18), adoption and diffusion of innovations (cluster 2), impact on the process of innovation and organizational learning (cluster 12) as well as strategic aspects of innovation in terms of, for example, search orientation and capabilities (cluster 20).
Cluster 18 is closely related to the manufacturing clusters for it deals with the industrial internet of things (IIoT). However, rather than investigating primarily manufacturing aspects of IIoT, studies in this cluster investigate the relationship between business model innovation and DT in general as well as IIoT in particular. The article with the highest in-degree (4) and 50 citations examines the effects of business model innovations triggered by the DT on accounting (Bhimani and Willcocks 2014). Other articles deal more strictly with the implications of IIoT for business models (Arnold et al. 2016) and how the new business models of the digital era can be identified and developed (Pisano et al. 2015; Najmaei 2016). Of particular interest is the emergence of these new business models in the context of the DT through entrepreneurship (Guo et al. 2017), as well as their more sustainable nature (Gerlitz 2016; Prause and Atari 2017).
While the technological focus of cluster 18 was on IIoT, cloud computing (CC) is the subject of cluster 2. In fact, the study of this cluster with the highest in-degree (7) and over 290 citations investigate determinants of its adoption. Oliveira et al. (2014) find significant differences in the determining factors between manufacturing and service firms. While adoption in manufacturing is driven by the relative advantages and cost savings of CC, service firms are more reluctant to adopt it due to the complexity of CC and require more top management support. In terms of theoretical frameworks, the technology adoption model (TAM) is the most applied in this cluster (Gangwar 2016). One of the earlier studies integrates the TAM with marketing theory in order to explain firm adoption behavior regarding radical innovations like CC (Bohling et al. 2013). However, some studies also investigate combinations of theories (e.g., TAM and media richness) and technologies (e.g., CC and augmented reality) (Lin and Chen 2015).
Cluster 12 covers managerial challenges of the DT. For example Khanagha et al. (2013) study the impact of management innovation on the adoption of emerging technologies. They show, based on an in-depth case study, that management innovations can provide the required changes in organizational structures that enable the adoption of emerging core technologies. Most importantly, it is argued organizational routines that prevent early stage experimentation with the new technology need to be overturned as they can hinder knowledge accumulation. Other studies investigate the role of established management concepts like absorptive capacity (Lam et al. 2017; Trantopoulos et al. 2017) and ambidexterity (Khanagha et al. 2014). The managerial challenges during the innovation process most investigated by studies in this cluster are the changing opportunities and difficulties related to managing the customer and customer communities, in particular, managing customer co-creation and ideation (Hoornaert et al. 2017; Khanagha et al. 2017).
Cluster 20 covers also managerial challenges of the DT, but with a distinct focus on BD. The issues investigated regarding the relationship between management and BD range from human resources (Shah et al. 2017) over new product success (Xu et al. 2016) to firm performance and strategy (Akter et al. 2016; Mazzei and Noble 2017). The article with the highest in-degree (11) received 130 citations on Google Scholar at the time of analysis and uses the resource-based view of the firm to explain the outcome of BD usage for consumer analytics (Erevelles et al. 2016).
In summary, innovation is by nature an important research avenue to pursue in regards to digital transformation because the transformation process has to be innovative itself to be successful. DT implies implementing and using new technologies in combination with a cultural change of the whole organization. Innovation literature can contribute to developing effective ways to apply and utilize DT.
Knowledge management
The cluster knowledge management (cluster 7) focuses on aspects of knowledge management and strategy in the realm of digitalization. The journal that most occurred in this cluster is the ‘Journal of Knowledge Management’ with one third of the articles published here, of which 57 percent of the articles were published in 2017. The most frequent keywords are big data, analytics and for the content-related realms knowledge management, intellectual capital, and performance. The article by Braganza et al. (2017) is the most cited article (in-degree = 2) with the highest betweenness centrality (168). They discuss the management of resources in BD initiatives and how to effectively introduce BD initiatives into companies.
We divided this cluster into two main areas as articles show tendencies towards (1) Knowledge Management as well as (2) Strategy.
(1) Knowledge Management is the primary topic focus of 13 articles. The major part of the cluster consists of articles focussing on digitalization in knowledge management. Among these papers, most (8) deal with BD and its use for knowledge management in companies. Half of the articles take a closer look at specific applications of BD in the realm of knowledge management. Fowler (2000) and Weber et al. (2001) on the one hand focus more on use cases that involve AI and how it can “contribute to knowledge management solutions” (Weber et al. 2001, p. 17). On the other hand, Murray et al. (2016)as well as Uden and He (2017) take a look at IoT devices and how they can enhance knowledge management systems because of the data that are automatically generated. A strict theoretical view can be found with Rothberg and Erickson (2017), who mean to bring together the existing theory from knowledge management, competitive intelligence and BD analytics. One article is quite critical of the use of BD and elucidates that “to describe it [BD in the context of knowledge management] as ‘revolutionary’ is premature” (Tian 2017, p. 113).
(2) Strategy is investigated by eight articles. The strategy topics can be divided into three subareas. Two articles focus heavily on decision making and how BD can be of use (Prescott 2014; O’Flaherty and Heavin 2015), while another two articles deal with text mining techniques and their impact on business strategy (Li et al. 2012; Zhang et al. 2016). Moreover, four articles investigate performance aspects of BD in relation to business strategy (Cleary and Quinn 2016; Tian 2017; Blackburn et al. 2017). This performance perspective includes papers that show how BD can help to improve the understanding of purchasing decisions (Tian 2017). It can also be seen how BD affects operation models (Roden et al. 2017), and whether BD might affect R&D Management (Blackburn et al. 2017), as well as “how the use of cloud-based accounting/finance infrastructure affects the business performance of small and medium-sized enterprises” (Cleary and Quinn 2016, p. 225).
Braganza et al. (2017) propose to utilize theories drawn from strategy and leadership fields. Deeper insights on how strategies are changing and still need to change are missing. Moreover, as business models are already studied in-depth regarding DT, concrete application scenarios would be useful.
Analytics and data management
Seventy percent of the articles in the Analytics and Data Management cluster are published in 2017. We further subclassify the publications in four major realms:
(1) Operations and supply chain management, in addition to the matter of BD and analytics, enhancement of supply chain processes and ultimately, performance, are important areas of study. Bag (2017) shows empirically the positive relationship between BD, predictive analytics, and supply chain performance. Rajesh (2016) presents a prediction model to forecast supply chain resilience performance and to test it. For an extensive literature review, see (Lamba and Singh 2017). Tan et al. (2015) propose an analytic infrastructure to assist firms to capture the potential of supply chain innovation afforded by data. This is also the article with second highest values for in-degree (12) and betweenness centrality (764). Ji et al. (2017) present an example of how BD in the food chain can be combined with Bayesian network and deduction graph models to guide production decisions.
The second significant research realm is in the context of (2) innovation and operations management. Furthermore, articles dealing with application and exploitation of BD to create competitive advantage and value in business are studied. For instance, Barton and Court (2012), also the most cited article in this cluster (in-degree: 26), present a practical perspective on how to improve companies’ performance with advanced analytics. Zhan et al. (2017) suggest how firms could use BD to facilitate product innovation processes. Moreover, Tan and Zhan (2017) present three principles related to BD which support new product development.
Another noteworthy topic is (3) analytics to improve decision-making in management. For example, Horita et al. (2017) present a framework that connects decision-making with data sources through an extended modelling notation and modelling process.
The last realm refers to (4) data analytic techniques and quality framework of data management systems. Zhang et al. (2015) discuss specific techniques for modelling BD and analytics in the context of computational efficiency. Others present explicit analytical modelling for designated business fields, such as quality control in manufacturing (He et al. 2016).
We conclude that “successfully introducing analytics requires substantial organizational transformation” (Dremel et al. 2017). Management decisions supported by BD analytics depend on the underlying data quality. With the highest values on in-degree (12) and betweenness centrality (3108), the article from Hazen et al. (2014) contributes to the data quality problem within the supply chain management context. Lamba and Singh (2017) see a lack of data analytics techniques and works which can suggest the practical implementation of BD. For future research, it is suggested one consult, for example, Sivarajah et al. (2017). How to analyse and use data effectively is still a topic with growing interest in research and a big challenge for practice.
Manufacturing
The research stream manufacturing is represented by three sub-clusters that deal with the fields of cloud manufacturing, strategic implications for manufacturing and logistics.
Cluster 4 is quite diverse. We excluded specialized topics in the field of space science (Metzger 2016), mobile services (Qi et al. 2014) and football robots (Bi et al. 2017). Among representative works within this cluster, a visualization platform for IoT to control and monitor wireless sensor networks (Bi et al. 2016), resource allocation (Pillai and Rao 2016) and resource bundling (Guo et al. 2016) are examined. Moreover, strategic issues are discussed (Li et al. 2012; Guggenheim 2016). One particularly strategic article dealing with information architecture in the context of supply chain management (Xu 2011) has a very high betweenness centrality (number six and seven of the whole sample). Xu (2011) is also cited 124 times.
Cluster 17 has a focus on cloud-manufacturing (also most mentioned keyword). The ‘International Journal of Computer Integrated Manufacturing’ focuses topics in this area and is the publisher of most of the articles of the cluster. Cloud-manufacturing means that the principles of cloud computing will be transferred to manufacturing concerns, so related manufacturing resources are offered as services which lead to a network of exchanging needed resources and products. This application of DT can optimize processes which is shown in an example of sheet metal processing (Helo and Hao 2017). Frameworks for building a cloud manufacturing solution (Cheng et al. 2016; Lu and Xu 2017) and the design of the network architecture (Škulj et al. 2015) are presented and discussed. Moreover, the communication between machines in different companies is a necessary condition to make cloud-manufacturing a success. Therefore, a scheduling model was developed to efficiently exploit distributed resources (Li et al. 2017).
Cluster 22 is the smallest of all clusters in the sample. It includes articles on manufacturing whereas it exhibits limited focus on logistics topics. Most articles were published in the ‘International Journal of Production Research’. The most cited article of the cluster with 43 cites is also the one with the highest betweenness centrality. Reaidy et al. (2015) and Zhong et al. (2017) show that RFID technology is especially useful in warehouses to track resources and to connect objects. Advantages of the aforementioned communication technologies in smart logistics, as in higher safety are shown (Trab et al. 2017). Moreover, applications of technologies are demonstrated like the development of an algorithm to optimize truck docking (Miao et al. 2014).
Smart factories, as well as smart industry (Haverkort and Zimmermann 2017), are popular areas of research which are shaped by examples from practical applications. Machines, information systems and workers become more connected. The future factory is decentralized and can produce diverse products in a short time period. The topic of DT is getting more and more important for the manufacturing industry.
Supply chain management
Two of the identified clusters were allocated to the topic supply chain management (SCM). The importance of the topic was extraordinarily high in the years between 2010 and 2014 when more than 100 articles were published.
The clusters differ especially in their technological focus. These are supply chain and CC for cluster 15 as well as supply chain and BD for cluster 21. Cluster 15 deals with the adoption and usage of one of the central technologies in DT—cloud computing—in the context of supply chain management. Empirical results show a positive effect of the technology on supply chain integration (Bruque Cámara et al. 2015; Bruque-Cámara et al. 2016) which also leads to higher operational performance. This fostering effect on collaborations is also examined by other authors in different contexts like manufacturing and humanitarian organizations (Schniederjans and Hales 2016; Yu et al. 2017). The highest betweenness centrality and a total number of times cited can be observed for the article from Cegielski et al. (2012) which deals with the adoption of CC in supply chains. A few other technologies are also discussed in the context of SCM. O’Donnell et al. (2009) develop a generic algorithm to reduce the bullwhip effect, and Cantor (2016) examines effects of work monitoring technologies. The author with most articles in this cluster is Dara Schniederjans who published four of the 20 papers.
Cluster 21 has a focus on the use of BD in SCM. Benefits like a higher supply chain visibility and transparency, along with challenges like the balance between humans and analytics management styles are shown (Waller and Fawcett 2013; Dutta and Bose 2015; Kache and Seuring 2017). The article of Waller and Fawcett (2013) is in total cited 95 times as they give a broad overview of BD in SCM and define critical terms in this area. Two very famous authors in the area of DT also occur in this cluster with an article on BD impacts (McAfee and Brynjolfsson 2012). The reputation can be seen by the in-degree of 75 and total times cited of 387.
In sum, collaborations between firms in supply chains are identified as one primary driver of DT (Liere-Netheler et al. 2018) as borders between enterprises are known to blur (Lucke et al. 2008). This means that technologies should support this change in the supply chains. Two of the significant technologies which lead to more exchange of data are CC and BD. Wieland et al. (2016) identified BD and analytics as an overestimated research theme in the next 5 years which is in accordance with our findings. Topics like people dimensions, ethical issues, and integration are underestimated as DT also includes a cultural change in companies and the whole supply chain. Moreover, the exchange of data is still an open question. Security and legal aspects are especially unclear (Richey et al. 2016).
Society
Cluster 8 contains 23 articles. An article from Boyd and Crawford (2012) has the highest betweenness centrality (2727) and the highest in-degree (37). Besides keywords from the digital context (BD, algorithms, and technology), the most frequently used keywords were social, communication, governance and epistemology. Hence, we further sub-classify the articles in three major realms:
(1) Society and communication Articles in this realm deal with topics like an ‘analytic culture’ (Gano 2015), data-driven urban geographical imaginaries and understandings (Lake 2017; Shelton 2017), ‘datafication’ of daily life (Madsen et al. 2016), and the monetization of user data (Doyle 2015). Other topics include data-journalism (Parasie 2015), data protection (MacDonnell 2015), impacts of socio-technical systems (Carolan 2017), or BD as communication with targeted audiences in a social and cultural context (Holtzhausen 2016). Furthermore, we find articles referring to a technical communication perspective discussion in which BD found to ignore the crucial roles of interpretation and communication (Frith 2017).
(2) Policy and international finds most of the articles taking a critical view on digitalization in this context (Chandler 2015). For example, Sanders and Sheptycki, who discuss stochastic governance, “defined as the governance of populations and territory using statistical representations based on the manipulation of BD” (2017, p. 2), towards a critique of the moral economy of neo-liberalism. A considerable number of articles deals with the topic ‘algorithmic governance’/‘datafication-governance’ (e.g. Chandler 2015; Madsen et al. 2016; Rothe 2017). Rothe (2017), for example, highlights the role of visual technologies and discusses the construction of environmental security as a form of ontological politics.
(3) Philosophy and ethics Lake (2017) integrates an epistemological view and discusses BD and urban governance in a democratic society upon an ontological approach. He concludes that BD leads to an atomistic behaviour in management and thus “undermines the contribution of urban complexity as a resource for governance […]” (Lake 2017, p. 1). Furthermore, we find articles provide critiques about the efficacy of BD approaches (Lowrie 2017) and the hidden, positivist assumptions (labelled techno positivism e.g., (Gano 2015) behind the movement. Critics of technological solutions and BD are also discussed, such as surveillance of the population (Heath-Kelly 2017). Furthermore, articles reflecting how BD affect people as psychological beings are found (Raab 2015). The predicament of living in a networked world and being partly unable to sufficiently grasp with the implications thereof is discussed epistemologically (Van Den Eede 2016).
In summary, the cluster provides multidisciplinary approaches on the impact of DT on society, and most of the articles engage with BD and digital technologies from critical positions. In the work of Madsen et al. (2016), we find a research agenda for future research on BD within international political sociology. An important field for further studies is the importance of theory-driven data production. From a societal point of view, DT needs to be considered as a possibility for advancement but also, and probably more important, risks need to be taken into account so that no people will be left behind.
Tourism
The cluster tourism deals with research articles in the cross-area of tourism and social media. Starting from the year 2000, there was a peak in 2012 (116 articles) whereas in 2016 only 28 articles were published. A content analysis showed that besides the tourism aspects (tourism, destination, marketing), the most frequently used keywords from the digital context were Facebook, social media and data analytics.
We identified only two journals that provided more than one source: ‘Journal of Destination Marketing & Management’ (5 articles) and the ‘Journal of Tourism Management’ (2 publications). Only one author contributed more than one article (Kwok and Yu 2013, 2016). Both articles deal with the consumer communication via Facebook. Furthermore, the article of Kwok and Yu (2013)—an analysis of restaurant business-to-consumer communications—was one of the most cited articles in this cluster. Only Fuchs et al. (2014) with six citations and Xiang et al. (2015) with seven citations provided a higher in-degree. The research is about BD analysis in the field of hotel guest experience.
We aligned the articles to dominant fields of interest: destination management, (Fuchs et al. 2014; Raun et al. 2016) and geospatial data (Supak et al. 2015) to improve the touristic attractiveness of an area. A further sub-cluster is the research on the use of forums, customer recommendations and consumer-to-consumer communication. Dominant research focuses on text mining and how user-generated content influences the success of tourism organizations and the feelings of customers (Xiang et al. 2015; Ksiazek 2015; Kim et al. 2017). The last sub-cluster deals with the use of social media for marketing purposes in this field (Buhalis and Foerste 2015; Hornik 2016).
In summary, the influence of consumers and peers increased due to DT. The digital (user-generated) data is increasingly used for analytical purposes, such as text mining and sentiment analysis. Surprisingly trust plays no critical role in the field of user-generated content. We assume this topic is linked more closely to specific marketing research. Moreover, DT has led to a change of the whole industry as a huge amount of purchasing activities has shifted from travel agencies to online booking.