1 Introduction

Since the beginning of data analysis in the early 1950s, researchers have been interested in developing new methods to provide insights into data using business intelligence (BI) tools that enable to produce and capture a large quantity of data (Davenport 2013; Davenport and Harris 2007). Until the early 1990s, structured data, such as numeric data in tables, dominated the area of data analysis. Techniques and corresponding research relied on data collection, extraction, and analysis capabilities (Chen et al. 2012). After this first evolution stage of data analysis research, the upcoming of unstructured data, such as video streams, music or text files, led to an exponential increase of data to be analyzed. The term big data, which describes the change of data in volume, velocity, variety, and veracity (Chen et al. 2012; Davenport 2013) was born. The new and even more changing third wave of data analytics began with new data sources, such as mobile devices and wireless connected sensors. Both enable advanced opportunities of collecting and analyzing data.

Since Chen et al. (2012) published their highly-regarded special issue introductory article for business intelligence and analytics (BI & A) research in the journal MISQ in 2012, many researchers started investigating the third wave of BI & A. Several meta-studies exist that provide an overview on the status quo of the BI research field (e.g., Abbasi et al. 2016; Akter and Wamba 2016). All existing literature reviews have in common that they either focus on the first or second evolution stage of BI & A or they applied an unstructured or semi-structured search strategy, which increases the probability of an uncompleted or inadequate result set (Vom Brocke et al. 2009; Webster and Watson 2002). The article at hand extends the existing body of knowledge by a structured investigation of the third wave of BI & A. We aim at drawing a comprehensive descriptive overview about research in the field of BI & A 3.0 between 2010 and 2018. Against this background, the paper at hand provides an answer to the following research question: How did information systems (IS) research address the emerging research area BI & A 3.0?

Thereby, our research contribution is fourfold. First, we develop a taxonomy to enable a rigorous classification of BI & A research results. Second, based on that taxonomy, we provide a structured and extensive overview on relevant and up-to-date big data research results within the IS discipline. Third, our results foster discussions about the predicted developments in the field, as described by Chen et al. (2012) and its differences to the observed developments in the past decade. Fourth, our analysis clearly reveals research gaps, in which no or only little research exists. We conceptualize these underrepresented research characteristics by suggesting a research agenda for future research in the field.

The remainder of this paper is structured as follows. In Sect. 2, we describe the evolution of BI & A and provide insights into corresponding literature reviews in the field. Section 3 briefly describes the research design, which comprises the applied literature search and analysis strategy. Section 4 contains the development process of a taxonomy for BI & A 3.0 research, which defines our scope and contains the dimensions and characteristics to classify the papers, revealed by the literature search. The results of the literature search process are presented in Sect. 5. Based on the analysis results, we develop a multidimensional research agenda and suggest relevant research questions in Sect. 6. Finally, in Sect. 7 we conclude the results and its impact on research and practice, point out the limitations of our study and provide an outlook for further research in the field.

2 Business intelligence and analytics

2.1 BI & A evolution

Davenport (2013) describes BI as software that is used to query and report data in data warehouses. Since the early 1950s, the term intelligence is often used in the context of artificial intelligence. The term “business analytics” was born to represent key analytical capabilities of an organization (Davenport 2013). Nowadays, analytical skills are tied with the capability to analyze large amounts of heterogeneous data. Chen et al. (2012) as well as Davenport (2013) divide the evolution of BI & A into three major phases, whose characteristics are depicted in Fig. 1.

Fig. 1
figure 1

(adapted from Chen et al. 2012)

Characteristics of BI & A evolution

BI & A 1.0 begins in the early 1970s and covers primarily the analysis of structured data. Applications in the first evolution phase focus on extraction, transformation and loading (ETL) processes to select decision relevant data from transactional systems and bring them into the right format for analyses (Chaudhuri et al. 2011; Turban et al. 2008; Watson and Wixom 2007). To store the collected data, companies implement relational database management systems (RDBMS) and data warehouses. Analysis technologies applied in the first evolutionary phase comprise mainly statistical methods. Since the 1980s data mining techniques are additionally applied to analyze the data. Online analytical processing (OLAP) applications enable an intuitive and simple data analysis via reports and dashboards.

Since the accelerating rise of the internet and the web in the early 2000s, new ways of collecting and analyzing unstructured and social media data came up. In particular, the appearance of user-generated content and web analytics, collected through Web 2.0 applications (Doan et al. 2011; O'Reilly 2020), are drivers for the second evolutionary phase of BI & A. These new opportunities of collecting and analyzing customer feedback and opinion data enables new innovative business models, such as user-centered advertisement in Facebook or the reveal of user’s purchasing patterns in Google analytics. In addition to the DBMS-based systems of BI & A 1.0, systems of BI & A 2.0 require mature text mining, web mining, and social network analysis capabilities (Chen et al. 2012).

BI & A 3.0 focus on the analysis of unstructured data from mobile devices and sensor data. Sensor-based devices, which are connected to the internet and equipped e.g. with RFID or radio tags, enable “location-aware, person-centered, and context-relevant operations and transactions” (Chen et al. 2012). BI & A 3.0 focus on analyzing these vast amount of sensor-generated data. The analysis of data, generated by cyber-physical systems (e.g., Matzner and Scholta 2014), data of mobile devices, such as the development of situation-aware data mining applications (e.g., Haghighi et al. 2013), and position related data, such as the analysis of using location-based services (e.g., Lehrer et al. 2011) are examples for research in this third evolutionary phase of BI & A.

2.2 Recent analyses of BI & A research

Several literature reviews about BI & A and big data have been published in the past decade (Abbasi et al. 2016; Akter and Wamba 2016; Günther et al. 2017; Kowalczyk et al. 2013; Phillips-Wren et al. 2015; Shollo and Kautz 2010; Trieu 2017; Yacioob et al. 2016). In the following, we introduce them and classify these reviews according to their addressed BI & A evolution phase(s) as well as the applied search strategy (unstructured, semi-structured, and structured approach), as depicted in Fig. 2.

Fig. 2
figure 2

(adapted from Eggert 2019)

Current BI & A literature reviews

The state of the art analyses from Kowalczyk et al. (2013) as well as Shollo and Kautz (2010) mainly address the evolution stage BI & A 1.0. Solely the work of Trieu (2017) additionally addresses the second BI & A evolution phase. All other reviewed literature analyses mainly focus on artifacts in the area of BI & A 2.0 (Abbasi et al. 2016; Akter and Wamba 2016; Günther et al. 2017; Phillips-Wren et al. 2015; Yacioob et al. 2016). From a methodological perspective, solely Kowalczyk et al. (2013) apply a structured literature search method with the methodological rigor search approach according to Webster and Watson (2002). However, even the structured literature review of Kowalczyk et al. (2013) solely focuses on BI & A 1.0 research results. All other literature reviews apply an unstructured or a semi-structured approach, which might lead to either an uncompleted scope or a missing concept. Both, completeness and concept usage, are basic requirements for a high-quality review (Vom Brocke et al. 2009; Webster and Watson 2002). Despite the high number of literature analyses in the last years, to our best knowledge, we could not find a study that systematically analyzes research results of the third evolution phase of BI & A. Furthermore, a valid and applicable taxonomy for such research work is missing. The paper at hand aims at filling this research gap and shed light on BI & A 3.0 research results of the past decade.

3 Research design

To analyze the literature, we follow the structured approach, introduced by Webster and Watson (2002) and Vom Brocke et al. (2009). Therefore, we divide the research process into six phases shown in Fig. 3. First, we define the scope of the literature review in detail. Second, we identify a search string, which enables the selection of all relevant articles in scope. Third, we apply the search string in literature databases to get matching papers from all publications in scope. Fourth, we reduce the hits from the search string application by deleting duplicates and filter for relevant articles. Fifth, based on exemplary results from the literature search, we develop a taxonomy for research results of BI & A 3.0. Thereby, we follow the taxonomy development approach by Nickerson et al. (2013). Finally, in step six, we deeply read all relevant articles and systematically classify them by applying the developed taxonomy.

Fig. 3
figure 3

Literature search and analysis strategy

3.1 Definition of scope

We follow the recommendation of Vom Brocke et al. (2009) and characterize our literature review according to the taxonomy introduced by Cooper (1988) (see Fig. 4). The focus of the literature review is on research outcomes and theories in the research area of information systems. The goal is to receive existing BI & A 3.0 publications in the field of information systems research. According to Strike and Posner (1983), integration can be divided into three sub goals: Generalization, resolution of conflicts and development of linguistic bridges. The review at hand focuses on generalization, as the literature is summarized with regard to BI & A 3.0, based on defined characteristics by Chen et al. (2012). The criticism and identification of central issues are secondary objectives, mainly resulting from the derivation and analysis of the literature in scope. Regarding the category organization, we perceive the work as historical, because we solely consider articles in the past from 2010 to 2018. BI & A 3.0 is the underlying concept of the literature review at hand. Consequently, we position the review as conceptual. The form of the methodical organization is out of scope. The perspective is neutral, since we do not defend a certain position in IS research. Our literature review addresses experts for BI & A as main audience. This includes mainly researchers and lecturers, secondly interested big data practitioners. The coverage of the literature is both, representative and exhaustive/selective. As BI & A 3.0 is a quite new phenomenon, we set a wide scope for the covered literature instead of covering only a small number of qualitative outlets done by the review of Günther et al. (2017). Therefore, the literature search covers most of the A+, A, B and Top 30 C journals according to the VHB-JOURQUAL 3 ranking—subset for information systems (Hennig-Thurau and Sattler 2015). In total, 65 journals and conferences are targeted. A detailed list can be found in Appendix E. This procedure covers a large part of the relevant sources in IS. However, we cannot speak of an exhaustive or fully exhaustive/selective coverage, because not all potentially relevant journals/conferences are considered. Lower-ranked C journals and D journals are out of scope for the study at hand.

Fig. 4
figure 4

adapted from Cooper 1988)

Classification of literature review (taxonomy

3.2 Search string development

By reviewing existing structured literature reviews (see Sect. 2.2), it turned out that none of them addresses BI & A 3.0. Thus, it is not possible to apply the meta-characteristics of the existing structured literature reviews as explicit basis for our approach. Nevertheless, existing reviews address the field of BI & A in general, which let us apply the corresponding concepts as one search string term. Finally, the applied search string for the literature search contains three search terms, which are determined by the definition of scope:

  1. 1.

    Selection of named conferences and journals in the IS discipline (See Sect. 3.1).

  2. 2.

    BI & A typical keywords.

  3. 3.

    BI & A 3.0 key terms, defined by Chen et al. (2012).

We adapt the fundamental key terms of BI and BI & A 3.0 using truncations and masks to capture linguistic deviations or synonyms and limit the time horizon to articles, published between January 1st 2010 and December 31st 2018. The applied search string for the second and third part of the search string is provided in Fig. 5.

Fig. 5
figure 5

Parts of the applied search string

The defined search string is applied in two major databases: Ebscohost and the electronic library of the Association for information systems (AISeL). AISeL does not allow to enter the full search string, as defined above. Thus, we first enter the BI & A attributes, export the results into the literature management software Citavi and then filter the results according to the BI & A 3.0 attributes. In total, 856 hits could be initially received by applying the search string in both databases.

3.3 Relevance filtering

The search string application comprises five search stages, which filter the results (see Table 1). In the first stage, we applied the search string and received 742 hits at Ebscohost and 114 at AISeL. We exported all articles and stored them locally as RIS file. Afterwards, we import all articles into Citavi for further processing. Citavi automatically conducts a duplicate cleansing (stage 2). After stage 2, 561 (447 Ebscohost, 114 AISeL) articles remain. After a manual duplicate cleansing (stage 3), further 41 hits could be sorted out. In stage 4, we read the titles, key terms and abstracts of the remaining 520 (406 Ebscohost, 114 AISeL) articles to sort out irrelevant ones.

Table 1 Literature search stages and hits

For filtering the remaining articles, we apply three criteria. The criterion formal requirements includes size and format of an article. If an article has less than seven pages including introduction and illustrations, we sorted them out. In addition, we exclude non-final and discussion papers, such as working papers or research in progress, and special issue introductions from the result set. From a methodological perspective, literature reviews are excluded from the result set. Nevertheless, we read and analyze the literature reviews described in Sect. 2.1 to argue for the relevance of a structured literature review in the field BI & A 3.0. With regard to the content aspects, we also exclude articles that deal with theoretical principles without referring the application of an IS artifact. A further selection criterion is the lack of reference to decision support or the absence of data from mobile devices or sensor-based data.

After conducting search stage 4, 109 articles remain for complete article reading. The criteria for the final selection of hits are analogous to those described above for stage 3. Solely articles that address at least one characteristic of the BI & A 3.0 attributes are included in the final result set. Applying this criterion, 75 articles finally remain for a deep analysis.

3.4 Taxonomy development and result analysis

To classify all articles, we develop a taxonomy and apply the well-established taxonomy development process suggested by Nickerson et al. (2013). We iterate through a subset of articles to build and refine a taxonomy, which we then apply to classify all 75 articles, resulted from the literature search. We decide to apply this taxonomy development process because it aims at creating a useful taxonomy that “seeks utility, not truth” (Nickerson et al. 2013, p. 342), which is in line with Webster and Watson (2002) and Vom Brocke et al. (2009) who recommend using a meaningful concept matrix for a structured literature review.

The iterative method of Nickerson et al. (2013) consists of seven steps, which we depict in Fig. 6. In a first step, the meta-characteristics of the taxonomy need to be defined (1) and the ending conditions that have to be met to end the process must be determined (2). The creation of the taxonomy can be carried out as empirical-to-conceptual or as conceptual-to-empirical approach depending on the choice of approach (3). In case of a conceptual-to-empirical approach, dimensions and characteristics are conceptualized based on domain knowledge without examining the objects of interest (4c). In a next step, the evaluation for a sample set of the objects is done (5c). At the end of the process, an initial (revised) taxonomy is created (6c) and checked for fulfilling the ending conditions (7). In case of an empirical-to-conceptual approach, a relevant subset of objects is identified (4e) and common characteristics are identified to group the objects (5e). The next step comprises the derivation of dimensions from the group characteristics followed by the creation of an initial (revised) taxonomy (6e) and the check, if the ending conditions are met (7). A next iteration starts again with step 3, if the ending conditions have not been met.

Fig. 6
figure 6

(adapted from Nickerson et al. 2013)

Taxonomy development process

4 Taxonomy for BI & A 3.0 research

In a previous attempt to provide a concept matrix for covering big data research and particularly BI & A 3.0, Eggert (2019) suggests seven dimensions: BI & A 3.0 attributes, techniques, technology, research area, research method and evaluation, application area, and the handling of data privacy requirements. We applied that concept matrix for our sample set and found out that it has some weaknesses, making it not supportive for our purpose of a literature review. It has a high descriptive character but according to Nickerson et al. (2013), a taxonomy has to be explanatory rather than descriptive. Furthermore, the characteristics within one dimension should be disjunctive (Nickerson et al. 2013). Eggert (2019) supports this basic criteria partially but leaves room for improvement.

4.1 Meta-characteristics and ending conditions

We start the process by determining the basic attributes of BI & A 3.0 suggested by Chen et al. (2012) and also applied by Eggert (2019) as the explicit meta-characteristics (step 1): sensor-based content, mobile device data, location-based analysis, person-centered analysis, context-relevant analysis, mobile visualization, and human–computer interaction. Furthermore, we determine the ending conditions as follows (step 2): The iterative taxonomy development and refinement process stops, when the taxonomy supports the goals of the structured literature review at hand. This is the case when both objective and subjective conditions of Nickerson et al. (2013) are met. In total, we determine seven objective conditions. All objects of the sample are analyzed (A). At least one object addresses each characteristic of every dimension (B). No new dimensions were added in the last iteration (C). No dimensions or characteristics changed in the last iteration (D). Each dimension is unique and not repeated (E). Every characteristic is unique within its dimensions (F). Each cell is unique and is not repeated (G). The subjective conditions comprise the attributes of a useful taxonomy: concise, robust, comprehensive, extendible, and explanatory (Nickerson et al. 2013, p. 342).

For the first iteration, we decide to choose a conceptual-to-empirical approach because of the existing extensive domain knowledge of the field (step 3). As starting point, we regard the concept matrix, suggested by Eggert (2019) and apply the presented dimensions and characteristics (step 4c). The dimension Technique comprises all techniques for big data analytics according to the taxonomy of Goes (2014). In total, seven techniques are included in this dimension: statistics, econometrics, artificial intelligence, computer-aided methods, linguistics, optimization and simulation. In contrast, the dimension Technology describes whether the research work handles data storage and data management challenges or focuses on data analysis. The characteristic Data storage handles challenges of the physical storage of mass data. Data management includes, for example, the development of data models and enables the analysis of corporate data. Finally, the characteristic Analytics consists of four sub-characteristics for the evaluation, processing and preparation of data for decision support.

Emerging research consists of five major big data research areas that are also mentioned by Chen et al. (2012): Big data analytics comprises data mining and statistical methods on large data sets. Text analytics contains approaches that mainly focus on the analysis of text in various formats. Web analytics contains “the measurement, collection, analysis and reporting of Internet data for the purposes of understanding and optimizing Web usage” (Web Analytics Association 2008). Network analytics focuses on the node relations and structure of networks and are used to understand network properties, such as centrality, betweenness, cliques or paths (Barabási 2014). Analytics of mobile data enables the analysis of “fine-grained, location-specific, context-aware, highly personalized, and content through these smart devices” (Chen et al. 2012, p. 1178).

The dimension Research method and evaluation contains commonly accepted research methods in IS research, according to Oates (2006). In total, this dimension comprises eight characteristics: Survey, Ethnography, Case studies, Experiment, Action research, and Design and conceptualization. To analyze the application scenario, for which a big data artifact is developed, the dimension Application area contains seven characteristics, which are partly derived by Chen et al. (2012): E-government and politics, E-commerce and market research, Science, Security, Public Safety, Smart Health, Industry 4.0, and Internet of Things (IoT).

The dimension Data privacy provides a classification schema for the role of data privacy in relevant big data research articles. A user´s data privacy is defined as “how, and to what extent information about them is communicated to others” (Westin 1967). This criterion is of particular importance since manifold data privacy regulations, such as the EU General Data Protection Regulation, have been enacted. IS research already addresses this topic using data anonymization (Parent et al. 2013; Zhang et al. 2011). The dimension distinguishes between the absence of data privacy implications (Not addressed), the explicit mentioning of data privacy risks and problems (Risks & problems mentioned) as well as the proposal of a concrete solution (Introduction of a solution).

In step 5c, we examine a sample of ten literature objects out of our final result set to review the dimensions and characteristics. We provide the classification results of the first iteration in Appendix A. Based on the results, we create the initial concept matrix (Step 6c), which is comparable to the one suggested by Eggert (2019) (Fig. 7). Finally, we check whether the ending conditions are met (Step 7). Five quantitative conditions (B), (C), (D), (E) and (F) as well as the qualitative conditions are not met. For taxonomy improvement, we begin a second iteration.

Fig. 7
figure 7

(adapted from Eggert 2019)

Taxonomy draft after first iteration

For the second iteration, we decide to use the conceptual-to-empirical approach again as the taxonomy resulted from iteration one can be further conceptually improved (step 3). In iteration one, we allow one object to address multiple characteristics within one dimension to receive a comprehensive taxonomy. In turn, this leads to a complex taxonomy, making it difficult to review one single object. Therefore, we conceptualize a refined taxonomy, in which each object from the sample set addresses exactly one characteristic in one dimension (Step 4c).

We delete the dimension BI & A 3.0 attributes as it is represented by the meta-characteristics of the taxonomy. Furthermore, we replace the sub dimension analytics by the dimension Analytics maturity to prevent sub-dimensions in the taxonomy. Analytics maturity contains the evolution of data analytics approaches (Abbasi et al. 2016) and is based on the widely accepted Gartner Business Analytics Framework (Chandler et al. 2011). The framework distinguishes between Descriptive (what happened?), Diagnostic (why did it happen?), Predictive (what will happen?) and Prescriptive (how can we make it happen?) approaches.

To increase the explanatory value of the taxonomy, we review the remaining characteristics and dimensions and deleted those which are of low explanatory value like Computer-supported methods (dimension Techniques) and Science (dimension Application area). We added the characteristics Data science foundation and human computer interaction (HCI) in the dimension Emerging research area. The characteristics IoT and Industry 4.0 within the dimension Application area are not fully distinct and therefore merged. The characteristic E-commerce and market research (dimension Application area) is renamed into Market intelligence to have a more precise expression. By analyzing the hits for techniques and technologies, we found out that they are not distinct. Thus, we merge the remaining characteristics of the dimensions Techniques and Technologies into one dimension.

Again, we apply the revised taxonomy to the same sample set of literature objects to evaluate the dimensions and characteristics (step 5c). All classification results of the second iteration are provided in Appendix B. Afterwards, we create the second taxonomy draft, which we provide in Fig. 8 (step 6c). In step 7 of iteration two, we check whether the ending conditions are met. Even the revised taxonomy does not meet the quantitative conditions. From a quantitative perspective, objectives A, B, C, D, E and F are still not fulfilled. Three literature objects are left that could not be fully classified (Hirt and Kühl 2018; Kauffman et al. 2017; Satyanarayanan 2017), which is in conflict with condition B.

Fig. 8
figure 8

Taxonomy draft after second iteration

The results of iteration two still have the potential to be conceptually improved since we could not classify all objects. Thus, for the third iteration, we decide to use the conceptual-to-empirical approach again (step 3). Once again, we adjust the taxonomy draft (step 4c). We split the dimension Techniques and technologies into two separate dimensions Technologies and Analysis techniques, as we want to map the characteristics of each dimension more precisely. The newly created dimension Technologies consists of the characteristics Analytics, Data Storage, Data Management, and Visualization. We apply this dimension to give a general overview of the fundamental BI & A technologies. The second new dimension Analysis techniques consists of the characteristics Econometrics, Linguistics, Optimization, Statistics, and Non analytics.

Furthermore, we change the dimension Research method and evaluation entirely because in iteration two we cannot use this dimension for classifying the literature objects from the sample set. Three out of ten papers cannot be classified with the taxonomy of iteration two (Hirt and Kühl 2018; Kauffman et al. 2017; Satyanarayanan 2017). We reviewed the literature for other potential approaches to map common IS research methods. We identified the work of Wilde and Hess (2007) and the follow-up research by Schreiner et al. (2015), which offer a comprehensive taxonomy containing a finer granularity compared to the taxonomy drafts in iterations 1 and 2. The new derived characteristics are: Deductive analysis (formal, conceptual, argumentative), Simulation, Reference modeling, Action research, Prototyping, Ethnography, Case study, Grounded theory, Cross-sectional study and Lab or field experiment (Schreiner et al. 2015, pp. 3–4). We find research works that apply more than one research method like Musto et al. (2015). To address this kind of mixed methods research work, we decide to allow the classification of a primary and secondary research method, indicated by a p and s.

In step 5c, we again examine the same sample set of ten literature objects and review the revised dimensions and characteristics. We provide all classification results in Appendix C. Afterwards we create the third taxonomy draft (step 6c), which is depicted in Fig. 9. Finally, we check the ending conditions (step 7). Condition A and B are fully supported. Since we changed dimensions and characteristics, the conditions C and D are not met. Each dimension is now unique, so that the current taxonomy draft fulfills condition E. All characteristics in the dimensions are unique, which fulfills ending condition F. In addition, the subjective conditions are met for this small sample, so that we perceive this taxonomy draft as mature enough to conduct an empirical evaluation.

Fig. 9
figure 9

Taxonomy draft after third iteration

4.2 Fourth iteration (empirical to conceptual)

For the fourth iteration, we decide to use the empirical-to-conceptual approach to test a new sample of literature objects (step 4). We now examine whether all 75 papers that resulted from the literature search can be classified with the revised taxonomy from the third iteration (step 4e). In step 5e, we check whether new dimensions or characteristics are needed. After assigning each of the 75 papers to one characteristic and dimension, we conclude that the taxonomy does not need further characteristics or dimensions. We add all literature search hits into the taxonomy, shown in Fig. 9 (Step 6e). Finally, we check the ending conditions again (step 7).

The taxonomy fully meets all qualitative conditions. It is concise as it still has many dimensions but it is easier to receive an overview about the results, compared to the concept matrix of Eggert (2019). It is robust because it clearly differentiates the literature objects into the dimensions and distinct characteristics. The taxonomy is comprehensive as we are able to classify the complete sample of 75 literature objects. It is extendible allowing the inclusion of additional dimensions and characteristics. Most importantly, the dimensions are more explanatory and less descriptive, which results in a useful taxonomy for a structured literature review.

The quantitative conditions are met, except for condition B. The characteristic Grounded theory (dimension research method and evaluation) has no object. Nevertheless, we suggest keeping this characteristic because of the limited sample size. Furthermore, one objective of the literature review at hand is the identification of research gaps. Having all characteristics addressed by the sample objects hinders the identification of unrepresented characteristics, which are potentially important. In the following section, we present the results of the taxonomy application for all 75 identified papers from the literature search.

5 Results

After conducting the literature search, we classified each article by its meta-data and applied the developed taxonomy (cp. Sect. 4) to categorize the 75 papers provided by the literature search. By examining the meta-data, we noticed a clear trend towards BI & A 3.0 (see Fig. 10). Whereas in 2010 only four articles could be retrieved, in 2018, a total of 11 articles, addressing BI & A 3.0, was published. The linear trend line clearly points to an increasing interest in the research field.

Fig. 10
figure 10

Development of published articles for BI & A 3.0

The second analysis of the meta-data focuses on the distribution of main outlets, in which articles in the area of BI & A 3.0 were published in the past decade. Figure 11 provides an overview about the most relevant scientific journals and conferences in the field. With 14 relevant articles, the journal Decision Support Systems is the top outlet for research in the field, followed by the IS conferences European Conference on Information Systems (ECIS) (eleven articles) and International Conference on Information Systems (ICIS) (nine articles). Interestingly, the MISQ, which published the inspiring article about BI & A 3.0 of Chen et al. (2012), did only publish one article in the investigated scope.

Fig. 11
figure 11

Main outlets for BI & A 3.0 research

Each paper addresses one characteristic of one dimension in the developed taxonomy (except research methods, which might be addressed twice by papers that apply a multimethod approach). Figure 12 contains the applied taxonomy and provides the summarized hits of each characteristic. Additionally, we provide the details of our analysis in Appendix D. In the following, we briefly introduce the results in each dimension. Thereby, we focus on outliers.

Fig. 12
figure 12

Summarized results of the literature review

5.1 Technologies

From a technological perspective, IS research clearly focuses on data analytics techniques (52 hits). Fourteen papers address data management aspects and seven research papers address the visual representation of large data sets. Solely two research works focus on data storage in a narrow sense. Akhbar et al. (2016) presents an approach that supports large central cloud systems and data center. Therewith, the authors address the shortcomings of current data warehouse architectures regarding the storage of IoT data. Pei et al. (2018) suggests a data architecture approach to store and analyze data from different sensing devices. They particularly discuss implementation technologies and focus on applying the Apache Hadoop framework.

5.2 Analysis techniques

The majority of big data research works focus on optimization techniques (28 hits). According to our analysis, econometric, linguistic and statistical techniques receive less attention in IS research. In total, 23 articles within the data sample do not address any analytical technique. They rather focus on conceptual (e.g., Griva et al. 2016) or descriptive outcomes (e.g., Zhang et al. 2011). Research results, applying econometric techniques focus e.g. on the evaluation of user behavior (Han et al. 2014, 2016) and the application of the conjoint analysis (Mihale-Wilson et al. 2017; Siegfried et al. 2015). Cezar and Raghunathan (2016) analyze the welfare of buyers and retailers depending on the availability of smartphone data such as GPS location and direction of movement. Research articles based on linguistic analysis techniques investigate decision support systems for knowledge mobilization (Morente-Molinera et al. 2016) and content analysis of social media platforms (Dai et al. 2015; Miah et al. 2017; Musto et al. 2015). Miah et al. (2017) additionally apply simulation techniques to predict the tourist behavior after analyzing social media content. Statistical techniques are applied e.g. for providing mobile analysis capabilities through the design of a generic, programmable position tracking platform, based on Bayesian networks (Chu et al. 2017).

5.3 Analytics maturity

As described in Sect. 4, analytics maturity comprises four evolution stages of data analytics capabilities and is based on the Gartner Business Analytics Framework (Abbasi et al. 2016; Chandler et al. 2011). The majority of papers address the descriptive maturity level (25), followed by 18 research papers, having a predictive maturity level. Eight papers address the Diagnostic level, such as Ngai et al. (2012), introducing a context-aware decision support for real-time accident handling and rescheduling. Solely the research work of Krumeich et al. (2016) could be classified as Prescriptive, as it contains an approach for the prescriptive controlling of business processes based on process events. By analyzing complex events, the approach predicts the optimal behavior of a certain business process.

5.4 Emerging research area

Out of the six emerging research areas, network and mobile analytics receive the highest attention so far (25 and 26 hits). The third most addressed emerging research area is Social analytics, which comes to 13 hits. Based on activities in social media that are analyzed by sense recognition and sensor data, Chen et al. (2017) develop an approach to predict smog in big cities. Gkatziaki et al. (2017) develop a system that divides a city into smaller subspaces based on the spatial data of social network users. The idea is to get knowledge about urban centers in terms of temporal, functional and spatial usage of the inhabitants. Six papers address the area of Human computer interaction. For example, Chernbumroong et al. (2014) introduce a sensor-based activity recognition approach to enable people in need of care to remain at home as long as possible.

Furthermore, we found three research works that deal with Data science foundations (Hirt and Kühl 2018; Pertesis and Doulkeridis 2015; Zschech 2018). For example, Zschech (2018) introduces a taxonomy of recurring data analysis problems in the area of maintenance. Facing a steady increase of textual data, produced by voice recognition systems (Apple’s SIRI was introduced in 2011, Amazon Alexa in 2015), it is surprising that solely two research papers address Text analytics. Current text analysis research evaluates tourist behavior by analyzing geo-tagged smartphone photos and corresponding descriptions (Miah et al. 2017) and identifies emotions in message service providers and social networks (Dai et al. 2015).

5.5 Research method and evaluation

The majority of papers analyzed in this literature review applies a Deductive analysis approach (34 hits), followed by Case studies (22 hits) as well as Lab and field experiments (21 hits). Reference modeling also received much attention in the past years. In total, 20 research papers apply this research method. 16 papers introduce a Prototype to validate its approaches. Papers addressing this characteristic mainly develop a software artifact, such as implementing a decision support system for the planning of large energy grids (Gust et al. 2016). Thirteen research works apply Simulation methods to generate results. For example, Cakici et al. (2011) analyzed the benefits of RFID technology compared to barcode usage for the management of pharmaceutical inventories. They analyze the impact of automatic counting and could show that a continuous review is superior to a periodic review of inventories. 12 out of 75 papers apply a cross-sectional studies, such as identifying the drivers for the likelihood of app installation by conducting a conjoint analysis based on an online survey of smartphone users (Siegfried et al. 2015).

The analysis clearly indicates a research gap in conducting Ethnographies as well as Action research (3 and 2 hits). While the research work of Siegfried et al. (2015) also addresses ethnographical aspects (investigating the app installation behavior of German university students), we identified two more research works containing ethnographical aspects. Rock et al. (2016) analyzes peer effects of user behavior in Yahoo! Go. Lehrer et al. (2011) analyze the cognitive process of mobile phone users when applying location-based services. They develop a theoretical framework for analyzing individual`s decision making process. Action research was applied by Chernbumroong et al. (2014) to improve nursing and the detection of emergency situations. Schuetz et al. (2018) developed a prototype of a data warehouse for the precision dairy farming context. The demonstrated prototype was evaluated in practice. We found no paper that uses Grounded theory as a research method, which reveals a further research gap.

Additionally to the classic research methods, our literature search also provides insights into the practical evaluation of the developed artifacts. This is of great importance because BI & A is a major challenge for the industry (McAfee et al. 2012). Therefore, we analyzed all papers regarding their explicit evaluation in an industry setting. Surprisingly, solely 11 out of 75 papers completely fulfill this characteristic and either contribute to a commercialized product (e.g., Chu et al. 2017; Morente-Molinera et al. 2016) or implement the presented approach as an information system in a company (e.g., Gust et al. 2016; Sha et al. 2018).

5.6 Application area

To identify BI & A 3.0 application areas that received low or no attention in IS research, the taxonomy differentiates between six application area characteristics and one characteristic for application area independent research works. The majority of papers addresses an application scenario in Market intelligence (23 hits), followed by the application area Smart industry (19), such as the implementation of a big data system for improving production processes at Bosch through data from customer's quality complaint and internal defect cost controlling (Santos et al. 2017). Eleven research outcomes have no clear application area and are classified as Application area independent. Zhang et al. (2011) for example introduces and describes the evolution of social and community intelligence. They suggest a framework for integrating large scale and heterogeneous data sources to support a fast application development and deployment process. Practitioners and researchers might apply such a framework in any industry section, which is the reason for that classification. Ten papers contribute to the private and consumer usage of IoT. For example, O'Leary (2013) investigates the Street Bump app (an app for detecting street potholes) and describes the challenges of using data from sensor-based mobile device apps and provide guidelines to prevent common failures.

Five research papers address the application area Security and public safety. Raciti et al. (2012) introduce a system that applies self-learning algorithms on sensors that measure water quality for the authorities and water operators. The goal is to detect anomalies through an improved monitoring. Chen et al. (2017) develop an approach to predict smog in big cities by combining sensor and social media data. Khalemsky and Schwartz (2017) present a smartphone app, which sends a notification in case of a medical emergency to relevant people within a spitting distance. The authors aim at providing a faster first aid than the regular rescue service.

Additional five research papers address the application area Smart health. BI & A research for smart health focuses on supporting nursing and health care processes through the real-time analysis of sensor data (Bhatia and Sood 2017; Chernbumroong et al. 2014; Gaber et al. 2010). Bourouis et al. (2014) introduce a mobile app that detects eye diseases using a small external microscope, connected with a smartphone. All approaches have in common that they collect sensor-based health data to track the health status of an individual. Tokar and Batoroev (2016) identify opportunities for the usage of mobile devices in mental health. Therefore, they analyzed 124 iPhone apps for depression therapies.

The potential application area E-government & politics receives only little attention (2 hits). Ju et al. (2018) propose a framework for citizen-centered analytics and decision support for governance intelligence. They implement their approach in a blood donation administration in China. Chatterjee et al. (2018) present a machine learning approach that identifies defects of road surfaces.

5.7 Data privacy

Since big data research often implies data privacy issues and promises to be a compelling research field (Lowry et al. 2017), we finally analyzed all research papers in scope regarding its explicit naming of data privacy risks and the presentation for a solution. We are aware that the regarded papers commonly have no explicit focus on data privacy. However, against the background of the extensive public discourse on data privacy leaks (cp. the Cambridge Analytica scandal on Facebook), it is surprising that 52 out of 75 papers did not even mention privacy risks that possibly arise when the research artifact at hand is applied. We further analyzed all papers regarding its relevance of data privacy. In total, we classify 46 research outcomes as relevant in terms of data privacy regulations. Thirteen papers mention data privacy risks that come along with applying the current research artifact. In the following, we focus on introducing those eleven papers that explicitly name a solution for data privacy.

The presented solutions for addressing data privacy risks contain procedures to transfer aggregated and non-personal data (Chernbumroong et al. 2014; Frey et al. 2017; Gkatziaki et al. 2017; Satyanarayanan 2017). For instance, Gkatziaki et al. (2017) analyzes social media broadcasts to divide a city into smaller subspaces and identifying urban centers. The introduced system solely transfers aggregated data to prevent personalization.

Other research papers contain a data anonymization approach (Parent et al. 2013; Zhang et al. 2011) or an explicit spatial collection of data that requires a minimum of user permissions (O'Leary 2013) to protect personal data. Mihale-Wilson et al. (2017) conduct a conjoint analysis to analyze data privacy concerns of a ubiquitous personal assistant, such as a speech assistant. Provost et al. (2015) develop a system that analyzes the similarities of mobile devices based on visited locations. The introduced system should consider data privacy by design, such as preventing to store personal data about mobile users. The authors perceive personal data as “nonanonymized identifiers, demographics, geographics, psychographics, etc.” (Provost et al. 2015, p. 264).

6 Research agenda

The synthesis of literature should lead to the identification of open research questions (Vom Brocke et al. 2009; Webster and Watson 2002). Based on the findings from the literature review, we create a research agenda that comprises the major BI & A 3.0 characteristics that are currently underrepresented in the IS research community. Therefore, Vom Brocke et al. (2009) suggest applying a concept matrix to provide a basis for further research. We follow that approach and conceptualize the current research gaps of BI & A 3.0 by selecting the most underrepresented characteristics and corresponding dimensions. The results of our literature review reveal that three out of seven dimensions have at least three underrepresented characteristics: Application area, Research method, and Emerging research area.

We do not believe that technology plays an important role in future BI & A 3.0 research. Since the introduction and wide-spread of cloud computing (Armbrust et al. 2010), we perceive data storage as commodity and do not regard the topic as a primary research topic in the future. Thus, this dimension does not appear in the proposed research agenda. Furthermore, we argue for a less important role of analytics techniques. Our results reveal that outstanding IS research papers, investigating BI & A 3.0, are not depended on technologies. 23 research results do not apply any analytic technique. For the same reason, we also perceive analysis maturity as less important dimension. Nevertheless, we noted that the maturity levels “diagnostic” and “prescriptive” are underrepresented, which motivates the development of research questions, addressing particularly these maturity levels.

Finally, data privacy does not appear in our research agenda framework, because we perceive this dimension as immanent in each BI & A 3.0 research work. Each beneficial application of data analytics comes along with ethical issues, such as data privacy concerns (Martin 2015). Independently from a concrete research question, we argue for the relevance of data privacy and encourage researchers to extend each BI & A 3.0 project by a separate sub project that investigates data privacy risks.

We mark each suggested research question with an ongoing number. The whole cube with all dimensional values is depicted in Fig. 13. By combining the remaining three dimensions, we create a cube metaphor that helps identifying research topics, which received no or only little attention in IS research. In addition, the cube metaphor supports the classification of other research artifacts in the field of BI & A 3.0. In the following, we introduce and discuss each research dimension and provide exemplary research questions.

Fig. 13
figure 13

Research agenda dimensions and exemplary questions

6.1 Application area

Three out of seven application areas of BI & A 3.0 are underrepresented according to the results of our literature review: Smart health, Security & public safety, and E-Government & politics. To extend the body of knowledge, we suggest focusing BI & A 3.0 research on these three application areas. Thereby, we are aware that the size of each area is not comparable. E-government research for example is generally underrepresented in IS research by having a look into publications, highly cited articles appear (Belanger and Carter 2012). Security & public safety is also a central topic in other scientific disciplines, such as “computer security, computational criminology, and terrorism informatics” (Chen et al. 2012), which might be a reason for its current underrepresentation in IS research.

In the area of Smart health, researchers may investigate the effects of health simulations, based on fitness tracker apps. The pharma industry is increasingly investing in the development of digital supported therapies via apps or smart health devices but many obstacles exist regarding data reliability (PwC Health Research Institute 2019). Combined with the dimension human computer interaction, the question “How can health simulation systems based on tracked fitness data be developed to influence individual`s health behavior” needs to be answered (1). One challenge in this regard is the limited availability of efficient solutions to make use of all the data available by sensors in or close-to real-time (Haghi et al. 2017). In the Security and public safety area, researchers may investigate the effects of smart home systems. Both, the effects on the probability of burglaries and the insurance industry as a whole need to be investigated. Therefore, we suggest the following research question: “How may smart home systems influence burglary probabilities and which effects does that have on the insurance industry”. Furthermore, access to smart home data is strictly regulated by privacy laws, so that the question of “how to secure smart home systems regarding the misuse of data” might additionally be raised (2). Predicting burglaries and its economic effects is a research field, which needs further data science foundations. Prototypes that collect and analyze smart home data are necessary. Again, we suggest evaluating these kinds of prototypes by an action research approach.

The scandal of Cambridge Analytica and its influence on the Trump campaign in the U.S. elections in 2016Footnote 1 evidenced the importance of data analytics for E-Government and politics. In this application area, researchers may exemplarily investigate the influence of big data delivered by sensors, such as traffic, smart phone or smart home devices on the election behavior of citizens as well as ways to prevent such data from misuse (Janssen et al. 2012). This leads to the question of how sensor data may be used to manipulate elections and which regulations are demanded for preventing such data misuse (3). Answering this question requires at least a theory building part, for which we suggest applying a grounded theory approach. We classify this research question as diagnostic, since it focuses on analyzing sensor-based data and its diagnostic power to explain election results.

6.2 Research method

Three research methods are currently underrepresented in BI & A 3.0 research: Grounded theory, Ethnography and Action research. The grounded theory methodology and action research is commonly known and often applied in general IS research (Mathiassen et al. 2012; Wiesche et al. 2017). Surprisingly, these methodologies are largely underrepresented for BI & A 3.0 results. Ethnographies are applied quite seldom and the low usage of this methodology is expectable. Califf and Stumpf (2018) report about a scant use of this qualitative research method and argue with vague methodological details, an unfocussed and unsystematic data collection, a heavy time consumption, and missing sense-making of the findings. We further found out that solely 11 out of 75 papers evaluated its artifact in a real-world setting, which brings us to the assumption that many research outcomes have no or only little practical relevance. At the same time, organizations highly demand for solutions that could be implemented to increase the diffusion of big data technologies to handle the data sources available and to potentially create competitive advantage (Kallinikos and Constantiou 2015; McAfee et al. 2012). Filling this gap means focusing on practical evaluations of research artifacts.

Applying a Grounded theory approach to develop a theoretical model for the influence of health tracking apps on pharmaceutical sales could be a fruitful research work that provides practical relevance. In addition, we suggest conducting a data analysis of relevant social media data for predicting purchase intentions to evaluate the theoretical model. Thus, we encourage answering the question: Does the usage of health tracking apps influence the revenue and sales of pharmaceutical products (4)?

In the area of Smart health, the analysis of language usage and its change over time might reveal insights into probabilities of certain diseases, which leads to the question: How can diseases be predicted through the analysis of spoken words (5). In a first step, we suggest building up a deeper understanding of the usage of language and voice and its relationship with a certain health status. Therefore, it is meaningful to conduct an Ethnography. In a second step, an analysis model needs to be trained that receives a voice stream and labeled disease data sets as input.

Action research might be a beneficial research method for increasing the practical evaluation of BI & A 3.0 artifacts. The method has a high potential to bridge the gap between theory and practice (Raffoni et al. 2018). Especially in combination with the application area E-government and politics many research opportunities exist. We suggest analyzing the speeches of politicians to predict its success and effects based on data of former speeches and its resulting social media reactions. Text analytics combined with an Action research approach is a promising combination for that investigation. Accordingly, we suggest answering the question: How to predict the success of political speeches based on social media reactions (6).

6.3 Emerging research area

Three emerging research areas received less attention in the past: Data science foundations, Text analytics and Human computer interaction (HCI). As Data science foundations comprise research results regarding the optimization and storage of large data sets (for example Pertesis and Doulkeridis (2015)), we suggest investigating data merging and filtering algorithms and its requirements for the storage of relevant data. Since the data science foundations are also addressed by other scientific disciplines, such as informatics and computer science, its under-representation in BI & A 3.0 topics is reasonable to some extent. Nevertheless, IS research might produce fruitful contributions regarding the investigation of data science foundations for BI & A 3.0. The identified underrepresentation of Text analytics is surprising, since this research area has had several promotors in the IS research community (Chen et al. 2012; Khan and Vorley 2017; Lim et al. 2013). In addition, the underrepresentation of research results in the area of HCI and BI & A 3.0 indicates a further research gap. Since HCI is a well-established research track in top IS conferences, such as ICIS and ECIS, we can solely guess that this topic is currently not in focus of BI & A researchers.

In the application area of Smart health, the question of how to store and to filter health data generated by smart health devices, such as fitness tracker or fitness apps needs to be answered (7). The results might deliver the foundations for developing a prototype that creates diagnoses for certain illnesses. By applying an Action research approach, such prototype can be evaluated.

In contrast to Data science foundations, Text analytics aims at understanding and interpreting textual content (Chen et al. 2012). In the area of E-Government & politics, we suggest answering the question of how the usage behavior of certain messenger apps influences voter turnout rates (8). To answer this question, we suggest using existing textual social media communication before a certain election to train a neural net model that predicts turnout rates of an upcoming election. For inspiring research in the area of Human computer interaction, applied in Security and public safety, we suggest answering the question of “how does the police interact with a system that enables the profiling of thieves based on smart home data” (9). We believe that the massive amount of smart home data (such as smoke detectors, window open detectors or motion detectors) has the potential to produce detailed profiles of thieves, which in turn may support the work of the police.

7 Discussion and outlook

The paper at hand contains a taxonomy and structured literature review to shed light on the BI & A 3.0 research developments of the past decade. Our research contribution is fourfold. First, we provide a systematically derived taxonomy for structuring Bi & A 3.0 research. Other researchers may apply this taxonomy to classify their BI & A 3.0 research results. Second, based on this taxonomy, we provide a structured overview about current IS research in the area of BI & A 3.0 between 2010 and 2018. The results clearly provide research topics of high interest as well as research gaps. The later may inspire other researchers to address them in further research projects. Third, we provide an answer to the MISQ special issue article of Chen et al. (2012), in which the authors predict upcoming big data research from the perspective of 2012. We partially confirm their prognoses, such as the trend of increasing publications of BI & A 3.0 over time. Fourth, we provide a research agenda that clearly points out open research issues that address dimensions and characteristics, which currently receive less attention in IS research. In total, we suggest nine research questions that are not answered yet.

Practitioners may use our results as a starting point to find a suitable data analysis solution for business challenges or even to find relevant expertise per taxonomy dimension. In particular, the eleven evaluated BI & A approaches could be valuable for practitioners as its implementation in organizations will probably be easier as non-evaluated approaches.

Furthermore, our results contribute to the controversial debate of Benbasat and Zmud (2003), who identify areas belonging to IS and represent the core of the discipline. Since all papers analyzed in our literature review discuss its investigated phenomenon in the IS context, the results provide evidence that BI & A 3.0 is clearly part of the discipline. But BI & A 3.0 is also influenced by other disciplines, such as computer science, mathematics, economics. Thus, the boundaries are vague to assign the topic clearly to one research discipline. According to a Delphi study of Becker et al. (2015) leveraging “knowledge from data, with […] high data volumes” is one of the grand challenges of research in IS, which is one of the key characteristics of BI & A 3.0. Arguing with Becker et al. (2015) and Benbasat and Zmud (2003), BI & A 3.0 can be perceived as a central part of the IS research discipline.

However, the results of this study are limited. We investigate the status quo of BI & A research in the IS discipline by analyzing the top IS literature. To define the scope of journals and conference proceedings, we apply the JOURQUAL 3 ranking for information systems (Hennig-Thurau and Sattler 2015) and focus on A+, A, B, and TOP 30 C outlets. We did not regard other potentially relevant publications like those from other disciplines, lower ranked outlets, special-interest workshops or industry reports. The paper selection might be biased, because we focused on papers considering IT/IS artifacts. Furthermore, we focus on the application of the developed taxonomy for structuring big data research. Thus, we did not regard other dimensions, possibly relevant for other scientific disciplines, in our literature review. Our results and findings are limited to the applied search string, which covers most, but not all, of the IS research in BI & A and could be possibly extended, e.g. with additional keywords and more synonyms. Some emerging aspects like the block chain technologies were not mentioned by Chen et al. (2012) and were not in scope of our investigation. The relation between research and practice remains ambiguous as the field is very dynamic and reports from practice like industry reports were out of the scope in our research. Possibly, the academic literature is lagging behind the current trends in practice as companies like Google are publishing a lot in the area of data science and big data analytics outside the traditional academic world.

Based on our research contribution, further research is needed to close the identified gaps in big data research. Our findings motivate IS researchers to address e.g. the fields of E-Government and Smart health. The limited scope of IS scholarly publications opens room for repeating the search for outlets in closely related disciplines, such as informatics, mathematics or economics. In addition to academic literature, BI & A related industry reports could be analyzed as they might offer useful insights about current and forthcoming developments from practice. For all further classifying works in the context of BI & A 3.0, we recommend applying and possibly extending the developed taxonomy, presented in this paper. The work at hand is a starting point for ongoing research in the field of BI & A 3.0 and may foster fruitful discussions on the further development of this emerging research area.