The effect of web of science subject categories on clustering: the case of data-driven methods in business and economic sciences

The primary goal of this article is to identify the research fronts on the application of data-driven methods in business and economics. For this purpose, the research literature of the business and economic sciences Subject Categories from the Web of Science is mapped using BibExcel and VOSviewer. Since the assignment to subject categories is done at the journal level and since a journal is often assigned to several subject categories in Web of Science, two mappings are performed: one without considering multiple assignments (broad view) and one considering only those (articles from) journals that have been assigned exclusively to the business and economic sciences subject categories and no others (narrow view). A further aim of this article is therefore to identify differences in the two mappings. Surprisingly, engineering sciences play a major role in the broad mapping, in addition to the economic sciences. In the narrow mapping, however, only the following clusters with a clear business-management focus emerge: (i) Data-driven methods in management in general and data-driven supply chain management in particular, (ii) Data-driven operations research analyses with different business administration/management focuses, (iii) Data-driven methods and processes in economics and finance, and (iv) Data-driven methods in Information Systems. One limitation of the narrow mapping is that many relevant documents are not covered since the journals in which they appear are assigned to multiple subject categories in WoS. The paper comes to the conclusion that the multiple assignments of subject categories in Web of Science may lead to massive changes in the results. Adjacent subject areas—in this specific case the application of data-driven methods in engineering and more mathematically oriented contributions in economics (econometrics) are considered in the broad mapping (not excluding subject categories from neighbouring disciplines) and are even over-represented compared to the core areas of business and economics. If a mapping should only consider the core aspects of particular research fields, it is shown in this use case that the exclusion of Web of Science-subject categories that do not belong to the core areas due to multiple assignments (narrow view), may be a valuable alternative. Finally, it depends on the reader to decide which mapping is more beneficial to them.


Introduction
Data-driven technologies, i.e. technologies that use large amounts of data to generate added value, play a prominent role nowadays. The main reason for the rapid expansion of data-driven technologies is the high degree to which data has been created, shared and used in recent years. A further important step was achieved when the first "mass digitization" projects (e.g. Google's Mass Digitization Project at Oxford), i.e. the conversion of entire traditional libraries into machine-readable files, were accomplished (Coyle, 2006). As soon as signals are converted into a digital format, they can be used to create structured data sets. This step is known as ‚datafication ' and provides the capability to perform macrolevel analysis of relevant trends and patterns, that are impossible when using an analogue format of the data being investigated (Mayer-Schönberger & Cukier, 2013). In Google 's Mass Digitization Project, accordingly, the conversion of the vast amount of text strings into sequences of related words and the resulting capability of examining the frequency of occurrence of these sequences over a long-term period was the process of datafication (De Mauro et al. 2016).
Data-driven methods can be used in many disciplines. The aim of this paper is to give a first overview on the use of data-driven technologies in business and economic sciences. Furthermore, the effect of the (multiple) assignment of journals to subject categories, which is a common practice in Web of Science, is analyzed. There are several publications that have investigated the implications of the system of multiple assignments of publications and journals to subject categories in WoS. Leydesdorff and Bornmann (2016) show, using a case study of the WoS-category "information science & library science" and those WoS-categories that are assigned to journals in the field of "science and technology studies", that the classification system of the WoS-categories is not sufficiently clear to bear the bibliometric normalization in the evaluation practice. Boyack and Klavans (2011) even note that far too few journals are truly disciplinary, therefore classification is not purposeful, and consequently question the validity of any classification system for journals and disciplines when used for research evaluation. However, journal classification systems have been used as a basis for comparison and categorization of journals for a long time and although there were numerous other classification systems developed (e.g. Glänzel & Schubert, 2003;Klavans & Boyack, 2010;Perianes-Rodriguez & Ruiz-Castillo, 2017), the Thomson Reuters' Subject Categories are probably the most widely used such classification system (Boyack & Klavans, 2011). The approach used in this study is a new, rather radical attempt that aims to improve the visibility of research fronts within a particular scientific discipline, in this case business administration, by excluding journals with multiple attributions in its analysis.
It must be stated that this approach is very likely to eliminate important publications from the results, i.e. it also entails a loss of information, which is the main limitation. However, if the aim is to highlight the core aspects of a specific topic as clearly as possible within a particular scientific discipline, the chosen method can be considered as an alternative to conventional approaches, as the following use case will demonstrate.

Data, methodology and tools
This article focuses on the data sources covered by the Web of Science Core Collection which is composed by the Science Citation Index Expanded (SCI-EXPANDED), the Social Sciences Citation Index (SSCI), the Arts & Humanities Citation Index (A&HCI) and the Emerging Sources Citation Index (ESCI). Although Science Citation Index Expanded (SCI-EXPANDED) and Social Sciences Citation Index (SSCI) make up the majority of the publications found, a few relevant publications are also found in the other mentioned indices. In order to consider only publications (articles, reviews and conference proceedings) with a background in the business and economic sciences, the search was limited to the following Web of Science subject categories: The search was conducted on works published in all languages before 2020, obtaining 2528 records (date of retrieval 28/01/2020). 1 For these documents the major research fronts should be determined.
Research fronts are clusters of documents on a specific topic that can usually be identified by an analysis of bibliographic couplings (Huang and Chang 2014). They are characterized by a growing research activity on a particular topic (Schiebel, 2012). Hence, research fronts are of particular interest for identifying and illustrating current research topics and emerging trends (Persson, 1994) and therefore of main interest in this article. If research fronts are based on bibliographic couplings, they are time-invariable, i.e. new publications do not retrospectively change the result of an analysis of bibliographic couplings at a certain point in time (Morris et al. 2003). Furthermore, research fronts are not defined by individuals, but result from the sum of contributions of certain groups of authors (De Moya Anegón et al. 1998).
Data analysis was conducted with BibExcel and VOSviewer. The tool BibExcelwas developed at the University of Umea in Sweden and is free for non-commercial use. It is capable of reading bibliographic data from common literature databases. In addition, bibliographic networks from co-citations, bibliographic couplings and co-authorships can be created. In Bibexcel it is possible to perform most types of bibliometric analysis and it allows easy interaction with other software. For example, it is possible to create Pajek files, which are subsequently used in VOSviewer when creating the network files. The strength of the program lies in its flexibility in both data management and analysis (Persson et al. 2009) and offering functions that can be used to improve data quality (e.g. "Convert Upper Lower Case" or "Keep only author's first initial"). BibExcel is also capable of extracting bibliographic couplings from reference lists that are exported as text files (Pilkington, 2018). However, BibExcel has no visualization function. Instead, the data can be exported and visualized with other tools such as VOSviewer, UCINET, SPSS or Pajek (Cobo et al. 2011). BibExcel therefore serves exclusively to prepare the exported data for visualization.
The reason for choosing VOSviewer to display the bibliometric data in this article is the significant display quality, the choice of the display density of the links (Belli and Balta 2019) and the possibility to set a minimum total link strength (TLS) of an article to ensure that only substantial publications are included in the network. Another reason for choosing VOSviewer was that it uses a unified approach to mapping and clustering that avoids inconsistencies between the results produced by mapping and clustering techniques (for a detailed description see van Eck & Waltman, 2010;Waltman et al. 2010). In order to consider only publications which are inter-related to a certain degree, the minimal TLS was set to 20. This means, that only those papers with at least 20 bibliographical couplings to the other investigated publications were considered. As a consequence, out of the 2528 publications only 1234 remained in the following analysis.

Results
Mapping 1: multiple assignment of web of science subject categories Figure 1 shows the results of the mapping for those publications which were assigned to one of the seven before-mentioned subject categories. Since journals (and therefore also their included publications) are usually assigned to more than one subject category in Web of Science, it is most likely that further subject categories are involved (this is meant by multiple assignment of Web of Science subject categories) (Figs. 2, 3).
As can be seen in Table 1, technical subject categories can even outweigh several subject categories from business and economic sciences in numbers of assigned publications. In particular, Engineering, Electrical & Electronic; Communication; Computer Science, Artificial Intelligence; Engineering, Industrial; and Computer Science, Software Engineering are thematically distant from business economics. Their high proportion suggests some technological orientation when it comes to the application of data-driven applications and methods in business and economic sciences. The question arises whether this is also reflected in the resulting mapping. Figure 1 shows the result of the mapping procedure. A circle or node represents a publication on the topic under investigation. The more similar two publications are in terms of their bibliographic coupling, the closer they are to each other on this map. The size of the nodes depends on the level of the TLS. The higher the TLS (the more bibliographic couplings an article shares with the other articles in the cluster), the larger the node is. The nodes are colored according to the assigned cluster.
As can be seen, there are six more or less specific clusters as regards contents (see Table 2).

Data-driven methods in Engineering Sciences (red cluster)
The red cluster is strongly dominated by the engineering sciences. From the 454 publications of this cluster, 359 were assigned to the Computer Science research area and 203 to Engineering. The most frequent business management research area is Operations Research & Management Science with 125 assignments. Accordingly, the business management reference in this cluster is comparatively low.
This impression is reinforced when we look at the journals that frequently appear in this cluster: 69 articles were published in IEEE Access, an Open Access journal published by the Institute of Electrical and Electronics Engineers, and 49 articles in Expert Systems with Applications.
An analysis of the TOP 10 publications by total link strength (TLS) gives an indication of which contents are primarily covered in this cluster. Process monitoring and control play a central role. For instance, the article with the highest TLS ("Statistical learning methods applied to process monitoring: an overview and perspective") by Weese et al. (2017) provides a review of the state of the art of data-driven multivariate statistical process monitoring. Some of the monitoring and control methodologies, which are dominated by data mining techniques and are promising for monitoring large and diverse data sets, are highlighted.
Quality prediction in manufacturing processes is another example of how the analysis of massive amounts of data can influence modern manufacturing environments. For instance, the article "Imbalanced classification of manufacturing quality conditions using 1 3 cost-sensitive decision tree ensembles " by Kim et al. (2018) deals with the problem of classifying production process conditions into normal and defective products. Such a quality analysis data set is usually not balanced because the error rate is low in practice. To solve this unbalanced classification problem, a cost-sensitive decision tree ensemble algorithm is used to increase the small number of defective cases and to assign higher costs to the misclassification of defective products than to normal products.
The detection and diagnosis of faults in industrial chemical processes is the subject of another TOP 10 article from this cluster (Ragab et al. 2017). Logical Analysis of Data (LAD) is used for this purpose. This classification technique from machine learning uncovers hidden knowledge in industrial data sets by revealing interpretable patterns linked to underlying physical phenomena. The patterns are then combined to build a decision model that can be used to diagnose errors during process operation and explain the potential causes of these errors.
Data-driven fault prediction and diagnosis is also the main research focus in several other publications from this cluster (e. g. Riaz et al. 2018;Diallo et al. 2018;Zhang et al. 2018;Xu et al. 2017a, b).

Data-driven predictive maintenance of machine equipment (yellow)
The local proximity and the partly overlaps between the yellow and the red cluster in the network suggest that there is also a proximity in terms of content. The situation is similar with regards to the publishing journals. IEEE Access is by far the most frequently used journal with 63 publications. The other journals (Reliability Engineering & System Safety, Expert Systems with Applications, IEEE Transactions on Multimedia and Multimedia Tools and Applications) are also application-oriented and have a strong reference to engineering sciences.
In terms of content, data-driven maintenance of machines and machine equipment plays a particularly strong role. Since current research on predictive maintenance is still in a primarily experimental stage, most of the work is carried out using several open data sets. A review article with the title "Data-driven methods for predictive maintenance of industrial equipment: A survey" is the publication with the highest TLS in the yellow cluster. It presents a comprehensive survey on the applications and attempts to provide a better understanding of the existing works recently published .
The majority of the publications contained in this cluster is dedicated to the data-driven estimation of the remaining life time of machines. "Remaining useful life estimation-A review on the statistical data driven approaches" by Si et al. (2011) is the article with the second highest TLS in the cluster providing an overview of modelling developments Even the data-driven prediction of the remaining useful life of certain goods is the subject of some publications. For example, those of bearings in the articles entitled "Deep learning-based remaining useful life estimation of bearings using multi-scale feature extraction" by Li et al. (2019), "Prediction of bearing remaining useful life with deep convolution neural network" by Ren et al. (2018) or "Real-time bearing remaining useful life estimation based on the frozen convolutional and activated memory neural network" by Chen et al. (2019a, b, c).

Data-driven control and automation processes (turquoise)
The smallest cluster is also strongly technically oriented. Computer Science with 16 entries and Automation & Control Systems with 9 are the most frequently assigned Research Areas. This thematic direction can also be derived from the publishing journals. The two top journals are IEEE Access (7 articles) and International Journal of System Science (5 articles). In 16 of the 20 publications included in the cluster, the term "control" is used in the title. Data-driven control and automation technology therefore plays the central role in this cluster.
The top article of this cluster ("From model-based control to data-driven control: Survey, classification and perspective ") by Hou and Wang (2013) can be considered an overview of problems and challenges inherent in the theory of data-driven model-based control. Existing methods and applications are presented with corresponding classifications and findings.
The more recent publications take a very application-oriented and specific approach in this context. For example, the model-free adaptive control as a data-driven control method and its concrete application to a launcher vehicle is the subject of one article (Duan et al. 2019). Another paper entitled "Deterministic continuous-time Virtual Reference Feedback Tuning (VRFT) with application to PID design" by Formentin et al. 2019) presents a datadriven control design method that is not based on a model of a plant. In this context a PID autotuner is developed and its effectiveness is tested on an experimental brake-by-wire system and improved performance compared to the design based on a benchmark model is shown.

Data-driven methods in management in general and data-driven supply chain management in particular (blue)
The blue cluster is characterized by its focus on management areas (general management, but also somewhat more technical areas such as data-driven process management and supply chain management in the sense of optimizing and managing supply chains using datadriven approaches). This is also reflected in the frequencies of the assigned research areas. While Business & Economics is the most frequent research area with 104 publications,

3
Computer Science (66 articles), Engineering (66 articles) and Operations Research & Management Science 63 (articles) follow, nearly equally located, with some distance.
If we look at the distribution of the publishing journals, the business management relevance becomes even more pronounced. Almost all TOP journals, with the exception of IEEE Access and Technological Forecasting and Social Change, are purely business management journals. The background of the International Journal of Production Research, the International Journal of Production Economics and the Journal of Manufacturing Systems indicates topics from process and production management.
A look at the TOP publications of this cluster reveals a similar picture. "A multidisciplinary perspective of big data in management research " by Sheng et al. (2017) is a review article on data-driven management and has the highest TLS in the cluster. It tracks scholarly work's perspectives on big data in the management domain over the past decade.
The article suggests that research should use both structured and unstructured big data to promote the understanding of the value of big data as a basis for managerial organizational decisions and to improve the competitiveness of enterprises. However, in order to generate the full value from the data, it is important that companies formulate and implement a datadriven strategy in advance. Further publications with a similar approach on (data-driven) general management are, for instance, from Carillo (2017) The article entitled "Data-driven supply chain capabilities and performance: A resource-based view" by Yu et al. (2018) explores how data-driven supply chains affect the overall performance of enterprises. The authors come to the conclusion that a data-driven supply chain has a significantly positive effect on the financial performance. Another paper deals with the dimensions and sub-dimensions of a "Supply Chain Analytics Capability". The authors model their overall impact on supply chain agility and business performance (Wamba & Akter, 2019). There are several more publications which also investigate different contexts with data-driven supply chains (e. g. Yu, Jabobs, Chaves and Feng 2019, Chavez et al. 2017).

Data-driven procedures in economics and econometrics (green)
The green cluster has a strong link to economics and econometrics which distinguishes it significantly from the other clusters. If we take a look at the Research Areas, Business & Economics is by far the most frequently one (192 out of 226 articles). The fact that Mathematical Methods in Social Sciences and Mathematics are the second and third most frequent research areas in this cluster suggests a strong mathematical and in particular econometric orientation. A closer look at the journals confirms this perception. Most papers were published in the Journal of Econometrics, followed by Econometric Theory, Journal of Business & Economic Statistics and Empirical Economics.
In the article with the highest TLS of this cluster "A review on variable selection in regression analysis" by Desboulets (2018) several variable selection procedures are examined to provide an overview of the existing literature for practitioners. The article explains which methods should be used for which model purposes and which are the most important differences among them. Besides theoretical and statistical methods, data-driven methods provide an alternative for automatic model selection.
In another paper of this cluster by Qian and Su (2016) the problem of determining the number of structural changes in multiple linear regression models using group-fused lasso is investigated. A method is presented which can correctly determine the unknown number of breaks with a probability of tending to one. Furthermore, a data-driven method for determining the tuning parameter is also proposed and validated.
A data-driven cross-validation approach is the focus of another publication entitled "Nonparametric estimation of conditional quantile functions in the presence of irrelevant covariates" by . The aim is to simultaneously determine the optimal smoothing parameters and to remove irrelevant covariates. For this purpose, the conditional quantile regression function is estimated in advance using the kernel method based on the check function.
Also further publications (e.g., "Methods for nonparametric and semiparametric regressions with endogeneity: A gentle guide" by Chen and Qiu (2016), "Shrinkage estimation of regression models with multiple structural Changes" by Qian and Su (2016) or "Nonparametric/semiparametric estimation and testing of econometric models with data dependent smoothing parameters" by Li and Li (2010)) go in the direction of data-driven nonparametric/semiparametric estimation and testing of econometric models.

The distribution of Research Areas in this cluster indicates a clear focus on Operations
Research & Management. Out of the 104 publications, 84 articles were assigned to this subject category.
Data driven robust optimization in general and the data driven newsvendor problem as a special aspect of it are the main topics covered in this cluster. 6 of the 11 top-ranked publications by TLS deal with data-driven robust optimization, 3 with a data-driven approach to the newsvendor model. 2 In the publication "Data-driven robust optimization" by Bertsimas et al. (2014), a novel approach to the use of data is proposed in order to design uncertainty sets for robust optimization using statistical hypothesis testing. The authors come to the conclusion that datadriven sets significantly outperform traditional robust optimization techniques. A number of other publications also deal with the superiority of data-driven robust optimisation compared to traditional optimization in different contexts (e. g. Esfahani & Kuhn, 2018;Kang et al. 2018;Fernandes et al. 2016;Chen et al. 2018).
Traditionally, the newsvendor problem is solved by assuming a distribution. In reality, however, the decision-maker does not know the true distribution of the demand. The highest ranked paper by TLS in this cluster entitled "A data-driven newsvendor problem: From data to decision" by Huber et al. (2019) presents data-driven solution methods based on machine learning and quantile regression that do not require the assumption of a specific demand distribution. The authors conclude that machine learning approaches are clearly superior to traditional methods, provided the data set is large enough. Also Ban and Rudin (2019) come to the conclusion that single-step machine-learning algorithms are superior for solving the newsvendor problem. Several other publications from this cluster also cover this topic (e.g. Levi et al. 2015;Sachs & Minner, 2014;Saghafian & Tomlin, 2016).

Mapping 2: consideration of subject categories exclusively from the business and economic sciences
In the second mapping, only journals assigned solely to subject categories from business and economic sciences were considered. In order to avoid articles also from other subject categories due the multiple assignment strategy of Web of Science, they were explicitly excluded. 3 This resulted in 694 publications in total, which are exclusively assigned to one of the business and economics subject categories. 4 Since a minimum TLS of 20 was also specified in this mapping for reasons of comparability, there remained only 166 articles in the mapping. 5 They distribute over the WoS-subject categories of business and economic sciences as follows (see Tables 3, 4, Fig. 3): As can be seen, the most apparent difference to the first broader mapping is that two clusters with a stronger technological focus (data-driven predictive maintenance of machine equipment (yellow) and data-driven control and automation processes (turquoise) are no longer included. The remaining clusters have changed, in some cases significantly, in terms of content. A much stronger relation to business and economic sciences is evident.
With 69 included publications, the blue cluster is the largest in this mapping. In terms of content, not much has changed compared to the previous mapping, but the focus is more distinct. The use of data-driven methods in business administration and management in general (7 out of the top-10-publications), and data-driven supply chains (3 out 3 TS = (data-driven) AND WC = ( of the top-10-publications) in particular, form the core of the blue cluster. The analysis of this cluster also shows that top publications from journals that have (also) been assigned to other non-business subject areas (Engineering) and, as a consequence, are not included in this mapping though can have a high significance for business management research. This is the case for the two top publications by Sheng et al. (2017) and Yu et al. (2018) (TLS 421 and 381,respectively) from the previous clustering that are also assigned to the subject areas Engineering and Manufacturing.
13 publications that appeared in the top-20, ranked by TLS, in the previous mapping (without exclusion of the non-business subject categories) are also included in the top-20 publications of this mapping. The overlaps in content are therefore considerable, which is why the individual publications are not discussed in more detail anymore.  The purple cluster is now the second largest with 42 publications. Its analysis demo nstrates the benefits of excluding multiply assigned subject areas which are outside the core interest. 19 of the top-20 publications are assigned to the Web of Science-Categories Management and Operations Research & Management Science. Compared to the previous clustering, many publications were eliminated that had been assigned to the categories Mathematics, Applied and Mathematics with a much stronger mathematical focus. Furthermore, in the previous analysis only half of the top 10 publications had been assigned to the management sector. The content focus and, namely how data-driven operations research analyses are used in business and management, is therefore more clearly recognizable and easier to identify. Now a much stronger focus on business administration/ inventory management is visible, the originally strong mathematical/statistical emphasis is fading into the background. Chen et al. (2019a, b, c), for example, propose a data-driven algorithm that learns continuously from demand data during application and simultaneously supports managers in supply and pricing decisions. The algorithm should result in the company actively experimenting with pricing and inventory to collect demand information with minimal loss of profit. In another publication entitled "Quantile forecasting and data-driven inventory management under nonstationary demand" by Cao and Shen (2019) a data-based approach for the determination of stock levels (in the framework of the newsvendor problem and its multi-periodic extension) is presented. It is shown that the data-driven approach outperforms existing statistical methods and provides good solutions.
In the green cluster, there has also been a significant reduction in the number of publications included (from originally 226 to 31). Because the WoS-categories Social Sciences, Mathematical Methods and Statistics & Probability have been removed, the initially very strong econometric component is almost completely eliminated. This is also underlined by the fact that only 4 of the original Top 20 publications according to TLS are included in the new green cluster. After viewing the top publications, the content can now be described as the use of data-driven processes in economics and finance. Pricing and applied economics are central topics in this cluster. For example, one publication by Byrne et al. (2019) investigates the movement of global yield curve dynamics using a data-driven hierarchical factor model. This data-driven approach is capable of determining the driving forces of yield curve dynamics and producing plausible estimates of term premiums. In another article, the effects of the driving factors for natural gas prices in the period from 1999 to 2017 are examined (Ji et al. 2018). A data-driven approach is used to reveal the simultaneous relationships between natural gas, oil and otherfactors. In contrast to most previous studies, the data-driven approach established a long-term equilibrium relationship between oil and natural gas yields. The publication entitled "Network causality structures among Bitcoin and other financial assets: A directed acyclic graph approach" also uses a datadriven method to reveal the simultaneous and lagged relationships between Bitcoin and other asset classes. The results of the analysis suggest that the Bitcoin market is rather isolated and that no specific asset plays a dominant role in influencing it (Ji et al. 2018a).
The new red cluster consists mainly of those publications that belong to the WoS category Computer Science, Information Systems. They are not assigned to any other Computer Science category but in some cases also to a business and economics subject category. The result is a very heterogeneous cluster, which consists on the one hand of publications with a computer science background, and on the other hand also contains articles with a stronger business and management background. 5 of the TOP-10 publications were published in the journal "Information Sciences", which has a strong focus on informatics. The business publications cover topics ranging from innovative data-driven forecasting models (Luna & Ballini, 2011;Yu et al. 2019a, b) and market segment analysis (Dolnicar and Leisch 2014; Boztug & Reutterer, 2008) to text mining (Lim & Maglio, 2018). It is also noteworthy that the publications are on average considerably older than those in other clusters. 13 of the 24 included articles were published before 2017.

Conclusion
As was shown in our contribution, the multiple assignment of (articles in) journals to subject categories, which is common practice in Web of Science, has a strong impact on science mapping. In the case of data-driven applications and methods in business and economic sciences, six clusters could be found, which can be roughly classified to three business and economics and three technical clusters. More than half of the publications were assigned to more technical disciplines, although the original search was limited to Web of Science subject categories from business and economic sciences. This is due to the fact that journals were also assigned to more technical subject categories, in the case of our study also from engineering sciences.
Therefore, we excluded subject categories outside business and economic sciences explicitly before performing the second mapping. As a result, clusters with a stronger technical focus (data-driven predictive maintenance of machine equipment and data-driven control and automation processes) did not appear in the mapping anymore. Only four clusters with a main focus on business and economic sciences remained: (i) data-driven methods in management in general and data-driven supply chain management in particular, (ii) data-driven operations research analysis with different business administration/management focuses, (iii) data-driven methods and processes in economics and finance (iv) datadriven methods in information systems. These four clusters were partly much smaller than those in the broader mapping. However, in our case they showed a stronger relationship to business and economic sciences since mathematical aspects and core topics in computer science were not covered by them anymore.
It may be concluded that the multiple assignment to subject categories can have a great impact on the result of the mapping. In particular, neighboring topics (clusters) can be identified in the broader mapping which would not have been anticipated otherwise. If one prefers to identify only the core aspects of the research fields under investigation, our case shows that limiting the analysis to the core subject categories of a specific scientific discipline by excluding all categories that do not uniquely relate to it can have a positive impact on the visibility and clarity of the clusters. However, it must be pointed out that these potential achievements may come at the cost that highly relevant publications in terms of content might not be considered because they appear in journals that are also assigned to other Web of Science categories not associated with the scientific discipline under investigation. In our opinion both mappings are beneficial. It mainly depends on the focus of the readers which mapping they prefer. Readers with a broader focus will prefer the first mapping. However, if they have a narrower view and are primarily interested in the core aspects of a certain topic, they will find the second mapping more relevant.
Funding Open access funding provided by University of Graz.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.