Introduction

Entrepreneurship is considered a universal tool for economic growth, job creation, and expanding economic opportunities for countries (Krysko, 2022). The relevance of entrepreneurship in the economy and society has been analysed in depth and has attracted the attention of academics, researchers and practitioners. For instance, Santos (2012) highlights that “entrepreneurship is having profound implications in the economic system: creating new industries, validating new business models, and allocating resources to neglected societal problems” (p.335).

Entrepreneurship has become a critical area of interdisciplinary education, focusing on value creation and the ability to assume high levels of risk and uncertainty (Obschonka, 2017). Tan et al. (2005) define entrepreneurship as “the process of attempting to make business profits by innovation in the face of risks” (p.357). These risks and the difficulties of being an entrepreneur are based on the need to efficiently manage many areas of knowledge, such as marketing, management, business strategies, and leadership (Bresciani et al., 2021). One of the concepts that can help mitigate these risks for entrepreneurs is the quantity and quality of available data, which is considered one of the most essential company assets (Albergaria & Chiappetta Jabbour, 2020; Kozjek et al., 2018). In this area, Guéneau et al. (2022) stress the importance of information and knowledge flow and a collaborative environment to make entrepreneurship foster entrepreneurial dynamics.

Along with entrepreneurship, the phenomenon of digitalisation, which has brought about a revolution and has had a positive impact on businesses, should also be considered (Yildirim & Erdil, 2024). Digital technology promotes the output of business production by reducing costs and improving efficiency and innovation (Zhang et al., 2022). Digitalisation has made the use of data essential for companies. This impact of data on business has been deeply analysed in the literature (Casson, 2005). It is even more critical in entrepreneurial environments, where information about the specific characteristics of any new business is not static and is often uncertain and insufficient (Eckhardt et al., 2018).

The capacity and availability of data have grown exponentially, coining the term Big Data (Sagiroglu & Sinanc, 2013), which refers to both the explosion of data and the tools to analyse vast amounts of data. In this line of thinking, many authors highlight the relevance of managing extensive, high-quality data to develop better business strategies and performance (Gaur et al., 2014; Scuotto et al., 2017). According to Sestino et al. (2020), Big Data plays a crucial role in Schumpeterian-level disruption, leading to the conversion of previously immovable and static information into dynamic and transferable resources. In this domain, the quality and breadth of data become a critical factor and, in many cases, the main path to success (Ranjan & Foropon, 2021). An excellent example of this can be found in the work of Du et al. (2023), which demonstrates how information extracted in a Big Data environment helps reduce companies’ operational risks and improve their performance.

Big Data can be understood as extracting information to better support decision-making processes and properly use massive amounts of data and knowledge (Jin et al., 2015; Staegemann et al., 2021). The core features of Big Data are Velocity, Volume and Variety. A vast amount of data (volume) that is generated with a high-speed rate (velocity) and is heterogeneous (variety) needs new visualisation, aggregation, normalisation, and storage tools that will let companies take advantage of massive amounts of data collected from their own companies, competitors, customers, etc. (Lull et al., 2022). According to Obschonka and Audretsch (2020), Big Data is gaining importance in broader research fields that are often considered foundational for research on entrepreneurship, such as management (George et al., 2014; Ransbotham et al., 2017), economics (Acemoglu & Restrepo, 2018; Brynjolfsson & Mcafee, 2014, 2017; Brynjolfsson et al., 2017; Einav & Levin, 2014) and economic policy (Agrawal et al., 2018, 2019). In this vein, authors have shown the relationship of the phenomenon of entrepreneurship with digitalisation (González-Padilla et al., 2023; Lull et al., 2022), with social media (Olanrewaju et al., 2020) and with the internationalisation of firms (Baier-Fuentes et al., 2019). Likewise, this data revolution is also disrupting application fields associated with entrepreneurship, including innovation, industry, and business management (Cockburn et al., 2018). In the words of Obschonka and Audretsch (2020), “AI and Big Data might not only enrich and transform future entrepreneurship research, but they might also transform at least some aspects of the actual real-world phenomena that entrepreneurship researchers usually study when they try to understand determinants and effects of the entrepreneurial process” (p.532). Furthermore, Big Data has been recognised as creating new opportunities for entrepreneurship and sustainable development, especially in the context of social entrepreneurship (Zulkefly et al., 2021). The emergence of Big Data offers organisations unprecedented opportunities to gain and maintain a competitive advantage, which is crucial in the entrepreneurial landscape (Wiener et al., 2020).

A recent example of the use of Big Data for entrepreneurs is the case of ChatGPT, a massive language model chatbot developed by OpenAI that opened to the public on November 30, 2022 (Gordijn & Have, 2023). The model is trained on a massive volume of text data (Big Data) and can learn and give users appropriate responses. How an entrepreneur can use ChatGPT to their advantage is a question that can be posed to chatbots. In the bot’s own words, “ChatGPT is useful for entrepreneurs in many ways because it can do a wide range of tasks and give them a lot of information and ideas”. More specifically, ChatGPT provides several support options for entrepreneurs, such as market research, planning for business, customer service assistance, production of business material or training and development programs (Rahaman et al., 2023).

Big Data and its contribution to many aspects of business development have yet to be thoroughly examined concerning Big Data and entrepreneurship (Akhtar et al., 2019). To address this research gap, we conducted a systematic literature review on the concepts of Big Data, entrepreneurship, and their connections in the management field. The bibliometric review is a commonly used method for recognising keywords, their relationship to each other and the citations of the articles published within a defined timeframe (Zhong et al., 2015, 2016).

As mentioned above, there are numerous studies that have individually delved into Big Data and entrepreneurship, but the interconnection between these concepts still has plenty of potential for further study. Therefore, this paper aims to give an overview of the past, present and future research directions of the concepts of Big Data and entrepreneurship by performing a bibliometric analysis systematically on the research articles that explore both topics. The main value of this study is to conduct an in-depth bibliometric study that provides interesting information on the scientific production (quantity, quality and impact) on Big Data and entrepreneurship (across different journals, authors and countries) that will serve as a theoretical basis for future researchers in the formulation of hypotheses and variables for the identification of quantitative models.

To achieve this purpose, the paper is organised as follows. The next section provides a theoretical background of Big Data, entrepreneurship, and the connection between the two. Subsequently, the research methodology adopted for the systematic literature review is shown. The study is based on 470 articles from the Web of Science (WoS) and 168 articles from Scopus, with a list of 301 articles after manual screening. Afterwards, we illustrate the significant findings of the review, analyse research topics, data sources, and data collection methods, and provide clear reporting and visualisation of the main results. The importance of the findings is then discussed. Finally, we highlight significant conclusions.

Theoretical background

In 2021, the overall amount of data generated worldwide was estimated to be around 79 zettabytes; by 2025, this amount is expected to double. In this sense, managing efficiently and securely this volume of data has become a real challenge for any type of organisation, such as businesses, scientific research, the health sector, architecture and engineering, among others, which use data regularly.

Therefore, the collection of massive and diverse data sets requiring advanced techniques and technologies to allow the capture, storage, distribution, management, and analysis of information, can be conceptualised as Big Data (Gandomi & Haider, 2015). Although the concept of Big Data may still be ambiguous and with a lack of consensus among researchers, after a deep study of existing notions of this term, the definition of Big Data can be focused on the three most relevant factors according to its appearance in academic and business literature: Information, Technologies and Business Impact.

The importance of adequately using information in companies is a critical matter, even more so in the last decades, when organisations manage millions of bytes of information on their customers, suppliers, operations, or industrial machines. Information management is critical, and technologies that enable storing and processing large volumes of data through powerful computational processing are relevant from a Big Data context. In this sense, the Big Data phenomenon is driven by technological advances and the exponentially increasing number of devices connected to the Internet.

Even though the concept of Big Data has been explored from the three points of view previously mentioned, there are many more studies based on the informational and technical aspects than on the business impact. The explosion of Big Data has transformed how organisations manage and effectively compete (Gangwar, 2018). In this vein, Big Data can be understood as a revolution for decision-making processes, increasing organisational performance and generating new competitive advantages (Davenport, 2014; Raguseo, 2018). In the same direction, the use of Big Data can positively impact the management of organisations in terms of improving customer service, identifying new products and services, and enhancing new strategic directions (Gangwar, 2018). In other words, “the new business challenges in the B2B sector are determined by connected ecosystems, where data-driven decision-making is crucial for successful strategies” (Saura et al., 2021, p.161).

Further specifying how the efficient management of the Big Data environment can help companies improve their business, many authors have stated in which specific part of the business processes the use of Big Data techniques can be efficient. Lee (2017), for instance, establishes six technical and managerial processes in which Big Data development positively impacts management. Șerban (2017) analyses how companies need to manage Big Data to their business advantage and how it impacts sustainability to grow and profit while benefiting society. Other authors specify the positive impact of Big Data on the improvement of the company in financial terms, such as Begenau et al. (2018), who argue that the efficient use of Big Data analysis could improve investors’ forecasts and reduce equity uncertainty, enabling large firms to grow larger. In the same vein, Hasan et al. (2020) show how Big Data influences different business sectors, more specifically in the financial sector, and Raguseo et al. (2020) provide empirical evidence of the positive business impact of Big Data analytics on firm profitability. It is precisely in these terms of improving business performance that entrepreneurship acquires particular relevance (Faisha et al., 2024).

Entrepreneurship has been deeply analysed in the last decades, and several papers have defined and highlighted the importance of entrepreneurship (Minniti & Lévesque, 2010; Chaston & Scott, 2012; Hameed & Irfan, 2019; Vladasel et al., 2021). Concretising the importance of entrepreneurship, many authors have furthered the hypothesis that an entrepreneurial environment can improve business performance. Wiklund and Shepherd (2005) suggest that an entrepreneurial orientation can positively impact small business performance. Investigating the relationship between entrepreneurship, access to capital, and environmental dynamism, Georgellis et al. (2000) investigate how entrepreneurial behaviour, characterised by innovation and the willingness to take risks, could positively affect business performance. Moreover, they propose entrepreneurship as one of the three main competencies that predict business performance. Similarly, Haltiwanger (2022) discusses the impact of entrepreneurship on business performance, connecting entrepreneurship with the launch of start-ups. Finally, with a very similar point of view, Teruel-Sánchez et al. (2021) highlight identifying factors that positively influence business performance, one of the most critical entrepreneurial capacities.

In the same way we already mentioned, something similar happens in entrepreneurship due to the wide range of possible orientations to the concept of Big Data. In this sense, we can find in the recent literature connections between entrepreneurship and digitalisation (González-Padilla et al., 2023), social media (Olanrewaju et al., 2020; Wilk et al., 2021), internationalisation of firms (Baier-Fuentes et al., 2019), ecological sustainability (Zeng, 2018) or pro-activeness, risk-taking and innovativeness (Davis et al., 1991; Knight, 2000). These findings suggest that Big Data and Entrepreneurship are mutually influenced and connected but poorly explored as a whole. This article will fill this gap to show an overall picture of the crossroads between Big Data and Entrepreneurship. In the next section, a methodology of the work is provided.

Methodology

We conducted a Systematic Literature Review following the PRISMA approach to assess the existing link between Big Data and entrepreneurship, as well as the different interactions (Page et al., 2021). Academic articles indexed in the Web of Science (WoS) and Scopus databases were selected and processed. Afterwards, we observed different aspects of knowledge we could extract from the available information.

Data sampling

The most comprehensive comparison between WoS and Scopus was performed by Vieira and Gomes in 2009 (Vieira & Gomes, 2009). They found that WoS was a more curated repository of academic articles. It was also more focused on English language journals. On the other hand, Scopus was found to have articles with fewer citations and a higher number of journals in specific fields, such as Health. Scopus was also found to be richer in terms of the languages of the journals included in the database.

More recent studies have been conducted that partially confirm Vieiria’s statements (Singh et al., 2021; Waltman, 2016), which is essential since the study by Vieira and Gomes could be deemed as old, especially considering that the Scopus database began in 2004, only five years before Vieira’s study was published.

We decided to include academic articles from both WoS and Scopus in our literature review by searching for the terms Big Data and entrepreneurship. Initially, we performed a search on WoS that contained both the topics “Big Data” and “entrepreneur*” (for the last topic, the asterisk means that all words that started with entrepreneur, such as entrepreneurial or entrepreneurship, were included). A list was created in WoS with the results, containing 472 scientific papers.

Another search was performed in Scopus. All the articles that included both “Big Data” and “entrepreneur*” as keywords were selected. A list of the 168 resulting papers was created. All searches, as well as the final compilation of the list, were conducted in May 2023.

Data processing

The whole process of selecting the final papers appears in Fig. 1. The lists were merged into a spreadsheet document with a row per article. The information included the title, the abstract and the keywords, among many other fields. After duplicates were removed, 620 papers remained. Two authors individually assessed whether each article addressed both Big Data and entrepreneurship, voting on whether the article was relevant and dealt with the topics. The possible voting possibilities were three: the article should be included, the article should not be included, and it needed to be clarified if it should be included. In order to vote, the title, abstract, and keywords of each article were reviewed.

Fig. 1
figure 1

Selection process of articles for the review

After each author reviewed the papers and classified them, the ones that were considered as suitable by both researchers were directly introduced in the final list (275 articles), while those deemed as unsuitable by both were discarded (137 articles). Finally, the ones classified differently by the authors were jointly reviewed by both and a final vote was cast, either selecting the paper or not (208 articles had been seen as suitable by only one of the authors or that generated doubts about their classification). A final list with 382 articles was accepted.

The initial WoS list with 472 articles was reduced, removing the unaccepted papers. Those from Scopus that had been selected were added to the list. Some of the articles were not available at WoS, specifically 81 papers. However, the information about each journal at WoS was more extensive, including Keywords Plus (Garfield & Sher, 1993)–automatically generated keywords based on the references that a journal cites–and the information was also more complete in the WoS database. Furthermore, when the authors randomly searched for articles missing in WoS, they found that some had been retracted. Thus, the list of 301 papers was used as the basis for the literature review. The process of the selection of publications can be seen in Fig. 1.

The final list was exported from WoS and then introduced into the Bibliometrix software (Aria & Cucurullo, 2017), and the general information can be seen in Table 1. The 301 articles presented no missing information about the title, authors, journals, publication year, and total citations. Less than 5% of articles had no abstract (specifically, eight papers), no corresponding author (nine papers) or no cited references (five papers). 19% of papers did not include the DOI, 20% missed the keywords, and 34% missed Keywords Plus.

Table 1 Descriptive analysis, showing the main information about the articles

The most relevant articles, sorted by the number of global and local citations, were examined in depth according to their purpose and main concepts. Precisely, the formula to extract the relevance index and to weigh the importance of the documents was:

$$ \frac{{1 + LC}}{{GC}} \cdot GC/1 + YSP$$

where LC is the number of citations received among the selected documents; GC is the number of citations received in general; YSP accounts for the number of years since the document was published.

In this way, both local and global citations were taken into account. The effect of a higher number of citations because of the time a document had been published was also countered in the formula.

Results

General information

The timespan was from 2013 until May 2023, with 224 document sources (e.g., journal, book, proceedings). The annual growth rate during those years was 27.1%. The growth for each year may be seen in Fig. 2. Average citations per document were 12.8%. The rest of the relevant general information about the dataset may be seen in Table 1. In Table 2, the top ten journals in terms of published papers are displayed. The journal with the highest number of publications about Big Data and entrepreneurship was Technological Forecasting and Social Change, with 11 articles, followed by the Journal of Business Research and Mobile Information Systems, with seven articles each. Four journals had five articles published each. Mathematical Problems in Engineering had four articles. Six journals had three articles. Thirty-four journals had two articles, and 193 accounted for one article each.

Table 2 Journals with the highest number of documents
Fig. 2
figure 2

Documents published per year

Most relevant paper. In-depth analysis

The top 30 most relevant documents, as defined in the Methodology section, are presented in Table 3, with in-depth information about their purpose and main concepts. As an example, Ciampi et al. (2021) was the research piece with the highest relevance (specifically, the research piece had received more than 100 global citations and a total of 6 local citations since its publication).

Table 3 Most relevant documents in the literature review

Impact and citations across journals

The Technological Forecasting and Social Change journal had the highest h-index, with a value of 11. However, it only had 213 local citations, below the Journal of Business Research (h-index 4), which had the highest number of local citations, 344, and Strategic Management Journal with 242 citations. The number of citations per paper was as follows: 13 had at least 100 local citations; 19 had between 50 and 100 citations; 39 had between 25 and 50; 6334 sources had less than 25 citations. The ten most locally cited sources can be seen in Table 4.

Table 4 Sources with the most local citations

Relevant authors and their countries

Table 5 shows the top ten most relevant authors according to the number of articles and fractionalised articles (i.e., divided by the number of authors per document). As can be seen in the table, Dr Martin Obschonka was the author with the highest number of local citations and the one with the highest fractionalised citations among the authors with three articles. In general, 91 authors were cited at least once, while 648 received no citations.

Table 5 Most relevant authors, according to the number of articles and fractionalised frequency. The local citation count for each author is also shown

Regarding authors’ countries, the ones with the highest production over time, as can be seen in Fig. 3; Table 6, were China (with 173 articles), the U.S.A (65 articles), Italy (57 articles), the United Kingdom (46 articles), India (20 articles) and Russia (20 articles). The total number of citations per country follows a different pattern: The most cited country is Italy, with 582 citations, followed by the United Kingdom, with 437 citations and the U.S.A., with 428 citations. China was the sixth country, with 226 citations. The fourth and fifth positions were Cyprus and Australia, with 414 and 240 citations, respectively.

Table 6 Most cited countries
Fig. 3
figure 3

Top 5 highest publishing countries, accumulated publications

Also, the corresponding authors’ countries were observed. Even though the information had a high overlap with the previous data about countries, this gave us additional insights. Specifically, the corresponding authors’ countries were, in descendent order of the number of documents: China, Italy, the U.S.A, the U.K., Australia, Russia and Germany, as seen in Fig. 4. Most importantly, we know the number of publications with at least one co-author from another country. Thus, we can state how much collaboration with other countries there was for each country where the corresponding author came from. This is shown in Table 7. The ratio of documents from multiple countries compared to one country per document showed a very different result. In the top 10 corresponding author’s countries, the countries with the highest multiple country documents were, subsequently, France, Australia, the United Kingdom and India, with at least half the publications coming from different countries. Then came, also ordered from higher to lower collaboration across countries, Italy, Germany, Russia, the USA, Spain, and finally China, with around 11% publications that were collaborations between countries.

Table 7 Corresponding authors’ countries. Across country collaboration index
Fig. 4
figure 4

Corresponding authors’ countries. SCP: Single Country Publication. MCP: Multiple Country Publication

Interestingly, the country with the highest number of average citations per document was Cyprus (with 420 citations), since Dr. Spyros Makridakis, who is a Cypriot, was the author with the highest (global) citation figure (see Table 8).

Table 8 Most cited articles

An overview of the most relevant authors

As already mentioned, Obschonka was the author with the highest number of articles and local citations. The article that was published first (Obschonka, 2017) versed on how psychological Big Data (e.g., behavioural data based on millions of subjects) let researchers put new questions about entrepreneurship, e.g., regional variations on personality traits that lead to a higher or lower entrepreneurial mindset. The second piece of research (Obschonka et al., 2020) was practical, applying Machine Learning through publicly available social media Big Data to study how personality traits affected U.S. entrepreneurship regionally. The third article (Obschonka & Audretsch, 2020) may be seen as an introduction to a Special Issue where many questions are proposed about how Big Data may impact entrepreneurship research based on readily available conceptual work by different researchers.

Makridakis had the highest number of citations from publications outside our database (i.e., global citations). Compared to Obschonka’s aggregated global 121 citations, Makridakis received more than 400 citations for one single piece of research (Makridakis, 2017). We could thus state that Makridakis’s work is generally well-established. Also, the topic is broader since it reviews how Artificial Intelligence and Big Data may impact society and firms shortly. This broader perspective would explain why the author’s ratio of local citations over global citations is lower than Obschonka’s.

Conceptual structure: factorial analysis of keywords

A Multiple Component Analysis was performed on the keywords defined by the authors. The aim of this type of Factorial Analysis is to show the relationship between words in the selected documents by finding dimensions that separate keywords or bring them closer together depending on their proximity. The more articles with two common keywords exist, the nearer they will appear. This can be represented graphically. Specifically, the representation for our case can be seen in Fig. 5. Five clusters that show different fields have been identified. Even though the clusters are not clear cut, they mainly include (1) the use of Big Data by academics to help future entrepreneurs find business opportunities that may not be covered at the moment; (2) the optimisation of logistics processes; (3) social applications arising from new opportunities offered by data, such as open data; (4) a cluster with Big Data as a source for collaborative work; finally, (5) there is also a body of research that focuses on novel business models centred on Big Data that entrepreneurs can exploit.

Fig. 5
figure 5

Conceptual structure map with author-defined keywords

Some documents were representative of the different clusters. Specifically, the article with the most citations in cluster 1 was Frizzo-Barker J (2020), Int. J. Inform. Manage. (Frizzo-Barker et al., 2020) with 166 citations. It consisted of a Systematic Review of the implications of blockchain in the business landscape. The papers that contributed the most in cluster 1 were Huang XM, (2022), Front. Psychol.; Zeng J, (2019), Manage. Decis.; Wang Y, (2023), J. Enterp. Inf. Manag.; Wang TD, (2022), Front. Psychol.; Ciampi F, (2021), J. Bus. Res.; and Lin CC, (2019), J. Bus. Res.

Cluster 2, which included keywords such as firm performance, dynamic capabilities, data analytics, decision-making, and entrepreneurial orientation, was represented by one single document, Akhtar P, (2019), Brit. J. Manage.

Regarding citations, cluster 3 -comprised of keywords such as Data Science, Information Technology, Future and Framework- was led by Dubey et al. (2020). This document was among the ones that contributed the most to the cluster, along with Vitari C, (2020), Int. J. Prod. Res.; and Kummitha RKR, (2019), Technol. Forecast. Soc. Finally, clusters 4 and 5 did not have representative documents.

Conceptual structure: strategic themes

A map with the most strategic themes in the field is shown in Fig. 6. This map was created by automatically classifying author keywords, exposing the main themes, and classifying them in a quadrant with four regions (Cobo et al., 2011).

Fig. 6
figure 6

Thematic map

Basic themes mainly included the most significant cluster, with the theme around Big Data, entrepreneurship, Artificial Intelligence, and innovation as the most representative keywords. Other basic themes included the green cluster around innovation performance, the pink cluster with text mining and SMEs, and two less dense clusters (the purple cluster about intellectual capital and the lilac one about environmental uncertainty).

Motor themes included a red cluster with digital transformation, digitalisation, collaboration, digital marketing, digital technologies, etc.; a grey cluster with the keywords ICT, smart cities, entrepreneurial ecosystems and social capital, among others; a brown cluster with the keywords entrepreneurial orientation, Big Data analytics capabilities, digital technology, dynamic capabilities, etc.; a pink cluster with the keywords education, research, supply chain management and additive manufacturing; a blue cluster with data mining, analytics, case study, data analytics, open innovation, sentiment analysis, etc.; and an aquamarine cluster that includes digital economy and digital platforms.

The niche themes included a red cluster with Big Data analytics capability, contingency theory, financial performance, market performance and resource-based view; a blue one with international entrepreneurship, Business Intelligence, international business and international management; a dark green cluster with college students, Big Data era and social network.

In the middle, between the motor and the niche themes, with the highest level of relevance, there was a pale green cluster with personal data, privacy, quantified self and surveillance.

Finally, among the emerging or declining themes quadrant, the most representative cluster was the one that included the themes of innovation and entrepreneurship combined, which showed a decline in interest over time. A pale purple cluster represented a highly relevant topic, fuzzy set Qualitative Comparative Analysis, an emerging theme during the last few years.

Discussion

In the ever-evolving business and technology landscape, the interplay between Big Data and entrepreneurship has become a focal point of discussion and exploration (Obschonka & Audretsch, 2020; Prüfer & Prüfer, 2020; Makridakis, 2017). The integration of Big Data analytics into entrepreneurial ventures has ushered in a new era of opportunities and challenges. This discussion delves into the intricate relationship between Big Data and entrepreneurship, as pointed out by the present review of the existing literature. We will examine how each influences the other and the broader implications for the business ecosystem.

One of the most prominent impacts of Big Data on entrepreneurship is its role as a catalyst for innovation, fostering a culture of continuous improvement and adaptation within entrepreneurial ventures. In this sense, Big Data can be seen as part of the organisational dynamic capabilities (Lin & Kunnathur, 2019). Specifically, entrepreneurs can leverage Big Data analytics to identify gaps in the market, emerging trends and operational performance opportunities and unmet consumer needs (Ciampi et al., 2021; Dubey et al., 2020; Gnizy, 2019). This, in turn, facilitates the development of innovative products and services that have the potential to disrupt existing markets, through open innovation among others (Saura et al., 2023). The agility to adapt and innovate becomes a defining factor for entrepreneurial success in an environment driven by Big Data (Ranjan & Foropon, 2021).

Another key element in the Big Data environment is detecting entrepreneurial traits. As such, some research papers focused on detecting these traits through Big Data (Obschonka et al., 2020; von Bloh et al., 2020). In both referenced studies, vast information in social media and news was the platform to detect differences in regional entrepreneurial qualities, though they were not conclusive in the case of von Bloh et al. (2020). However, Obschonka et al. (2020) were able to find relationships between the personality estimates obtained through 1.5 billion tweets and the entrepreneurial personality profiles. This could even lead us to deduct information on entrepreneurial traits from Big Data that could be as potent as if explored through questionnaires.

The vast amount of data generated and collected enables entrepreneurs to make informed, data-driven decisions, thus transforming the decision-making processes (Trabucchi & Buganza, 2019). This not only minimises risks but also enhances the precision of strategic planning. Entrepreneurs can harness insights from customer behaviour, market trends, and competitor analysis to pivot and adapt their business strategies dynamically.

Entrepreneurial ventures thrive when they understand their customers’ needs and preferences. Big Data plays a pivotal role in this aspect by providing a comprehensive view of customer behaviour and preferences (Mariani & Nambisan, 2021). Through advanced analytics, entrepreneurs can tailor their products and services to meet specific customer demands, leading to increased customer satisfaction and loyalty. The ability to personalise offerings based on data insights is a significant competitive advantage in today’s dynamic business landscape.

While the integration of Big Data offers immense potential for entrepreneurs, it comes with its set of challenges, particularly in the realm of data security and privacy. Entrepreneurs need to navigate the complexities of handling sensitive customer information responsibly. Data breaches and privacy concerns can significantly impact the reputation of an entrepreneurial venture. There are examples that show privacy issues when governments use AI strategies, that are akin to the concerns we are currently discussing (Saura et al., 2022). Striking the right balance between utilising Big Data for business insights and safeguarding customer privacy is an ongoing challenge that entrepreneurs must address. The concern about security and privacy has not been a focal point in the research pieces we studied. Thus, we propose this as an interesting future research guideline. In fact, many entrepreneurs may unfairly take advantage of opportunities based on irresponsible management of personal data, given that legislation usually moves slower than technological advances.

Conversely, entrepreneurship also influences the development and evolution of Big Data technologies. The demand for innovative solutions prompts entrepreneurs to invest in creating tools and platforms that facilitate the collection, storage, and analysis of large datasets. This reciprocal relationship between entrepreneurship and Big Data technologies contributes to a cycle of continuous improvement and evolution, with entrepreneurs driving the demand for more sophisticated data-driven tools. In the literature, this is not a topic that has taken traction since big companies or communities are the ones who easily attract the attention of both media and academia.

The intersection of Big Data and entrepreneurship fosters collaborative ecosystems where knowledge sharing becomes a cornerstone of success. One interesting paper (Elia et al., 2020) proposes this, based on a real case of a multi-national company that established a Virtual Brand Community. Entrepreneurs can benefit from shared insights, industry benchmarks, and best practices derived from Big Data analytics. This collaborative spirit creates a supportive environment where emerging entrepreneurs can learn from the experiences of others and adapt their strategies accordingly, ultimately contributing to the overall growth of the entrepreneurial ecosystem.

To sum up, the symbiotic relationship between Big Data and entrepreneurship is reshaping the business landscape. The infusion of data-driven decision-making, innovation, and personalised customer experiences empowers entrepreneurs to navigate an increasingly complex and competitive market. However, challenges such as data security and privacy underscore the need for responsible utilisation of Big Data. As entrepreneurs continue to shape and be shaped by Big Data technologies, the ongoing dialogue between these two realms will undoubtedly define the future of business and innovation.

Conclusions

This study showed a review that identified 301 academic contributions that target both Big Data and entrepreneurship. We then analysed those documents in depth and individually studied the most relevant ones, showing their main contributions and purpose. Systematic bibliometric tools were applied to extract knowledge about the research corpus.

We discovered that some authors stand out, such as Obschonka and Makridakis, with Obschonka’s influence being more locally focused and Makridakis earning the highest global citations. Country-wise, we found that a global competition is apparent, particularly between two leading economies, China and the United States of America, which have the highest number of publications. We found that the UK leads international collaborations across countries, and Italian and British authors receive the most citations.

Theoretical implications

The results of the current study carry the following theoretical implications. The conceptual structure showed the most representative previous and basic research themes, focused on Big Data, innovation, Artificial Intelligence, and entrepreneurship, representing a pivotal role in entrepreneurial endeavours with current examples such as ChatGPT, which embodies the synergy of these concepts. We detected that certain scholars draw a clear distinction between social and economic impacts, considering them as entirely separate entities. We found that the innovation and entrepreneurship theme showed an apparent decline, particularly in 2016–2020, while fuzzy-set Qualitative Comparative Analysis appeared as an emerging theme. Thus, at a theoretical level, forthcoming research can employ the methodology outlined in this study as a foundation for fresh concepts in knowledge generation and extraction of insights from the confluence of topics such as Big Data, innovation and Artificial Intelligence with entrepreneurship.

Furthermore, the themes uncovered in this study can be translated into operationalized variables and investigated using quantitative models aiming for statistical significance. Therefore, while this study is mainly exploratory, it lays the groundwork for future quantitative research on Big Data and entrepreneurship. Our focus was not hypothesis testing, but rather the identification of variables to formulate future empirical hypotheses. Moreover, the issues delineated in this study can serve as a reference point for future research efforts, which may involve building statistical models and substantiating theoretical variables related to Big Data and entrepreneurship. With these variables at their disposal, fellow researchers can leverage the insights gleaned from this study to formulate research questions and objectives, corroborate hypotheses, or design questionnaires.

Finally, an interesting gap was found. Specifically, how entrepreneurship affects Big Data technically is a field that has not been explored. This field has an enormous interest for academics since Big Data advances because of innovators and entrepreneurs that need new tools that may analyse more unstructured data at a higher rate from different sources.

Practical implications

The findings of this study are highly practical. Big Data fosters companies’ competitiveness, but it requires deep strategic changes. Businesses need an entrepreneurial orientation so they may take advantage of Big Data capabilities. There are also new opportunities for entrepreneurs that they may detect through Big Data. Furthermore, Big Data may be used as a source of competitive advantage and part of the business model.

There is also wide evidence that Big Data is effectively employed to analyse the most sought-afer entrepreneurial carachteristics. Universities and Business Schools are already leveraging this to shape their curriculum, ensuring the education of entrepreneurs is effective. This is particularly crucial for educational institutions that are slow to adapt their teaching content, as proficiency in Big Data might soon become a necessity, potentially posing a threat to these institutions.

In a more general vein, the findings provide valuable insights based on the conceptual map. The rising importance of Big Data analytics as a prominent niche theme underscores the increasing significance of data analytics. This emphasises the point that Big Data analytics empowers companies and entrepreneurs to enhance their understanding of information for better decision-making. A notable recurring theme focused on social and sustainable entrepreneurship was detected, indicating its well-established and integral position within the research corpus. This type of entrepreneurship directly impacts society in diverse ways, including areas such as smart cities, urban environments, and technology (Han, 2024). In addition, the results offer valuable insights into the optimal structuring and promotion of information and development, support for entrepreneurial initiatives, the organization and promotion of ideas, the structuring and organization of teams, the role of sustainability and technology within entrepreneurship. In summary, the results show valuable insights on the relationship between Big Data and entrepreneurial activity that both show new trends and how society may take advantage of the existing literature.

Limitations and future research

Some limitations are present in this study. For instance, a manual selection had to be made from the 640 documents that passed the search criteria, possibly leading to bias. This was minimised by a blinded selection by two researchers and a subsequent examination of the documents that had not reached an initial consensus. In addition, we found an inherent problem in the corpus: As already hinted at, there appears to be a race for the highest number of publications, creating a prevalence of superficial exploration rather than substantial advances, as suggested by the large number of publications without any citations. This also coincides with the surge in publications during the last few years, as noted by some researchers (Goel & Faria, 2007; Ioannidis et al., 2023). Related to the previous limitation, there are authors who consider that tools such as ChatGPT will soon provide extensive literature reviews and propose research gaps to be covered (Rana, 2023).

The study suggests that future research should delve deeper into exploring the interplay between entrepreneurship and other disruptive technologies like Artificial Intelligence (AI), Business Analytics, Industry 4.0, and digitalisation. Another field that should be studied is the privacy concerns about Big Data related to entrepreneurship. As previously commented, we detected that certain scholars draw a clear distinction between social and economic impacts, considering them as entirely separate entities. However, an alternative perspective acknowledges that economic impact can be encompassed within the broader social impact category due to its alignment with the mission and overall influence of business. Future research in this domain could beneficially extend its focus beyond discerning the distinctions and similarities between economic and social impacts, aiming to explore a holistic perspective encompassing both dimensions. Finally, this study represents an exploratory inquiry that generated qualitative findings. Consequently, there is a need for subsequent quantitative research to rigorously test the relations unearthed herein.