Introduction

The accumulation of knowledge can be a catalyst that drives nations economic productivity. The relationship between them can emerge when workers’ human capital increases due to the technical training that accompanies the adoption of new technology for creating new products and services. Consequently, enhancing workers’ human capital also generate multiplier effects on innovation, competitiveness, economic development or economic growth (Solarin and Yen 2016; Hatemi-J et al. 2016; Inglesi-Lotz and Pouris 2013; Inglesi-Lotz et al. 2014; Zaccaria et al. 2016). Discussions about the role of scientific productivity in this virtuous circle are already present in the literature that exploits interdisciplinary approaches to understand macroeconomics (Jaffe et al. 2013). Several scholars suggest the use of different indicators to evaluate the impact of scientific research on nations economic performance: the number of published academic papers, the number of academic papers per capita, participation of the papers published by a country in relation to the world publication, or the number of citations that each paper receives (Kumar et al. 2016; Javed and Liu 2018). As a result, empirical evidence shows the adequacy of these indicators to describe scientific productivity and forecast nations economies (Pouris and Pouris 2009; Inglesi-Lotz et al. 2014).

Another non-monetary approach to understand nations production was proposed by Hidalgo and Hausmann (2009) who coined the so-called term “economic complexity index” (ECI). The development of this index relies on the premise that the economic complexity of a country is higher when its exports are more diverse and have fewer competitors (i.e., the authors used the concepts of diversity and ubiquity to differentiate nations’ economies on these two dimensions). This index was intended as an additional measure to traditional ones, such as GDP per capita (Tacchella et al. 2012; Sweet and Maggio 2015; Hausmann and Hidalgo 2011; Felipe et al. 2012). As a matter of fact, after the publication of ECI, new literature emerged to explore the impact of economic complexity on economic outcomes (Hartmann et al. 2017; Brito et al. 2018).

However, the relationship between economic complexity and scientific productivity remains little explored so far. Our goal here is to provide some preliminary analyses to shed lights in filling this gap. In the first place, we direct ourselves to examine the existence of this relationship. Secondly, we carry on to explore which areas of knowledge best explain this link. Finally, we evaluate the stability of this relationship by discriminating nations according to their levels of development. The organization of this paper is as follows: in section “Methodology and data”, we provide the methodological aspects of our analytic technique and the construction of the necessary data to implement this methodology; in section “Results”, we present the results; in section “Discussion” we discuss the implications of our findings and provide further insights for future research.

Methodology and data

In the first place, we performed bivariate analysis to compare the predictive power of the different areas of knowledge on the economic complexity of countries. To this end, we used the following three methods. We analyzed Spearman correlations between different scientific disciplines and economic complexity; then, we used a simple regression model by ordinary least square (OLS); and, with these results, we filtered out those areas that have a moderate to high correlation, have a positive regression coefficient, highly significant and a relatively high coefficient of determination (\(R^{2}\)). We used Clarke’s test for non-nested models (Clarke 2007), which like the F test, is the standard method for selecting models by testing the significance of differences in \(R^{2}\).

Then, we used panel data techniques to further examine the relationship between economic complexity and scientific productivity by exploiting the structure of the data, evaluating the unobserved heterogeneity and the endogeneity present in the variables. We express our specification as follows:

$$\begin{aligned} \hbox {ECI}_{i,t}=\alpha \times \hbox {ECI}_{i,t-1}+\beta \times \hbox {SP}_{i,t}+ X_{i,t}\times \gamma + \mu _{i}+ \delta _{t}+ \epsilon _{i,t} \end{aligned}$$
(1)

where \(\hbox {ECI}_{i,t}\) is the economic complexity index for country i in period t; \(\hbox {SP}_{i,t}\) is scientific productivity; \(X_{i,t}\) is a vector of control variables; \(\mu _{i}\) are the country-specific effects that are not observed and \(\delta _{t}\) includes the temporary effects that have an impact on countries. To deal with these temporal effects, we included a set of time dummies for all the regressions.

We employed two estimation techniques, Pooled OLS and System GMM, to account for alternative treatments of information regarding the type of Scientific Productivity. Despite the difficulties presented by Pooled OLS, we ran this model to contrast the robustness of its results with those of the system GMM estimator. The estimation of Eq. 1 through Pooled OLS faces three problems for its identification. The first problem is the specific characteristics of the countries, which could be correlated with other regressors, making it impossible to estimate the parameters consistently. Panel data models offer the possibility to avoid this problem by treating these individual characteristics as time-invariant and eliminating them through transformations.

The second problem is the endogeneity present in some variables. As usual, the instrumental variables approach can be used. The problem here is the difficulty of finding valid and robust instruments. However, a more general strategy to deal with these problems is to use the difference GMM estimator which was developed by Arellano and Bond (1991). This estimator transforms the variables to eliminate the fixed effects and, later, the endogenous variables are instrumented in levels with the lags of the variables. However, the difference GMM estimator suffers from sample bias when the number of years is small and the dependent variable shows a high degree of persistence (Alonso-Borrego and Arellano 1999). Bond et al. (2001) recommended the System GMM estimator developed by Arellano and Bover (1995) and Blundell and Bond (1998) to obtain more consistent estimates. System GMM estimator uses the lagged values in levels as instruments for the transformed variables in the equation of first differences, as Arellano and Bond (1991) does, but adds the lagged differences to instrumenting the levels of the endogenous variables. This improves the efficiency of the estimates and prevents sample bias. However, the gain of asymptotic efficiency comes with a cost. The number of instruments tends to increase exponentially with the number of periods. This proliferation of instruments can lead to different problems such as a matrix of large estimated variances, downward bias in the standard errors in the two-stage estimators, weakens the over-identification test and over-adjustment of the endogenous variables.

A third problem of estimating Eq. 1 through pooled OLS is that the delayed value of the dependent variable is correlated with the fixed effects on the error term. The independence of \(\hbox {ECI}_{i,t - 1}\) and the error is a necessary condition for the consistency of OLS. If these variables are correlated, then OLS inflates the estimated coefficient of \(\hbox {ECI}_{i,t - 1 }\) by attributing predictive power to it that belongs to the countries’ fixed effect. Besides, if the number of lags is small, then problems of endogeneity arise, and the influence of countries fixed effects could increase. The solution of this third problem is to instrument \(\hbox {ECI}_{i,t - 1 }\) as any other similar endogenous variable and use system GMM as mentioned above (Roodman 2009b).

Following Roodman (2009a), we employed System GMM estimator to estimate the parameters of our models, and to prevent instrument proliferation we followed the rule that the number of instruments does not exceed the number of groups. The consistency of the estimations relies on the fulfillment of the conditions of orthogonality (i.e., the residuals are not serially intercorrelated and the regressors are exogenous). We verified these assumptions by using two tests: the Hansen’s overidentification test, to check the validity of the instruments, and the test for serial correlation AR(2).

To obtain data of economic complexity, we relied on the index proposed by Hidalgo and Hausmann (2009) from MIT’s Observatory of Economic Complexity (https://atlas.media.mit.edu/en/) (Simoes and Hidalgo 2011). As we indicated in the previous section, this indicator uses export data to characterize the products of the countries in terms of diversity and ubiquity. Diversity describes the national production through the breadth of the products spectrum. Ubiquity takes into account the number of competitors of the underlying products. In this sense, the economic complexity of country is higher if the diversity of its products is higher than those of other countries and there are few competitors for these products.

We approximated the scientific productivity of countries through the number of publications in all the scientific disciplines for the period 2003–2014, using data from SCImago Journal and Country Rank (www.scimagojr.com). Initially, we used publications per capita (Jaffe et al. 2013) and the Revealed Comparative Advantages Index (RCA) (Guevara and Mendoza 2013; Guevara et al. 2016). The first is simply the logarithm of the ratio between the number of publications and the population of each country. Meanwhile, RCA is defined as:

$$\begin{aligned} \hbox {RCA}_{i,j}=\frac{x_{i,j}}{X_{i}}\Big{/}\frac{x_{a,j}}{X_{a}} \end{aligned}$$
(2)

where \(x_ {i, j}\) are publications of the country i that publishes in the journal of area j; \(X_{i}\) is the total of publications in country i; \(x_ {a, j}\) is the total of publications of area j of all the countries analyzed a; \(X_{a}\) is the total of publications from all countries a.

The control variables that we used to characterize the countries economic complexity are: (i) patent applications (residents) per capita (Pper) taken from World Bank indicators (https://data.worldbank.org/indicator); (ii) an index of Human Capital (HC), based on years of schooling and returns to education from the Penn World Table (PWT) (www.rug.nl/ggdc/productivity/pwt), version 9.0 (Feenstra et al. 2015; (iii) GDP per capita (GDPper) from PWT; (iv) an institutional indicator, Corruption Perception Index (CPI) taken from Transparency International (www.transparency.org). CPI ranges from 0 (very corrupt) to 100 (highly transparent countries); (v) population size from PWT. Excepting ECI and CPI, all variables are in logarithms.

Results

Table 1, shows the Spearman correlations between ECI and GDP per capita with the scientific productivity (SP) in each discipline for the period 2003–2014, as well as ECI and GDP per capita with the Revealed Comparative Advantage Index (RCA). The correlations varied ostensibly both in sign and magnitude. For example, we observed a positive correlation between ECI and the scientific productivity per capita in disciplines such as Agriculture and Biological Sciences or Energy, and a negative correlation between ECI and the RCA for these same disciplines. Furthermore, the magnitude of the correlations also differed substantially, SP always showed higher correlations with ECI and GDP per capita than those of RCA. Such variations show that RCA is more sensitive than SP to the annual variability of countries and disciplines, which might be due to the failure of the index as either a reliable cardinal or ordinal measure of a country’s revealed comparative advantage (see further details in Yeats 1985).

Table 1 Spearman correlations of scientific productivity and economic complexity index and gross domestic product per capita

By taking \(\rho >= 0.6\) as a cut-off criterion, we found that biology, computer science, chemistry, engineering, and exact sciences were those that best correlate with the economic complexity of countries. Table 2 shows a bivariate OLS regression model, where the dependent variable is ECI and the independent one is each scientific discipline. The results confirm the divergences between SP and RCA both in sign and magnitude, as well as in their statistical significance. Given the exploratory nature of this work, we decided to take as a reference an \(R^2\) cut-off criterion equal to 0.1 for several reasons. First, we noted that a higher cut-off criterion in both SP and RCA (e.g., 0.2 or higher) would end up leading us concluding that few disciplines act as predictors of ECI. Second, the majority of disciplines showed coefficients of determination below 0.2 for RCA indexes, so we set up the threshold in 0.1 so as to preserve consistency with similar previous findings (Jaffe et al. 2013). Finally, we rely on the statements of Breiman and Friedman (1982) who suggested 0.1 as the minimum and reliable recommended cut-off criterion to identify meaningful predictors. Authors as Falk and Miller (1992) consider that an \(R^{2}\) less than 0.1 provides very little information, so the relationships between variables have a very low predictive level. Thus, we found that chemistry, computer science, health sciences, engineering, biology and exact sciences proved to be the best predictors of ECI.

Table 2 Bivariate analysis of scientific productivity- OLS

We selected the disciplines that best explain the economic complexity of countries by identifying their higher associations with ECI in both Spearman correlations and OLS regressions: Biochemistry, genetics and molecular biology, computer science, engineering, environmental sciences, health professions, materials science, mathematics, neuroscience, physics and astronomy, and veterinary. Then, we used Clarke’s test for non-nested models with the aim of determining which of the models is preferable to explain ECI. In Table 3, we found that the best models were those involving biochemistry, engineering, materials science, mathematics, and physics and astronomy (based on Clarke’s test for a \(p \le 0.05\)). In summary, the economic complexity of countries seems to be better explained by the scientific productivity of those areas related to basic and exact sciences and engineering.

Table 3 Tests for non-nested model selection

However, one would expect that richer countries are likely to publish relatively more in specific disciplines compared to poorer ones. Figure 1 shows an analysis of correlations between ECI and scientific productivity for 2014, by discriminating the income level of countries. This figure clearly shows that poorest countries were less complex and made fewer research efforts in publications for basic and exact sciences and engineering. However, exceptional countries also exist. For example, Australia and Japan behaved differently. Australia is highly productive in research outputs but has a relatively low economic complexity. Japan showed the opposite situation with one of the highest indexes of economic complexity and a relative low scientific production (within developed countries). Although Australia has a world-renowned higher education system, a large part of the products traded in this country come from the Asian market (mainly China). Japan has a highly industrialized productive system that allows it to show a relatively high economic complexity index.

Fig. 1
figure 1

a Scattergram of total scientific productivity and ECI, b scattergram of scientific productivity in biochemistry and ECI, c scattergram of scientific productivity in engineering and ECI, d scattergram of scientific productivity in material science and ECI, e scattergram of scientific productivity in mathematics and ECI, f scattergram of scientific productivity in physics and astronomy and ECI

To proceed with a thorough analysis, we performed a pooled OLS regression model that allowed us to control other phenomena. Table 4 shows different specifications alternating the disciplines selected in the bivariate analysis. Columns 1–6 include control variables such as human capital (HC), innovation (Pper), socio-economic (GDPper, POP), and institutional (CPI) conditions. In each model, the scientific productivity was positive and highly significant as a predictor of the economic complexity of countries, with an explained variance ranging from [0.74 \(\le R^2 \le\) 0.77]. Human capital was positive and highly significant to explain the economic complexity of nations. Likewise, richer countries were statistically more complex. Innovation, which we approximated through patents per capita, is positively and highly significant in each of these models. Besides, in some models corruption perception index proved to be non-significant for predicting economic complexity. Finally, population size seems to be an important factor to ECI.

Table 4 Economic complexity (ECI) and Scientific productivity – Pooled OLS regression models

Next, in Table 5 we tackled the problems of unobserved heterogeneity and endogeneity through a System GMM estimator. To avoid the proliferation of instruments (Roodman 2009a), we followed the rule of keeping the number of instruments less than or equal to the number of groups. For different models, the lag of ECI is taken as a predetermined variable, and HC, Pper and GDPper as endogenous variables. In addition, we used the Windmeijer’s correction (Windmeijer 2005) for standard errors of statistical estimates. When we take the total scientific productivity of countries (SP) we see that it explained relatively well their economic complexity (positive sign, \(p-value <0.10\), Hansen test\(> 0.10\) and AR(2) \(> 0.10\)). But, we wonder, which scientific disciplines explain better this relationship? Table 5 shows that Biochemistry, Genetics & Molecular Biology and Materials Science, failed to explain the economic complexity of countries. In addition, surprisingly, control variables ceased to be significant and in some cases presented the wrong sign.

Table 5 Economic complexity (ECI) and Scientific productivity – System GMM

Next, we analyzed if previous results remained unchanged by splitting the sample by countries levels of income and development. In Tables 6 and 7 we used the average per capita income sample as a cut-off criterion to classify countries into low- and high-income nations. Then, we ran a set of Pooled OLS regression models (see Table 6).

Table 6 Economic complexity (ECI) and Scientific productivity—Pooled OLS by Income

For the group of low-income countries, scientific productivity failed to explain ECI, by revealing both positive and negative significant and non-significant relationships. In these nations, as shown in Fig. 1, there was no clear relationship between these two variables. In contrast, for the high-income countries, these relationships remained positive and highly significant, regardless of the discipline. We confirmed these results in a dynamic analysis using system GMM (Table 7). These results showed how the level of income and development of nations associates with notable differences in their economic complexity. Poorest countries need a minimum threshold of scientific productivity to impact their economic complexity significantly. Rich nations, however, enjoy increases in the economic complexity of their production due to increases in scientific productivity of basic sciences and engineering.

Table 7 Economic complexity (ECI) and Scientific productivity—System GMM by Income

Discussion

An examination of the relationship between economic complexity (Hidalgo and Hausmann 2009) and scientific productivity (Jaffe et al. 2013) was the aim of this work. As science and wealth have always been connected (Jaffe 2009; Erfanian and Neto 2017), those analyses that are intended to unveil how this relationship works are not only a well-deserved intellectual motivation but a matter of policy-makers’ interest (Inglesi-Lotz and Pouris 2013; Inglesi-Lotz et al. 2014; Babić et al. 2016; Kutlača et al. 2015; Wong and Fung 2017).

We believe that the relationship between economic complexity and scientific productivity is one of the core elements of the so-called “knowledge-based economy” that uses knowledge to create tangible and intangible values (Nguyen and Pham 2011). However, as this last term is missing in the approaches of economic complexity (Hidalgo and Hausmann 2009) and scientific productivity (Jaffe et al. 2013), few studies discuss its implications in the literature. Nguyen and Pham (2011), for example, have shown the critical role of scientific outputs (i.e., scientific productivity) in knowledge-based economies such as the ones of South East Asian (ASEAN) countries. In these nations, economic and social stability were regarded as potential factors that explain why some countries are more productive than others. By including corruption perception index as a variable that mirrors economic and social (in)stability (Correa and Jaffe 2015), in our analyses, we have shown that these factors are neither related to economic complexity nor the scientific productivity of countries.

What seems to be important for explaining economic complexity is the scientific disciplines where countries compete (Inglesi-Lotz and Pouris 2013). Our results showed that productivity in basic sciences and engineering are the ones with stronger correlations with economic complexity (Jaffe et al. 2013). Arguably, this fact might reflect the so-called “triple helix model” where university-industry-government capabilities interact. This interaction, in turn, stimulate and generate innovation and strategies for economic growth of firms that exploit this infrastructure to create synergistic economic effects (Chung and Park 2014). Here, an illustrative classification of research types is in order. On the one hand, scientific research products are those intended to generate new knowledge, especially the theoretical knowledge that allows the advance of scientific disciplines. We might say that these products are fundamentally produced and funded by universities. On the other hand, research and development (R&D) products aim to solve real-world problems based on well-known scientific principles. These R&D products are produced and funded by industries and government agencies [e.g., projects targeting technology transfer and commercialization purposes (Baek et al. 2018)]. One way of understanding the idea of a triple helix is as a form of general research policies that regulate how scientific and R&D products coexist. Their coexistence, however, not only rely on scientific requirements per se, but necessarily by social and economic factors, and needs and possibilities of each country (Vinkler 2018).

A well-known case that illustrates the idea above is the set of European post-communist countries whose research policy was to increase the number of their research products rather than enhancing the quality of them. According to Jurajda et al. (2017) such policy distracted their limited resources away from internationally more competitive research. Increasing the number of scientific papers in different disciplines is undoubtedly a necessary step for achieving the required diversification that leads nations to scientific competitiveness worldwide, but this diversification works with a large number of citations and a higher effort on R&D projects that require a higher percentage expense concerning countries GDP (Suarez 2014; Cimini et al. 2014; Jaffe 2011).

Our findings are linked with the ideas above when we try to understand the relationship between the economic complexity of nations and their scientific productivity. However, such connection proved to be clear and stable only for high-income countries while low-income economies did not exhibit this same pattern. Arguably, developing countries need to overcome several institutional hurdles (e.g., lack of public and private funding agencies, underdeveloped scientific institutions, insufficient investment on research and development projects) before they can reach the benefits of the triple helix that exists in wealthier nations. Our results should be seen in perspective with those reported by Solarin and Yen (2016) who found that research outputs had a positive impact on economic growth, in both developing and developed countries. It is important to recall that economic growth does not measure the productive knowledge required to develop the capacity to make a larger variety of products of increasing complexity (Hausmann et al. 2012). The scarce scientific productivity of low-income countries can be deemed as a systematic indicator of weak or non-existent productive knowledge.

We believe that future research should focus on the following four interrelated aspects. Firstly, it is important to quantify the proximity of the industrial sector with basic sciences and engineering and compare its proximity to other areas of knowledge. As patents are more common in basic sciences and engineering but rare in disciplines such as economics, psychology, social sciences or liberal arts, it might be possible that ECI is more sensitive to this proximity. In contrast, the productivity of social sciences might impact ECI in a non-trivial manner through, for example, public policies. Examining the indirect effects of social sciences and other disciplines on ECI is a well-deserved venture. Second, it is worth diving deep on the relationship between economic complexity and scientific productivity in low-income economies, by emphasizing the impact of public and private investment on R&D projects. Third, our work should be replicated with a more extended temporal series. For example, by merging the series analyzed by Jaffe and colleagues between 1982 and 2013 (Jaffe et al. 2013), with a more recent series (between 2014 and 2018) further studies could test the robustness and reliability of our results. Finally, in order to deepen our findings, an analysis might be done on the control variables, such as the prevailing human capital of nations, as captured by modern techniques (Laverde-Rojas et al. 2019), or including another pool of institutional metrics such as public spending in R&D (González and Pazó 2008), economic freedom index, innovation index (Jaffe et al. 2013), among others.