Keywords

1 Introduction

Based on the results of the foresight project, SCETIST (Skulimowski 2013), and a Delphi study on future development trends of knowledge platforms performed within the recent Horizon 2020 project MOVING (Köhler and Skulimowski 2019), this paper aims to provide an insight into the future of e-science. The focus is on three specific aspects of this perspective: the emergence of new research tools related to global expert systems (GESs), researcher communication with computers through brain–computer interfaces (BCIs), and the role of researchers in shaping holistic knowledge development systems that will emerge over the next few decades.

The aims of the aforementioned foresight projects include making recommendations to R&D and ICT policymakers, while pointing out prospective ICT development and research trends relevant to individual researchers and research teams. The time horizon of foresight was 2025, with an impact analysis of selected anticipated technological breakthroughs up to 2030. Some of the project results related to e-science are presented in Skulimowski (2016b); the results on the emergence of GESs are published in Skulimowski (2013), while the relation to artificial autonomous decision systems (AADSs) is discussed in Skulimowski (2014b, 2016b).

A diverse spectrum of methods was applied to elaborate on technological and social scenarios and forecasts. Those used predominantly included bibliometric analyses, extrapolation Delphi surveys (Skulimowski 2019), group building of a hierarchical state-space model of information society evolution (Skulimowski et al. 2013) and anticipatory networks (Skulimowski 2014a).

For the purposes of e-science foresight, the computer-assisted multi-round expert Delphi questionnaire retrieval (cf. e.g. Skulimowski et al. 2013, 2019), combined with expert panel meetings and outcomes of bibliometric and patentometric research proved most useful within the overall project. The analysis of expert responses was combined with an information retrieval strategy from the open Web and from major bibliographic databases. Different procedures were elaborated for fusing quantitative and qualitative knowledge and providing recommendations to the ICT industry and policymakers. A trust and competence factor system was used to compensate for the impact of diverse expert biases and competences. Each survey respondent was assigned a vector with trustworthiness coefficients of this expert in the particular subject areas of the Delphi exercise. A weighted combination of individual responses with coordinates of the trustworthiness vector was applied, wherever appropriate, to take account of the difference in respondents’ credibility.

Section 2 outlines certain basic ICT/AI development trends that may influence future research tools. The roles played by AI-based learning platforms (AILPs) and GESs will gain importance when fusing ever-growing information flows, culminating in deeper automatic data refinery before presenting them to researchers. GESs will be capable of processing “big data” to “big knowledge”. New knowledge fusion methods will be developed, such as hybrid and scenario-based anticipatory networks (Skulimowski 2014a), e-science foresight (Skulimowski 2016b), including combinations of forecasts (Elliott and Timmermann 2004) or recommendations (Skulimowski 2017a). Finally, Sect. 3 presents the results of the Delphi surveys on information systems prospects, which were conducted for SCETIST and MOVING projects (Skulimowski et al. 2013; Köhler and Skulimowski 2019). We show that different technological trends will have a synergetic impact on e-science. Artificial intelligence-based (AI) tools and approaches will play a major role. New tools will make the research conducted by humans more efficient by reaching predefined goals faster and more accurately.

Recommendations that may be useful to R&D policymakers, artificial intelligence researchers, and innovative companies will be presented in Sect. 4. We will also explore the relationship between BCIs and the future methodology of storing and processing scientific information in GESs and AILPs. Moreover, Sect. 4 discusses the opportunities, challenges, and threats posed by the development of AI tools and how BCIs could be used to quickly overcome the problem of accessing big data streams and knowledge repositories.

2 Integration of Future Research Tools in Global Expert Systems

GESs were originally intended as a generalization of large-scale expert surveys and intelligent digital libraries (Leidig and Fox 2014), capable of merging heterogeneous information. They were defined in Skulimowski (2013, p. 582) as “all knowledge sources, sensors, databases, repositories, and processing units, regardless of whether they are human, artificial, animal, or hybrid, provided that they are all mutually connected and endowed with … the usual expert system functionalities.” Nodes of a GES are marked as “users” and each GES has a specific user hierarchy. Moreover, a GES must offer each user an efficient information management system providing “knowledge transfer on immediate demand” (ibid.).

The growing coverage of scientific information by search engines, with an increasing share of open access resources, further enhances the capabilities of autonomous information retrieval, which is the base of the GES paradigm. In the e-science context, the rationale justifying the introduction of GESs is to determine rules and principles for the design of knowledge-based systems capable of gathering and processing big scientific data, information and knowledge at different stages of verification and refinery. The access of autonomous webcrawlers and other GES tools to paid or sensitive information sources may be ensured with automatic subscription passwords or automatic micropayments and may be facilitated by distributed ledger technologies such as Linux Foundation’s Hyperledger Fabric blockchain (Thakkar et al. 2018). It is also assumed that the researchers will pursue the trend to upload the results of their work to public open access repositories such as researchgate.net, zenodo.org, or academia.edu.

The development of GES and the simultaneous emergence of AILPs will ensure similar progress in learning approaches (Skulimowski 2019). It has also been argued (Skulimowski 2013) that GESs may play an important role in solving the human–computer convergence problem, which touches upon the AILPs as well. The following Internet development trends that support the above claims were identified in Skulimowski (2013, 2014b):

  • growing integration of heterogeneous information sources (ISs);

  • increasing interconnection of knowledge units, online and offline;

  • increasing sophistication of information processing within each knowledge unit;

  • growing availability of sensor and other scientific measurement data, including information from Internet of things (IoT);

  • growing need to apply big data technologies in scientific information processing driven by the overall growth of the amount of information available online;

  • the emergence of common standards for scientific information management (Jeffery et al. 2014).

The above trends are amplified by qualitative and quantitative refinement of the information stored and processed online as well as by the growing availability of the learning content. The latter is fed to AILPs and boosts their development.

The usability of online information for scientific purposes depends upon how well it is structured and accessible via search engines. For instance, the percentage of all data stored on the open Web and indexed by the search engine Google rose from 1% in January 2007 to 6% in January 2010 and exceeded 10% in January 2012. This estimated ratio has been preserved until at least 2019. At the same time, the estimated amount of information available online rose to 800 exabytes (1018 B) in 2009 and 1.3 zettabytes (1021 B) in 2013. According to the Delphi survey in Skulimowski et al. (2013), question [I.8], it is expected to rise to 1.6 zettabytes in 2020 and to reach the value of 3.5 zettabytes in 2025 and about 7 zettabytes in 2030. The recent Internet metrics dataFootnote 1 yield the value of 2 zettabytes of information contained in indexed Web sites as of 2019, which does not deviate much from the Delphi forecasts from 2012 to 2013 (Skulimowski et al. 2013). The same survey provided replies to the question of whether the information available online is really useful to scientists. The results are presented in Sect. 3.

The number of Web sites exceeded 1700 million in 2016,Footnote 2 then slightly declined and rose again to 1730 million in 2019 (Mill provides the value of 1.27 × 109 as of December 2019). Only 15% of all Web sites are active.Footnote 3 They are hosted in about 360 million top-level domains.Footnote 4 Forecasts of a further increase until 2025 and beyond diverge considerably depending on whether exclusively machine-operated and used (M2M) sites in the Internet of things are considered or not. Estimations vary between 3 and 50 billion sites in 2025. The number of Web pages indexed by Google and Bing rose to 6.27 × 1012 in January 2020.Footnote 5 When the tools offered by search engines become sufficiently sophisticated, this system of interconnected Web sites may become a real GES with strong analytic capacities.

Another salient trend shaping the future of e-science is the emergence of a new form of collaborative learning (Köhler and Skulimowski 2019) that is facilitated and made more efficient with AILPs. This trend supports collaborative research, the overall growth of collective intelligence of research teams (Mohamed et al. 2013) and their fusion in GESs. Although in the mid-term future, the intellectual capacity of scientists can be outperformed by autonomous “global brain” type analytic engines (Heylighen 2017), using GESs and AILPs as the composite tools for learning and research will keep them aligned to the recent progress of autonomously performed research. In addition, the “explainable AI” paradigm (Xu et al. 2019), when commonly applied, can use combined GESs and AIPLs as tools to make available the results of any kind of autonomous research in a comprehensible form for any GES/AILP user.

Internet-based information supply chains of constantly growing size and complexity necessitate new approaches to designing search-and-survey procedures and to delegating more of this design work to autonomous agents. In a creative decision process (Skulimowski 2011), the user defines an initial subset of ISs according to some criteria, assigns them trust or credibility coefficients (Gligor and Wing 2011) and activates the procedure that transforms selected IS to autonomous agents with capabilities similar to those of the user. The procedure runs recursively from the initial IS, so that second-stage ISs are selected and activated. This allows the agents to pursue the search autonomously and simultaneously, until a prescribed stack level or the desired retrieval goal is achieved. A creativity-stimulating content-based search and recommendation has been investigated within the recent Horizon 2020 project (Skulimowski 2017a). The design of GES knowledge provision procedures must ensure that the reply to each query is given at a specified level of trust. When trust coefficients φi, 0 ≤ φi ≤ 1, are assigned to each source of information available to this GES, the resulting trust τ(q) in the information retrieved in reply to a query q can be higher than any of its individual sources.

Autonomous management of complex queries processed by a GES is a multicriteria combinatorial optimization problem (Skulimowski 1994). The order of queries from different users and the sequence of information sources to be contacted can be assessed from the point of view of precision, recall, and other information retrieval measures, such as timeliness. The GES functioning proposed in Skulimowski (2013) is based on a snowball principle: The node that generated a query activates other units until the desired information is found. The following principles of query processing in a GES have been defined in Skulimowski (2013).

  1. (a)

    Each knowledge unit K activated by another one, Ki with a query qij returns the information specified by qij to Ki or passes to (b).

  2. (b)

    If the query qij can be only partly responded by Kj, the latter unit modifies it to qjk to ask for the missing information. Thus Kj activates further knowledge units Kk1, …, Kkn(k) with the query qjk in the order specified as a solution of the search optimization problem as proposed in Skulimowski (1994). The resulting information search strategy minimizes the number of repeated activations of the same knowledge unit.

  3. (c)

    The procedure (b) activates recursively further units. Each unit Kj activated by Ki fuses the information received from units activated by itself and returns them to Ki. All activated units are deactivated after the information requested in qij is gathered.

As previously mentioned, the above procedure is a special case of a multicriteria search strategy optimization problem, where the resulting strategy maximizes the amount of information, which is to be gathered in the least amount of time, at a minimum effort of all activated units, and at minimum cost for the initial unit. Such a search strategy may be endowed with a certain level of free will and may be designed to fulfill the definition of a creative decision process (cf. Skulimowski 2011).

The natural question of whether science is capable of accommodating any kind of future AI technology for research purposes and how it can be achieved appears when projecting the GES future. From a purely economic standpoint, the role of AADSs in e-science will grow, encompassing new areas of intellectual activity and the replacement of human researchers. Performing a complex Web search strategy by an intelligent autonomous web crawler is a real-life example of such empowerment. The development of GESs will challenge users with a growing complexity of queries, a growing amount of gathered information, and with a need to comprehend the search workflow. Rejecting useful information due to the lack of an appropriate explanation of its provenance (Malaverri et al. 2013) may cause the recipients to lose the reply, but they may prefer to proceed so as to avoid infringing cybersecurity rules.

3 Results of the Delphi Survey on e-Science Tools and Factors

This section highlights a sample of the Delphi survey results (Skulimowski et al. 2013). This survey based on the novel “Extrapolation Delphi” principle was performed twice, the first time within the above-cited project and once during its durability period. Specifically, we present the results concerning the future development of advanced expert systems, heading toward advanced GESs, which were the subject of questions contained in survey Section 11 titled “Future prospects of knowledge base, expert systems, information streams and decision support systems integration” (Skulimowski et al. 2013). The replies to five questions most relevant to this article’s topics are presented out of 36 questions in the above mentioned survey section. 

3.1 Delphi Survey Background and Scope

The survey results are presented in tables, which provide the basic statistical characteristics of replies, together with Delphi-specific consensus measures of experts and a cluster analysis (von der Gracht 2012). The latter is then used to construct the development scenarios of investigated information systems. The survey respondents were requested to define certain numerical development indicators for four time horizons: 2015 (as forecast in 2013 and an estimate in 2016), 2020, 2025, and 2030 (forecasts). The following indicators have been calculated for all replies and for all time horizons:

  • the average value, standard deviation, left and right semideviations,

  • the median, 1st and 3rd quartile and four quintiles,

  • the interquartile range (IQR), defined as the difference between the third and first quartile,

  • the interquintile range (IQVR), defined as the difference between the fourth and first quintile,

  • Hartigans’ dip test of unimodality (Hartigan and Hartigan 1985); if negative, it was followed by a clustering of replies and the number of clusters of replies was determined,

  • the Shapiro–Wilk (log) normality test, applied to replies either directly or to their logarithms when the question touched upon growth ratios.

The consensus indicators IQR and IQVR should be normalized, for example by dividing them by the maximum data range R: = rmax − rmin after eliminating the outliers. Then, the consensus is defined by one or both inequalities

$${\text{IQR}}/R \le \eta_{1},{\text{IQVR}}/R \le \eta_{2},$$

where ηk, k = 1, 2, are certain threshold values and η1 ≤ η2. We can clearly see that given the same threshold value, the IQVR provides a stronger consensus test. A positive result of the Shapiro–Wilk normality test indicates a potentially unimodal distribution of replies and rejects the hypothesis that there is more than one cluster of replies.

The statistical analysis was first performed under the hypothesis that the replies be weighted according to a self-assessment of certainty by the respondents’ survey, in combination with a self-assessed credibility coefficient of individual replies, and an automatically assigned individual expert competence score. This score was computed by the Delphi support systemFootnote 6 (Skulimowski 2017), based on previous survey participation, the record of publications, research projects, and other achievements in the question-related area. It has been observed (Skulimowski 2016a) that for most survey questions, there was no significant difference between the statistical indicators for weighted and non-weighted responses. This observation also touches upon the consensus measures and indicates that the expert group’s ability to estimate the future evolution of indicator values was homogeneous. Therefore, in this section we concluded that the resultant analysis variant yields a smaller statistical error (in terms of the standard deviation) for a majority of forecasting horizons. The sum of errors was a decisive factor, for an equal number of dominating values at different horizons. Out of five questions selected for this section, only the replies to question 11.8 (Table 4) exhibited smaller errors when analyzed without weighting coefficients.

The survey in the project SCETIST (Skulimowski 2013) consisted of two rounds and was conducted in 2012 and 2013. There was also a post-project update round with the same participants, questions and Delphi support software. The respondents could select the questions to answer, according to their competences. Therefore, from over 100 respondents, the number of those replying to questions in Section 11.1 varied between 43 and 48 in the first and second rounds.

3.2 The Future Use of Information Systems for e-Science—The Results of the Delphi Survey

The first of the above-mentioned survey outcomes presented in this paper is a basic statistical analysis of question 11.1a pointing out the forecasted shares of scientists that consider online information to be accurately representative of their research. It is shown in Table 1.

Table 1 Estimated share φ1[in %] of researchers considering the online information widely available through browsers and search engines as fully representative in their areas of scientific research. Analysis of the replies to question No. 11.1a in (Skulimowski et al. 2013) weighted with combined trust/competence coefficients of respondents

The above question did not distinguish between the research areas, so the replies only provide a rough estimate by merging humanities, engineering, etc. However, it shows the average value of online researchers’ share almost doubling between 2015 and 2030, while the mean square ex-ante forecast error rose only by about 20%, and the relative error decreased considerably. All but one (2025) reply sets for the estimation (2015) or forecasting (2020, 2025, 2030) horizons were considerably irregular and did not pass the weighted Shapiro–Wilk normality test. However, all value distributions were unimodal and concentrated in one cluster.

Let us note that all quantiles (quartiles, quintiles, median) and consequently, the consensus measures, are integers because the respondents select their replies from the standard integer pick list [0:100]. The same list was used for all questions in Section 11 of the survey where the replies were to be provided in %.

Table 2 shows the breakdown of the verified and raw quantitative information available on the Web for the same estimation/forecasting.

Table 2 Amount of processed and verified quantitative knowledge available online (in % of all quantitative information available). Replies to question no. 11.2 weighted with combined trust/competence coefficients

The respondents estimated the amount of trustworthy information (i.e., knowledge) to comprise about one-fifth of all quantitative information available. This cannot be seen as an optimistic estimate. The forecast for 2030—about 40% of refined information—presumes the emergence of a new data refinery mechanism. This share is almost double in comparison with the estimate for the present state of the Internet. Nevertheless, the share of unverified Web information will still be close to the larger part of the golden proportion, which is an indication of the power of disinformation and fake data. The question in the first two rounds just touched upon the knowledge, irrespective of whether it was quantifiable or not. Based upon the respondents’ postulates, the question for the follow-up round was formulated more precisely, but without a statistically essential impact on outcomes. A characteristic feature of the above replies is smaller than the usual difference between the IQR and IQVR consensus measures, which indicates a relatively large number of equal replies between the 1st quartile and 1st quintile as well as between the 3rd quartile and 4th quintile.

The next question (11.3) assumed the emergence of a next generation of Wolfram’s AlphaFootnote 7—an expert system capable of providing informed replies to virtually any query. This question touched upon a quantitative characteristic of a future GES capability to reach the existing information, namely, its maximum recall value relative to the query provided by the system user. Replies equal to “0” were representative of the disbelief of this particular survey respondent that such software will be created (Table 3).

Table 3 The share of information available on the Web that can be processed by advanced expert software (GES) capable of analyzing heterogeneous data (quantitative economic information, multimedia, publications, video streaming) and providing GES users with informed replies to any given question (in % of available information used for this purpose)

Unlike in the case of the two previous questions, the replies to question 11.3 above indicate a sharp rise in the GES search range, from an initial estimate of about 2–27% in 2030, with a high yet relatively decreasing uncertainty, expressed by standard deviation and semideviations.

A symmetrical problem to that shown above was considered in question 11.8 (Skulimowski 2013); namely, we investigated the Internet users’ attitudes to searching for solutions to their problems on the Web. The analysis of replies is given in Table 4.

Table 4 Answers to problems, questions, and queries of all kinds (translations, spelling, defini-tions, geographical information, graphical object finding, legislation, etc.) that will be sought online: in% of all queries from user with Internet access (mobile or landline); unweighted

A predominance of solving problems through access to online information is not a surprise. Actually, the above characteristics may be burdened by a relatively high share of elderly people who have Internet access via their mobile phones, but use it sparingly. The most recent research performed within the project (Skulimowski 2019) yields considerably higher estimates for 2025 and 2030, reaching more than 90% of all queries.

The last set of results presented in this section touches upon the emergence of qualitatively new capabilities and phenomena in GESs, manifesting itself through solving previously intractable problems or answering unresolved questions. Namely, the integration of knowledge on the Internet will allow for a new level of quality in resolving problems presented by GES users, specifically those intractable problems, and providing replies to queries, which are unavailable through contemporary information processing methods (Table 5).

Table 5 The share in % of problems and queries that will be more adequately solved by GES, compared to the solutions and replies provided by human experts (question 11.9)

Both the uncertainty expressed by the standard deviation and semi-deviations, as well as the consensus indicators IQR and IQVR for question 11.9, are relatively lower than in case of the two previous forecasts. Fitting the above replies with the logistic curve (Skulimowski 2017b), we can calculate the expected time when the majority of problems and queries can be better solved by GESs, namely the year 2037. This year can thus be regarded as a kind of a singularity (Skulimowski 2014b); however, in a limited sense. To conclude this section, let us note that reaching a consensus need not be the ultimate goal of a Delphi survey. Usually, if the unimodality test is negative, a lack of consensus indicates the existence of several clusters of replies. If this is not the case and the IQR or IQVR values are rather high, while growing more slowly than the trend investigated by the survey, it means that there is a common expectation of a certain trend or event among the survey respondents, with a high uncertainty regarding its time of occurrence, however.

4 Discussion and Conclusions

The results of the Delphi survey presented in Sect. 3 provide clues, arising from expert judgments, regarding the amount of information available online and its use for e-science purposes until 2030. It is expected that by 2030, the corresponding information retrieval tools will reach sufficient enough levels to provide virtually all necessary scholarly information to researchers. Furthermore, within a similar time frame, GESs are expected to outperform human experts in solving complex knowledge processing tasks.

Another AI trend that may have a relevant impact on e-science is the development of brain–computer interfaces (BCIs) and their deployment in enhancing research, their joint use with GESs and AILPs, as well as in intelligent decision support systems. The results of a Delphi survey on BCIs are presented in Skulimowski (2014b, 2016b). Here, we briefly discuss a summary of these findings. By definition, in a BCI, outward information is retrieved by recognizing the brain’s electromagnetic neural activity, while for the inward transfer direction, a BCI triggers the neural circuits directly (Brunner et al. 2011; Jiang et al. 2019). The best transmission rates and qualities were obtained with invasive BCIs, based on intracranial implants, but the greatest hope in enhancing human capabilities is placed on non-invasive BCIs, such as wearable devices that are used to retrieve EEG or fMRI signals. They are expected to facilitate efficient bidirectional communication with GES (Zhang et al. 2013) as well as direct communication between human brains, called hyperinteraction (Grau et al. 2014; Jiang et al. 2019). The ability of a BCI to directly connect researchers’ brains with powerful expert systems will speed up progress in global data integration provided by GESs. It will also increase the efficiency of scientific collaboration (Leidig and Fox 2014; Shi et al. 2017) and the use of AILPs. The positive effect of BCIs on researchers who obtain efficient and instant access to big research data may partly compensate for the negative impact of data explosion. However, the question of whether e-science can fully exploit the capabilities of emerging advanced AI tools and technologies such as AILPs, GESs, and BCIs to increase the quality and efficiency of scientific research remains to be seen.

The analysis of the full set of SCETIST Delphi survey replies resulted in deriving three human–AI interaction scenarios (cf. Skulimowski 2014b, 2016a, b). Here, we adjust them slightly to provide conditional responses to the above question. The full and beneficial use of AI defines the optimistic scenario of human–AADS interaction, while the negative response is associated with the pessimistic scenario, often referred to as the AI threat problem. The foresight results presented in Skulimowski (2014b) suggest that the main condition triggered between the positive and negative scenarios is the capability of future BCIs to provide a direct interface to GESs and facilitate the creative process of GES users.

In the optimistic scenario, the growing empowerment of AADSs will be compensated for by the ability of human supervisors and authorized users to control them directly with BCIs. This scenario is backed by results of the Delphi survey presented in Sect. 3, which suggest that GESs and AILPs supported by high-performance BCIs and enhanced reality will ensure control over advanced AI technologies. Further results of the Delphi survey on the development of artificial creativity and creativity support systems performed in SCETIST (Skulimowski 2016a) highlight the importance of coupling human users with GESs and AILPs via BCIs to stimulate their creative abilities.

The pessimistic scenario presumes that a growing share of human creative activity, specifically in research, will be replaced by AADSs due to the ever-growing complexity of research and decision problems to be solved along with increasingly large data volumes. In this scenario, AADSs will specify goals, criteria and constraints, target quality and the scope of applicability of solutions. Human researchers will only perform auxiliary and assistive roles.

In the third, neutral scenario, technological development is generally slowed down in the face of various setbacks. In this case, the AADS/human competition problem will be deferred to a more distant future, beyond horizon 2030 of the foresight studies presented here.

In conclusion, the results of recent foresight studies highlight the relevance of development trends in selected advanced AI technologies for future e-science, e-learning, and e-research. According to the outcomes of the research projects (Skulimowski et al. 2013; Köhler and Skulimowski 2019), the areas of intensive ICT/AI development efforts that can be of utmost relevance for e-science are GESs driven by autonomous web crawlers and dedicated decision support systems, creativity support systems capable of stimulating or at least preserving human creative abilities, and bidirectional non-invasive BCIs providing direct links to GESs and other researchers to efficiently tackle large amounts of scientific data.