1 Introduction

In order to follow the evolution of economic productive activities, R&D processes of economic production were investigated in the beginning of 90’s [13, 15, 23], particularly by focusing in information and interactions circulation among agents (people, enterprises and institutions). While the main objective of this literature was to study the ability of expansion of national economic systems, the introduction of such approach fostered the analysis of the interactive dynamics that characterise the bottom-up development of these processes at a micro-scale [15, 28] and at a meso-scale of analysis [6, 24, 30]. Network dynamics and technology development research invoked institutional policy makers’ consideration to also assess these relations.

In the prospect of comprehending interactive dynamics able to unleash the emergence of bottom-up meso-structures, the development of the techno-economic segment (hereinafter TES) analytical approach aims to analyse specific techno-economic interconnected systems and describing their dynamics [29]. Targeting also non-official heterogeneous data sources, and studying the complex underlying worldwide networks of agents, locations and technologies, TES refers to largely integrated technological (and non-technological) domains in the ICT industry that usually do not correspond to industrial or products classification. To this scope, an integrated method is developed and presented here for the R&D only part of the interconnected system, with the following stages. The flowchart of the methodological approach is presented in Fig. 1.

A. :

ETL process: “Extract-Transform-Load” (ETL) for data collection related to a specific technology, data preprocess, and keyword extraction to reduce text dimensionality

B. :

Network structure: multilayer three-level network development for unveiling latent relationships based on apparent relationships

C. :

Community detection: for analysis of strongly interacting groups of agents,

D. :

Topic modelling: for technological areas’ identification, evolution, and semantic analysis.

Fig. 1.
figure 1

Schematic representation of methodological approach to capture a technology’s pathway. The dashed line illustrates the perspective semantic analysis within the resulted communities.

The present first TES analysis covers the photonics industrial segment. Photonics is identified as one of the six important Key Enabling Technologies (KETs) for Europe’s competitiveness [9, 34]. The technological domain of Photonics covers computational methods and applications regarding the light generation, detection and management. The methods include conversion of sunlight to electricity for renewable energy production, and the applications vary among photodiodes, LEDs, lasers and other electronic components and equipment used from a plethora of fields as ICT, life sciences, industrial manufacturing, etc. [34].

In the following sections only the stages aimed for analysis will be presented, namely stages B and C as the first analytical step, and stage D for the second step. In Step 1 of the analysis (B and C stages), economic relevant R&D activities (EU funded projects and patents) are analysed. As the acknowledged innovation theory addresses interactions as a critical factor in the exaptation process, hence in the evolution of the agent-artefact space [20,21,22], a network of agents is created among them. The first dimension of the network represents (a) the interactions occurred among agents through co-participations in specific type of economic activities. Then, in order to address and represent latent relationships, a second dimension regarding (b) the geographical proximity and another one regarding (c) the use of similar technological elements, are added to the network. The three dimensions are selected so as to include the three aspects that in certain works on socio-economic complex systems [21, 26] and on bio-chemical complex systems [3] are discussed as relevant in the process of the bottom-up meso-structures’ formation. Correspondingly, these aspects are: (a) activity/metabolism, (b) proximity/localization and (c) functional behaviour/informationFootnote 1. In order to inquire the aforementioned dimensions and uncover key agents and activities, a multi-layer network (MLN) is formed, in which each layer represents one of the aforementioned dimensionsFootnote 2. Then, to identify groups of agents that, due to proximity in three dimensions, are more probable to interact intensively, the Infomap Algorithm for MLN is implemented [11]. Communities are so mapped, therefore indicating ongoing interactions within and, by means of the resulting overlaps, also between the detected groups.

In Step 2 of the analysis (stage D), the captured dynamics are further studied, as they represent potential factors that affect the entire interconnected system. In order to unveil technological subdomains with method-oriented and application-oriented activities, a semantic analysis of the techno-economic research field of photonics is pursued. The textual information of the corpus of documents describing the aforementioned economic R&D activities that are identified and used in the previous step, are considered to structure the MLN. Topic modelling is implemented, so as to unveil technological subdomains associated with methods (e.g. chemistry) and implementations (e.g. photonic devices), that would subsequently reveal potential semantic connections among the agents. Topics are essentially summaries of the most pertinent themes of the large collection of documents. So they are used to communicate rapidly the subjects on which the agents are working. Agents grouped into the same topic do not necessarily collaborate. However, the fact that they are found active in the same topic indicates a potential aspirant collaborative relationship, or a competitive relationship. Both kinds of relationships foster innovation and suggest vibrant innovation processes, and hence prospective compelling technological sub-areas. A generative three-level hierarchical Bayesian model is employed to identify the topics that are generated based on the collected documents, namely the Latent Dirichlet Allocation (LDA). Therefore, the knowledge of a technological domain, which was only a property of experts, is approximated through this statistical, hence unbiased, method that extracts the documents’ thematic content as topics [32]. A topic is a probability distribution of terms, which occurs from the underlying semantic structure of terms found in documents [4, 25, 32]. Each document is composed by one or several topics, with the latter being a combination of terms from several documents. This bag-of-words assumption is common to natural language statistical models [1, 4, 32]. Latent connections between agents are thence uncovered, as part of the natural language processing analysis. The implementation of the model is tested on the techno-economic research field of photonics with worldwide activities.

The methodological background of the network structure and community detection, and of the topic modelling will be further analysed in Subsects. 2.1 and 2.2 respectively. Subsequently, the results of the two-step analysis are presented in Sect. 3. With this last part it is possible to further investigate the co-appeareance of agents and activities in topological groups, and the presence similar content in topic groups. These patterns can subsequently provide indications for the evolution of the technology. In conclusion, the presented multidimensional methodology enables the interpretation of the occurred latent relationships of the agents and their interactions.

2 Methodological Background

2.1 Step 1: Multilayer Network Structure and Community Detection

The set of considered economic agents is structured based on two of the worldwide most important activities of developments in productive processes. The first type of considered activities are participations in European Projects (FP7 and H2020), while the second is patent applications. The first embodies the concept of R&D processes, i.e. qualitatively new economic processes of production. The second type addresses differently R&D processes, since the included actions fix technological combinations that may not be yet formally established. After considering these two activities, the focus is set to those involved in the field of photonics. In this way, the set of agents that develop activities regarding the evolution of photonic-related technologies, is identified. Following the aim of the present study, the introduced approach focuses on interactions and information exchanges that form a network populated by identified agents. Economic agents are represented as nodes of the network, and a multidimensional space of relationships is modeled by placing different types of connections in different layers of the MLN. From a conceptual point of view, the following conjectures are established in order to justify the consideration of (i) co-participations in economic activities (first layer), (ii) common geographical location provenance (second layer), and (iii) common use of technological terms (third layer).

The first conjecture is made in order to capture relationships really occurred among agents. The hypothesis is that agents, by co-participating in a same activity (i.e. one project in FP7, or H2020, or Patent), exchange relevant information among them. Therefore, when co-participations occur, involved agents are considered as being connected. Since the analysis aims to capture the structure of interactions for technological development, the dimension regarding information exchange necessarily represents the starting point of this work. This is the first network layer that is defined.

The second essential considered dimension addresses geographical information, as agents can establish deep connections with the place where they locate their activities. More precisely, common socio-economic habits/traditions and economic processes’ integration can emerge among agents rooted in a same geographic area. This phenomenon, which favours the flourishing of fruitful and innovative interactions, is identified in economic literature with the concept of ‘industrial district’ [2, 7]. Therefore, connections are established when agents are located in the same region (one granularity level under the national level). The second dimension of the analysis is so justified, and the second layer of network is then structured.

The third dimension is conceptualised based on the conjecture that agents, moving towards the development of specific technological field, endorse the use of specific technology-related terms. As R&D activities are here considered, the relevant terms extracted from these activities’ descriptionFootnote 3 are used to investigate the presence of agents’ convergent paths/semantic proximity in terms of their technological developments [10, 12, 17]. Therefore, a connection is established among all the agents using a same term (regardless the activity). This last dimension, aimed at investigating agents’ qualitative orientations, is represented by the third layer of the MLN.

Agents’ interactions in three different dimensions are so represented by means of a MLN with three different layers in which the agents endorse the described types of interactions. Three additional points regarding the structure of the MLN. First of all, while initially each layer is set as a bipartite network, with agents (first-type node) connected to other nodes (second-type) whose nature depends on the layerFootnote 4, then each layer is transformed in its one-mode projection based on the first-type of nodes, i.e. agents. Therefore, in the final MLN only ‘agents-to-agents’ connections are present. Secondly, all the connections are undirected and their weight is computed on the basis of the number of shared second-type nodesFootnote 5. Finally, inter-layers connections are added to the connections considered until now (i.e. intra-layer connections). Therefore, the three ‘state-nodes’ of agent X, i.e. \(X_a\) and \(X_b\) and \(X_c\) where a and b and c indicate the layers, are always connected [11].

After having structured the MLN, the Infomap community detection algorithm for MLN is implemented [11]. This algorithm uses a two-level binary description based on Huffman coding and on MapEquation [14, 18, 27] and its implementation in socio-economic complex-networks is discussed in [26]. As this algorithm minimises the length of the description of a random flow that circulates throughout the considered network, the resulting communities can be interpreted as groups of agents with intense information exchange. Regarding the settings of its implementation, in order to balance the weights of the three layers, a principle regarding the density of the layers is here taken into account. More specifically, for each layer the ratio between (i) the density of the most dense layer and (ii) its densityFootnote 6 is used as a multiplication factor for the weight of each connection belonging to that layer. In addition, Infomap algorithm is set to detect overlapping communities, i.e. agents can belong to more than one community. As a consequence, an interconnected structure of communities can emerge.

2.2 Step 2: Topic Modelling

An emerging technology may trigger interactions across several technological fields and geographical locations. Moreover, agents’ spatial and economic process-related relations convey heterogeneous characteristics to the agents, which may be direct or indirect. The transversal distribution of direct or latent interactions, jointly with the heterogeneous profiles of agents, raises the issue of how to effectively detect the complex network of a technology, the technological subdomains, and most importantly the evolution of an emerging technology.

The convergence to a common use of a set of terms in order to communicate, creates a thematic group or topic. The agents using this topic adopt a common language to communicate and exchange information [10, 17], and this creates topic relations. Thence, the topic relations can be directly apparent, when the agents collaborate to an activity and use a common set of terms to communicate, or else a topic.

The topological relations in conjunction to the topic relations of a complex network’s structure, is a subject that has not been thoroughly studied. Relevant literature presents various methods and reports reasonable performance [12, 35]. This final part of the suggested approach aims to associate and interpret all the aforementioned information, so as to render comprehensive information on innovative technological pathways that will allow policy makers and R&D managers to make timely and effective decisions. In order to capture the pathway of the technology, its subdomains should be identified, to assess their persistence, disappearance or emergence through interdisciplinary activities. To this end, the thematic content of the research and patenting activities that represent the interconnected system’s development is considered through topic modelling.

The technology’s subdomains are approached as topics, based on a number of documents that represent the activities performed by a technology. This conjecture is drawn from the idea that every collected activity, hence document, has one or multiple thematic subjects. To have an overview of the most salient thematic subjects of all the documents, these subjects should be grouped and provide the most pertinent topics of the entire collection of documents. Previously, in order to define the topics of the corpus that the documents form for technology mining, so with real-world corpora, expert assessment was invoked, which moreover requires a further refinement of the analysis [8]. To avoid this subjective approach, a generative hierarchical Bayesian mixed-membership model for discrete data is employed, to identify the topics that are generated based on the collected documents, namely the Latent Dirichlet Allocation (LDA) [5, 19]. The assertion is that based on the co-occurrence of terms - the probability distribution of terms - in a document, one or multiple topics are formed. The LDA model, as a mixed-membership model, returns the most probable topics from each document, and then from the entire corpus. So each document may be consisted of a mixture of topics, as the model allows to a document to belong to several topics simultaneously, which happens also in real-world. Respectively, a topic is a mixture of terms from several documents [4, 25, 32]. Accordingly, the hypotheses that are made are the following:

  • A technology is approximated by a number of collected activities, which is assumed to be representative of the complex network of this technology.

  • Each detected activity (patent, project) is represented by a document.

  • Each document is represented by a set of terms (or else words, hereinafter interchangeably used).

  • A mixture of terms creates a topic, regardless the document to which it belongs.

  • A term may be assigned to one or more topics. This non-exclusive assignment of a term to a topic is sustained by the observation that in a natural language words do not belong to one topic only.

  • The order of terms in a document and the order of documents in a corpus are disregarded (exchangeability assumption) [1, 5, 32]. So each document may be a mixture of topics, which are mixtures of words (bag-of-word-assumption) [5, 32].

  • Each topic represents a combination of subdomains or one subdomain of the technology.

Probabilistic Generative Model Notation. To formalise the aforementioned, let the basic unit be a term, or word w, which is an element of dictionary, namely unique words, \(W=\{w_1,...,w_V\}\). A document, which is a sequence of words, is denoted by \(d=\{d_1,d_2,...,d_N\}\). A corpus, which is a set of documents, is denoted by \(D=\{d^1 _1,d^2 _2,...,d^M _N\}\). The generative process for each document in a corpus provided by the LDA model, is outlined in the seminal article of Blei et al. [5].

Since a topic model assumes a latent structure to approach the entirety of set of words as a computable distribution over K topics, to identify the topic distribution of the corpus and the words assigned to each topic, the learning process of the LDA is then implemented. More specifically, for a topic index k, the goal is to compute the word-topic distribution P(w|k), namely the most important words w of a topic k, and the topic-document distribution P(k|d), namely the prevailing topics of a document, so as to eventually converge to a robust set of topics in the corpus and compute the P(k|w). Analytically, P(w|k) is the probability of a word \(w_i\) assigned to a topic \(k_{i}=j\), represented by a set of multinomial distributions \(\phi \) over the W dictionary of words, so that \(P(w_{i}|k_{i}=j)= \phi ^{j}_{w}\). Accordingly, P(k) is the probability distribution of topics within a document d, represented by a set of multinomial distributions \(\theta \) over the K topics, so that a w in d is assigned to topic \(k_i=j\), based on \(P(k=j)= \theta ^{d}_{j}\).

However, to define in a corpus the probabilities of word instances per topic, and the set of topics, involves enumerating a very large discrete state space and compute a probability distribution (P(k|w)) on that space, which requires the computation of P(w|k),through \(\phi \) and \(\theta \). So \(\phi \) and \(\theta \) will be estimated through LDA and prior probability distributions, or else priors. Let \(\alpha \) and \(\beta \) be the parameters of the priors, called hyperparameters, each of them having a single value. The Dirichlet priors are conjugate priors to the multinomial distribution. Namely, as the prior distributions of the multinomial parameters are Dirichlet, respectively Dirichlet will be the posterior distributions \(\phi \sim Dirichlet(\alpha )\) and \(\theta \sim Dirichlet(\beta )\), which renders possible the computation of the joint distribution \(P(w,k)=P(w|k) \cdot P(k)\) through integration of \(\phi \) and \(\theta \) separately. The integration of \(\phi \) results to the computation of P(w|k), and of \(\theta \) to the P(k).

The evaluation of the posterior distribution \(P(k|w)=\frac{P(w,k)}{ \sum _k {P(w,k)} }\) has to be indirectly computed, as the denominator creates a state space too large to enumerate, through sampling from the target distribution with Markov Chain Monte Carlo (MCMC) [16, 32]. The target distribution is approximated by a Markov chain, and then samples are selected from this chain through Gibbs sampling, which does not require tuning. All the conditional distributions of the target distribution are sampled and a sequence of observations used to approximate the latent variables of the LDA is returned. So finally, for each sample among the set of samples from the predicted P(k|w) posterior distribution of words as mixtures of topics, the \(\phi \) and \(\theta \) can be estimated [16].

Topic Distances and Visualisation. Interdisciplinarity and relevance of the ensued technology-related topics is subsequently pursued, based on inter-topic distances. The hypothesis is that if two topics appear close in distance and their topic content is not thematically relevant, this points to a potential emergence of a new subdomain from the activities of these subdomains in a subsequent time period. The relevance of the topics is computed through a weight parameter, the topic prevalence with Principal Components Analysis (PCA), and the inter-topic distances with Jensen-Shannon divergence. The interdisciplinarity aspect points to potential generation of new subdomains in a subsequent time period, based on the common use of terms from co-participation to activities subdomains that were assigned together to a topic.

Let the relevance metric be the relevance of a term w to topic k given a weight parameter \(\lambda \). This weight parameter determines the weight given to the probability of a term under a topic relative to its lift, on a log scale. It is computed as the ratio of a term’s probability within a topic to its marginal probability within the corpus. A lift of \(\lambda =1 \) results in a ranking of terms according to their topic-specific probability, and lift of \(\lambda =0 \) to a ranking solely according to their lift. The lift parameter decreases the appearance of globally (in the corpus) frequent terms in high ranking places. However, rare terms that occur to very few or a single topic could appear in high rankings, decreasing the interpretability of a topic if the term is very rare [31, 33].

3 Results and Discussion

Communities. The identification of cohesive groups of agents, namely communities, may assist in recognising recursive elements regarding agents’ interactions and to shape and design policy interventions targeted to specific interest groups. The resulting communities are formed by agents more likely to share information in any of the three considered dimensions: activity, location and technology. After the analysis of Infomap output, communities c.1 and c.2 have been removed from the analysis as they were grouping together all the players participating in EU projects (c.1), and all the patenting players (c.2). Some descriptive statistics regarding the detected communities are presented in the Appendix A. Through the identity of the agents in each community, the community’s profile is explored. In addition, the information regarding the structure of communities, i.e. how communities generate a structured network by having agents in common, is considered. Figure 2 immediately reveals which communities are the most connected and with which ones they share agents. Further analysis would entail the identification of the technological subdomains in which the communities are involved.

Fig. 2.
figure 2

Structure of Photonics communities of agents detected by Infomap algorithm, in a Fruchterman-Reingold force-based layout. The largest components and some isolated communities are here presented. Each node represents a group of agents, i.e. a community. The size of the nodes is proportional to the number of agents belonging to the communities. The labels indicate the 24 most relevant communities (by amount of Infomap flow). Communities c.1 and c.2 have been removed from the analysis as they were grouping together all the players participating in EU projects (c.1), and all the patenting players (c.2). The width of the links is proportional the number of agents simultaneously belonging to the two communities involved.

The community detection analysis in the multilayer network of R&D in photonics reveals the presence of over one hundred communities. Many communities are isolated, as their agents are not involved in other communities. This is the case of a large set of very small communities made of agents that patented alone, or that participated in few EU projects with low number of participants. When the communities’ structure is observed from a country perspective, some elements emerge. While Japanese agents are mainly grouped just in community c.3, Chinese agents populate four large communities, namely c.4, c.5, c.7 and c.10. Two of them (c.5 and c.7) are connected (as they have agents in common), while the remaining ones are independent. The connection involving c.5 and c.7 reveals the presence of collaborations or technological convergence between the agents of these two communities. On the opposite, c.4 and c.10 do not have relevant convergences with other large communities and they have only smaller neighbouring communities. Therefore, the two countries with the highest propensity to develop patents show two community structures that are very different from each other. The intertwined single structure made of multiple communities observed in Fig. 2 is mainly populated by European agents. The communities reveal the presence of supra national patterns involving in different ways the countries of the Union. Some of these communities are mainly dominated by agents in a single country, e.g. c.6 (almost all in France), c.13 (the majority in United Kingdom), c.22 and c.24 (the majority of agents in Germany). Then, there are many heterogeneous communities composed by agents from different countries. Some of them are mainly made of agents from two countries, like c.8, which includes nine agents from Germany and seven from Italy, c.17, which includes six agents from Germany and six from Italy, and c.12, which includes seven agents from United Kingdom and six from Italy. Some communities present a larger mix, as they include agents from many different countries. This is the case of c.11, a medium size community whose 21 agents participated in 14 FP7 projects, 5 H2020 projects and in 1 patent.

Topic Identification. Another potential grouping of agents is examined assuming that as the technological subdomains do not operate exclusively one from another, considering that they may exchange applications and methods, they present relationships that may create thematic topics. These topics ensuing from interdisciplinary combinations of the subdomains of the technology’s system, are targeted to be identified with the topic modelling. Some of the subdomains that are found within one or more topics are for instance: fiber optics and waveguides, photonic devices, optical materials, light pulses, etc (see Fig. 3 and Appendix B). The potential overlap of topics sharing subdomains is studied with the thematic distance between the topics. This analysis provides insights about plausible emergence of new specialised topics, or about possible combinations of several topics into a single one due to the application of common methods.

Fifteen topics are identified in the photonics technological landscape over the period 2008 to 2016. The topics are mapped in an interactive representation based on their thematic distance. A snapshot of this map is presented in Fig. 4. To inquire each topic’s prevalence over the corpus of the technology’s activities, all the resulting topics are ranked in descending order based on their occurrence frequency (Fig. 3). Moreover, the topics are classified according to the kind of use these topics are more associated with. The activities that characterise each topic are classified as follows: (i) only application-oriented, (ii) only research method-oriented, or (iii) both. A more detailed presentation of the contents of each topic per (i), (ii) or (iii) is provided in Appendix B. The colours of the bars are used to highlight this classification. The most frequently occurring topic is the multidisciplinary topic 2, which contains three interdisciplinary subdomains: fiber optics and waveguides, general optics, and lasers. All detected subdomains demonstrate activities in applications. It is followed by the topic 3 related to the subdomains of photonic devices (applications-oriented subdomain), and of physical foundations (methods-oriented subdomain). This reveals that the basic physical methods are intensively investigated for the development of photonic devices. The third most frequently occurring subdomain is regarding optical materials, such as display devices and LEDs, and it is identified only in topic 6. Subdomains regarding applications occur more frequently than subdomains regarding methodological aspects in the photonics interconnected system. This implies that the most significant evolution is expected mainly by the interactions within the subdomains regarding applied research. Furthermore, the detected combinations of subdomains within each topic indicate the potential upcoming paths of development for the considered technology.

Fig. 3.
figure 3

Occurrence of photonics topics in R&D activities per type of topic, 2008–2016.

Fig. 4.
figure 4

Inter-topic distance map for R&D activities in photonics, 2008–2016 (left). Top-30 most relevant terms in topic 2 (right). Red bars indicate frequency of occurrence of a term within the topic, and blue bars frequency of occurrence of the term across all topics of the corpus. Note: For the visualisation the LDAvis R package is used. (Color figure online)

Topic Distances. In Fig. 4, the topics resulted from the topic model are illustrated after a PCA analysis, in order to identify which topics, over the entire period 2008–2016 in the photonics R&D activities, are closer to each other. In case that the observed topics appear thematically very close to each other, e.g. the topic consisted of subdomains regarding photonic devices employing physical foundations methods (topic 3) and the topic on optical materials (topic 6), or in case that the topic appears to extensively thematically overlap with another, e.g. the topics of photonic devices applications employing methods from chemistry and biology (topic 13), of lasers for optical sensing employing methods from biology (topic 14) and of fiber optics and waveguides applications employing methods from quantum optics and physical foundations (topic 9), it is likely that their evolution will be intertwined. Namely, when topics are close in thematic distance, eventually they will either merge into a new single multidisciplinary topic including stronger relations between their corresponding subdomains, or they will diverge and create more specialised and independent topics, where their subdomains will not demonstrate interdisciplinary relations. In the first case, the destination becomes common, and in the second the origin remains common.

The terms of each topic are displayed on the side of the map, offering a comparative ranking of the most relevant terms within the topic and across all the topics. Hence, the terms that describe the subdomain that appear in the topic more pertinently are depicted by red bars Fig. 4, and the most relevant terms of the topic in the entire photonics domain, namely the terms that are relevant for other topics too, are depicted by blue bars. The illustration is given here for the highest ranked topic (topic 2), but the observations are made for all.

There, it is possible to identify which topics, over the entire period 2008–2016 in the photonics R&D activities, are closer to each other. In case that the observed topics appear thematically very close to each other, e.g. the topic consisted of subdomains regarding photonic devices employing physical foundations methods (topic 3) and the topic on optical materials (topic 6), or in case that the topic appears to extensively thematically overlap with another, e.g. the topics of photonic devices applications employing methods from chemistry and biology (topic 13), of lasers for optical sensing employing methods from biology (topic 14) and of fiber optics and waveguides applications employing methods from quantum optics and physical foundations (topic 9), it is likely that their evolution will be intertwined. Namely, when topics are close in thematic distance, eventually they will either merge into a new single multidisciplinary topic including stronger relations between their corresponding subdomains, or they will diverge and create more specialised and independent topics, where their subdomains will not demonstrate interdisciplinary relations. In the first case, the destination becomes common, and in the second the origin remains common.

The terms of each topic are displayed on the side of the map, offering a comparative ranking of the most relevant terms within the topic and across all the topics. Hence, the terms that describe the subdomain that appear in the topic more pertinently are depicted by red bars 4, and the most relevant terms of the topic in the entire photonics domain, namely the terms that are relevant for other topics too, are depicted by blue bars. The illustration is given here for the highest ranked topic (topic 2), but the observations are made for all.

Topic Dynamics. To delineate the performance of the observed technological topics, the popularity of each topic, namely the topic’s probability of occurrence based on the technology-related collected documents, is analysed on a yearly basis over the time period from 2008 to 2016 (Fig. 5). However, given the short time period between its establishment and the end of the considered study period, definitive conclusions are not to be deduced. Moreover, it is noted that the number of collected activities during the last three years of the study period present a downward trend (ranging from 1.6 to 2.7 times less activities between 2014 and 2016, in comparison to the average number of activities between 2008 and 2013), due to delays in the registration of the patenting activities.

Fig. 5.
figure 5

Topic evolution based on topic-document probability in Photonics R&D activities, 2008–2016.

The popularity of the multidisciplinary topic with subdomains regarding applications of fiber optics and waveguides, general optics and lasers (topic 2) is among the highest in the photonics interconnected system. This is corroborated by the high occurrence of the topic, as mentioned in the previous indicator. Moreover, a 15% mean increase in occurrence is shown during the period 2014–2016 in comparison to 2008–2013. A similar interpretation can be proposed also considering the high occurring topic of optical materials applications (topic 6): its mean occurrence increases of 4.8% during the period 2014–2016, after a drop in 2013. Also the topic of photonic devices and of methods of physical foundations occurred rather highly (topic 3, 11% mean occurrence during 2008–2016). In particular, stabilisation is observed after a period of fluctuation and final increase in 2012. The mean low popularity of the technological topic on quantum optics (topic 11, 3% mean occurrence during 2008–2016), after a marginal increasing peak in 2014, confirms the topic’s lower ranking in terms of occurrence that is found in the intertopic map. This variation is marginal and may conceal any trend due to the aforementioned issues of delay in registration and uptake of decisions.

4 Conclusions and Perspectives

The heterogeneity and spatial dimensions of agents and activities within emerging technologies, as well as the complex interactions that develop due to these factors, and the lack of correspondence to standard classifications (industrial or products), demand a methodology that identifies characteristics of the technology and the evolution of their relationships. The presented methodology through community detection and semantic analyses successfully detects not only the established R&D applications of the photonics technology based on collaborations of agents, but most importantly the latent relations of agents that appear in homogeneous groups due to spatial or thematic interactions. The analysis of these dynamic groups’ attributes offers valuable insights, which were not acknowledged without the proposed method. The identified technological pathway and subdomains in conjunction with the yielded observations, provide valuable suggestions for the development of timely and reliable policies.

The perspectives of this ongoing study comprise automatic detection of the corpus to improve topics’ robustness, and semantic analysis within the resulted communities. In particular, the presence of multiple topics within the same community, or the presence of the same specific topic in distant and not else related communities, will be examined. Lastly, the TES methodology will be implemented for the entire complex network of the technology, so as to cover both R&D and non-R&D activities.