Forecasting emerging technologies using data augmentation and deep learning

Deep learning can be used to forecast emerging technologies based on patent data. However, it requires a large amount of labeled patent data as a training set, which is difficult to obtain due to various constraints. This study proposes a novel approach that integrates data augmentation and deep learning methods, which overcome the problem of lacking training samples when applying deep learning to forecast emerging technologies. First, a sample data set was constructed using Gartner’s hype cycle and multiple patent features. Second, a generative adversarial network was used to generate many synthetic samples (data augmentation) to expand the scale of the sample data set. Finally, a deep neural network classifier was trained with the augmented data set to forecast emerging technologies, and it could predict up to 77% of the emerging technologies in a given year with high precision. This approach was used to forecast emerging technologies in Gartner’s hype cycles for 2017 based on patent data from 2000 to 2016. Four out of six of the emerging technologies were forecasted correctly, showing the accuracy and precision of the proposed approach. This approach enables deep learning to forecast emerging technologies with limited training samples.


Introduction
Forecasting emerging technologies is important for governments and enterprises to identify strategic opportunities in the face of technological change. Existing forecasting studies use either normative or extrapolative methods, while the latter mainly involves the analyses of bibliometrics and patents. For example, Daim et al. (2006) used patents and curve fitting techniques to forecast emerging technologies; Chang et al. (2010) proposed an approach employing bibliometrics and patent network analysis to forecast emerging technologies; and Breitzman and Thomas (2015a) assessed the value of patents by considering inventors' team-size, in order to identify high-value emerging technologies. However, these approaches were mostly concerned with bibliometric/patent indicators and ignored rich text information contained by these patents. In addition to these studies, other recent studies attempted to explore text mining and cluster analysis. For instance, Chang and Breitzman (2009) used the clustering of patents to identify emerging and high-impact technology clusters and trends; Chiavetta and Porter (2013) proposed the basic idea of tech mining to forecast emerging technologies using text and data mining, Choi and Jun (2014) developed a Bayesian model for patent clustering to forecast emerging technologies; Breitzman and Thomas (2015b) proposed the "Emerging Clusters Model" based on patent citations to identify emerging technologies across multiple patent systems; and Zhou et al. (2019a, b) developed a framework through citation network and topology clustering to reveal the convergence process of scientific knowledge to forecast emerging technologies. All these approaches utilized unsupervised learning to probe valuable text-based data. However, unsupervised learning methods cannot incorporate external domain knowledge during the machine learning process, and the results need to be professionally interpreted by domain experts that are usually rare, costly, and sometimes biased.
As a remedy, supervised learning methods can generate forecasting results by embedding external knowledge into the model using labeled samples. Some recent studies have explored the use of supervised learning. For example, Kreuchauff and Korzinov (2015) developed a support vector machine model based on robotics patents to detect the early development of an emerging technology in patent data. Kyebambe et al. (2017) used labeled data based on new classes established in the United States Patent Classification (USPC) system to train supervised learners to forecast emerging technologies. Lee et al. (2018) employed a feed-forward multilayer neural network to capture the complex nonlinear relationships between the input and output indicators to identify emerging technologies in early stages. Zhou et al. (2019a, b) developed a semi-supervised topic clustering model and generated a sentence-level semantic technological topic description to identify emerging technologies. Supervised learning requires high-quality labeled samples to prevent overfitting and ensure the accuracy of the forecasting model. However, large labeled samples are difficult to obtain.
As an advanced supervised learning process, deep learning has a relatively complex model structure and exhibits better performance (Liu et al. 2019). Some recent studies have explored the applications of deep learning in bibliometrics and patent analysis, such as patent classification , citation classification (Hassan et al. 2018), and natural language processing (Zhang et al. 2018). These studies showed that deep learning exhibits superior performances and great potential for forecasting emerging technologies compared to the traditional supervised learning methods. However, these studies used large-scale labeled sample-data to fully optimize the model parameters and lead to superior performance. Existing studies suggests that the sample-size of a dataset would significantly affect the deep learning performance (Goodfellow et al. 2016). However, large training samples (e.g., emerging technologies in history) are difficult to obtain due to data/resource constraints. To cope with this issue, we utilize a generative adversarial network (GAN) method, which recently emerged in computer science, as a data augmentation method to enlarge the data scale for emerging technologies samples.
Superior to basic deep learning, an integrated GAN-based deep learning can help to develop new approaches to address the problem of lacking emerging technology samples. A GAN consists of two deep-architecture functions for the generator and the discriminator, which can learn simultaneously from the trained data in an adversarial fashion (Radford et al. 2015). Most recent studies have shown that a GAN can effectively augment training samples. For example, Fiore et al. (2019) used a GAN to generate synthetic illicit transaction records and merged them into an augmented training set to improve the effectiveness of credit card fraud detection, and Pascual et al. (2017) proposed a speech-enhancement framework based on GAN. Prior to these publications, Santana and Hotz (2016) proposed an approach for generating images with the same distribution as real driving scenarios. In summary, GANs provide an opportunity to overcome the problem of the lacking training samples for applying deep learning to forecast emerging technologies. This paper, therefore, proposes a novel approach that integrates data augmentation and deep learning to forecast emerging technologies. First, we built training and testing sets by labeling emerging technology (ET) and non-emerging technology (NET) samples, collecting patent data for each technology, and extracting patent features. Second, a data augmentation method based on a GAN was employed to generate a large amount of synthetic data to train the forecasting model using a deep learning classifier. Finally, we evaluated the performance of the forecasting model with a testing set. We adopted Gartner's emerging technology hype cycles (GETHC) and the Thomson Innovation patent database to forecast technologies that emerged in 2017. The results show that our approach could forecast an ET 1 year before it emerged with high precision. Our proposed approach overcomes the problem of lacking emerging technology samples by combining a GAN with basic deep learning, and the integrated new model was proven to be effective given limited training samples in patents.
This paper is organized as follows. In "Related work" section, related work is presented, and then the research process and methodology are explained in "Methodology" section. Guidelines for the implementation and evaluation of our approach are presented in "Results" section. Finally, our conclusions are provided in "Conclusions" section.

Forecasting ET based supervised learning
Supervised learning uses a set of known categories of samples to optimize the parameters of the classifier, enabling the classifier to accurately fit the relationships between the features of data samples and the sample categories (Jung and Pedram 2010). Supervised learning approaches include Support Vector Machines (SVMs), Naive Bayes (NB), and Random Forests (RFs). Supervised learning has the following advantages over unsupervised learning: (1) the classifier can effectively introduce external knowledge to increase the reusability and external scalability through the learning of the labeled samples. (2) The trained classifier can automatically and quickly give a sample's category, which greatly reduces the degree of manual participation and prevents subjective biases caused by manual participation (Love 2002;Kyebambe et al. 2017). However, compared with unsupervised learning, supervised learning suffers from the major limitation that it relies on many high-quality labeled samples (Zhu et al. 2006).
According to Kyebambe et al. (2017), using supervised learning to forecast emerging technologies requires the forecasting problem to be transformed into the construction of classifiers. This first involves selecting historical emerging technologies as labeling samples and determining the time when they began to emerge. Second, the historical ET data 1 3 prior to emergence are collected, and the corresponding data features are extracted. Third, a classifier is built to discover the relationships between the historical data features and the ET or NET categories. Finally, the classifier is used to forecast whether it will become an ET in the future based on historical data for a certain technology (shown in Fig. 1). The general process of constructing the classifier mainly has three steps: labeling samples, constructing data sets, and training and testing classifiers (Fu and Aliferis 2010;Kong et al. 2017;Lee et al. 2018).

Deep learning in bibliometrics
Deep learning, developed by Hinton and Salakhutdinov (2006), has become the key technology of big data intelligence (Zhuang et al. 2017) and has led to major breakthroughs in many fields. Recent studies have also explored the value of deep learning in bibliometrics. Li et al. (2018) proposed an effective patent classification algorithm based on deep learning to solve the large-scale and multiclass patent classification problem and suggested that deep learning has several advantages in large-scale patent classification, including being free of handcrafted features, utilizing straightforward models, and being easy to implement without tedious feature engineering compared with traditional supervised learning algorithms. Hassan et al. (2018) compared deep learning and the classical statistical supervised learning models for classifying the importance of a citation using the same dataset, and the results showed that deep learning with all 64 features had a higher accuracy than SVMs and RFs using the 29 best features. This study also proved the modeling power of deep learning with complex features and large-scale data. Zhang et al. (2018) utilized word embedding, as one such application of deep learning in natural language processing, to map words from vocabulary to vectors and created a method to discover the latent semantics in large-scale text. This study showed the superior performance of deep learning in handling topic extraction tasks in large-scale text data.
The existing studies in bibliometrics have shown that deep learning, as the core of the big data intelligence method, exhibits high powerful modeling power and better performances than classical statistical supervised learning methods, and thus, it holds great potential for forecasting emerging technologies. However, large-scale labeled sample data in existing studies of bibliometrics are required to fully optimize the model parameters and achieve superior performance. A previous study suggested that the quality of a dataset can significantly affect deep learning (Goodfellow et al. 2016). Since there is limited historical emerging technology data, existing methods for forecasting emerging technologies using deep learning can be easily overfit during the training process, lowering the effectiveness of the forecasting. In other words, under the existing framework, the performance of the deep learning model is decided by the scale and quality of the labeled data. With a small sample size, the high performance of deep learning is generally restricted.

Data augmentation in small-scale samples
In previous studies, to address the insufficiency of the data scale, several oversampling methods were proposed. Their main advantage is that they are self-sufficient. In the early stage, the training set can be enlarged by duplicating the training examples of the minority class if the examples of different classes are imbalanced or by creating a new data set by adding artificial noise (DeRouin et al. 1991). Chawla et al. (2002) proposed a classic oversampling method known as the synthetic minority oversampling technique (SMOTE), which involves the creation of a synthetic minority class data set. Barua et al. (2014) proposed a majority-weighted minority sampling technique with the aim of generating valid synthetic samples. The existing research has mainly focused on imbalanced learning in which better performances can be achieved by adding oversampling instances to the minority class data set. However, data sets in many realms remain insufficient rather than imbalanced in every class. With the development of artificial intelligence and deep learning, GANs have provided opportunities to create new approaches to solve the problem of a lack of samples.
A GAN is a powerful type of generative model (Wang et al. 2017) introduced in 2014 by Goodfellow (Goodfellow et al. 2014). The GAN consists of two deep-architecture functions for the generator and the discriminator, which can learn simultaneously from the trained data in an adversarial fashion compared to oversampling technology (Radford et al. 2015). In the learning process, the generator captures the potential distribution of the real data and generates synthetic samples, while the discriminator discriminates between real samples and synthetic samples as accurately as possible. Recent work has shown that GANs have been successfully applied. Hwang et al. (2018) used a GAN for disease prediction and compared the predictive performance of an auxiliary classifier GAN (AC-GAN) with existing models, such as SVMs and adaptive boosting (AdaBoost), which are widely used in research involving medical data, such as disease prediction. Their results showed that the combination of stacked autoencoders and AC-GAN performed significantly better than existing algorithms for the problem of disease prediction. Fiore et al. (2019) trained a GAN to output mimicked minority-class examples, which were then merged with training data into an augmented training set to improve the classification effectiveness of credit card fraud detection. Zhu et al. (2018) proposed several evaluation methods using three benchmark data sets to validate GAN performance. The empirical results showed that a 5-10% increase in classification accuracy could be obtained after employing GAN-based data augmentation. GANs have been successfully applied in many fields, and they have the potential to solve the problem of limited samples in forecasting emerging technologies.

Methodology
Overall process Figure 2 shows the overall process of the proposed approach. Given the complexities involved, the proposed approach was designed to be executed in five discrete steps: ET and NET sample labeling, data set construction, data augmentation, forecasting model construction, and evaluation. A detailed discussion of how we performed each step is presented below.

Labeling ET and NET samples
In this study, GETHCs were used to identify ET and NET samples. The GETHC was proposed by Gartner (www.gartn er.com), which was established in 1979 as the world's first information technology research and analysis company. The GETHC aims to describe a specific stage of development of an emerging technology (Jun 2012). "Appendix 1" presents the GETHCs from 2008 to 2017.
The foundation of deep learning to forecast emerging technologies is to find the relationships between emerging technologies and their historical data. Thus, when we label the ET and NET samples, we must specify the time when a technology became an ET or NET. Knowing the specific time point, we can find the historical data corresponding to each sample. The GETHC released yearly can effectively solve this problem. According to the characteristics of the GETHC, if a technology entered the GETHC in a certain year, this indicates that this technology began to gain attention and showed the characteristics of emerging technology in this year. If a technology dropped out of the GETHC in a certain year, this technology was in last year's GETHC and disappeared in this year's GETHC, which indicates that this technology's influence began to decline and its characteristics as an emerging technology began to disappear. In our method, a technology that entered the GETHC for the first time in a certain year is labeled as an ET sample for that year, and a technology that dropped out of the GETHC in a certain year is labeled as a NET sample for that year. Technology evolves dynamically over time. A technology may enter the GETHC and become an ET in a certain year, and as the technology evolves, it may disappear from the GETHC and become a NET after several years. Thus, the same technology may be labeled as an ET sample in 1 year and may be labeled as a NET sample in another year.
The following are examples of labeling ET and NET samples. Cloud Computing entered the GETHC of 2008 for the first time, which indicated that Cloud Computing began to garner attention and show the characteristics of an emerging technology in 2008. Thus, we labeled Cloud Computing as an ET sample in 2008. As the technology evolved, Cloud Computing dropped out of the GETHC of 2015, which means this technology was in the GETHC of 2014 and disappeared in the GETHC of 2015. This indicates that Cloud Computing's influence began to decline, and its characteristics as an emerging technology began to disappear in 2015. Thus, we labeled Cloud Computing as the NET samples in 2015. The same method was used for labeling ET and NET samples in other years.

Data set construction
After labeling ET and NET samples, patents for the corresponding technology were retrieved from the patent database. We subsequently extracted patent features and used them to create a patent feature vector of each ET and NET sample. Finally, all the patent feature vectors of the ET and NET samples were used to construct the training set and testing set for the training and evolution of the forecasting model.

Patent data collection
When collecting historical patent data corresponding to each ET and NET sample from the patent database, patent data should be collected according to the time point when each sample become an ET or a NET. The purpose of the proposed method is to forecast whether a certain technology will become an emerging technology 1 year later. Therefore, we need the deep learning model to find the relationships between an ET or NET sample that becomes an ET or a NET in year T and all the corresponding historical patent data until year T − 1. We collected historical patent data until year T − 1 for each ET and NET sample in year T. The patent data were collected from the Thomson Innovation (TI) patent database, which contains comprehensive and high-quality patent data.
According to the sample labeling method in "Labeling ET and NET samples" section, technology may be labeled as an ET sample and a NET sample in different years. However, when collecting the corresponding patents, the historical patent data of this technology as a sample of ET and NET is different because of the different years when this technology became an ET and a NET. When the same technology is labeled as an ET sample and a NET sample in different years, for the ET sample in year T, we collect historical patent data until year T − 1, and for the NET sample in year T, we collect historical patent data from the year it becomes an ET sample to the year T − 1.
The following are examples of collecting patents. Cloud Computing is used as an example for the labeling of ET and NET samples. This technology was labeled as an ET sample in 2008. Thus, we collected the corresponding patents until 2007 for Cloud Computing. As the technology evolved, Cloud Computing was labeled as an NET sample in 2015. Thus, we collected the corresponding patents from 2008 to 2014 for Cloud Computing. The same method was used to collect corresponding patents of each ET and NET sample in other years.

Patent feature selection
Patent feature selection is crucial to this study, because forecasting accuracy greatly depends on the relevance of the patent features for emerging technologies. Earlier reports have presented a variety of patent features to capture the characteristics of emerging technologies. This study employed a total of seven features from previous studies to capture the key characteristics of emerging technologies, which are divided into five sub-categories (shown in Table 1): low-novelty, science-intensity, growth speed, scope and coverage, and development effort and capabilities (Lee et al. 2018). These features can be extracted from patent databases immediately after the relevant patents are collected.
(1) Low-novelty Earlier studies have explored the relationship between patent novelty and backward citations. Some studies suggested that the breadth of patent backward citations can capture the absorption of previous technological innovations from many different fields and the novelty of a patent increasing with the breadth of the backward citations . Some other studies suggested that patent backward citations can capture the related prior publications and patents in the patent applications and patents with large numbers of backward citations have a relatively low novelty and low monetary value (Harhoff et al. 2003;Lee et al. 2018). In this study, we consider that the number of backward citations tend to directly measure the number of related prior publications and patents and the breadth of the backward citations requires other metrics.
(2) Growth speed Since the growth speed cannot be fully observed at early stages of technology development without continuous monitoring and surveillance, the technology cycle time (TCT) was employed in this study, which captures the degree of newness of prior knowledge or the pace of technology progress (Bierly and Chakrabarti 1996;Kayal and Waters 1999) in this sub-category. (3) Science-intensity Earlier studies have suggested more scientific knowledge contained in the patented invention may lead to the development of more innovative and influential technology (Cozzens et al. 2010;Day and Schoemaker 2000). The number of nonpatent citations (Trajtenberg 1990) was employed in this study to capture the closeness to scientific knowledge. (4) Scope and coverage This sub-category contains three patent features to represent the scope and coverage of patents. The first feature is the number of claims, which indicates the scope of the legal protection conferred by patent claims (Lanjouw and Schankerman 2004). The second feature is the family patent size, which corresponds to the number of times the same invention has been developed in different countries and represents the economic and technical importance of the invention (OuYang and Weng 2011). The third feature is the number of International Patent Classification (IPC), which represents the scope of the technological fields of a patent (Lee et al. 2009).  (5) Development capabilities Many empirical studies have found that there is a significant positive relationship between forward citations and development capabilities (Lerner 1994;Martin 1995). This study employs forward citations, which captures the development capabilities.
At this point, each ET and NET sample comprises many patents. However, for the subsequent data augmentation and deep learning analysis, we must represent each ET and NET sample as a single feature vector that has the same features as the individual patents. As such, drawing on previous research (Kyebambe et al. 2017), for a given ET or NET sample, we compute the feature value by taking the arithmetic mean of values for the constituent patents. For example, if an ET or NET sample comprises 100 patents, we first compute the seven features of each patent, after which we compute the average of each feature for all 100 patents, and finally we construct a 7-dimensional feature vector representing this sample with the average value of each feature.

Training set and testing set construction
The feature vector of each ET and NET is calculated according to the patent features described in "Patent feature selection" section. We obtain a set of patent feature vectors labeled as ET or NET. To precisely evaluate the performance of the forecasting model, the set of patent feature vectors is randomly divided into a training set and a test set at a ratio of 7:3. Due to the small scale of the training samples, we utilize a data augmentation method based on a GAN to generate a large number of synthetic samples to train the forecasting model and improve the forecasting accuracy. In the next section, we briefly explain the process used for data augmentation based on the GAN.

Data augmentation based on GAN
To solve the problem of a lack of emerging technology samples, the approach of data augmentation using a GAN is proposed to enlarge the data scale of emerging technology samples. After the ET and NET training sets are built, we use the original ET and NET training sets to build the corresponding GAN and generate synthetic samples. The workflow of generating ET or NET synthetic samples based on GAN involves two steps (shown in Fig. 3): (1) The generator of the GAN begins to generate the original synthetic samples when the loss functions of the generator and discriminator of the GAN converge after being trained using the ET or NET training sets several thousand times.
(2) The trained generator of the GAN is used to generate the synthetic samples and the discriminator is used to filter these samples. In the synthesized ET or NET samples created by the GAN generator, samples that fool the discriminator are selected as the final synthesized samples. According to the adversarial idea in GANs (McDaniel et al. 2016), the generator attempts to generate synthetic samples that can fool the discriminator while the discriminator tries to distinguish between real samples and synthetic samples. This means that when ET or NET synthetic samples were discriminated as real by the discriminator, the synthetic samples were more akin to the distribution of the real ET or NET training sets.
Training of the GAN involves finding the parameters of a discriminator (D) that maximize its classification accuracy and finding the parameters of a generator (G) that maximally confuses the discriminator. The cost of training is evaluated using a value function, which is defined in Eq. 1, that depends on D and G. During training, G and D play a minimax game with the value function, D and G are updated, and the iteration stops until a Nash equilibrium is achieved. In greater detail, D(s) is the probability that s comes from the real data, G(z) is the synthetic sample that is generated by the generator, D(G(z)) is the probability that the synthetic sample is discriminated as real by the discriminator. Equation 1 is as follows: The hyperparameters of the GAN include the hyperparameters of the generator and discriminator. The generator and discriminator of the GAN both have deep neural network (DNN) structures. The input of the generator is white noise, the number of input units is equal to the dimension of the white noise, the output is the synthetic sample, and the number of output units are equal to the number of selected patent features, which is seven. The number of hidden layers in the network and the optimal number of units per layer must be experimentally determined. The input of the discriminator is a real or synthetic sample, and the output is the category of the real or synthetic samples. The number of input units is equal to the number of selected patent features, which is seven, and the output unit is one no-activation-function unit. The optimal number of hidden layers in the neural network and the number of units per layer must be experimentally determined.

Forecasting model construction based on DNN classifier
After the augmentation of the original ET and NET samples, the GAN will generate a large number of synthetic samples. To make full use of the advantages of big data, we constructed a DNN classifier based on deep learning to forecast emerging technologies. DNN classifiers based on deep learning are complex and have larger model capacities. After extensive training on large-scale labeled samples, they can exhibit superior performances (Goodfellow et al. 2016). Meanwhile, the multilayered neural network structure of a DNN can learn the multilevel abstract features of sample data in which high-level features are constructed by low-level-feature combinations, which can more effectively express the distribution characteristics of the data and produce a better learning result than the classical statistical supervised learning model (Bengio and Lecun 2007). The construction of the DNN classifier includes training and testing. First, the DNN classifier is trained with many synthetic ET and NET samples generated by the GAN. The DNN classifier is subsequently tested with partially independent real ET and NET samples. Testing DNN classifiers with real independent ET and NET samples simulates the real forecasting environment and can effectively reflect the general performance of a classifier. The input of the DNN classifier is a synthetic sample, and the output is the ET or NET classification. The number of input units is equal to the number of selected patent features, which is seven. The number of output units is equal to the number of categories, which is two. The number of neural network layers and the number units per layer must be determined experimentally.
When using the trained DNN classifier to forecast emerging technologies, it is necessary to collect the corresponding patent data for the technology to be forecasted and extract the patent features to construct the feature vector. By inputting the vector of one technology to be forecasted into the DNN classifier, the DNN classifier can directly and automatically forecast whether the technology will become an emerging technology in the next year.

Evaluation
To test the performance of the DNN classifier, we used three classification metrics based on a confusion matrix (Table 2): accuracy, F1, and G-mean. Accuracy is the proportion of predictions that were correct, F1 is the harmonic mean of the precision and recall, and the G-mean indicates the geometric mean of the recall (Sun et al. 2007). Accuracy, F1, and G-mean are defined as follows: In these equations, TP, TN, FP, and FN are the number of true positive samples, true negative samples, false positive samples, and false negative samples, respectively. Further, recall and precision are defined as follows: (2) Accuracy = TN + TP TN + TP + FP + FN , Precision = TP TP + FP .

Analysis result of the proposed approach
Based on the GETHCs from 2008 to 2017, 57 ET and 48 NET samples were extracted. Tables 3 and 4 list the samples of each ET and NET. We retrieved and downloaded the patents for all the ET and NET samples and calculated the patent feature vector for each technology to construct the data set. The details of the data set are shown in "Appendix 2". In "Appendix 3", patent feature descriptive statistics are reported. The results of the descriptive statistics showed that different patent features have different distributions, and the same patent features between ET and NET samples also had different distributions. In "Appendix 4", from the results of Pearson correlation analysis, Number of IPC had significant positive correlations with Number of claims and Family patent size. These three features were used to represent the scope and coverage of the patent, and the correlation The correlation analysis results showed that the sub-categories were not completely independent and had certain correlations. However, there were no strong correlations between the different sub-categories, with the correlation coefficients not exceeding 0.6. Since the multi-layer nonlinear structure of deep learning does not require strict independence of features (Valmadre et al. 2017), we assumed that the correlations between the selected features would not have a significant impact on the performance of the deep learning.
According to the data augmentation method, we first use the ET and NET samples in the training set to train the corresponding GAN and then used the trained GAN to generate synthetic samples. The GAN consisted of two deep-architecture functions for the generator and the discriminator, as many hyperparameters could influence the performance of a GAN. The number of layers in the generator and discriminator was a fundamental hyperparameter. Too few layers would hinder the ability of the network to build a representation at a level of abstraction to adequately capture the data complexity, and too many layers would complicate the training substantially and likely result in overfitting (Fiore et al. 2019). As a reasonable tradeoff, networks with two and three hidden layers were tested in the generator and discriminator. The optimal parameter was determined by the convergence value of the loss function of the generator and discriminator. Since training and tuning a GAN is an expensive operation, we conducted a limited number of experiments, in which 4, 8, 12, 16, and 32 nodes with two and three layers were tested. Through a series of experiments on the GAN, the best-performing hyperparameters were determined. The generator had two hidden layers containing four ReLU units, seven softmax units are used as the output layer, and the dimension of the noise vector z was set to four. The discriminator also had two hidden layers containing four ReLU units, and one no-activation-function unit was used as the output layer. The ET and NET GAN had the same hyperparameters. In each iteration of the GAN training, the discriminator first iterated 100 times, and then the generator iterated once. The GAN development environment was TensorFlow 1.1 with Python 3.5.2, and it was trained through a GPU. After the synthetic samples were generated, they were used to train the DNN classifier. The DNN classifier was subsequently tested using the test set samples. Hyperparameters for the DNN classifier were empirically determined using a similar procedure as that used to determine the hyperparameters of the GAN. We also conducted a limited number of experiments, in which 4, 8, 12, 16, and 32 nodes with two and three hidden layers were tested. The optimal parameter was determined by the accuracy, F1, and G-mean. Through a series of experiments on the DNN classifier, the best-performing hyperparameters of the DNN classifier were determined. The classifier had two hidden layers, each containing 32 ReLU units. Two softmax units were used as the output layer, the dimension of the classifier's input was seven, and cross entropy was used as the loss function. The number of iterations was set to 3000. The DNN classifier's development environment was TensorFlow 1.1 with Python 3.5.2, and it was trained through GPU.
Existing studies on data augmentation have shown that the number of synthetic samples significantly affects the performance of supervised learning models (Natten 2017; Fiore et al. 2019). To evaluate the effect of the synthetic training ET and NET sample size on the performance of the DNN classifier, we used a different number of synthetic ET and NET samples to train the DNN classifier and used test samples to evaluate the performance through the accuracy, F1, and G-mean. From 100 synthetic training samples of each class, more than 100 synthetic samples were generated each time to train the DNN classifier, and the changes in the accuracy, F1, and G-mean are shown in Fig. 4. When the number of ET and NET synthetic samples exceeded 1000, the accuracy, F1, and G-mean no longer increased but fluctuated within a certain range. In other words, when the synthetic ET and NET training sample size was 1000, the performance of the GAN began to converge, and an effective DNN classifier for forecasting ETs could be obtained. Table 5 shows the forecasted results for the DNN classifier on the test set data when the number of synthetic ET and NET samples generated by the GAN was 1000. A total of 14 ETs and four technologies were identified as NETs with a precision of 71%. There was a total of 17 NETs, and three technologies were identified as ETs with a precision of 82%. There were seven technologies wrongly forecasted in the test set data, indicating that the forecast model could forecast ETs with an accuracy of 77% 1 year before their emergence.

Evaluation of proposed approach
To evaluate the effectiveness of the proposed method, we used statistical supervised learning classifiers, SVM, NB, and RF, for comparison experiments. As classic supervised learning classifiers, SVM, NB, and RF exhibit higher classification accuracies and better general performances than other statistical supervised learning classifiers. The results of the comparative experiments in the two categories of ET and NET are shown in Table 6. Table 6 shows that for the same data set, the accuracy, F1, and G-mean of the SVM, NB, and RF supervised learning classifiers were lower than those of our proposed GAN-DNN. The comparison of the evaluation indicators shows that the forecasting quality of the classical supervised learning classifiers was lower than that of the combined GAN-based data augmentation and DNN-based forecasting model proposed in this study. The results of the comparative experiments showed that our approach enabled us to obtain an effective forecasting model based on the GAN and DNN classifier without large-scale ET and NET samples.

Validation of forecasting model
To validate the forecasting effect of the proposed method in a real environment, the model must be trained based on the current data and forecast whether technology will become an emerging technology in the future. Therefore, we applied the method to the available data in 2016 to make predictions for 2017 and validated the forecasting effect based on the real data for 2017. According to the ET and NET sample labeling method in "Labeling ET and NET samples" section, whether technology will become an ET or NET in the next year is measured by whether the technology will enter the GETHC for the first time or disappear from the curve in the next year.
In this paper, we utilize the historical data from 2000 to 2016 to train forecasting model to forecast whether technology will become an emerging technology in 2017, and utilize the forecasting results of 2017 to validate the forecasting effect. Firstly, all the ET and NET samples from 2000 to 2017 were labeled from the GETHC according to the proposed method. Next, the patent feature vector corresponding to the ET and NET samples from 2000 to 2016 were adopted to train the forecasting model. The ET and NET samples from 2017 were not used in the training. Finally, the forecasting effect of the forecasting model was validated with the ET and NET samples labeled in 2017 from GETHC. The model parameters of the GAN and DNN classifier were consistent with those in "Analysis result of the proposed approach" section and the number of synthetic ET and NET samples was also 1000.
For 2017, six ET samples and two NET samples were labeled from the GETHC. The 6 ETs were 5G, deep learning, edge computing, cognitive computing, digital twin, and deep reinforcement learning. The two NETs were 802.11ax and micro data centers. 5G and cognitive computing were incorrectly forecasted, and the other four ETs were correctly forecasted by the proposed method. The forecasting results for the ET samples in 2017 showed that our method based on GAN and DNN could forecast emerging technologies 1 year before they emerged with high precision and few samples.

Conclusions
A novel approach for forecasting emerging technologies using data augmentation and deep learning was proposed in this study. The essence of this proposed approach was to integrate data augmentation (GAN) and deep learning, which enabled deep learning to effectively forecast emerging technology with limited training samples. Specifically, this paper constructed a sample data set of emerging technologies from the GETHC and TI patent database, and subsequently used a GAN to augment the sample data set and construct a forecasting model based on DNN classifiers. The test results showed that the forecasting accuracy reached 77% when the synthetic sample size was 1000. Finally, this approach was used to forecast technology in 2017. Four of the six emerging technologies were correctly forecasted. This verified that the model could, given limited samples, forecast emerging technologies 1 year before they emerged with high precision.
The contributions of this research are twofold. First, this study contributes to technology forecasting literature by proposing a novel approach that advances the basic deep learning method for forecasting emerging technology. In previous research, largescale labeled sample data was required to fully optimize the parameters of the deep learning model and obtain a superior performance compared with the other traditional supervised learning methods. Our proposed approach utilized a GAN to overcome the problem of lacking training samples, and the integrated new model was proven to be effective, even without large training samples in the patents. Second, from a practical perspective, the proposed approach is more effective than previous unsupervised methods when embedding external knowledge into the forecasting model through deep learning classifier. After a forecasting-model-based deep learning classifier was constructed, we can obtain the forecasting results effectively on a real-time basis without requiring extra work for experts' interpretation, which is usually less-efficient and may lead to significant biases in technology forecasting.
The main objective of this paper was to use a GAN to overcome the problem of lacking training samples. Thus, we selected patent features that were simple and directly verified the effectiveness of the proposed method. In this study, the patent features we explored were all external features that had better consistency in theory, and they were selected through a review of prior literature. The empirical results showed that all these patent features had strong correlations with emerging technologies. Compared to the external features used in this paper, internal semantic patent features based on text mining and semantic analysis may elicit patent information more deeply and comprehensively. However, this requires more complex methods and would increase the uncertainty of the method in the feature extraction stage, creating difficulties for the verification of the effectiveness of the proposed method. Thus, in this study, we chose not to consider internal semantic patent features; however, it would be valuable to explore this concept in future research.