A smooth dynamic network model for patent collaboration data

The development and application of models, which take the evolution of networks with a dynamical structure into account are receiving increasing attention. Our research focuses on a profile likelihood approach to model time-stamped event data for a large-scale network applied on patent collaborations. As event we consider the submission of a joint patent and we investigate the driving forces for collaboration between inventors. We propose a flexible semiparametric model, which allows to include covariates built from the network (i.e. collaboration) history.


Introduction
The analysis of network data has seen increasing interest in the recent years.Many network data thereby contain a dynamic structure, be it the development of network ties over time or observations of the network at different time points.Such data structures have led to numerous extensions of classical variant and time-independent network statistics based on the history.This approach is applied in Vu et al. (2011) to a dynamic egocentric model for citation networks based on a multivariate counting process.The models are based on the Relational Event Model (REM) from Butts (2008) who introduced a flexible likelihood-based framework for social actions (also called relational event).This approach includes the event history in modeling the behavior and its complex dependence structure.While Butts (2008) focuses on modeling the rate of relational events, Brandes et al. (2009) extend this framework to weighted events where the influence of the quality of actions is in the center of interest.
In the cited papers above, all covariate effects are included linearly in the model.We propose a semiparametric approach for modeling the covariates in a more flexible way.We follow the idea of penalized spline smoothing as proposed in Ruppert et al. (2003) (see also Eilers and Marx, 1996;Ruppert et al., 2009).The basic idea is to replace linear functions by spline based functions and to achieve smoothness, a penalty is imposed on the spline coefficient.Penalized spline smoothing can be considered as the state-ofthe-art smoothing technique where we refer to Wood (2017) for a general discussion in the framework of (generalized) regression models.
The paper is organized as follows.First, in Section 2 we introduce the patent data set of the application with some basic ideas and descriptive statistics.Then, in Section 3, we outline the general framework of the structure of the underlying patent data set, give a short introduction into the notation and motivate the construction of the covariates from the network history.Then we take a closer look on inference and how the model based on a profile likelihood approach can be estimated with linear covariates and its extensions to penalized spline smoothing.We give a brief outlook on computational issues, before we apply in Section 4 these techniques to the example data set.Finally, we summarize the most important issues.

Patent data
We will first introduce the patent data in detail before describing the model in the next section.We consider all patent applications submitted to the European Patent Office (EPO) and the German Patent and Trademark Office (Deutsches Patent-und Markenamt, DPMA), which listed at least one inventor with an address on German territory between 2000 and 2013.This selection should yield a comprehensive database of all inventions filed in patent applications by German inventors.It is possible that some inventors may have submitted applications directly to patent offices of other countries, but in practice such cases are extremely rare, since the invention would not enjoy patent protection in the inventors home country.The data were extracted from the PATSTAT database of the European Patent Office (version October 2018).The data consist of patented inventions from different technological areas.For each patent we have information about the submission day and for most inventors geographic coordinates of their registered home address at the time of submission is given.We assume that the inventor location stays the same until new information due to new patent submissions is given.
We focus on four technological areas -basic communication processes (105), IT-methods (107), analysis of biological material (111) and food chemistry (118) -with different numbers of inventors, patents and therefore network densities.Some exploratory information is provided in table summarizes the selected inventor networks for the whole time period and Figure 8 in the Appendix explicitly visualizes two of them.The basic communication processes technological area has the highest number of patents, but quite less inventor pairs applying for a patent during the observed time period of 14 years.In this area the number of single ownership patents is three till six times higher than for the other fields.On average over all technological areas the inventors applied for 1.7 patents, whereas one person in area 111 (analysis of biological material ) is involved in 83 patents.
The number of involved inventors per patent varies between one and 93, with an overall average of 2.7 inventors.
As time stamp we choose the earliest filing date, which is aggregated on a monthly basis.To adjust for incomplete data, we select only patents from the full years 2000 till the end of 2013, resulting in 168 months.We are interested in inventors that jointly apply for patents.Therefore, we only include inventors with at least one joint patent.Note, that there are of course single ownership patents in the data sets if the inventor has other joint patents.
Beside the number of inventors (about three to five thousand), also the number of patents in total and patents with single ownerships vary in the four technologies.Noticeable is that the number of observed inventor pairs applying for a patent during the observed time period of 14 years is quite small in comparison to the possible number of pairs (N (N − 1)/2).In other words the density of the networks is small.Furthermore, in all areas the mean number of patents per inventor is quite low.Therefore and due to content-related reasons, we restrict the actor sets to sets of active inventors in a period of three years.To do so, we split the data sets from above into four periods of three years starting from the beginning of 2002.We will analyse each time interval separately.The first two years of data from 2000 to the end of 2001 are used as "burn-in" for the covariates.We include only active inventors in the option set.An active inventor is defined as a person with at least one patent within the observed time period of three years (e.g.inventor 4 or 7 in Figure 1), or at least one patent within and one beyond the time period (e.g.inventor 6 or 8 in Figure 1), or at least one patent before and one after the time period (e.g.inventor 5 in Figure 1).We want to point out, that the covariates are based on a five years retrospective interval, meaning that the inventors' history beyond the five years is ignored in the calculation of the covariates.
In the application we focus on a model with four network specific covariates: the overall sum of patents of each inventor pair within the considered inventors' history ("patents ij "), the number of joint patents per pair ("joint patent"), the number of inventors that hold a joint patent with inven- tor i or j ("2-star ") and the number of inventors that jointly hold a patent with i and j ("triangle").These numbers differ for the technological areas and time periods.Table 2 gives an overview for the covariate ranges due to the variation between the different time periods.Furthermore, we see that the number of inventors varies between 753 and 1466 depending on the technological area and selected time period.The realized number of edges in the considered networks at the end of the observation period ranges between 1188 and 6306.Figure 2 visualizes two networks of the food chemistry area (118) for two selected time periods.The covariates are the number of patents of i and j ("patents ij "), the number of joint patents of i and j ("joint patent"), the number of inventors that hold a joint patent with i or j ("2-star ") and the number of inventors that jointly hold a patent with i and j ("triangle").
3 Poisson process network model for count data

Model description
We motivate the model by directly referring to our data example.Let Z r be a patent indexed with a running number r = 1, ..., R. Each patent can be defined through the following attributes: • t r = time point at which patent r was successfully submitted For a set of actors (inventors) A = {1, ..., N } we define with Y (t) ∈ R N ×N the matrix valued Poisson process counting the number of (joint) patents.
To be specific, let Y ij (t) = cumulated number of joint patents of inventor i and j up to five years at time t = #{r : (i, j) ∈ I r , t r ≤ t, r = 1, ..., R} for i, j = 1, ..., N , where Y ii (t) defines the number of patents of inventor i including single ownership patents.We are primarily interested in the number of joint patents of a retrospective history of five years and observe the process at the time points where one or more joint patents have been successfully submitted.We define with Y ij,d = Y ij (t (d) ) the evolving process, where t (1) , t (2) , . . ., t (m) is the discretized version of time at which patents have been submitted.We model the intensity of the above process as where λ 0 (t) is the baseline hazard of the process and x ij (t) is the covariate process, which will be defined in the following section.We assume for sim-plicity that both, the baseline hazard as well as the covariate process are piecewise constant between the observed time points, that is This leads to the log-likelihood function where C d is the index set of events at time point t (d) , and O d is the "option" set, that is the set of inventor pairs that could submit a joint patent.This option set can be regarded as the set of inventors who are able to work together.In our application this restriction occurs from being in the same technological area and being an active inventor like defined above in the description of the data.Maximizing the above likelihood with respect to λ 1 , . . ., λ m yields and inserting this in (2) provides the profile log-likelihood omitting all constant terms.In principle and based on the Poisson process we observe at each time point a single patent submission only, resulting in maybe more edges at t if more than two inventors are involved in a patent submission.In practice, however, the time points are discretized so that at each discrete valued time point t (d) we may observe more than just one submitted patent.Note that this may be caused by one (or more) patent submitted at the same discrete time point but with more than two inventors as patent holders or by more than one patent submitted at time point t (d) by different inventor pairs.Let Y d = (Y ij,d ) be the process network matrix.The profile log-likelihood (3) is also obtained if we assume that the probability for a single change where 1 ij refers to an increment of 1 in entry Y ij,d and x ij,d is a vector of covariates calculated from the previous process matrix only a single patent was submitted by i and j at time point t (d) , we obtain where O d being an inventor tuple from the "option" set.If |C d | > 1 we approximate (4) with We can now easily derive the log-likelihood from equation ( 3) and obtain the score function Defining In the survival model context formula (5) is also known as Breslow approximation (see Breslow, 1974).

Covariates
The covariate vector x ij,d is built from the network history and exogenous covariates.We describe the network related covariates first.First, we take the total number of patents of inventor i and j at time point t (d−1) .That is In the application we refer to it as "patents ij ".Moreover, the number of previous "joint patents" of inventor i and j is included as covariate, which is calculated through Furthermore, a so-called 2-star statistic ("2-star ") is included, which expresses the number of inventors that hold a joint patent with inventor i or j.This is obtained through A common choice in network analysis are also "triangle" statistics.This counts the number of inventors that jointly hold a patent with i and j: We restrict our analysis to these four structural covariates, which are visualized in Figure 3.As exogenous quantity in our application we include the inventor-pairspecific distance in kilometers, that is where s i,d are the geocoordinates of the address of inventor i and s j,d accordingly and || • || denotes the Euclidean distance.We assume that the inventors do not move until new location information on the basis of submitting a new patent becomes available.Due to only few data points, distances over 1000 kilometers are set to 1000 kilometers.

Semiparametric Estimation
We now extend the model towards penalized smoothing techniques to obtain more flexibility.We therefore replace the linear predictor η ij,d = x ij,d β in (3) through the additive nonparametric setting and λ 0 (t) = exp(m 0 (t)) being a smooth term of the time.Here m(•) are smooth but otherwise unspecified functions, which extend the linear effects.
To achieve identifiability of the model we postulate m (p) (0) = 0 for p = 1, 2, . . ., P. To estimate the unknown functions we employ B-splines and replace m(x) by where B k (•) is a K dimensional B-spline basis (see de Boor, 1978;Wood, 2017).
For simplicity of notation we now replace the index pair (i, j) by a single index l running from 1 to n = N •(N −1)

2
. Consequently, we can rewrite (2) , . ..) provides the final notation.In the estimation we need to take the constraint m (p) (0) = 0 for the covariates built from network history into account.This constraint means that there is no effect to estimate if the independent variables are zero.We make use of this point constraint, which is an alternative to the sum-to-zero identifiability constraints for smooth terms, which is used for the exogenous covariate.
With this notation we can reformulate the profile likelihood in (3) as following: where 1 C d is a vector defined as is a vector of ones of length n and B d is a matrix of n × P • K.
Following Eilers and Marx (1996) we use a large number of knots but regularize the estimation by introducing a roughness penalty (see also Ruppert et al., 2003Ruppert et al., , 2009)).This leads to the penalized smooth log-likelihood where K(λ) is a second-order penalty matrix.The smoothing parameter vector λ penalizes large differences in adjacent basis coefficients.Details are provided in the Appendix C.

Computational issues
In principle, computation is straight forward, because we can derive the corresponding likelihood function and its derivatives.Nevertheless, we have a huge option set of inventors pairs for each time point.A data set with N inventors results in N (N − 1)/2 times T time points and therefore in about 18 million data points for e.g.N = 1000 inventors and T = 38 months.
For estimating the parameters, we need to maximize the profile loglikelihood of equation ( 3) for the linear framework and the penalized smooth log-likelihood (8) for semiparametric estimation.To do so, we can make use of the flexible toolbox available in the package mgcv (see Wood, 2011, for further information) in the software R (R Core Team, 2017).This becomes possible since a Cox proportional hazards model can be estimated as a linear or additive Poisson model in case of smooth predictors (see Whitehead, 1980).To do so, an ordinary time-to-event data representation is necessary (see Tutz et al., 2016).At each event time t (d) an artificial response variable y ij,d for every inventor pair from the option set is included with y ij,d = 1 if a patent was submitted at time t (d) or y ij,d = 0 if not.The generalized Poisson model can than be fitted to that artificial data with the time variable as intercept and including offsets o ij,d .Friedman et al. (1982) showed that with this data structure the log-likelihood resembles the log-likelihood of the Poisson regression model, which we utilize here.
With this simple data transformation we can take advantage of the whole machinery of mgcv with the methods and algorithms for model extensions like including spatial, random, or nonlinearly time-varying effects (see Bender et al., 2018).Furthermore, a variety of automated smoothing approaches including constraints can be used.The estimation of the smoothing parameters are solved by using the Un-Biased Risk Estimator (UBRE) criterion.

Data analysis 4.1 Linear estimation
We first estimate a model with linear effects for the four technology areas for the different time periods.All models include the above mentioned structural covariates "patents ij ", "joint patent", "2-star ", and "triangle", and the exogenous covariate "distance [100 km] ".The quite small standard errors can be explained with the huge data set.Figure 4 compares the estimates for the four considered time periods.The different technology areas show more or less the same behaviour.The biggest effect can be seen for the variable joint patent.The more joint patents two inventors have, the more likely they collaborate in the future.The estimates for 2-star and triangle are quite small.A negative or quite small effect results for the number of patents of inventor i and j.This means that a lot of own patents (with other inventors or even single inventor patents) reduce the potential of new joint patents.
The distance in 100 kilometers has a negative effect on the patents meaning that inventors with regional proximity are collaborating more likely.

Semiparametric estimation
This subsection visualizes the estimated smooth effects for our model with linear effects replaced by spline based fits for "patents ij ", "joint patent", "2-star ", "triangle", and "distance".In Figure 5 the model is estimated exemplary for the technological area food chemistry (118) and the second time period.We see that the sum of patents of inventor i and j has a negative effect, whereas the number of joint patents has a positive and stronger effect.This means that if the inventors have already submitted several own patents (with other inventors or even single inventor patents) their affinity of being involved in new patents decrease.On the other hand, if the inventor pair has already joint patents in the past, they are more likely to work together in future.Both effects are bounded.The effects of the structural statistics like the number of inventors that hold a joint patent with inventor i or j or the number of inventors that jointly hold a patent with i and j do not show a significant tendency.Moreover, the geodesic distance of two inventors plays an important rule.There is a larger positive effect for small distances, which decreases with increasing distance.For distances larger than 250 kilometres the effect is almost zero or negative.This means that if there is a certain distance between the inventors, it does not matter how many kilometers exactly.same for all periods, while there is a steep increase at the beginning, which then becomes bounded.In period three and four the effect decreases and increases, respectively, at the end of the observation period.This should not be interpreted too strictly as the frequency of more than 10 joint patents is quite low.We can see similar behaviours for the other areas (see Appendix).

Model validation
To evaluate the model performance, we select the third time period of food chemistry area (118).This data set is divided into a training set, which contains the first two years and a test data based on the last year of that time period.We estimate our model on the training set and predict the probability of present ties in the test data.The predicted probabilities are most of the time almost zero and have a mean of 0.0000069.The maximal value is 0.05.This result is not surprising as the network is very sparse, only a proportion of 0.0001 patents are present.This is visualized in Figure 7 in the left panel, where the predicted values are plotted separately by the true observed ties.The (red) dashed added line shows the optimal cut point for the classification into present and absent ties for the test year and is based on sensitivity and specificity.The right panel shows the precision recall curve for the test set.Recall, or also called true positive rate (sensitivity), is defined as the fraction of present patents that are correctly predicted by the model.Precision measures the fraction of predicted present patents that actually occur.This measurement is useful in model validation, like in this case, where the number of existent ties are quite small in comparison to all possible ties (see Maalouf and Trafalis, 2011).The visualized precision recall curve in Figure 7 has a maximal value around 0.13 and drops down to zero quite fast.This indicates that an optimal cut point for the predicted probabilities needs to be quite low.

Conclusion
In this paper we propose a flexible approach to model large-scale dynamic network data with structural and exogenous covariates.Our approach is based on a profile likelihood method exploiting well-established estimation routines.We applied this idea to a large data set of patents submitted jointly by inventors from Germany between 2000 and 2013.We showed advantages of including covariates in a semiparametric and therefore flexible way.The results show the driving forces in collaboration of inventors and demonstrate their behaviour over time.The models could be fitted with standard software and therefore invite to be used in other data constellations as well.Table 3: Summary statistics of covariates for different areas for the time period of 14 years.The covariates are the number of patents of i and j ("patents ij "), the number of joint patents of i and j ("joint patent"), the number of inventors that hold a joint patent with i or j ("2-star "), and the number of inventors that jointly hold a patent with i and j ("triangle").

Appendix C: Technical Details
The second-order difference penalty matrix can be defined as

Figure 1 :
Figure 1: Outline of data management.The time period from 2000 till the end of 2013 is divided in four periods (2002 − 2004, 2005 − 2007, 2008 − 2010 and 2011 − 2013) of three years each.The data is aggregated on a monthly grid.The years 2000 and 2001 are used as a burn-in time for the covariates.If e.g. the observed time period from the beginning 2005 till the end of 2007 is selected, only active inventors, like inventor 4 − 8 are included in the option set.Inventors with a patent structure visualized with 'x' in this figure, are excluded.

Figure 2 :
Figure 2: Visualization of two time periods of the inventor network for food chemistry (118).Vertex size represent nodal degree.Colouring is transparent to better examine the clusters.The layout uses maximal connected components and applies the layout separately.

Figure 3 :
Figure3: Visualization of covariates from network history of a toy network graph: Number of patents of inventor i and j with x (1),ij,d = 6 + 8 (black edges), including self-loops (single ownership patents) and multiple patents (first panel).Number of joint patents of inventor i and j with x (2),ij,d = 2 (black edges), counting the number of edges of i and j (second panel).Number of inventors that hold a joint patent with inventor i or j with x (3),ij,d = 3 + 6 (black nodes), counting the joint inventors k and m twice and counting k twice because of two previous joint patents between k and j (third panel).Number of inventors that jointly hold a patent with i and j with x (4),ij,d = 1 • 2 + 1 • 1 (black nodes), counting k twice because of a multi-patent (fourth panel).

Figure 4 :
Figure 4: Estimates for different covariates, technological areas and time periods.For each of the four areas and four covariates we have four estimates for the time periods with the corresponding errorbars (standard error × 2).

Figure 5 :
Figure 5: Estimated smooth effects for food chemistry (118) area and second time period.

Figure 6
Figure6visualizes the positive effects of "joint patent" for the four time periods.Each time period lasts 36 months.The tendency of the effects is the

Figure 6 :
Figure 6: Estimated smooth effects for "joint patent" of food chemistry (118) area and different time periods.

Figure 7 :
Figure 7: Predicted versus true present and absent ties (patent or no patent) where the (red) dashed line shows the optimal cut point for the classification (left panel).Precision and recall curve (right panel).
dimension [P • K × P • K] and [K × K], respectively.P is the number of covariates.The second-order penalty matrix K can be derived fromK (p) = D T 2 D 2 where D 2 = D 1 D 2−1 is a recursively obtained difference matrix with (K − 1) × K].The corresponding derivatives to apply the Newton-Raphson algorithm are straight forward:s pen (u) = s(u) − (K(λ)) u J pen (u) = J(u) − K(λ)

Table 1 .
The

Table 1 :
Summary statistics of different technological areas for the time period of 14 years.The statistics are summarized and averaged over time.

Table 2 :
Summary statistics for different technological areas and time periods.There are shown the ranges for the different time periods per area of number of nodes, edges and covariates.