To analyse the publication activities of German firms, we draw on the Scopus database provided by Elsevier. We further use the Mannheim Enterprise Panel generated by the ZEW Mannheim, and the German Patent Office’s patent database. The three datasets are matched and aggregated at the firm-year level. The final dataset comprises yearly information on firms’ publishing and patenting activities by industry and size. We add aggregate numbers on the population of all German firms from the official Business Register of Germany to our trend analyses.
In using Scopus to identify scientific firm articles, we follow Simeth and Cincera (2016). Scopus is the largest abstract and citation database of peer-reviewed literature.Footnote 1 It comprises information on scientific journals, books and conference proceedings. We use the disambiguation of articles, letters, notes, reviews, and conference proceedings from Rimmert et al. (2017) to identify publications published by at least one author affiliated to a German firm. These publications are defined as firm publications and we extract information on their authors’ affiliations, their composition of authors, their citations and type of research. The strategy by Rimmert et al. (2017) has been generated by the Bibliometric Group of the University Bielefeld since 2008 in the context of the German Competence Centre for Bibliometrics and is comprehensively described in Winterhager et al. (2014). They develop a semi-automatic procedure, which detects text patterns in authors’ affiliation information and combines them with additional data on the German science system. The procedure identifies eight classes referring to publications from firms, two different university types, four different kinds of non-university research institutes, departmental research of federal or state ministries, and others.Footnote 2 We access the disambiguation for Scopus via the German Competence Centre for Bibliometrics (Winterhager et al., 2014). We extract 89,849 publications with at least one author affiliated to a firm and published between 2005 and mid-2017.
We enrich this data with information on firms’ industry classes and employment numbers. For this, we draw on the Mannheim Enterprise Panel. This dataset contains the data pool of the largest German credit rating agency—Creditreform e.V.—and is maintained by the ZEW Mannheim since 1992. It includes information on various firm characteristics for approximately 8.8 million running and closed firms located in Germany.Footnote 3 It is the most comprehensive firm-level database in Germany next to the official Business Register of the Federal Statistical Office and provides a representative picture of the German corporate landscape (Bersch et al., 2020). The panel covers around 90% of the entire population of active firms in Germany and is sampling frame for the official German Community Innovation Survey of the European Commission (Bersch et al., 2014). Firms are defined as legally independent enterprises: The smallest legally independent unit, which operates its own accounts due to commerce or tax law reasons. Thus, the dataset covers parent companies and their subsidiaries as individual units. Creditreform updates the dataset and adds new firms on a half-yearly basis. It retrieves its information from (i) different official registers, (ii) print and internet media, (iii) business reports, and (vi) own investigations based on client requests.
To match Scopus records and firm information, we extract affiliation names and addresses from our sample of publications.Footnote 4 We aggregate them to 47,124 unique name-address combinations, whereas many combinations are similar and only have minor differences in their spelling. There are for instance only 22,757 unique names and 11,080 unique city-street combinations. The name-address combinations are matched to the entire Mannheim Enterprise Panel covering the period 1992 to 2017. Name-address combinations are either matched exactly, or, in case the exact affiliation from Scopus was not found in the firm panel, matched to the most similar name-address combination using the ZEW Search Engine. The ZEW Search Engine is a text analysis software developed at ZEW and frequently used to combine the Mannheim Enterprise Panel with other datasets such as the PATSTAT database of the European Patent Office or the public funding database PROFI of the German Ministry of Education and Research (Bersch et al., 2014). The software matches keywords in the name and address information of both datasets, whereas rare words receive a higher weight than frequently used ones.Footnote 5 A more detailed description of its algorithm is provided in Doherr (2017). To avoid mismatches arising from the automated procedure, we also manually check all matches. We match 99.6% of all combinations to 2459 enterprise panel entries. This corresponds to matching 95.8% of all extracted firm publications, whereby each publication was allocated to 1.2 firms on average. Publications are attributed to firms over the generated affiliation-firm match. If a publication has two or more authors from the same or from different German firms, the publication is attributed once to each participating firm.
The patent database stems directly from the German Patent Office and covers the received patent applications from 1896 to 2017 onwards. Inter alia, the database contains information on the names and addresses of patent applicants. The match between the Mannheim Enterprise Panel and the patent database is directly provided by the ZEW Mannheim and also based on its text analysis software (Doherr, 2017). Instead of author affiliations, the matching uses patent applicant names and addresses.
The information from all three datasets is aggregated to the firm-year level. For our analysis, we restrict the sample to publishing firms within industries covered by the European Community Innovation Surveys.Footnote 6 The surveys are used to estimate official statistics on the business enterprise sector’s innovativeness and thus cover the same target population as our examination.Footnote 7 Furthermore, we focus on citable items (see Garfield, 1979; Moed, 2005) and, thus, abstract from conference proceedings. The reason for this is that conference proceedings in many cases only list the presenting author (Michels & Fu, 2014). Therefore, they cannot be attributed to all their authors reliably and, in addition, underestimate joint publications. Finally, we do not consider the pre-economic-crisis years before 2008. Our investigated period therefore covers the years 2008 to 2016. After applying these three restrictions, our panel represents 1647 firms that publish 43,063 firm publications in full counts. We use full counts at the firm-level for our analysis, as our paper focuses on the participation of firms in scientific publishing. Full counting and fractional counting methods create information from different perspectives. Full counting provides information from the perspective of participation, whereas fractional counting provides information in terms of contribution (Moed, 2005). The reason for this is that fractional counting assigns less weight to the collaborative publications of a firm, whereas full counting assigns the same weight to each publication. Fractional counting is therefore particularly suitable to measure the contribution of German firms to the entirety of scientific German publications. However, full counting is more suitable for our analysis of the participation of German firms in scientific publishing and its development.
The yearly publication volume of a firm is calculated as the sum of the yearly published articles, letters, notes, and reviews from authors affiliated to the firm as marked by our matching procedure of Scopus and the Mannheim Enterprise Panel.
Applied and basic research publications
The yearly publication volume of each firm is classified into basic and applied research publications. We use the journal-level classification of Boyack et al. (2013) differentiating between (i) applied science, (ii) art and humanities, (iii) general, (iv) economic and social sciences, (v) natural sciences and (vi) health sciences. The classification is based on an automatic identification of keywords related to the different classes within the abstracts and titles of publications in a journal. We define publications in journals of the first category “applied science” as applied research and all publications of the remaining five categories as basic research. 18% of the aggregate firm publication volume cannot be classified into applied or basic research as their journal is not classified.
Publications with domestic co-author groups
We use the disambiguation of Rimmert et al. (2017) to identify the publications of firms that are co-authored with German academia and other German firms. German academia is defined as all identified university and non-university institute types. 32% of the aggregate firm publication volume cannot be attributed as these publications also include other affiliations than German academia or firms. We limit our co-author composition classes to domestic collaboration, as we cannot reliably distinguish between co-authors from academia, firms or other organizations for other countries than Germany.
Highly cited publications
Highly cited publications refer to the 10% of mostly cited publications worldwide (Waltman & Schreiber, 2013). To avoid field differences and time trends, we follow Schmoch et al. (2016) and identify highly cited publications separately for each year and Scopus science fields.
We define firms with more and equal to 500 employees as large. The information is directly taken from the Mannheim Enterprise Panel. Firms with one or more patent application at the German Patent Office according to the provided matching are defined as patenting firms.
The yearly number of active firms in Germany within our target industries are extracted from the aggregate numbers of the official German Business Register. We differentiate between the entire population of firms, large firms, firms in different high-tech manufacturing industries and firms in different knowledge-intensive service industries.
Table 1 provides descriptive statistics on our constructed balanced firm-year panel dataset, and shortly describes the different variables, which are used to examine aggregate trend statistics in the following section. Table 2 displays descriptive statistics on firms’ publication volume for different industry subsamples. Technical details about the sample and short descriptions on the variables’ generation are included as notes below the tables and figures. For all publication variables, our main analysis focuses on whole counts.
Table 1 shows that the average yearly publication volume of a firm which published at least once within 2008 and 2016 is 2.92 on average, whereas the largest observed yearly publication volume accounts to 337 publications. Moreover, the average yearly publication volume in basic research journals of 1.66 is more than twice as high as the average yearly publication volume in applied research journals of 0.74. In addition, firms publish more in cooperation with German academia than alone or with other German firms, or with both. The average publication volume of highly cited publications is 0.59 and thus makes around 20% of a firm’s entire publication volume. 57% of the sample applied at least once for a patent at the German patent office and around a third, 29%, employs 500 persons or more.
Table 2 describes the publication volumes of publishing firms per industry. It shows that the average publishing firm in technology-intensive manufacturing and knowledge-intensive services produces a higher number of publications than firms in less technology- and knowledge-intensive industries. Firms in high-tech manufacturing publish on average the most with 5.51 publications per year. Firms in medium-high-tech manufacturing are ranked second with on average 4.33 publications per year. In medium-low- and low-tech manufacturing, publishing firms show a much lower publication volume of 0.86 and 0.74 publications per year. Publication volumes are higher again among firms in knowledge-intense service industries. While not reaching the level of high-tech and medium-high-tech manufacturing, they publish between 2.42 and 2.84 publications per year. Publishing firms in other industries produce on average 2.10 publications per year.