Drivers of article processing charges in open access

Large publishing companies have been dominating scientific publishing for long, which leads to high subscription fees and inhibited access to scientific knowledge. At digital era, the opportunity of an unrestricted access appears feasible, because the cost of publishing should be low. It is no longer the readers and libraries to pay subscription fees, but scientific organizations and authors themselves who pay for the cost of having their article published. As the data shows, there is a tremendous variance of article processing charges (APC) across journals, which obviously cannot be explained by the costs. One of the explanatory variables could be reputation, but it only contributes less than 5% to the variance in average APC. This study is meant to shed light on the various determinants of APC. Based on the data from the OpenAPC Initiative, the Directory of Open Access Journals and three different datasets of Web of Science, we employ ANOVAs and multivariate regressions. The results show that market power plays an important role to explain APC, inter alia, through market concentration, market position of individual publishers (publisher size), and the choice of hybrid publishing model.


Introduction
With nearly 40% of profit margins of the most successful publishers, academic publishing is one of the most lucrative businesses (Delamothe and Smith 2004;Smith 2018), although their services are rather restricted to technical support in turning a scientific paper into a published article such as printing, binding and distributing the paper (Delamothe and Smith 2004). The main part of the knowledge production value chain is provided by the research community itself. From the original submission to the eventual publication, a scientific manuscript usually undergoes a thorough quality control process involving research community members in three different roles: (1) as authors they document their research work in papers which they finally submit to an editor of an academic journal, (2) as editors they screen the submitted article deciding on its eligibility to peer-review, and (3) as reviewers, i. e. members of the research community who provide a quality judgement about the paper on which footing editors decide whether to reject, to publish, or to let authors revise and resubmit their research work. More often than not, papers undergo several rounds of a time-consuming revise-and-resubmit cycle before their paper eventually gets accepted (Heinemann 2015). Once accepted, the paper is proof-read, lay-outed, and finalized for publication, namely a process in which again community members are involved.
That no monetary remuneration is paid to the actual suppliers of productive services, is a specificity to academia. Researchers participate in the peer-review process not for money, but for the best of science (Bergstrom 2001;Freda et al. 2009;Smith 2018;Wellington and Nixon 2005) and their reputation. Whether being a researcher publishing in a reputable journal, an editor, or a reviewer of a renowned journal, any academic achievement in this respect will help improve one's reputation (Dasgupta and David 1994). On the other hand, the research community has to pay for accessing publications. Usually, it is university libraries which make corresponding arrangements, so that researchers do not pay directly. Researchers may not even be aware of the total amount of subscription fees paid by their local library. They tend to be more interested in the reputation of respective academic journals than actual costs involved in publishing. In other words, publishers and researchers do not compete on equal grounds. Publishers seek for monetary profits, researchers for reputation (Grebel 2011). Despite that there is little doubt that reputation is a common denominator, the design of the academic publishing market give reasons to believe that publishers set prices above the marginal returns in reputation to researchers. One indication corroborating this hypothesis is the situation at university libraries. Libraries complain that most journal subscriptions refer to few big publishers. They indicate that 50% of their budget is consumed by 10% of the journal titles they have in their inventory (Bensman 2011).
At the advent of digitization, however, publishers run the risk of losing ground in their business. Apart from the fact that the academic supply chain has basically always been in the hands of research communities themselves, digitization has now opened up new ways of publishing. The process of publication has become less complicated and more efficient. Accessing print copies has become obsolete, marketing and distributing articles across the world is now possible at almost zero marginal cost (Armstrong 2015). It seems inevitable that the time has come for established publishers to lose ground. Open access journals should gradually oust traditional publishers, subscription prices should fall.
However, none of that has happened ever since. Despite the significantly lowered printing, binding and circulation costs in the digital era (Armstrong 2015), subscription prices do not fall. In fact, it is the other way round, they have been growing in recent years. Likewise, the Open Access (OA) business model intended to fight high subscription fees (McCabe and Snyder 2005) has not kept up to its promise.
As more and more OA journals enter the market, offering a possibly promising alternative, established publishers do not remain inactive. They respond with own OA strategies offering open access to at least some of their articles, a business model which has become known as the so-called hybrid OA model. With this hybrid OA model, established publishers try to fight the competitive pressure of so-called "gold OA" journals, i. e. journals which exclusively offer open access. In both models APC are charged, 1 i.e. a price for publishing an article. Given the assumption that the cost-reducing effect of digitization together with the emergence of new gold-OA journals has increased competitive pressure among publishers, the price of publishing open access, i.e. the APC, should have rather fallen than increased. To date, a decrease in APC has not yet been observed. Conversely, it seems as if hybrid journals may even manage to increase their APC. APC vary substantially among the different OA-types and the pricing structure remains quite obscure.
Against this background, the aim of this paper is to examine the driving factors of APC. We do not claim to evaluate the efficiency of some OA model rather than to quantify their differences in APC. We calculate various descriptive statistics, perform ANOVAs, and a linear regression model to decompose APC according to its drivers. The data we use originates from the Bielefeld University Library project OpenAPC Initiative, which releases a repository of APC information, the Directory of Open Access Journals (DOAJ), and, additionally, three datasets from Web of Science (WoS), which are the Journal Citation Reports (JCR) Journal Impact Factor database, the Essential Science Indicators (ESI) database, and the general journal information. Our results suggest that, ceteris paribus, journal reputation, the market power of publishers, and the market concentration of disciplines show significantly positive correlations with APC. Most strikingly, we identify a surcharge of 1482USD, half of the average APC, which traditional publishers can impose on their prices. The results support the hypothesis that market power plays a large role in setting APC.
The paper is organized as in the following: the second section reviews the existing literature on APC. The third section summarizes the arguments from industrial organization theories and communication theories and proposes the hypotheses in this paper. The fourth section presents the data we employ and introduces the methods used. In the fifth section, we present the results of our descriptive and analytical analyses. The final section discusses the results, compares them to previous studies, reflects on the limitations of the present study, and offers some concluding remarks.

Building up article processing charge
Current studies focus on three perspectives, i.e. journal reputation, publisher size, and disciplines. Journal reputation, journal impact, and quality are often used interchangeably in the literature. They employ quantitative analyses with sample size ranging from around 100 to 2000 journals. Most of the studies apply descriptive statistics and compare means, accompanied with scatter plots and histograms, while some also use correlation analyses (Wang et al. 2015;Yuen et al. 2018;Solomon 2014, 2015;Hagenhoff et al. 2008;Pinfield et al. 2016;Romeu et al. 2014;Solomon and Bjoerk 2012a;Pinfield et al. 2017;Solomon and Bjoerk 2012b). More sophisticated data analyses are conducted by Smith et al. (2016) and Schoenfelder (2018). Smith et al. (2016) applies univariate regression without controls, whereas Schoenfelder (2018) conducts a multivariate regression analysis, including journal quality, open access types, big publishers, and disciplines. In order to define big publishers, Schoenfelder (2018) assumes that the six most frequent publishers in her dataset, represent the big and traditional publishers. In order to study disciplines, Schoenfelder (2018) classifies the journals into four subject fields and tests them as dichotomous variables. However, no explanation is given about the used classification rules.
Journal impact, quality and reputation share similar definitions and measurement. Smith et al. (2016) and Schoenfelder (2018) measure journal reputation and quality with journal impact factor. Solomon and Bjoerk (2012a), Bjoerk and Solomon (2015), Pinfield et al. (2017) and define journal quality as journal impact. Solomon and Bjoerk (2012b), Yuen et al. (2018), Romeu et al. (2014), Bjoerk and Solomon (2014) and Wang et al. (2015) study journal impact directly. Most of them employ the Journal Impact Factor (JIF) from Citation Reports (JCR) Web of Science (Romeu et al. 2014;Solomon and Bjoerk 2012a;Wang et al. 2015). Pinfield et al. (2017) adopt the Field-Weighted Citation Index (FWCI) derived from Scopus. Solomon and Bjoerk (2012b) apply two impact factors, namely JIF from Web of Science and SCImago Journal and Country Rank (SJR) from Scopus. Solomon (2014, 2015), Smith et al. (2016) and Schoenfelder (2018) use Source Normalized Impact per Paper (SNIPs) basing on Scopus data. Yuen et al. (2018) compare the correlation between APC and six indexes of journal impact, which are two H-indexes (one based on Scopus database and the other based on Google Scholar database), Scopus SJR, Eigenfactor, Article Influence Score (AIS), and JIF from JCR. In all studies, irrespective of the different indicators employed, journal impact is shown to have a positive relationship with APC, while Romeu et al. (2014) claim that there is no correlation between journal impact and APC of hybrid OA journals in physics. In addition, JIF, SJR and AIS correlate strongly with each other (Yuen et al. 2018). Obviously most big publishers are traditional subscription publishers, but they are not identical. However, they are frequently mentioned together and the difference between the two have been barely investigated. In the studies of Pinfield et al. (2016Pinfield et al. ( , 2017, top 10 publishers which have received the most APC (in the data contributed by UK higher education institutions) are listed, and it is clear that most spots are occupied by traditional subscription publishers. Pinfield et al. (2016) list the amount of APC spent by the institutions with each publisher. Among the top 10 publishers with the most revenue, five of which with the highest mean APC are traditional publishers. By grouping the publishers from DOAJ into different types and different sizes, Solomon and Bjoerk (2012b) demonstrate that big publishers tend to charge higher APC. Bjoerk and Solomon (2014) break down the full OA journals into different APC price ranges and explicate that journals belonging to the traditional subscription publishers charge more than the full OA ones which only come to exist since the digital age. By including six big and traditional publishers as dummies in a regression analysis, Schoenfelder (2018) shows the coefficients of most big, traditional and for-profit publishers are significantly positive, implying that they charge higher article processing prices.
The connection between APC and disciplines are studied from different perspectives in the literature. However, most of them did not explain which guideline is used to categorize the disciplines into bigger subject fields. Solomon and Bjoerk (2012b) classify the journals and articles into six subject fields and compare the average APC. They assume that disciplines display more APC expenditure and higher average APC, if there is more grant funding available in those disciplines (Solomon and Bjoerk 2012b). Solomon and Bjoerk (2012a) categorize the journal APC into seven discipline clusters and, for each cluster, the share of different funding sources are listed. It is shown that disciplines enjoying a higher share from grant and institutional funding charge higher APC (Solomon and Bjoerk 2012a). Hagenhoff et al. (2008) apply the discipline categorization of JCR to the data from Ulrich's International Periodicals Directory and Web of Science. Then they descriptively compare average fee per article of different disciplines and find that publication fees vary across disciplines with circulation cost of the printed editions (Hagenhoff et al. 2008).
In comparison to the available literature, from theoretical point of view, our paper brings in the perspectives which have been ignored before by shedding light on the market power of publishers, the market concentration of disciplines, the age of publishers, and the interaction between market power and publisher age as the determining factors of APC. This study will be the first of its kind to take the intensity of market competition within disciplines into account. Our study also applying more sophisticated econometric methods by exploiting a bigger sample, which combines several different datasets.

Theoretical background
In order to understand how publishers' business works, it is necessary to understand why scientific organizations and scientists supply the fruits of their labor without charging publishers and why they are even willing to pay journals for publishing their papers. The main reason for this phenomenon is the desire and the need of scientists and scientific organizations to obtain reputation. Scientific reputation is the substantial reward for their work. Scientific reputation is a concept that is applicable to both individuals (researchers) and institutions (e.g. universities and journals) (Eisenegger 2005). The formation of personal and institutional reputation is closely link to each other and also depends on each other. The institutional reputation of a scientific organization is a source for personal reputation of its members and the same is true for the reputation of a scientific journal, which supports the acquisition of personal reputation of its authors. For instance, rankings based on journal reputation (however measured) provide an important career mechanism for scientists. Therefore, scientists experience strong incentives to work in respected universities and publish in journals with high reputations.
In return, the institutional reputation of a university or a journal is the result of the individual reputation of its members and contributors. If recognized scientists are editors and reviewers of a journal and established scientists publish their work in a journal, the reputation of this journal will increase. It is this reciprocal process that affirms the structures and the logic of the system. A theoretical perspective explaining these phenomena combines modern sociology of action with aspects from system theories (inter alia, Esser 1999;Giddens 1997;Schimank 2000). These theories explain how structures arise and are developed by individual actions and how the same structures in return shape the actions of the individuals who created the structures.
Eventually, it is reasonable for researchers with limited time resources to prefer articles from prestigious journals when they search for relevant information because there is a probability that these journals provide articles of excellent scientific quality. Hence, it seems logical that such journals attract more readers to their papers and, as a result, the papers published there will become more frequently cited. Consequently, the number of researchers who try to publish their work in a reputable journal will increase. If the number of submissions increases, the journal can and must be stricter in evaluating the articles and the probability that the quality of the selected ones will be outstanding grows, further enhancing the journal's reputation. Subsequently, the number of submissions will increase. Selectivity will rise and so on. This illustrates how the Mathew effect intensifies inequalities between the journals and why under normal conditions reputation is self-reinforcing (Merton 1968;Dasgupta and David 1994;Irmer 2011).
Quantitative measurable factors of reputation, such as number and outlets of publications, number of citations, prizes received, the affiliated academic department or the reputation of his/her coauthors, are all signals tied to the author (Van Dalen and Henkens 2005). However, reputation refers to the entire personality of a scientist. Bourne and Barbour (2011) list characteristics of appropriate conduct that help to build and maintain scientific reputation, including, inter alia, honesty, respect, kindness, responsibility, fairness, scientific accuracy, accountability, and fidelity. In the case of journal reputation, similar criteria might be relevant. Journal reputation may decrease if the quality and accuracy of the reviews are low, when decisions of the editorial board are perceived to be unfair, when the responsiveness of the journal is slow, or when the treatment of submitters by editors or reviewers is impolite. Even though these qualitative dimensions of reputation are notorious difficult to measure, ignoring these qualities might lead to inappropriate estimates of the reputation. Nevertheless, if these soft elements of reputation are effective, they should-at least in the long run-affect the hard dimensions accordingly. If a journal fails to accomplish the soft criteria, scientists-and especially the ones with a high reputation-will look for other journals to publish their papers. Furthermore, if scientists with a high reputation do not deliver their papers any longer, the quality of published papers will decrease, eventually affecting citations and impact. Consequently, a downward spiral of the reputation of a journal might be observed-if the reputation mechanism on the soft dimensions actually works.
Altogether, higher reputation means greater value for authors, and therefore, higher demand accompanied with higher willingness to pay for APC by the authors and sponsoring institutes (McCabe 2000(McCabe , 2002Dewatripont et al. 2007;Harvie et al. 2013). This leads to our first hypothesis:

Hypothesis 1
The higher the journal reputation, the higher the APC.
As already stated in the introduction, journals from some publishers are capable of imposing a surcharge without offering corresponding reputation (Bjoerk and Solomon 2014;Pinfield et al. 2016;Romeu et al. 2014). Behind APC, there are still other unexplained factors.
The prices of subscription fees keep growing despite decreasing costs due to digitization (McCabe and Snyder 2005). Furthermore, subscription fees for journals of for-profit commercial publishers are significantly higher than those fees for journals from non-profit journals. Moreover, the price increase of the former has been approximately twenty fold during the last three decades, whereas the price increase of the latter has "only" been eight fold in the same period of time (Bergstrom 2001;McCabe and Snyder 2018), pp. 301-302, 332-334). Notably, this development has only been reported for subscription fees (targeting readers) and not for submission fees (targeting authors). If subscription prices of nonprofit journals may serve as a proxy for journal costs, these price differences can be interpreted as demonstrating a considerable and significantly increasing markup (McCabe and Snyder 2018).
In economics, supracompetitive rents (i.e. profits exceeding those that could be earned in a competitive market) rest on market power. With effective competition, profitable price increases are limited by (1) the reaction of competitors, undercutting the price and drawing consumers away from the price-increasing company, and (2) the reaction of consumers, stopping consumption of the good because of price increases (i.e. choosing the outside option). The market for academic publishing displays favorable conditions for exercising market power regarding both restrictions: (1) Somewhat simplifying the first restriction depends on market concentration, i.e. the existence of competitors with a similar size, similar capacities and innovative competences. Thus, it is not surprising, as reported in the literature, that publishers tend to charge higher subscription prices if market concentration is higher (Harvie et al. 2013) or when possessing bigger portfolios of journal titles across multiple disciplines, for instance following mergers among academic publishers (McCabe 2000(McCabe , 2004Jeon and Menicucci 2006;Solomon and Bjoerk 2012b). Generally, the market for academic publishing faced considerable concentration tendencies during the last decades Rubinfeld 2004, 2005;Larivière et al. 2015;Pinfield et al. 2016) (2) The probability of customers choosing the outside option can be described by the price elasticity of demand. In the case of the market for academic publishing, subscriptions typically yield from institutional subscribers like, in particular, libraries and academic institutes. These customers typically display a relatively inelastic demand for journal subscriptions, which offers an advantageous condition for big publishers to exercise their market power (McCabe 2000).
A further factor reinforcing market power vis-á-vis institutional subscribers are bundling strategies employed by the oligopolistic publishers. Publishers bundle fast-selling and slow-selling journals as well as print and electronic versions of the same journals. They offer their journal title only within the bundles (and not separately) and compile bundles specifically for individual institutional customers (like libraries). On the one hand, this allows the publishers to profitably exploit their market power as each library is forced to pay for unattractive titles (for their readers) along with getting access to attractive titles (forced bundling or tying)-as well as being forced to buy print copies along with online access. On the other hand, this practice creates (strategic) entry barriers against newcomers in academic publishing as it seeks to fully exploit the demand capacities of the institutional customers (Delamothe and Smith 2004;Nevo et al. 2005). As such, the bundling practices are both an expression of market power and a reinforcing factor for generating more market power. While market power is acknowledged by many studies of the market for academic publishing (inter alia, Bergstrom and Bergstrom 2004;Chressanthis 1994, 1993;Edlin and Rubinfeld 2004;Nevo et al. 2005;Phillips and Phillips 2002), the variations of market power among the publishers depending on their size and portfolio is mostly neglected in the theoretical literature. In oligopolistic markets with heterogeneous firms (e.g. size, portfolio) and goods [e.g. discipline, (perceived) quality, etc.], market power may occur in different strengths among the oligopolists (and not of just one dominating company like in homogeneous markets). Thus, next to the overall existence of market power driving up subscription prices through bargaining power vis-á-vis institutional customers like libraries, this market power will differ among publishers. Depending inter alia on their size, some publishers will enjoy higher market power and enforce higher markups than other publishers, so that prices are supracompetitive but differ between the publishers. There is not one single price in the market.
Open access changes the game in the sense that now the reader (directly and, more often, indirectly through libraries, etc.) is no longer charged (zero subscription prices). Instead, authors pay an APC. While authors do also face a higher concentration on the publisher level as more and more academic journals belong to the same narrow oligopoly of publishers, this may not directly cause publishers to raise submission fees. This may be due to asymmetric indirect network effects if publishers of academic journals are viewed as being a two-sided platform between readers and authors 2 (Jeon and Rochet 2010;Armstrong 2015;Snyder 2007, 2018), i.e. more accessible papers in academic journals benefit subscribers (or readers) in a stronger way than more subscribers (= more potential readers) benefit authors. 3 Moreover, authors may react more price-elastic to submission fees than subscribers to subscription fees. However, open access does jeopardize the market power rents for big and commercial academic publishers from exploitative subscription bundles. Consequently, the powerful publishers experience incentives to strive for maintaining their profitable business model either by blocking open access options or, if this proves to be impossible politically, by charging high APC (Morrison et al. 2015;Pinfield et al. 2016Pinfield et al. , 2017, with a so-called hybrid-OA strategy, which is mostly offered by established publishers. This strategy serves two purposes: keeping authors with lower willingness-to-pay for open access in the old business model (Collins 2005) and extracting supracompetitive rents from those who strongly prefer open access Snyder 2005, 2007). 4 Hence, hybrid-OA strategy is an instrument used by established publishers to maintain their profit, which can also be considered as an indicator of market power. If publishers follow these incentives, bigger publishers (with more market power through bigger bundles of journals) can be expected to charge higher APC than smaller publishers because the incentive to delay the open access innovation should be particularly strong (since they have more to lose), and a big amount of surcharge should be observed with hybrid model. This theoretical consideration leads to hypothesis 2: Hypothesis 2 The greater publishers' market power along with a hybrid strategy, the higher the APC.
Another possible explanation of differences in APC may be that publishers also enjoy market power towards authors. A major difference, however, is that publishers cannot sell cross-disciplinary bundles to authors. Instead, single journals compete for authors within disciplinary sub-markets, i.e. a micro-biologist will choose among academic journals in micro-biology and economists among those within economics. Therefore, market power vis-á-vis authors is not determined by the overall portfolio of the publisher. Instead, the concentration in the disciplinary submarket matters. The possibility to find close substitutes to a given journal from an author's perspective varies from discipline to discipline, creating significant differences in profit margins across disciplines (Dewatripont et al. 2007). If an author can easily switch to another journal, which is a close substitute, and submit his/her paper there, then publishers cannot raise APC as much as if an author has no choice but to submit to a specific journal. First, the size of the submarket matters, i.e. how many journals are available in it. Second, market concentration plays a crucial role, i.e. how many of the journals belong to the same publisher. If an author can realistically submit to five journals belonging to the same publisher, then this publisher enjoys considerable market power. Third, the submarkets are heterogeneous as well, i.e. the degree of closeness 1 3 of substitutes differs between journals. According to modern oligopoly theory this implies that market power rents may already exist if a publisher in such a submarket owns a pair of journals that are particularly close substitutes (unilateral oligopoly effects). Accordingly, mergers of publishers may raise APC if this combines journals that are particularly close substitutes in any disciplinary submarket under the same ownership. Thus, we derive our hypothesis 3: Hypothesis 3 In disciplines where the market is more concentrated, average APC are higher.
Academic publishing is a business with long and strong traditions and, consequently, any change of traditional business models faces considerable status-quo preferences and, thus, skepticism (Collins 2005;Hunter 2005;Oppenheim 2008;Resnick and Belluz 2019;Suber 2003). Many of the journals from traditional publishers, which used to base their business model on the subscription model, are offering hybrid OA options to authors (Laakso et al. 2011;Morrison et al. 2015). Being rooted in the subscription model and accustomed to its lucrative profits, the OA process may be viewed to be endangering the foundations of the business of academic publishers (Suber 2003;Storbeck 2018;Van Noorden 2012), especially by the more traditional publishers-and irrespective of market power. Traditional publishers, then, would be particularly reluctant to embrace new models like OA and would prefer to continue with the status quo. As a consequence, their OA pricing strategies may be influenced by making the OA option not too attractive for authors in order to delay the change, which is perhaps perceived to be unavoidable, as much as possible (Collins 2005). Similar modes of behavior by publishers have been observed in the market for printed books versus e-books (Budzinski and Koehler 2015). If status quo biases determine APC, then traditional publishers should charge higher APC than younger, less traditional publishers, beyond any exercising of market power. Therefore, our hypothesis 4 reads:

Hypothesis 4
The Older, the more traditional the publishers, the higher APC they levy.

Data processing
To test our hypotheses, we use data about APC, publisher market power, OA type (gold or hybrid), publisher age, discipline market concentration and journal reputation. To control for the size of the market, as suggested by Dewatripont et al. (2007) and Coomes et al. (2017), we include the total number of articles by journal and the type of publisher (forprofit and not for-profit). Overall, we include four types of data and seven datasets, which are OA journal data, JIF data, data of disciplines and data of publishers: (1) the OpenAPC Initiative, (2) the Directory of Open Access Journals (DOAJ), (3) the JCR JIF database, (4) the Essential Science Indicators (ESI) database, (5) Web of Science (WoS) journal information, (6) the Top 50 Revenue List of publishers, and (7) publisher foundation year and profit type.
The Furthermore, we had to check and correct inconsistencies. We detected mergers and acquisitions activities among publishers as well as other kinds of collaborations among publishers. The corrections, including typos, were performed manually accessing homepages of publishers/journals including their news announcements. A further requirement was to convert the data into purchasing power parity (PPP). The APC of DOAJ are measured in US Dollars, the OpenAPC Initiative in Euro. Using 2017 (PPP) from the OECD, all values were converted to US Dollars. We also decided to drop 128 journals, due to the lack of reliable PPP information.
In line with Dewatripont et al. (2007), we assume that the soft elements of reputation affect the hard dimension in the long run, which eventually affects a journal's impact factor. Thus we rely on the JIF as a proxy for journal reputation. There are two editions of the JCR JIF database, the Social Science Citation Index (SSCI), covering mostly social science disciplines and the Science Citation Index Expanded (SCIE) with primarily inquires about natural sciences.
To classify disciplines, we use the classification of the (ESI) (https ://clari vate.libgu ides. com/esi) provided by Clarivate Analytics. 6 This allows us to categorize the journals into 22 research disciplines. For multidisciplinary journals, they are classified according to the dominant subject of the citations and references on the article level. A paper is classified as multidisciplinary, only when no predominant discipline can be ascertained after two rounds of allocation. 7 Besides the collection of the journals, the dataset offers the total citation counts of each discipline. Like Dewatripont et al. (2007), we use citation shares of publishers by discipline to calculate publishers' market shares. The latter is the Herfindahl-Hirschman Index (HHI), i.e. the sum of the square of each publisher's citation share within a discipline. Correspondingly, it ranges from 0 (no market concentration) to 1 (perfect concentration).
Concerning big publishers, the Top 50 Revenue List of publishers in 2017 provides a publisher ranking list by revenue. 8 Eight of the top 50 publishers, out of 295 publishers ranked in that Publishers Weekly edition, we also share in our dataset (i. e. Cambridge University Press, Elsevier, Georg Thieme Verlag, Informa, 9 Oxford University Press, Springer, 6 Previously known as Thompson Reuters Intellectual Property & Science. http://www.bpeas ia.com/ news/16100 3-thoms on-reute rs. 7 For more details, please see: http://archi ve.scien cewat ch.com/about /met/; https ://clari vate.libgu ides.com/ esi. 8 https ://www.publi shers weekl y.com/pw/by-topic /indus try-news/publi sher-news/artic le/78036 -pears on-isstill -the-world -s-large st-publi sher.html. 9 The mother company of Taylor & Francis. 5 We regress the log-transformed journal-level APC on years with the OpenAPC Initiative dataset of 2005 to 2017. The hybrid APC is found to increase 5.24% each year. Wiley and Wolters Kluwer). In total revenues, they account for more than 60% of the total number of citations. These eight publishers, we will call "big publisher" hereafter. The number of articles, for the year 2017, is extracted from the general journal information of WoS. Information about the year of establishment and profit type of publishers (for-profit or non-profit), we collected manually from publisher homepages. This includes the information about publishers' age, which we calculated as the difference between 2017 and the founding year. After merging the seven datasets, we end up with a unique dataset of 22 disciplines and a total of 3,793 journals. With including datasets such as DOAJ and OpenAPC Initiative, we hope to overcome-at least to some extent-systematic biases of data collection and sampling. A further advantage of this combination is that we can cover the two major types of OA journals, gold and hybrid ones. It should also be stressed that our data do not contain any "predatory journals". 10 This is due to the scrutiny of the DOAJ initiative and the fact that the Web of Science does not report any journal of this kind, either.

Methods
The data analysis is conducted with Stata. First, we present the descriptive statistics of all variables, namely APC, big publisher, publisher age, HHI, JIF, hybrid (when the journal is a hybrid journal), number of articles, profit (when the publisher is a for-profit publisher). Second, histograms of the share of journals in each discipline, and of different publishers are presented, followed by the bar graph of HHI of all disciplines. Then, we discuss the two-way table of publisher size (big publishers vs. other publishers) and JIF, as well as the three-way table of JIF, publisher size and publisher age. In the end, a correlation analysis is implemented, followed by multivariate OLS regression with APC as the dependent variable. Table 1 shows the summary statistics of the variables. APC, JIF, publisher age, HHI and no. of articles are continual variables, while big publisher, hybrid and profit are binary variables. The variable publisher age contains 20 observations less than the rest, because 1 3 the foundation years of these 20 small publishers could not be found online. There are 206 journals which are not contained in the 2017 WoS database. This is why we lose a further 206 titles, as they lack any information about the number of articles. Three journals published only 5 articles in 2017. The difference in age among publishers is quite large. The youngest publisher was founded in 2017, the oldest in 1518. The standard deviation is almost 85 years. The Herfindahl (HHI), as already pointed out above, could theoretically range from 0 to 1. In this case, the maximum value is 0.21, which indicates that there are disciplines with market concentration. 71% of the journals are from big publishers, while 78% are hybrid OA titles. The number of titles in disciplines varies from 15 to 615. Among the 22 ESI disciplines, 16% of journals belong to Clinical Medicine, distinguishing it from the other disciplines (see Fig. 1). The category of Social Sciences accounts for almost 12%, making it the second biggest discipline. Multidisciplinary and Space Science have the least amount of journals, which are only 0.4%. For the Multidisciplinary category, this is due to the classification principle of ESI. ESI classification tries to code each journal into a single discipline other than Multidisciplinary. Except these four subject fields, the titles distribute rather evenly among the rest 18 categories.

Descriptive statistics
The share of journal titles of different publishers are displayed in Fig. 2. Elsevier appears most frequently in the dataset with 24% of the titles. Springer has the second highest share with 18%. Wiley is the third with 14% followed by Informa with 9%. Cambridge University Press and Wolters Kluwer share around 1%. The share of the Georg Thieme Verlag amounts only to 0.05%. There still is a considerable amount of journals of smaller publishers, in total 29%. In contrast to big publishers, which cover almost all disciplines, 11 small publishers in general cover fewer disciplines.
The HHI of disciplines varies from 0.02 to 0.21, from low to moderately concentrated. Among the top three most concentrated disciplines in our data, namely Multidisciplinary, Chemistry and Space Science (depicted in Fig. 3), Multidisciplinary and Space Science have the fewest journals and fewest publishers. Multidisciplinary has only 15 journals, corresponding to 10 publishers, the HHI of which is 0.21. Space Science with the HHI of 0.17 contains 17 journals and 9 publishers. The HHI of Chemistry is 0.18, which consists of 207 journals and 20 publishers. Most of the disciplines have a moderate HHI, ranging from 0.1 to 0.18. The two least concentrated disciplines are Social Sciences and Economics and Business, 12 with an HHI of 0.03 and 0.02, respectively. Social Sciences comprise 439 journals from 44 publishers, while Economics and Business has 90 journals from 13 publishers.
In the next step, we decompose the data into several groups, as reported by Table 2. Taking quartiles of JIF and classifying publishers in big and other publishers renders eight groups. 13 Ascending from the group with lowest to highest JIF, average APC increase. This holds for big publishers as well as for the remaining publishers (Others). A further revelation is that average APC are consistently higher for big than for other publishers. The Table 2 Average APC of journals at four JIF levels-big publishers and others * * * p < 0.001 , * * p < 0.01 , * p < 0.05 The numbers in the F-statistic in the last column indicates the respective significance level. Accordingly, there are significant differences between big and other publishers ranging between 768 to 1,477 USD. The largest difference can be detected for to the subgroup of publishers with the lowest JIF. A further decomposition sheds light on several age groups. We split the journals into three age groups: publishers not older than 137, publishers older than 137 but not older than 175 years, and publishers older than 175 years; JIF again is partitioned into three categories (quartiles) and publishers are distinguished between big and other publishers. Adding age as a third category allows us to differentiate 18 different subgroups as documented in Table 3. The numbers in parentheses indicate the rank of average APC within the subgroups of big publishers and other publishers, respectively.
For big publishers the row with the highest average APC is the old publishers, except for the group of young publishers with a high and medium JIF, which take the ranks of 1 and 3. The highest APC for other publishers belongs to the medium-aged high-JIF category. The columns with the highest JIF contain APC ranked 1, 2, 3, and 6. A Spearman-Rank-Test  between big publishers and other publishers does not suggest that subgroups of both can be ranked in the same order. Also, an ANOVA has shown that JIF significantly influences APC for subgroups big-young, big-medium, other-young, and other-old ( p − value < 1% ). For big publishers, older than 175 years (the big-old group), no significant effect could be detected. What we may conclude from this table is that in general, the impact factor appears to be a proxy for increasing APC. But old publishers that managed to grow big charge the highest APC even without having a high JIF. In the following, we will further investigate the interplay between those determinants in a multivariate regression.

Regression results
Before we start with regressions, Table 4 reports the pairwise correlations between variables. The highest correlations is shown between hybrid and APC, which amounts to 0.56,  (Morrison et al. 2015;Pinfield et al. 2017), big publishers tend to be established publishers with longer histories (Edlin and Rubinfeld 2004) presenting also the main group of publishers choosing a hybrid-OA strategy that allows them to charge higher APC (Laakso et al. 2011;Morrison et al. 2015). This is tantamount to saying that reputable journals have the scope to charge extra (McCabe 2000(McCabe , 2002Dewatripont et al. 2007;Harvie et al. 2013), or that dominant publishers can charge higher APC on average (Morrison et al. 2015;Pinfield et al. 2016Pinfield et al. , 2017. Furthermore, JIF-the commonly used proxy for reputation-correlates positively with APC. 14 Now, we calculate partial correlation coefficients by running several regression models, the results of which are presented in Table 5. APC shall serve as dependent. Independent variables are introduced sequentially to check the robustness of coefficients. In model 1, we start with JIF as the first independent variable to be tested. It shows a significant effect on APC. This holds true across all six models in the table. Model 2 includes big publisher, and model 3 publisher age. All coefficients turn out to be positive and highly significant. Model 3 additionally contains dummies for disciplines. In model 4, we introduce HHI, which is a constant by discipline and therefore multi-collinear with the discipline dummies. Hence, we cannot include both HHI and discipline dummies. Model 4 only includes HHI and its coefficient is also positive and significant. Incorporating the information about whether being a for-profit or non-profit journal, together with the information of gold OA or hybrid and the number of articles in model 5, reveals the effects of profit and hybrid are significant and positive on APC. As we measure the respective variables in USD, the coefficient of the variable hybrid suggests that journals with a hybrid-OA strategy on average charge 1467USD more in APC than non-hybrid ones. Taking up the insights from the descriptive statistics above, where we saw that the size of publishers and their age matter in terms of APC, if both coincide. Consequently, we introduce the interaction term between big publisher and publisher age (model 6). However, the estimate does not indicate a large impact of this interaction on APC. With respect to the goodness of fit, model 6 reports the highest adjusted R2, which makes it our preferred model.
Collecting all estimates in model 6, allows us to reconstruct the average APC according to the identified determinants. The constant term of 768.1USD reflects the base price for an open access publication. From a technical perspective, we may interpret this amount as the pure production cost of processing an article for an open access publication. With a oneunit increase of the impact factor (JIF) of the journal, APC increase by 132.5USD. Concerning publisher size and age, the direct effect of publisher age is not significant. It only counts in combination with being a big publisher, i.e. big. pub × pub. age. It then increases average APC by a moderate 1.13USD. The compound effect of being a big publisher (big pub. + big pub. × pub. age) amounts to 447.6USD + 1.13USD × pub age. In other words, a big publisher of mean age (=153 years) can charge, ceteris paribus, 620.5USD more in APC. At the mean (HHI = 0.09), the discipline's concentration accounts for 117USD in APC. In case a journal belongs to a for-profit publisher, a premium of 133.3USD is charged. Most strikingly, the fact of running a hybrid-OA strategy adds 1,482USD. Adding all components yields average APC of 3415USD for a open access publication with a big publisher of middle age, average impact factor, profit orientation, average market concentration of the respective discipline pursuing a hybrid-OA strategy. In relative terms, 43% of the resulting APC consist of returns to the hybrid model.

Caveats and conclusion
The aim of this paper was to evaluate the impact of the drivers of APC. Market power of publishers, the concentration of disciplines, the age of publishers, the reputation of journals, and last not least the underlying business model of publishers (hybrid or gold OA) were the determinants under investigation. By and large, we could corroborate our hypotheses: journal reputation, market power of publishers, hybrid model, and the concentration of disciplines increase APC, although publisher age only appears to play a role, if publishers managed to become one of the big publishers.
In comparison to the existing studies, especially with respect to Schoenfelder (2018), closest to our paper, we extended her perspective by incorporating the role of market power in the evolution and proliferation of open access publishing. In addition, we clarified the distinction between big and traditional publishers. In the literature, the two terms are often blurred in usage (Pinfield et al. 2016(Pinfield et al. , 2017Schoenfelder 2018). Not all traditional publishers have grown big. Our results show that only some of the traditional publishers have grown big and thus managed to charge extra on APC. What adds to the surcharge is the concentration of disciplines, which facilitates publishers to exploit their market power.
Also with regard to the effect of reputation, our results are in line with the existing literature. As reputation is the actual "good" a researcher is striving for when choosing a certain journal for publication, it should reflect in the APC. This is the case in our study as is in others such as Solomon and Bjoerk (2012a, b), Solomon (2014, 2015), Smith et al. (2016), Schoenfelder (2018), Yuen et al. (2018), Romeu et al. (2014), Pinfield et al. (2017) and Wang et al. (2015). Yet, the magnitude of its effect is surprising. One unit increase in reputation, as usually measured by journal impact factor in the literature, only makes 132.5USD of average APC. This is less than 5% of average APC (2,987USD, see Table 1). A Journal with an average reputation of 2.77 (see Table 1) contributes 367USD (132.5USD × 2.77JIF), equivalent to 12% of average APC. Bearing in mind that reputation is the principle concern of researchers, little of the APC accounts for reputation. The reason for this discrepancy can be manifold: (a) the latent variable reputation is inadequately captured by journal impact factor as proxy (measurement problem), (b) publishers are able to charge significantly more due to their market power, or (c) a combination of both. According to our study, market power takes a large share in APC. Our analysis of the priceincreasing effects of market concentration, the choice of the hybrid-strategy, as well as the size of the publishers, they all support the hypothesis that market power inflates APC. 15 On the other hand, it is pointless to argue that the reputation of a scholar is congruent to his/her publication-weighted sum of journal impact factors. Highly cited articles in not so high-ranked journals may contribute more to a single researcher's reputation than a nocited article in a top journal-subject to the differences that may exist between scientific disciplines. In other words, the journal reputation might not be the only crucial determinant when choosing a certain journal for publishing one's research work. Apart from a journal's reputation, it could also be the speed and quality of the review process or the time to publication in a journal which are included in the researcher's decision making process. This, in turn, may explain why journal reputation takes a fairly low share in average APC, compared to the impact the hybrid-OA model has on APC.
With a share of almost 50% in average APC (2,987USD, see Table 1), the hybrid-OA strategy attributes most to the APC compared to all remaining factors. All big publishers play the hybrid-OA strategy at least in some of their journals. They can be considered as established and experienced publishers with one youngest being 30 years old and the rest older than 137 years. With respect to market power, aside from considering market concentration within disciplines as an indicator for that, it stands to reason that also the hybrid-OA strategy is prone to the exploitation of market power. Certainly, our results do not provide evidence to causal links, but it reveals an obvious imbalance between the drivers of APC, which does not reflect the intuition that reputation would be the traded good between publishers and researchers striving to successfully disseminate their scientific insights in a reputable journal.
A word on the limitations of our study: the DOAJ data only provides cross-sectional data. Merging the data comes at the cost of a loss of information. The available time dimension only allowed us to extrapolate some missing values. Hence, we did not have the chance to shed light on the underlying causal links between APC and its drivers. With respect to the latter, the study may also suffer from an omitted variable bias, because other potential determinants of APC have not been taken into account. Solomon and Bjoerk (2012a, b) argue that the availability of funding between disciplines also influences APC. Along similar lines, there are research institutes (i.e. universities or academic societies) which financially support OA journals in their APC. As a consequence, the resulting APC is underestimated. As the percentage of these journals only is about 2% among gold OA journals (Morrison et al. 2015), this, however, should be of minor concern in our study.
We controlled for unobserved heterogeneity using discipline dummies or the disciplinespecific market concentration variable HHI, respectively. The discipline-specific fixed effects nevertheless need to be investigated further in future research-despite the fact that collecting relevant information might be a difficult task and data protection efforts may impede the attempt to do so. Neither authors nor institutions are required to report the information to an easy-to-access repository such as DOAJ or OpenAPC.
Taking all caveats to be tackled in future research aside, our results disclosing a 50% surcharge on hybrid-OA publishing strongly support the hypothesis that academia runs the risk not to take advantage of the cost-reducing opportunities inherent to digitization. Via a hybrid-OA strategy, big publishers may nonetheless be able to sustain their comfortable profit situation leveraging their existing market power from the subscription-based to the open-access publishing era.