1 Introduction

Quarkonium production is a paradigm case study for the understanding of hadron formation, but the theoretical description of its basic mechanisms remains a challenge [1]. Both the attractiveness and the complexity of the problem reside in the quark-antiquark (\(Q\overline{Q}\)) binding process, whose degrees of freedom manifest themselves in the observed spectrum of bound states of different masses and spin–angular-momentum properties, remaining unclear how they are interrelated with the momentum distributions measured for the different states. Non-relativistic QCD (NRQCD) [2], as well as other approaches to quarkonium production (e.g., colour-singlet and colour-evaporation models [3]) are based on a pivotal postulate: the long-distance effects related to bound-state formation can be factorized out in the calculation of the cross sections. In this way, short-distance \(Q\overline{Q}\) cross sections (SDCs), containing the dependence on transverse momentum (\(p_{\mathrm{T}}\)), rapidity (y), and pp centre-of-mass collision energy (\(\sqrt{s}\)), can be calculated perturbatively [4,5,6,7,8], independently of the long-distance matrix elements (LDMEs), which are proportional to the probability of the transition from the \(^{2s+1}l_j^{c}\) pre-resonance \(Q\overline{Q}\), with \(c=1\) (colour singlet) or 8 (colour octet), to the final-state observable \(^{2S+1}L_J\) quarkonium. Here, s, l, and j (S, L, and J) denote the spin, orbital angular momentum, and total angular momentum of the primordial \(Q\overline{Q}\) state (quarkonium), respectively.

The LDMEs (in practice, free theory parameters) are considered constant: they depend on the initial \(Q\overline{Q}\) quantum numbers and on the bound-state properties (quarkonium mass M, quark mass \(m_Q\), velocity of the heavy quark in the quarkonium rest frame v, etc.), but not on the “laboratory” variables \(p_{\mathrm{T}}\), y, and \(\sqrt{s}\), nor on the collision system. The factorization postulate, believed to be valid in the limit of heavy quark mass, for which the time of creation of the \(Q\overline{Q}\) pair, \(\varDelta t \sim 1/m_Q\), is distinctively smaller than the bound-state formation time, \(\sim \) 1 fm, has not yet been demonstrated from QCD principles [1]. Also because of inconsistencies between the NRQCD predictions and the measurements, especially concerning polarization [9, 10], it is often argued that the validity of factorization might be restricted to specific kinematic domains (high \(p_{\mathrm{T}}\)) or to the (heavier) bottomonium family.

The analysis reported in this paper addresses the problem in a data-driven way, comparing the patterns of momentum, mass, and collision-energy dependences measured for the “reference” case of inclusive Drell–Yan production with the corresponding quarkonium patterns. It uses mid-rapidity pp LHC data and only relies on general kinematic scaling properties of the production cross sections.

2 Dimensional scaling in kinematics of particle production

The \(p_{\mathrm{T}}\) and y double-differential cross section for the inclusive production of a given particle in pp collisions can be written as

$$\begin{aligned} \frac{\mathrm{d} ^2 \sigma }{\mathrm{d} p_{\mathrm{T}} \, \mathrm{d} y } = \int \frac{\mathrm{d} ^2{\hat{\sigma }}}{\mathrm{d} p_{\mathrm{T}} \, \mathrm{d} \hat{y}}\, P(x_1)\, P(x_2)\, \mathrm{d} x_1\, \mathrm{d} x_2 , \end{aligned}$$
(1)

where \({\hat{\sigma }}\) is the cross section of the parton-initiated process, P(x) are the parton density functions of the proton, \(x_{1}\) and \(x_{2}\) are the proton momentum fractions carried by the colliding partons, and \(\hat{y}\) is the particle rapidity in the parton-parton centre-of-mass frame, \({\hat{y}} = y - \frac{1}{2} \ln ({x_1}/{x_2})\). An implicit delta function sets the particle’s \(p_{\mathrm{T}}\) to the measured one. The cross section depends on the pp collision energy, \(\sqrt{s}\), through the relation \(x_1 x_2 = (\sqrt{{\hat{s}}}/\sqrt{s})^{2}\), where \(\sqrt{{\hat{s}}}\) is the parton-parton collision energy, function of \(p_{\mathrm{T}}\), \({\hat{y}}\) and M:

$$\begin{aligned} \sqrt{ {\hat{s}} }/M= & {} {\hat{E}}/M + {\hat{p}}/M , \nonumber \\ {\hat{E}}/M= & {} \sqrt{1 + ({p_{\mathrm{T}}}/{M})^2} \, \cosh {\hat{y}} , \nonumber \\ {\hat{p}}/M= & {} \sqrt{\sinh ^2{\hat{y}} + ({p_{\mathrm{T}}}/{M})^2\cosh ^2{\hat{y}}} . \end{aligned}$$
(2)

We start by discussing inclusive Drell–Yan lepton-pair production. After integrating over the lepton emission angles in the dilepton rest frame (and over the degrees of freedom of the recoil system), the kinematics in the rest frame of the colliding partons is fully described by \(p_{\mathrm{T}}\), \({\hat{y}}\), and M. Since the dimensionality of the differential partonic cross section is \([\mathrm {energy}]^{-3}\), we can write

$$\begin{aligned} \frac{\mathrm{d} ^2{\hat{\sigma }}}{\mathrm{d} p_{\mathrm{T}} \, \mathrm{d} {\hat{y}}} = \frac{1}{M^3} \; f(\xi ,{\hat{y}}), \end{aligned}$$
(3)

f being a dimensionless function of the dimensionless variables \({\hat{y}}\) and \(\xi \), where we use the definition \(\xi \equiv p_{\mathrm{T}}/M\) to have more compact equations. When the mixture of production mechanisms does not change with M, the partonic cross section, calculated at any arbitrary reference kinematic point \((\xi ^\star ,{\hat{y}}^\star )\), scales as \(M^{-3}\). This property can be tested using Drell–Yan cross sections, measured in a range of masses sufficiently smaller than the Z boson mass to stay unaffected by the strongly mass-dependent interference of \(\gamma \)- and Z-exchange mechanisms. The M scaling of the observable cross section depends on the pp collision energy via the parton densities, an effect that can be singled out in a purely data-driven way at LHC energies and mid rapidity, given the small average \(x_{1,2}\).

For illustration, we parametrize P(x) as

$$\begin{aligned} P(x) = A \cdot x^{-B} \cdot q(x) , \end{aligned}$$
(4)

explicitly showing the factor describing the low-x behaviour and globally representing the remaining factors by the function q, typical examples of which being \(q(x) = (1-x)^C\, (1 + D \sqrt{x} + E x)\) or \(q(x) = (1-x)^C\, (1 + D x^E)\), all tending to unity in the small-x limit. Therefore,

$$\begin{aligned} P(x_1)\, P(x_2)= & {} A^2 \frac{1}{(x_1 \, x_2)^B}\, q(x_1)\, q(x_2) \nonumber \\= & {} A^2 \left( \frac{\sqrt{s}}{M}\right) ^{2B} \left( \frac{\sqrt{{\hat{s}}}}{M}\right) ^{-2B} q(x_1)\, q(x_2) . \end{aligned}$$
(5)

Moreover,

$$\begin{aligned} \mathrm{d} x_1 \mathrm{d} x_2 = 2 \left( \frac{\sqrt{s}}{M}\right) ^{-2} \left( \frac{\sqrt{{\hat{s}}}}{M}\right) ^{2} \frac{{\hat{E}}/M}{{\hat{p}}/M} \, \frac{\xi }{1+\xi ^2} \, \mathrm{d} \xi \, \mathrm{d} {{\hat{y}}_0} ~ , \end{aligned}$$
(6)

where \({\hat{y}}_0 = y - {\hat{y}} = \frac{1}{2} \ln ({x_1}/{x_2})\) is the rapidity of the system of colliding partons. Using the previous relations and releasing the \(\xi \) integration in Eq. 1, we obtain the pp (“dressed”) version of the partonic cross section, Eq. 3,

$$\begin{aligned} \frac{\mathrm{d} ^2 \sigma }{ \mathrm{d} p_{\mathrm{T}} \,\mathrm{d} y } = \frac{1}{M^3} \left( \frac{\sqrt{s}}{M}\right) ^{b} F\left( \xi , y; \frac{\sqrt{s}}{M} \right) , \end{aligned}$$
(7)

where \(b = 2B - 2\) and

$$\begin{aligned} F\left( \xi , y; \frac{\sqrt{s}}{M} \right) = \int g\left( \xi , \, y-{\hat{y}}_0\right) q(x_1)\, q(x_2) \, \mathrm{d} {\hat{y}}_0 . \end{aligned}$$
(8)

The function g absorbs all the factors depending on \(\xi \) and \({\hat{y}}\), including those contained in \(\sqrt{ {\hat{s}} }/M\), \({\hat{E}}/M\), and \({\hat{p}}/M\) (Eq. 2). F is a fully experimentally determinable dimensionless function of the dimensionless scaling variables \(\xi \) and y. It also depends on \(\sqrt{s}\) and M, but always through the \(\sqrt{s}/M\) ratio, via the terms \(q(x_{1,2}) = q( \sqrt{{\hat{s}}}/\sqrt{s} \,\, e^{\pm {\hat{y}}_0})\) and the integration extremes \(\pm \ln (\sqrt{s}/\sqrt{{\hat{s}}_{\mathrm{min}}}\,) = \pm \ln (\sqrt{s}/M) \times {(1 - {\ln (\xi +\sqrt{1+\xi ^2})} \,/\, {\ln (\sqrt{s}/M)} )}\). The sensitivity of F on \(\sqrt{s}/M\) decreases with increasing \(\sqrt{s}\) and towards mid rapidity. We normalize F over the \((\xi , y)\) range of the analysis and assimilate its integral \(I(\sqrt{s}/M)\) in a numerical redefinition of the b exponent (always possible to a very good approximation over a small range of \(\sqrt{s}\) values, as in our analysis of \(\sqrt{s}=7\) and 8 TeV data),

$$\begin{aligned} \frac{\mathrm{d} ^2 \sigma }{ \mathrm{d} p_{\mathrm{T}} \,\mathrm{d} y } = \frac{1}{M^3} \left( \frac{\sqrt{s}}{M}\right) ^{\beta } {{\mathcal {F}}}\left( \xi , y; \frac{\sqrt{s}}{M} \right) , \end{aligned}$$
(9)

with \(\int {{\mathcal {F}}} \,\mathrm{d} \xi \, \mathrm{d} y = 1\). The \((p_{\mathrm{T}}/M, y)\)-integrated cross section becomes \(\sigma = M^{-2} \, (\sqrt{s}/M)^{\beta }\) and its measurement at two or more \(\sqrt{s}\) values provides a determination of \(\beta \).

These considerations lead to the following proposition. In a domain where Drell–Yan production mechanisms do not interfere in varying proportions, the joint distribution of the variables \(p_{\mathrm{T}}/M\), y, and \(\sqrt{s}/M\) is universal: its shape does not depend on M. At any given kinematic point \(((p_{\mathrm{T}}/M)^\star ,y^\star )\) and \(\sqrt{s}\), the \(\mathrm{d} \sigma / \mathrm{d} p_{\mathrm{T}} \) cross section scales like \(M^{-(3+\beta )}\), where \(\beta \) expresses the increase of the integrated cross section with \(\sqrt{s}\). The scaling of the cross section with M, integrated over \(p_{\mathrm{T}}\), was previously discussed in Ref. [11], but using process-specific arguments rather than completely general dimensional-analysis considerations.

Fig. 1
figure 1

Mid-rapidity Drell–Yan differential cross sections as a function of the dilepton mass, as measured by ATLAS [12] and CMS [13, 14]

We will now consider the Drell–Yan \(\mathrm{d} \sigma /\mathrm{d} M\) differential cross sections measured by ATLAS [12] and CMS [13, 14], in pp collisions at \(\sqrt{s}=7\) and 8 TeV, shown in Fig. 1 for the \(M < 50\) GeV range. It can be recognized from the previous discussion that \(\mathrm{d} \sigma /\mathrm{d} M\) has the same \(M^{-3}\,(\sqrt{s}/M)^{\beta }\) scaling behaviour as \(\mathrm{d} \sigma /\mathrm{d} p_{\mathrm{T}} \) at fixed \(p_{\mathrm{T}}/M\) and y. Simultaneously fitting all four data sets to a single \(M^{-\alpha _{\mathrm {DY}}}\) function, only considering point-to-point uncorrelated uncertainties in order to determine the shape parameter with maximum significance, gives a remarkably precise result: \(\alpha _{\mathrm {DY}} = 3.63 \pm 0.03\). Fitting each data set individually gives: \(3.60 \pm 0.05\) (CMS 7 TeV), \(3.63 \pm 0.05\) (CMS 8 TeV), \(3.75 \pm 0.07\) (ATLAS 7 TeV, 2010) and \(3.60 \pm 0.07\) (ATLAS 7 TeV, 2011).

The \(\beta \) exponent, reflecting the \(\sqrt{s}\) dependence, can be derived from the ratio between the cross sections measured at 8 and 7 TeV, directly reported by CMS [14], leading to \(\beta = 0.73 \pm 0.15\), where the uncertainty is obtained assuming a relative uncorrelated luminosity uncertainty of 2%. The resulting \(\alpha -\beta \) difference, \(2.90 \pm 0.15\), is perfectly consistent with 3, as expected.

The Drell–Yan cross sections so far reported by the LHC experiments are integrated over \(p_{\mathrm{T}}\). The future availability of \(p_{\mathrm{T}}\)- and \(p_{\mathrm{T}}/M\)-differential measurements will allow the realisation of more detailed and accurate tests. It is worth noting that the existence of these simple and general kinematic scaling rules has been ignored in all the published analyses of LHC data, an unfortunate situation because of the resulting loss in physics value and also because they can be very useful experimental tools. In fact, detector and trigger acceptances drastically sculpt the reconstructed dilepton distributions, especially in the low momentum and low mass regions. Verifying that the measured \(p_{\mathrm{T}}/M\) and M spectra, after efficiency corrections, etc., satisfy the expected dimensional scaling constitutes a powerful cross-check of the analysis procedure and a validation of the reported systematic uncertainties.

3 Dimensional scaling in quarkonium production

Moving now to our main case study, quarkonium production, we write the differential cross section for the inclusive production of a given S-wave state as

$$\begin{aligned} \frac{\mathrm{d} ^2 \sigma }{\mathrm{d} p_{\mathrm{T}} \, \mathrm{d} y}= & {} m_Q^{-3} \left( \frac{\sqrt{s}}{M}\right) ^{\beta } \sum _i \frac{\mathcal {L}_i(m_Q, M, \xi , y, \sqrt{s}/M)}{m_Q^3} \nonumber \\&\quad \times \; \mathcal {F}_i (m_Q, M, \xi , y, \sqrt{s}/M) ) , \end{aligned}$$
(10)

where the overall factor \(m_Q^{-3}\) matches the global dimensionality of the observable. The only specification of the functions \(\mathcal {L}_i\) is that they have dimension \([\mathrm {energy}]^3\), formally compensated by the \(m_Q^3\) denominators, while the \(\mathcal {F}_i\) are dimensionless shape functions (defined with \(\int {{\mathcal {F}}} \,\mathrm{d} \xi \, \mathrm{d} y = 1\), as in the Drell–Yan case). The \(\sqrt{s}/M\) power-law factor represents the modification from partonic to observable level, as from Eqs. 3 to 9. The coefficient \(\beta \) is the same as for Drell–Yan production. From the previous discussion, we can obtain a precise evaluation of its value at \(\sqrt{s} \simeq 7\) or 8 TeV as the difference between the average experimental mass scaling exponent \(\alpha _{\mathrm {DY}}\) and the expected bare-cross-section scaling exponent: \(\beta = \alpha _{\mathrm {DY}} - 3 = 0.63 \pm 0.03\).

The expression above, Eq. 10, is built in analogy to the NRQCD factorized expansion. In this limit, \(\mathcal {F}_i\) and \(\mathcal {L}_i\) encode the information on, respectively, the hard scattering process producing a \(Q\overline{Q}\) pair, of mass \(\simeq 2\, m_Q\), and its ensuing long-distance transition to the final bound state \(\mathcal {Q}\), of mass M. In NRQCD the LDMEs \(\mathcal {L}_i\) are independent of laboratory kinematics and only depend on degrees of freedom naturally defined in the \(\mathcal {Q}\) rest frame (M, \(m_Q\), v, quantum numbers, spectroscopic energy levels, etc.), while the kinematic dependence is contained in the SDCs, corresponding to \(\mathcal {F}_i / m_Q^6 (\sqrt{s}/M)^{\beta }\), where the index i indicates one specific \(Q\overline{Q} (^{2s+1}l_j^{c}) \rightarrow \mathcal {Q}(^{2S+1}L_J)\) production channel.

In this formulation, a consequence of the factorization hypothesis is that each \(\mathcal {F}_i\) function should be independent of M, being, in particular, the same for all directly produced \(^3S_1\) charmonia and bottomonia. In fact, it is important to note that the mass difference between the \(\mathcal {Q}\) and \(Q\overline{Q}\) states plays no role in the \(p_{\mathrm{T}}/M, {\hat{y}}\) dependence of \(\mathcal {F}_i\), given that, similarly to the physical transitions between states of the same quarkonium family [15], the \(Q\overline{Q} \rightarrow \mathcal {Q}\) transition preserves the ratio between laboratory-vector-momentum and mass (and, therefore, both \(p_{\mathrm{T}}/M\) and \({\hat{y}}\)).

Fig. 2
figure 2

Mid-rapidity prompt quarkonium cross sections in pp collisions at \(\sqrt{s} = 7\) TeV (left) and 13 TeV (right), as measured by ATLAS (red markers) [16,17,18] and CMS (blue and green markers) [19,20,21] (top panels), and after scaling up all the normalizations to match those of the \(\mathrm{J}/\psi \) cases (middle panels). For each collision energy, the curves represent a single universal empirical function, of shape determined by a simultaneous fit to all data points of \(p_{\mathrm{T}}/M > 2\) and normalizations specific to each quarkonium state. The bottom panels show the pulls between each data point and the fitted function

While these notations have been chosen to accommodate the NRQCD factorization expansion as a limit case, Eq. 10, with generic \(\mathcal {L}_i\) and \(\mathcal {F}_i\) both redundantly depending of the relevant variables, represents a fully general template of the cross section, with no prejudice on its physical scaling properties.

Very interesting physical indications transpire from the unforeseen simplicity of the trends measured by ATLAS and CMS, as discussed in the next paragraphs.

The first experimental input to our considerations are the quarkonium cross sections measured at mid-rapidity (\(|y| \lesssim 2\)) by the ATLAS and CMS experiments, shown in the top panels of Fig. 2 for \(\sqrt{s} = 7\) TeV (left) and 13 TeV (right). All the measurements, from the \(\mathrm{J}/\psi \) to the \(\varUpsilon \)(3S), scale with \(p_{\mathrm{T}}/M\) in a state-independent way, at least for not very low \(p_{\mathrm{T}}\) (\(p_{\mathrm{T}}/M \gtrsim 2\)). This “universality” is well illustrated by the middle panels, which clearly show that, after rescaling the state-specific normalizations to those of the \(\mathrm{J}/\psi \), the cross section shapes become indistinguishable from each other. The goodness of this universal scaling can be better appreciated by looking at the bottom panels, which show (in a linear scale) the pull distributions, i.e. the differences between each data point and the universal fitted function, divided by the measurement uncertainty. No systematic trends are seen in the pull distributions and the observed deviations are very well compatible with statistical fluctuations. The cross section fits consider correlations between the luminosity uncertainties in each data set and the pulls, evaluated to check the consistency with one common shape, are calculated excluding such uncertainties.

A slightly broader \(p_{\mathrm{T}}/M\) distribution is observed at the higher energy, as shown in Fig. 3-top. The fact that the distinct P- and S-wave states show a compatible kinematic scaling is discussed in Ref. [22]. Here we focus on the precisely measured cross sections of the closely related \(^3S_1\) states. Their universal \(p_{\mathrm{T}}/M\) scaling, at a given energy, indicates that the same mixture of processes (or one common dominating process) describes the production of all these states.

The second experimental indication for our following discussion is the mass scaling of the cross section from charmonium to bottomonium. By exploiting the \(p_{\mathrm{T}}/M\) scaling we can determine it without relying on model-dependent extrapolations to low \(p_{\mathrm{T}}\): it is sufficient to consider the fitted \(p_{\mathrm{T}}/M\) distributions at an arbitrary \(p_{\mathrm{T}}/M = (p_{\mathrm{T}}/M)^\star \). We consider for each family the meson closest to the ground state, \(\mathrm{J}/\psi \) and \(\varUpsilon \)(1S), at the two energies (Fig. 3-bottom).

To obtain the yield of directly produced \(\mathrm{J}/\psi \) mesons, we subtract from the fitted prompt-\(\mathrm{J}/\psi \) normalization those of the \(\chi _{c1}\), \(\chi _{c2}\) and \(\psi \mathrm{(2S)}\) states (always considered at the same \(p_{\mathrm{T}}/M\) value: as mentioned before, \(p_{\mathrm{T}}/M\) is transferred unchanged from mother to daughter), weighted by the branching fractions [23] of the respective feed-down processes. The result is a \(\mathrm{J}/\psi \) direct-production fraction of 31.9%. For the \(\varUpsilon \)(1S) we assume a direct-production fraction of \((50 \pm 10)\%\). Later in this paper we will need the corresponding fractions for the \(\varUpsilon \)(2S) and \(\varUpsilon \)(3S), which we assume to be 60 and 70%, respectively, also with \(\pm 10\)% uncertainties. These values were estimated from \(\varUpsilon \)(nS) yield ratios measured by ATLAS [17] and CMS [20], complemented by LHCb data on the (large) \(\chi _b\mathrm{(mP)} \rightarrow \varUpsilon \mathrm{(nS)}\) feed-down fractions [24].

Fig. 3
figure 3

Shape comparison between the 7 and 13 TeV \(p_{\mathrm{T}}/M\)-differential quarkonium cross sections (top) and normalization comparison between the \(\mathrm{J}/\psi \) and \(\varUpsilon \)(1S) at the two energies (bottom)

The resulting mass scaling, measured as

$$\begin{aligned} \frac{\left. \mathrm{d} \sigma / \mathrm{d} p_{\mathrm{T}} (\varUpsilon (\mathrm {1S}))\right. _{\xi = \xi ^\star } }{\left. \mathrm{d} \sigma / \mathrm{d} p_{\mathrm{T}} (\mathrm {J}/\psi )\right. _{\xi = \xi ^\star } } = \left( \frac{m_b}{m_c}\right) ^{-\alpha } , \end{aligned}$$
(11)

does not show a significant energy dependence: \(\alpha = 6.6 \pm 0.1\) and \(6.5 \pm 0.1\) at 7 and 13 TeV, respectively, with \(2\, m_c \simeq M(\eta _c\mathrm{(1S)}) = 2.984\) GeV and \(2\, m_b \simeq M(\eta _b\mathrm{(1S)}) = 9.389\) GeV [23]. Subtracting the \(\beta \) value mentioned above, \(0.63\pm 0.03\), measured in Drell–Yan production at 7 and 8 TeV, we find that the partonic-level differential cross section changes as \(\simeq m_Q^{-6}\) between the charmonium and bottomonium families.

These experimental facts constrain and specify the functions \(\mathcal {L}\) and \(\mathcal {F}\) appearing in Eq. 10. The independence of the \(p_{\mathrm{T}}/M\) scaling on either M or \(m_Q\), at a given energy, indicates that the \(p_{\mathrm{T}}/M\) and \((M,m_Q)\) dependences do not “mix”: \(\mathcal {L} \; \times \; \mathcal {F}\) = \(\mathcal {L}(m_Q, M, \sqrt{s}/M) \; \times \; \mathcal {F} (\xi , y, \sqrt{s}/M )\) (further studies considering forward-rapidity, low-\(p_{\mathrm{T}}\) data will assess the combined \(p_{\mathrm{T}}/M, y\) dependence of \(\mathcal {F}\), here effectively a function of only \(p_{\mathrm{T}}/M\)). In other words, there is experimental evidence that, at mid rapidity and not too low \(p_{\mathrm{T}}\), it is possible to describe quarkonium production “factorizing” short- and long-distance effects, defined as the dependences on, respectively, the laboratory momentum of the detected meson (\(\mathcal {F}\)) and the specific bound-state properties (\(\mathcal {L}\)).

The measured \(m_Q^{-6}\) scaling of the differential cross section at a fixed \(p_{\mathrm{T}}/M\) (and y) further specifies \(\mathcal {L}\). By precisely equating the explicit \(m_Q\) dependence of the expression in Eq. 10 (coming from the overall factor and the denominators of the \(\mathcal {L}\) terms, with \(\mathcal {F}\) now taken independent of \(m_Q\)), such result leaves no margin for a dependence of \(\mathcal {L}\) on \(m_Q\), if not counterbalanced by a dependence on M and/or \(\sqrt{s}\): \(\mathcal {L} = \mathcal {L}(m_Q / M, m_Q / \sqrt{s})\). However, given that no significant difference in mass scaling is observed at 7 and 13 TeV, a \(m_Q / \sqrt{s}\) dependence would be experimentally indistinguishable from the analogous global energy scaling of all quarkonium cross sections already accounted for by the \(\beta \) power law in Eq. 10. It is therefore always possible to define the long-distance factors as \(\sqrt{s}\)-independent, thereby reducing them to functions of the kind \(\mathcal {L} = \mathcal {L}(m_Q / M)\), as is illustrated by the discussion starting in the next paragraph.

It should be clear from these considerations that such \(\mathcal {L}\) functions do not coincide with the LDMEs of NRQCD: instead of being defined by setting an energy scale within the theory, they are built on the basis of dimensional-analysis and data patterns, becoming universal, experimentally definable quantities. Apart from differences in operative definitions, we note that the remarkably simple picture of S-wave quarkonium production emerging from the data mirrors well the primary concepts of NRQCD factorization and of universality of the long-distance bound-state formation effects. A future verification of the corresponding experimental patterns in different collision systems can further probe these fundamental concepts.

By adopting now the guiding idea of factorization of short- and long-distance effects, as supported by data, we can analyse in more detail the mass scaling, using all S-wave states.

Fig. 4
figure 4

Mid-rapidity direct production cross sections of \(^3S_1\) quarkonia, at \(\sqrt{s} = 7\) TeV and \(p_{\mathrm{T}}/M = 7\). In the top panel, the (single) black line is the result of a fit to the five cross sections, after extrapolating them (red and blue lines) to \(\sigma (2\,m_Q)\). The bottom panel shows the common trend of the cross sections, normalized to the extrapolated values, for the five quarkonium states

Figure 4-top shows the values of the direct production cross sections of the S-wave quarkonia, for one specific and arbitrarily chosen point, \(p_{\mathrm{T}}/M = 7\), as derived from the \({{\mathcal {F}}}(p_{\mathrm{T}}/M)\) functions shown in the top panel of Fig. 2. Given the universality of the function, a different \(p_{\mathrm{T}}/M\) reference point only leads to an overall vertical rescaling of the points in Fig. 4, not affecting the mass dependence itself. For the \(\mathrm{J}/\psi \) and \(\varUpsilon \)(nS), the prompt cross sections were scaled down by the direct-production fractions mentioned earlier. The figure only shows the 7 TeV pattern; completely analogous results are found in the 13 TeV data (apart from a significantly different global normalization scale), as discussed hereafter.

From the \(\mathrm{J}/\psi \) to the \(\varUpsilon \)(3S) we observe, besides the global drop by three orders of magnitude, a “fine structure”: within each quarkonium family, the cross section decreases faster than from one family to the other. In the factorization perspective, the global drop represents the short-distance scaling, reflecting the change from \(M(Q\overline{Q}) \simeq 2\, m_c\) to \(\simeq \) \(2\, m_b\). The steeper change within each family reflects the M-dependent \(Q\overline{Q} \rightarrow \mathcal {Q}\) transition probability \(\mathcal {L}(m_Q, M)\). The factorized interpretation is represented by the fit curves shown in the figure. First (red and blue curves) the charmonium and bottomonium cross sections are extrapolated, respectively, to \(2\, m_c\) and \(2\, m_b\) (as defined above). From the resulting \(\sigma (2m_b) \,/\, \sigma (2m_c)\) ratio, the short-distance charmonium-to-bottomonium mass scaling (black curve) is determined as \(\propto m_Q^{-\alpha _{Q\overline{Q}}}\), with \(\alpha _{Q\overline{Q}} = 6.63 \pm 0.08\), practically identical to the previously obtained result, considering only the \(\mathrm{J}/\psi \) and the \(\varUpsilon \)(1S), leading to the \(m_Q^{-6}\) partonic-level scaling discussed above.

Here we focus on the long-distance mass dependence. The bottom panel of Fig. 4 shows the dependence of the \(\sigma (\mathcal {Q})/\sigma (2\,m_Q)\) ratio on \(M/(2\,m_Q)\): one common power-law, \(\sigma (\mathcal {Q})/\sigma (2\,m_Q) = [M/(2\,m_Q)]^{-(9.7 \pm 0.3)}\), describes very well both the \(\psi \) and the \(\varUpsilon \) data points.

Interestingly, as shown in Fig. 5, this monotonic dependence can be seen as a tight correlation of the long-distance factors to the quarkonium binding energy, defined as \(2M(D^0) - M(\psi \mathrm{(nS)})\) or \(2M(B^0) - M(\varUpsilon \mathrm{(nS)})\) and calculated with mass values from Ref. [23]. As a function of this variable we see another “universal” trend: the data points are well described by the parametrization \(\sigma (\mathcal {Q})/\sigma (2\,m_Q) \propto E_{\mathrm {binding}}^{\delta }\), common to charmonium and bottomonium and identical at 7 TeV (\(\delta = 0.63 \pm 0.02\)) and 13 TeV (\(\delta = 0.63 \pm 0.04\)). The central values of the \(\delta \) and \(\beta \) parameters are identical by coincidence.

Fig. 5
figure 5

Correlation between the long-distance factor \(\sigma (\mathcal {Q})/\sigma (2\,m_Q)\) and the quarkonium binding energy, at 7 and 13 TeV

This result gives further support to the idea that the dependence of the cross sections on the bound-state masses is a “factorizable” long-distance effect, independent of the laboratory-momentum.

We have until now considered only S-wave states. Given the absence of detailed \(\chi _b\) cross section data at mid rapidity, we can put to test the assumption that the “universal” \(\sigma (\mathcal {Q})/\sigma (2\,m_Q) \propto E_{\mathrm {binding}}^{\delta }\) dependence of the long-distance factors, with \(\delta \simeq 0.63\), can be extended to the P-wave states, even if with a different normalization constant multiplying \(E_{\mathrm {binding}}^{\delta }\) to account for a dependence on the orbital angular momentum. In this way, \(\chi _c\) data come to constrain the \(\chi _b\)(nP) cross sections, through the relation \(\sigma (\chi _{bJ}\mathrm{(nP)}) = [ E_{\mathrm{binding}}(\chi _{bJ}\mathrm{(nP)}) / E_{\mathrm{binding}}(\chi _{cJ}) ]^{\delta } \sigma (\chi _{cJ})\), corresponding to the assumption that there is no dependence on J, n, or quark flavour.

Using branching-ratio measurements from Ref. [23], the feed-down structure of quarkonium production can be fully predicted. The result is presented in Table 1. It should be kept in mind that these results correspond to the \(p_{\mathrm{T}}/M > 2\) region. The predicted feed-down fractions of \(\varUpsilon \)(nS) production from \(\chi _b\)(nP) states are in reasonable agreement with forward-rapidity LHCb measurements merging the \(J=1\) and \(J=2\) signals [24], considered for \(p_{\mathrm{T}}/M > 2\).

Table 1 Feed-down fractions between the several quarkonia, for \(p_{\mathrm{T}}/M > 2\). The first uncertainty reflects the precision of our fit, while the second reflects the branching fractions

4 Summary and discussion

At the current level of experimental precision, mid-rapidity LHC proton-proton data for inclusive charmonium and bottomonium production are well described by a simple parametrization reflecting a universal (i.e. state-independent) scaling with two variables: the shapes of the \(p_{\mathrm{T}}\) distributions of all states become one common shape as a function of \(p_{\mathrm{T}}/M\) and the normalization of this shape scales in a simple monotonic way with \(E_{\mathrm {binding}}\), at least for the S-wave states. While the cross section shape as a function of \(p_{\mathrm{T}}/M\) depends on \(\sqrt{s}\), the normalization scaling with \(E_{\mathrm {binding}}\) is found to be identical at 7 and 13 TeV.

Having a simple empirical parametrization faithfully describing the proton-proton measurements is certainly very useful for model-independent studies of quarkonium production, especially when considering more complex collision systems such as proton-nucleus and nucleus-nucleus collisions. It is also very convenient for the tuning of Monte Carlo simulations. More importantly, the analysis reported in this paper, exclusively based on non-trivial (albeit potentially misleadingly simple) dimensional analysis arguments, reveals significant experimental evidence supporting that quarkonium production can be understood as being factorized between short- and long-distance phases. This data-driven result mirrors very well the general concept of NRQCD factorization into process-dependent SDCs and universal LDMEs. More generally, seeing evidence of factorization in the LHC data is an important step towards establishing that the \(Q\overline{Q}\) pairs are produced uncorrelated from the rest of the event, allowing us to experimentally probe how they evolve and bind into the observable spectra of quarkonium states.

The experimental verification of the factorization ansatz can be extended by including in the analysis the available forward-rapidity LHCb measurements. Such study, currently ongoing, has a higher order of complexity: since the \(p_{\mathrm{T}}/M\) distributions show a significant y dependence, the determination of scaling patterns is no longer an effectively one-dimensional problem as in the case where only mid-rapidity data are considered. Further tests, which may become possible in the future, include the study of the production of quarkonia associated to specific final state particles and the search for simplicity patterns in different kinds of collisions.

More data on the production of P-wave quarkonia are needed, especially in the bottomonium family: the \(\chi _b\) absolute cross sections have not yet been measured at mid rapidity, and the forward rapidity results, reported relative to \(\varUpsilon \)(nS) production, do not distinguish between the \(J = 0,1\), and 2 states. It is reassuring, however, to see that those forward rapidity measurements are in reasonable agreement with the pattern of \(\chi _{bJ}\mathrm{(nP)} \rightarrow \varUpsilon \) feed-down fractions that we obtained assuming that the simple \(E_{\mathrm {binding}}\) scaling pattern can be extended to the \(\chi \) states, an assumption that follows very naturally from the work presented in this paper. The detailed feed-down structure of the charmonium and bottomonium families we have determined in this work needs to be tested by new measurements, filling gaps in the experimental picture. It cannot be excluded that future measurements of direct \(\chi _{cJ}\) and \(\chi _{bJ}\) production will show that the scaling with binding energy is not the same for the P- and S-wave states. Such a result would still be very interesting, naturally. For example, the difference in scaling trends, when known, could be used to see if new resonances, as the X(3872), align better with the S- or the P-wave mass trends, thereby contributing to the understanding of their nature.

A complete account of all the feed-down fractions is particularly important in quantitative analyses of the quarkonium production and suppression patterns observed in nucleus-nucleus collisions. It has been argued since long, in particular, that the observed suppression of the \(\mathrm{J}/\psi \) signal, both from pp to heavy-ion collisions and from peripheral to central nuclear collisions, might be dominantly (or fully) caused by the suppression of the \(\psi \mathrm{(2S)}\) and \(\chi _c\) excited states, more loosely bound and, hence, easier to dissolve in the hot medium created in those interactions. Such hypotheses can only be reliably and quantitatively tested if the hierarchy of feed-down relations between states is taken into account. This computation is now feasible, assuming the global \(E_{\mathrm {binding}}\) scaling that leads to the values collected in Table 1 for pp collisions, complemented by a model for its modification in nucleus-nucleus collisions. Such studies will finally address, and possibly provide evidence for, the concept of sequential quarkonium suppression, an ideal signature of quark-gluon plasma formation [25] that applies to the directly-produced states, after the subtraction of the feed-down contributions.