1 Introduction

The aim of this paper is to study the evolution of profitability of private equity across time and geographically, and more specifically, whether the shocks in the series over time feature transitory or permanent effects, looking at both aggregated and disaggregated data by region. We focus on four specific areas: US, Europe, Asia/Pacific and Rest of the World, along with the “Total” data. For this purpose, we use methodologies based on the concept of fractional integration, employing updating techniques in time series analysis.

Assessment of the type of shock is normally performed by unit root tests (Dickey and Fuller 1979; Phillips and Perron 1988 and others). In the case that the process has no unit roots, it is supposed to be stationary and hence exhibiting reversion to the mean (the lagged level, pre-shock, will drive the reversion to the mean). However, should it have unit roots, the process does not revert to the mean, the shock having a permanent effect on the series. In this article, we depart from these classical methods by using fractional integration, which is more flexible and general in the sense that it allows fractional degrees of differentiation and mean reversion takes place as long as the differencing parameter is significantly smaller than 1. As later explained in the manuscript, this will allow us to consider flexible approaches, including, for example, nonstationary though mean reverting processes if the differencing parameter is in the range [0.5, 1). Thus, the main objective of the paper is to determine if shocks in profitability of private equity have transitory or permanent effects, and we determine this by estimating the degree of differentiation of the series from a fractional viewpoint.

The structure of the paper is as follows: Sect. 2 presents a historical context of the profitability of private equity. Section 3 deals with the methodology employed in the paper. Section 4 displays the dataset and the main empirical results, while Sect. 5 concludes the paper.

2 Literature review

Private Equity (PE) has been mainly studied from these four perspectives: a) profitability (i.e. performance against a benchmark), b) key factors to select companies to invest in, c) valuation of PE funds, and d) interaction with limited partners (LP).

Research on profitability shed contradictory results ranging from—6% (Phalippou and Gottschalg 2009) to + 32% (Cochrane 2005). Regarding the benchmarks, Steger (2017) defends an index excluding large capitalized companies such as the Russel 2000 Index in the wake of a substantial part of private equity funds investing in small or mid-sized companies. Furthermore, several measures have been employed to track returns. IRR (Internal Rate of Return),Footnote 1 TVPI (Total Value to Paid-In, or “Money Multiple”) and PME (Public Market Equivalent). TVPI is defined by Phalippou and Gottschalg (2009) as the sum of all cash distributions plus the latest Net Assets Value (NAV) (which serves as a proxy for future cash flows), divided by the sum of all drawdowns. The PMEFootnote 2 approach, documented in Kaplan and Schoar (2005) is calculated as the sum of all discounted cash outflows over the sum of the discounted cash inflows, where the total return of the S&P 500 Index is used as the discount rate.

Gompers and Lerner (2000), Gompers et al. (2005) studied the organizational structure and performance of various venture capital funds. They found a strong positive relationship between the degree of specialization by individual venture capitalists at a firm and the firm’s success. They also concluded that experienced funds outperformed inexperienced funds, and that small and inexperienced funds are the main drivers of low performance in private equity funds.

Phalippou and Gottschalg (2009) depict a fund having typically a life of ten years, which can be extended to thirteen, reporting quarterly a Net Asset Value that reflects the value of on-going investments, and basically are non-tradable. Also, they suggest that two different assumptions have been made concerning the treatment of the final NAVs. The first and most frequent one treats the final NAV as a cash inflow of the same amount at the end of the sample period. That is, NAVs are assumed to be an unbiased assessment of the market value of a fund (e.g., Kaplan and Schoar 2005, and industry benchmarks). The second one only computes cash flows (e.g., Ljungqvist and Richardson 2003), what is applicable to “mature” funds and to follow up “on-going” funds (median IRR takes 8 years to turn positive).

As stated by Brown et al. (2016), adoption of SFAS 157Footnote 3 has not prevented PE firms from manipulating NAVs in two directions; inflate NAVs during times that fundraising activity is likely to occur,Footnote 4 and in contrast, top-performing funds under-report returns, which is a way to insure against future bad luck that could make them appear as though they are NAV manipulators.

As to the behavior itself of the profits over the time reaped by the PE, research has been oriented to the phenomenon of persistence, rather than presenting static models to explain its evolution over time.

Kaplan and Schoar (2005) find persistence not only between two consecutive funds, but also between the current fund and the second previous fund (unlike for mutual funds, that in case of existing, it is driven by underperformance, rather than overperformance). Moreover, the results suggest a statistically and economically strong persistence in private equity, particularly for Venture Capital funds. However, the persistence seems to have been declining according to Brown et al. (2016) and Korteweg and Sorensen (2017).

Ang et al. (2018) state that the structure and nature of the data are limited, which makes it particularly difficult to evaluate its time series properties, and assessing PE returns, construct an index for separate classes, which shows that their cycles are not highly correlated. This suggests that a diversified strategy across sub-asset classes of PE may be beneficial. Moreover, the authors’ index exhibits negligible serial dependence, in contrast to industry indices. This result is consistent with the smoothing induced by a conservative appraisal process or by a delayed and partial adjustment to market prices, which often arises in illiquid asset markets (see, e.g., Geltner 1991, and Ross and Zisler 1991).

More recently, Harris et al. (2020), using ex post or most recent fund performance (as of June 2019), confirm the findings on persistence overall as well as for pre-2001 and post-2000 funds.

3 Methodology

As earlier mentioned, the main objective in this paper is to determine if shocks in the series have permanent or transitory effects. For this purpose the most standard approach are the unit root procedures, widely employed to determine if the series of interest is stationary I(0) (and thus with shocks being temporary) or nonstationary I(1) (in which case shocks will have a permanent nature). Within this methodology, the ADF test (Dickey and Fuller 1979) is the most widely used procedure, though other more robust methods were later developed, including Phillips and Perron (1988), Kwiatkowski et al. (1992), Elliott et al. (1996), Ng and Perron (2001), etc. Nevertheless, all these methods have the drawback that they simply consider two potential scenarios, I(0) and I(1) and do not take into account fractional degrees of differentiation. This is important, noting that many authors have shown that the above mentioned procedures have extremely low power if the true data generating process is fractionally integrated. Classical references here are Diebold and Rudebush (1991), Hassler and Wolters (1994) and Lee and Schmidt (1996). Thus, in this paper we use an I(d) modelling framework of the following form:

$$(1-B)^d {x_t} = {u_t}, \,\,\,\,\,\,\,\,\,t = 1, \,2,\,...,$$
(1)

where the operator B indicates a backshift function (i.e., Bxt = xt-1) and where ut is an integrated of order 0 or I(0) process, properly defined as a covariance or second order stationary process where the infinite sum of its autocovariances is finite. Thus, ut may be a white noise process but it might also display a weak autocorrelated (e.g., ARMA) structure.

The estimation of d is conducted via the Whittle function in the frequency domain and is implemented throughout a testing statistic derived in Robinson (1994), which is supposed to be the most efficient method in the Pitman (1936) sense against local departures from the null. We use a simple version of his method that is based on the following model,

$${\mathrm y}_t=\mathrm\beta^T{\mathrm z}_t+x_t;\;\;(1-B)^dx_t=u_{t\mathit,}\;\;t=1,2,...,$$
(2)

where zt is a vector of deterministic terms that may include a constant and a linear trend among other terms, and ut is supposed to be I(0). Based on this set-up, Robinson (1994) proposed testing the null hypothesis:

$$\mathrm{Ho}:\mathrm{ d}={\mathrm{d}}_{\mathrm{o}},$$
(3)

for any real value do, including thus values in the stationary range (do < 0.5) as well as those being nonstationary (do > 0.5). In addition, another advantage of this approach is that its limit distribution is standard normal and this holds independently of the regressors used in zt, the values of do, and the specific structure of the I(0) error term ut.

Employing alternative parameteric methods (e.g., Sowell 1992) or even semiparametric ones (Shimotsu and Phillips (2005, 2006), the results were qualitatively very similar to those reported in this paper.

4 Data and empirical results

Quarterly data (available up to Q3-21) were retrieved from Cambridge Asociates LLC (CA) hosted in Eikon-Reuters database, selecting in first place, all the world and all kinds of assets, and secondly, broken down by geographical areas, according to the following four regions: United States, Europe, Asia/Pacific, Rest of World, and All the World). Together they produce a table with 744 records, of which 713 were finally counted after missing values were excluded.

The chosen metrics were “Pooled IRR” instead of leaning on other IRRs (average or weighted) or TVPI.Footnote 5 Due to having worked with the entire database from CA, the start periods for the analysis matches with those of the database, which vary depending on the geographical area and class of investment,

Table 1 gathers the starting dates and number of observations for each region along with maximum and minimum IRR values. Descriptive statistics are reported in Table 2. We see that the median quarterly IRR for the overall PE industry was 3.15% during the period spanning from 1981Q2 to 2021Q3Footnote 6 (3.22% in the United States, 3.78% in Europe, 1.79% in Asia/Pacific). The time series plots and their corresponding histograms are displayed in the Appendix. We observe that skewness to the right (positive index) is present in almost all the areas (also including the aggregation of total world) with the exception of the Rest of World, whereas the four regions (and also total world) show a leptokurtic distribution (index > 3, indicating that the values are largely concentrated around the mean). Shapiro–Wilk tests reject the hypothesis of the pooled IRR stemming from a normal distribution (graphically histograms overlaying normal distribution are featured in Appendix: Histograms), and addressing randomness, Runs tests only spot the returns from Europe to stick to a random process.

Table 1 Starting dates and maximum and míimum IRRs
Table 2 Descriptive data by Area-Type of Asset

In the empirical application, we consider that xt in (1) can be the errors in a regression model incorporating an intercept and a linear time trend,

$${{\text{y}}_{\text{t}}}\,\,\, = \,\,\,\,{\beta_0}\,\,\, + \,\,\,{\beta_1}\,t\,\, + {x_{\text{t}}};\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,t\,\,\,\, = \,\,\,\,1\,,\,\,\,2\,,\,\,...,$$
(4)

where β0 and β1 denote the unknown coefficients of these deterministic terms. In other words, the estimated model is:

$${{\text{y}}_{\text{t}}}\,\,\, = \,\,\,\,{\beta_0}\,\,\, + \,\,\,{\beta_1}\,t\,\, + {x_{\text{t}}};\,\,\,\,\,\,\,\,\,\,\,\,{(1\,\, - \,\,B)^d}{x_t}\,\,\, = \,\,\,{u_{t,\,\,\,}}\,\,\,\,\,t\,\,\,\, = \,\,\,\,1\,,\,\,\,2\,,\,\,...,$$
(5)

and we report the estimates of the differencing parameter d under three different scenarios: i) first, we consider the case with no deterministic components, i.e., assuming that β0 and β1 are both set up equal to 0 a priori in Eq. (3); ii) then, we only include a constant, so β1 = 0, and iii) finally, with both coefficients, β0 and β1 freely estimated from the data along with d. In addition, we make different assumptions with respect to the error term ut in (3). Thus, in Table 3, we suppose ut is a white noise process; in Table 4, ut is allowed to be autocorrelated; however, instead of imposing here a given parametric model, we use the exponential spectral approach of Bloomfield (1973), which is non-parametric in the sense that no functional form is presented for ut but simply displaying its spectral density function, which is very similar (in logs) to the one produced by AR structures. Finally, in Table 5, and based on the quarterly structure of the data, a seasonal AR(1) process will be adopted.

Table 3 Empirical results based on the assumption of white noise errors
Table 4 Empirical results based on the assumption of autocorrelated (Bloomfield) errors
Table 5 Empirical results based on the assumption of seasonally autocorrelated errors

Starting with the results based on white noise errors, in Table 3, the first thing we observe is that the time trend is not required in any single case, and the intercept coefficient is significant only for the case of Europe. More importantly, and focusing on the degree of integration, we observe that the values of d range from -0.08 in Europe to 0.43 in the USA. The null hypothesis of short memory or I(0) behavior cannot be rejected for Europe, although it is rejected in the remaining cases in favor of long memory (d > 0) or fractional integration, this value being 0.26 for Asia–Pacific; 0.35 for Rest of the World, and 0.41 for Total, the latter being clearly influenced by the large number obtained for the USA. It should be noticed here that only for Europe and Asia/Pacific the estimates of d are within the stationary region since the upper bounds of the confidence intervals are still below 0.5. However, for the remaining three series (United States, Rest of the World and Europe) the confident bands include values which some are below 0.5 while others are above 0.5).

If we allow for autocorrelation, first using the exponential spectral model of Bloomfield (1973), (Table 4) we notice first that the time trend coefficient is now statistically significant for Europe and Asia–Pacific, in the former case with a negative coefficient and in the latter with a positive one (see lower part of the table). With respect to the order of integration, the value is negative for Europe and Asia–Pacific, where the I(0) hypothesis cannot be rejected along with the Rest of the World (d = 0.03). However, for Total and the USA, the coefficient is significantly positive supporting once more the hypothesis of long memory (the estimated value of d is equal to 0.28 for Total and 0.30 for the USA). Note here that for United States and Total, the confidence intervals are very wide including values of d outside the stationary region (d ≥ 0.5). Finally, if seasonal autoregressions are permitted, in Table 5, the results are very similar to those based on white noise errors (Table 3) finding no evidence of time trends; I(0) behavior for the case of Europe and long memory (d > 0) in all the other cases, especially for the US data.

As a robustness method, we also use two widespread semiparametric estimation methods, the log-periodogram estimator (Geweke and Porter-Hudak 1983), and the local Whittle estimation approach of Künsch (1987) (Table 6). In both cases, a bandwidth parameter specifying the number of Fourier frequencies must be fed between 0 and 1, for which we follow Weijie et al. (2021) who propose the interval (0.58, 0.67) for the GPH estimator for a sequence length of 100, being (0.59, 0.68) when the length is 300. Results shown on Table 5 are consistent with those reported across Tables 2, 3, 4, with evidence of long memory being found in all cases except for Europe. Performing a parametric approach based on Haslett and Raftery (1989), the results are once more consistent with the previous one and long memory is found in all cases except for Europe (see Table 7).

Table 6 Robustness tests of parameter “d” (GPH and local Whittle, for bandwidth = 0.65)
Table 7 Estimation of d based on optimal ARFIMA case

Impulse responses for each region cannot be properly calculated noting that there is no explicit model in case of autocorrelated errors (or when the estimates are semiparametrically calculated). Nevertheless, and as approximation, we have computed half-lifeFootnote 7 shocks under the assumption of an AR(1) structure. Results are displayed in Table 8. They are consistent with our previous results noting that half-lives are smaller in the geographical areas possessing a lower degree of persistence (0.99 quarters for Europe versus 0.17 in Europe).

Table 8 Calculation of Half-Life cycles assuming an AR(1) structure

5 Conclusions

Our results based on fractional integration confirm the stationarity in the PE returns measured by “Pooled IRR” series, though showing evidence of long memory behavior in all series except for Europe. The USA displays the highest degree of persistence, following by the Rest of the World and Asia, while the order of integration for Europe is close to 0 by all methods employed.

The main finding of this paper underpins the idea that shocks will have long impacts in all regions except for Europe (more remarkably in case of the US, which may be a competitive advantage in the case of shocks ignited by innovation or on the other hand, entail lingering adverse economic effects in the face of supply/demand issues. To dig into the causes, the same analysis should be deployed by kind of asset, which could reveal a different composition of investment by geographical area. Furthermore, a long memory process cast doubts on the independence of the returns and would support the idea of GP smoothing the reported profits, yet it also might be the aftermath of better performance, as mentioned by Kaplan and Schoar (2005). Also, the short memory of PE returns in Europe (in the sense of lack of strong persistence), poses the question of benchmarking its performance against the United States (despite average IRR being somewhat higher in Europe, the standard deviation is higher which leads to a lower ratio mean-standard).

In this regard, industry practices should evolve towards a higher transparency, not ruling out, to place some of them legally in force. Firstly, this industry lags in IT in comparison with some other financial sectors. A higher embracement of digitization would enable a superior statistical handling of data, including what if analyses. Secondly, the attempts by supervising entities (such as SECFootnote 8) to broaden disclosure rules should be adopted. This double folded objective could be better grasped from the reporting devoted to Environmental, Social, and Governance (ESG) standards.

On the other hand, this study can gain in granularity if applied to (disaggregating) the factors shaping the excess returns of PE (namely, illiquidity premium, management by the GP, leverage, and risk adjustment). Moreover, fractional integration employed in analyzing the performance of public stock markets can be extended to private equity indexes, and more specifically when they are homogenized in terms of “public market equivalent” (Long and Nickels 1996). From a methodological viewpoint, the model can be extended to allow for non-linear structures including, for example, Chebyshev polynomials in time (Cuestas and Gil-Alana 2016), Fourier transform functions (Gil-Alana and Yaya 2021) or even neural networks (Yaya et al. 2021), all of them within the context of fractional integration. These lines of research will be developed in future papers.