Introduction

A number of studies have already addressed customer lifetime value (CLV), including factors which affect it and models which calculate it and predict it. Incidentally, these models also allow scholars to analyse the impact of the said factors (Blattberg et al. 2001; Gupta et al. 2004; Pfeifer et al. 2005; Reinartz and Kumar 2000; Rust et al. 2000, 2011; Zhang et al. 2015). Many studies focus on the technical construction and financial side of CLV i.e. how to calculate and predict CLV as accurately as possible (Morisada et al. 2019; Pfeifer et al. 2005; Zhang et al. 2015). A comparatively lesser focus has been on factors which affect CLV, with such studies generally utilising questionnaires (Liu and Shih 2005; Lee et al. 2006; Qi et al. 2012, 2015; Tukel and Dixit 2013; Dachyar et al. 2019; Segarra-Moliner and Moliner-Tena 2022; Amendah et al. 2023). Easley (2000) reports that studies utilising questionnaires, and conclusions based on results obtained thereby, have not been verified using transaction data. He considers such verification a necessity if those conclusions are to be considered valid.

This provides opportunities for research based on real data collected by businesses which may reveal actual customer behaviour. Contrary to data collected through surveys which are subject to more or less bias and distortion as customers (as respondents) may not reply truthfully (Bolarinwa 2015), transaction data collected by businesses do not suffer from such limitations, as it captures customers’ decisions and behaviour.

Transaction data refer to events (sequences of events) whose main purpose is to provide information about a transaction and can be analysed to better understand the activities of the enterprise (Hand 2018). Transaction data are very high-frequency data where all transactions are recorded, and these data contain valuable information about the time between transactions as well as underlying information about the type of trades (Hautsch & Pohlmeier 2001).

There are insufficient empirical studies (Donkers et al. 2007; Batislam et al. 2007; Platzer and Reutterer 2016; Chamberlain et al. 2017), and few of those that exist are focussed on online purchase behaviour. Most of those, in turn, only work with data from a single enterprise (Jašek et al. 2019). The lack of studies in the area of e-commerce in this context may lead to theories and practice to be based on knowledge obtained in the offline domain, which may not be fully applicable in e-commerce. This is because customer behaviour differs between online and offline environments in various aspects (Kaur and Kochar 2018; Higueras-Castillo et al. 2023).

Consumer behaviour can be defined as “the totality of consumers’ decisions with respect to the acquisition, consumption, and disposition of goods, services, time, and ideas by human decision-making units” (Hardesty and Bearden 2009). The diffusion of the internet has significantly changed consumer behaviour, as consumers now find it easier to search, share, and buy products online (Tseng and Wei 2020).

Moreover, customer behaviour changes within the e-commerce context itself, such as in relation to the Covid-19 pandemic (Bazi et al. 2022; Hashem 2020). As such, knowledge obtained directly from e-shops before or up to the year 2022 may not be applicable to the present times. Therefore, we focussed our research exclusively on the period after the Covid-19 pandemic, as evident from the data collected, and did not examine the period before or during the Covid-19 pandemic (it is therefore not a comparative study).

In the existing literature focussed on Customer Lifetime Value (CLV), several studies have identified gaps in current knowledge that require attention. Lin et al. (2017) emphasised the inadequate consideration of CLV in general. Additionally, Li et al. (2021) indicated a significant gap in understanding customer behaviour and value within the e-commerce domain. Addressing this issue, Marmol et al. (2021) highlighted the necessity of introducing new insights to enhance resource allocation based on empirical data. Dandis et al. (2022) similarly underscored a substantial gap in understanding CLV and encouraged researchers to expand current theoretical knowledge. Specifically, Amendah et al. (2023) urged the filling of existing gaps in this area by incorporating a combination of financial and non-financial factors when analysing CLV predictors which was fulfilled by our research.

Most available studies investigate a single e-shop (Batislam 2007; Donkers 2007; Kumar et al. 2009; Singh et al. 2009; Zhang et al. 2010; Chan et al. 2011; Kumar & Pansari 2016; Chiang a Yang 2018; Yoseph et al. 2019) and do not categorise e-shops in any way. We are different in this regard. We used data from four different e-shops, which we labelled as SOEFIOR based on their characteristics, thereby homogenising the sample. As such, our research contributes new knowledge from a heretofore insufficiently researched area despite the lower number of e-shops in the sample, making it a possible starting point for follow-up research.

E-commerce and shopping in e-shops is an area which has seen rampant development in the last decade (Aldaej 2019; Zhou et al., (2017); ; ; ; ;, accelerated as it was by the onset of Covid-19 to all sectors of commerce (including clothing, footwear, and groceries) (Higueras-Castillo et al. 2023; Ingaldi & Brožovà, 2020). This makes it important to research e-shops, and factors which affect their success (here operationalised with CLV).

The primary research gap lies in understanding CLV using both financial and non-financial data, focussing on both customer behaviour, including CLV, and firm (economic) behaviour. This research is conducted on multiple enterprises in the post-Covid-19 pandemic context.

The goal of this paper is to identify factors that affect CLV of e-shops. At the theoretical level, we use literature review to construct a model of factors that affect CLV. At the practical level, we provide recommendations which, if followed, may increase the CLV and long-term performance of e-shops.

The paper is structured as follows: We lead with a review of relevant literature focussed on the construction of CLV and potential factors which affect it. We then introduce the methods, describe our sample and how we constructed CLV and the factors which make up the proposed model, and present hypotheses based on which the theoretical model is constructed. This is followed by results, which form the basis for a discussion and finally conclusions including managerial implications and research limitations.

Theoretical framework

Defining and measuring CLV

CLV is defined as “the net profit or loss to a firm obtained from a customer over the entire life of transactions between the two parties” (Jain and Singh 2002 in Morisada et al. 2019). CLV takes into account the time value of money via discounting i.e. the word “value” in CLV corresponds to a financial understanding of value as net present value of a given asset (Pfeifer et al. 2005). From the financial perspective, therefore, we can define CLV as “the present value of the future cash flows attributed to the customer relation” (Pfeifer et al. 2005). Another possible definition is “the net present value of the cash flows attributed to the relationship with a customer, and therefore reflects a customer's future profitability” (Gupta et al. 2004; Zhang et al. 2015). Glady et al. (2009) offer a similar perspective, defining CLV as “the expected sum of discounted future earnings where a customer generates a margin for each period”. It is thus apparent that CLV is based on the financial yield each customer brings for the business, including all potential future yields, including a consideration of (usually all, even future) costs related to servicing the customer.

CLV can be measured based on customer profitability (CP), which refers to the revenues less the costs that one particular customer generates over a given period (Chang et al. 2012). CP is usually measured as “the gross contribution margin, that is, sales revenue less all product-related expenses for all products sold to an individual customer during one particular period” (Bauer et al. 2003; Wang & Splegel 1994 in Chang et al. 2012). CP measures profitability of a database of customers based only on interactions of those customers with the business, meaning that the customers’ spending with competitors remains unknown (Rust et al. 2011).

CLV can be constructed as cumulated CP. In general, cumulated CP is further discounted (Rust et al. 2011; Chang et al. 2012). The discount factor is based on retention rate, which is an estimate of future profits the customer will bring to the business (Reinartz and Kumar 2000; Chang et al. 2012). This corresponds to the above definition of CLV. Customer retention (and its correct construction) is very important due to its significant effect on business value (Gupta et al. 2004). However, authors do not agree on how retention rate should be calculated (Jain and Singh 2002), meaning that a number of models exist in this context (Blattberg et al. 2001; Gupta et al. 2004; Reinartz and Kumar 2000; Rust et al. 2000).

Factors affecting CLV

Due to the focus of this research, the following text will contain a list and characterisation of the empirical variables measured in commerce (especially e-commerce), which may potentially affect CLV.

In this context, an often-mentioned concept is customer loyalty. Its positive effect on CLV was found by a number of authors (Vankatesan and Kumar 2004; Liu and Shih 2005; Kumar et al. 2009; Singh et al. 2009; Zhang et al. 2010; Qi et al. 2012, 2015; Kumar and Pansari 2016; Barbu and Tiganoaia 2018; Sifa et al. 2018; Segarra-Moliner and Moliner-Tena 2022; Amendah et al. 2023). There are also studies which question the effect of loyalty on CLV. For example, Marshall (2010) disproved this effect in his study of factors affecting CLV, though this was not in the e-commerce context.

Loyalty is a deeply rooted commitment, where customers continue to purchase their preferred product or service in the future despite situational circumstances and competitors’ marketing activities aimed at influencing customers to change the vendor (Oliver 1999). Identifying and quantifying loyalty is a difficult endeavour. Scholars use both attitudinal and behavioural criteria in their attempts (Oliver 1999; Zeithaml 2000). Loyalty can be represented through behaviour as the number of repeated purchases or as the relative volume of purchases (Tellis 1988); and through attitude as intention for repeated purchases (Ganesh et al. 2000), intention to recommend to others (Mattila 2001), likelihood of switching vendor, and likelihood of another purchase (Selnes and Gonhaug 2000). In this paper, we represent loyalty through the number of repeated purchases i.e. behaviourally.

Another factor of CLV was identified by Chiang and Yang (2018), who assert that CLV is affected by customer characteristics such as openness, enthusiasm, and peacefulness. Benefits offered by the vendor, credibility, and customer satisfaction further increase CLV and should be an area of primary focus of businesses (Dandis et al. 2022). According to Amendaha et al. (2023), the time that passed since the purchase also positively affects CLV, where an analysis of this metric allows predicting CLV development and subsequently allocating the resources necessary for marketing activities aimed at relevant customers.

Donkers et al. (2007), Petersen et al. (2009), and Yoseph et al. (2019) proved a positive effect of RFM factors—recency (time passed since last purchase), frequency (number of purchases), and monetary value (sum of sales to the customer). They also emphasise that focussing only on customer retention is not enough and recommend using multiple variables. Their study was done on data from a company in the insurance sector, which means that it is based on contract relationships with customers.

Further factors may include information about the path of access i.e. the channel which brought the customer to the e-shop. The ability to predict such variables was described by for example Chamberlain et al. (2017) or Jašek et al. (2019), who included information about marketing channels involved in the purchase process (search engines, product compare engines, e-mailing, social media, etc.) into their model in addition to date of the purchase, number of purchased products, and sales. The positive effect of search engines (specifically Google) and related paid search campaigns are presented by Chan et al. (2011). Customers who access the e-shop through this channel have the highest CLV, as they are in a situation where they have purchase intentions, which quickly translates into a purchase. This lowers transaction costs for the business compared to other acquisition channels. Internet search engines may also increase the effectiveness of marketing tools used on other platforms such as Facebook, which would otherwise be little to none (Šrámeková and Vilhanová 2012).

Access through directly inputting the URL into a web browser (direct access) is relatively seldom used by customers in e-commerce (Lostakova et al. 2012). Here, knowing the internet address is key. Sales per customer obtained through direct access is lower than through Google, but higher than through Facebook and other channels (Šrámeková and Vilhanová 2012).

Specifics of the e-commerce environment

One of the main specifics of the e-commerce environment is comfort (Rita et al. 2019) and time savings (Andrews and Currim 2004). Authors also claim that the online environment provides consumers with quick and easy access to information about products, allowing them to compare and decide which product best adheres to their preferences (Alba et al. 1997; Degeratu et al. 2000; Kaur and Kochar 2018). Furthermore, consumers may choose from products from all over the world (Brynjolfsson et al 2011) and face significantly lesser barriers when switching vendors (Burnham et al. 2003; Mutum et al. 2014; Yen 2011). Put simply, customers are one click of the mouse away from visiting a different vendor and making their purchase there (Grommans et al. 2001).

Another specialty of the e-commerce environment lies in the trust that customers put in vendors (Liu et al. 2013; Mittal 2013). Transactions are long distance and customers cannot touch the products or try them out, which increases uncertainty (Horvath et al. 2021) and creates certain purchase-related risks (Ingaldi and Brožova 2020), such as when providing personal information. This increases the importance of trust which the vendor needs to gain from the customer (Chetioui et al. 2021), otherwise the customer will simply make a purchase elsewhere (Reichheld and Schefter 2000).

Available technology provides online stores with greater opportunities to collect personal data about customers and their behaviour. Vendors may use these data for marketing purposes, to adjust supply, or to optimise the purchasing process based on customer needs (Song and Liu 2020).

Materials and methods

Statistical methods

Multilevel models, also referred to as hierarchical models or mixed-effects models, have emerged as a robust and flexible framework for analysing complex data structures characterised by multiple levels of nesting. These models serve as an extension of traditional regression models, particularly valuable when dealing with data that exhibit a hierarchical organisation, such as individuals nested within groups or repeated measurements over time.

One of the distinguishing features of multilevel models is their ability to account for the variability in coefficients across different levels of the hierarchy. Unlike fixed-effect models, where coefficients remain constant across all groups, multilevel models allow for the estimation of varying intercepts and slopes for explanatory variables. This adaptive approach enables different groups may exhibit distinct relationships between variables. In the context of varying-intercept models applied to grouped data, the model postulates the unique intercept within each group.

In our study, we utilised a three-level nested framework to structure our analysis. At the first level, we focussed on transactions as the basic units of our examination. However, a single customer could generate multiple transactions across time. To reflect this hierarchical nature of our data, we introduced a second hierarchical level into our model. Thus, the model operates at the first level with transactions and at the second level with customers. However, the presence of transactions from four distinct e-shops adds an additional layer of complexity. To account for the variability stemming from these different e-shop platforms, we introduced a third level into our framework. This third level corresponds to the diverse e-shop environments. In essence, our three-level framework accommodates the hierarchy of transactions within customers and further incorporates the influence of various e-shop platforms.

The explanatory variables such as session_duration, fb_cpc, google_organic, and others are exclusively linked with transactions. For that reason, explanatory variables are incorporated into the first level of the hierarchy only and our final model is a three-level hierarchical model with varying intercepts and fixed slopes of explanatory variables at the first level of the hierarchy.

We can express the model using the following maths notation, where indices i refers to transactions, j refers to customers, and k refers to e-shops. Y represents the outcome variable CLV, X represents a vector of explanatory variables and letters \(\varepsilon\), u, v represent random effects at particular levels.

Level 1 (transaction level)

  • Parameters at this level are denoted by \(\beta\) letters.

  • The explanatory variables X are included exclusively at this level, intercept will be random, slope fixed.

    $$Y_{ijk} = \beta_{0jk} + \beta_{1jk} X_{ijk} + \varepsilon_{ijk}$$

Level 2 (customer level)

  • Parameters at this level are denoted by \(\gamma\) letters.

  • There are no explanatory variables at this level.

    $$\begin{gathered} \beta_{0jk} = \gamma_{00k} + u_{0jk} \hfill \\ \beta_{1jk} = \gamma_{10k} \hfill \\ \end{gathered}$$

Level 3 (E-shop level)

  • Parameters at this level are denoted by \(\vartheta\) letters.

  • There are no explanatory variables at this level.

    $$\begin{gathered} \gamma_{00k} = \vartheta_{000} + \upsilon_{00k} \hfill \\ \gamma_{10k} = \vartheta_{100} \hfill \\ \end{gathered}$$

Thus, the three-level model with explanatory variables at the first level with random intercept and fixed slopes has the following form:

$$Y_{ijk} = \vartheta_{000} + \upsilon_{00k}+u_{0jk}+\vartheta_{100}\cdot X_{ijk} +\varepsilon_{ijk}.$$

In general, multilevel models can extend various regression frameworks, including classical linear regression, generalised linear models (GLMs) for non-normal data distributions, or logistic regression for binary outcomes. For more information about multilevel models, see Finch et al. (2019).

Technically, parameter estimates were computed via the lme function from R library nlme.

The sample

The sample consists of data from four SOEFIOR e-shops. These businesses are based on the rented e-shop platform Shoptet. They generate a yearly turnover of under 2 million CZK (approx. 83 thousand EUR) with approx. 2000 customers, employ fewer than 10 employees and have been on the market for less than 5 years. None of them have a physical point of sale, nor do they produce their own products. They keep a stock of each product they offer. They do not include investors, subsidies, lenders, or other third parties in their financing, meaning they finance their ventures with internal resources.

The e-shops were deliberately chosen to maximise comparability. They share the same operational model and cater to shopping goods based on Porter (1974) or heterogeneous pre-selection products according to Parshukov et al. (2020), targeting a similar customer base despite offering varying product selections.

Each of the e-shops uses Google Ads, Seznam, Facebook, Instagram and product compare engines to acquire customers, and Google Analytics to collect and analyse data. The e-shops do not bind customers using any form of membership or continuous contract-based relationship. Customers may make their purchase at any time, in any amount, with any frequency.

The sample includes 5267 customers, 7254 transactions, and 10,404,406 CZK (approx. 433 thousand EUR) in revenue. All data were collected during the period after the end of the restrictions associated with the Covid-19 pandemic between May 2022 and May 2023. We calculated CLV for each individual customer and for each e-shop (see Table 1).

Table 1 Characteristics of the sample of SOEFIOR e-shops

For each e-shop, we calculated average purchase frequency per customer (APF—number of purchases in the e-shop divided by number of customers), average purchase value (APV—total value of all purchases within the timeframe divided by number of purchases), and average CLV of the top and bottom 10% of customers ranked by their total CLV, see Table 2.

Table 2 Average value of select variables in SOEFIOR e-shops

We then calculated the average CLV for customers in groups based on the number of purchases per customer. There were three groups of customers with one, two, and three and more transactions, see Table 3.

Table 3 Average CLV based on number of transactions in SOEFIOR e-shops

Variable definition and research question construction

The model includes a total of 19 explanatory variables, of which 18 describe customer purchasing behaviour during transactions and 1 represents cumulative number of customer transactions; CLV is the dependent variable. The variables are constructed from the data obtained by Google Analytics and are standard, in line with the constructs reported in the literature (see below). The number and structure of these variables are limited only by the amount of usable data that companies collect on customers.

The variables used are commonly used in marketing research, as evidenced by, among others, selected studies that we have mapped to specific variables for clarity: device_category (Morisada et al. 2019; Wang et al. 2015), days_to transaction (Yılmaz Benk et al. 2022; Yen 2011; Moe 2003), session_duration (Jabr et al. 2023), source of visit (Jasek 2019; Matthews 2016; Chana et al. 2011), landing page (Viswanathan and Swaminathan 2017; Kantola 2014; Ash et al. 2012; Gofman et al. 2009), cum_transactions (Kwarteng et al. 2017; Kvíčala and Starzyczná 2021; Saprikis 2021).

The model, developed based on the relationships explored in the aforementioned research questions, aims to address the main research gap: understanding CLV through both financial and non-financial data. Research questions 1–4 and their respective variables are directed towards comprehending customer behaviour and CLV, while questions 5–6 and their variables aim to elucidate resource allocation within firms using empirical data.

As a behavioural variable, CLV is constructed using data of customer transactions. The basic model of CLV is described in Esmaeili and Tarokh (2013), who accounted for the discount rate in their model. Discount rate was not included in CLV calculation by for example Tarokh and Sekhavat (2006) or Nikkhahan et al. (2011). Leaving the discount rate out is more appropriate in cases where future discount rates are uncertain. Average retention rate may also be included in the calculation as an indicator of the probability that the customer will make a purchase in the future (Fader and Hardie 2010; Kahreh et al. 2014; Jin 2023).

CLV, our dependent variable, can be calculated using various methods (Borle et al. 2008; Singh and Jain 2013; AboElHamd et al. 2020). One of these methods includes volume of revenues from a customer as well as margin and retention rate (Berger and Nasr 1998; Kumar and George 2007; Chang et al. 2012). We will use this method in our research. The method is as follows:

$${\text{CLV}}_{ij} = R_{i} \cdot {\text{GM}}_{j} \cdot \left( {1 + r_{j} } \right)$$

CLVij is the customer lifetime value related to the i-th transaction (within the whole sample of all transactions) in j-th e-shop; Ri represents revenues per i-th transaction; GMj represents gross margin of j-th e-shop; and rj is retention rate of j-th e-shop.

The calculation of CLV is thus a multiplication of revenues related to a customer’s transaction, and margin of the e-shop, which includes costs related to marketing, purchase of merchandise, and the transaction itself. This multiplication is then further multiplied by retention rate of the e-shop, increased by one unit.

The following independent variables were obtained from outputs provided by the Google Analytics service. As such, the metrics are defined based on this tool (for more details see https://ga-dev-tools.google/dimensions-metrics-explorer/).

The first investigated variable is the cumulative number of transactions (cum_transactions), which represents loyalty (in the behavioural sense, see above). Studies have confirmed a positive effect of behavioural loyalty on CLV (Sifa et al. 2018; Yoo et al. 2020) and on financial performance (Phan et al. 2020), which is closely related to CLV. It is therefore reasonable to expect that behavioural loyalty will also have a positive effect on CLV of the investigated SOEFIOR e-shops, which leads to the following research question:

RQ1:

Does behavioural loyalty exhibit a positive association with Customer Lifetime Value?

The next investigated variable is device_category. We distinguish between three categories of this variable based on the device used by the customer to make a purchase. These are mobile, tablet, and desktop, with desktop serving as the reference level (Wang et al. 2015). Communication through multiple channels (omni-channel) is a key priority currently which affects business performance (Melero et al. 2016) and needs to be optimised (Verhoef et al. 2015). It is thus reasonable to conclude that the device_category variable may have an effect on CLV. That being the case, it would be appropriate to also find the magnitude and direction of the effects that categories of this variable have on CLV compared to its reference level. Studies by Wang et al. (2015) and Morisada et al. (2019) suggest that using mobile (tablet as well in the case of Morisada et al. 2019) should increase CLV compared to using desktop SOEFIOR e-shops. This leads to two research questions:

RQ2a:

Does purchasing from a mobile device predict higher Customer Lifetime Value compared to purchasing from a desktop?

RQ2b:

Does purchasing from a tablet device predict higher Customer Lifetime Value compared to purchasing from a desktop?

The variable days_to transaction represents the delay between a customer first accessing the website and a purchase being made, measured in days. If the purchase happened on the day of the first visit, the value is 0, if on the next day, the value is 2, etc. We included this variable based on inspiration from Moe (2003) and Yılmaz Benk et al. (2022). The effect of days to transaction is generally considered as rather negative (Yen 2011), which confirms how easy it is to switch e-shops (Lee 2001; Mutum et al. 2014). It is therefore reasonable to expect a negative effect of days to transaction on CLV of SOEFIOR e-shops, leading to the following research question:

RQ3:

Does Customer Lifetime Value exhibit a negative association with the value of “Days_to_transaction”?

The next variable is session_duration. This represents the average time spent by the customer on the website, during access instances preceding a transaction. It is measured in seconds (Venkatesan and Kumara 2004). The duration of the visit has a positive effect on repeat visits to the website and purchases (Jabr et al. 2023), which leads us to expect its positive effect on CLV. This leads to the research question:

RQ4: Is there a positive association between the amount of time a customer spends on the website during a single visit to an e-shop and Customer Lifetime Value?

The source of visit variable represents the channel which brought the customer to the e-shop. This variable has 8 levels, with google_cpc as the reference level. We distinguish between paid and unpaid channels, which is in line with the previous studies, such as those by Chan et al. (2011), Matthews (2016), or Jašek (2015). The paid channels in this research include the following:

  • google_cpc—paid Google campaigns, reference variable

  • senznam_cpc—paid campaigns on the search engine Seznam

  • fb_cpc—paid campaigns on Facebook

  • Compare_engines—paid campaigns on product compare engines.

The unpaid channels in this research include the following:

  • google_organic—Google searching, where the customer clicks on an organic search result

  • seznam_organic—Seznam searching

  • direct—direct input of the URL address in a web browser

  • other_medium—other unpaid channels

Search engines seznam.cz and google.cz are one of the most used search engines in the Czech Republic (Kopecký et al. 2020; Vondráčková 2020) and are actively used by the surveyed companies. We chose google_cpc as the reference variable because previous studies have shown its effect to be positive and the most significant (Chan et al. 2011). This influence increases when other channels (e.g. Facebook) are used in interaction with google_cpc (Šrámeková and Vilhanová, 2012). Based on these studies (Chan et al. 2011; Šrámeková and Vilhanová 2012), it is reasonable to expect a positive effect of the channels mentioned above on CLV. This leads to the following partial research questions:

RQ5a:

Does running a paid campaign on the Seznam search engine predict a lower Customer Lifetime Value (CLV) compared to running a paid campaign on Google?

RQ5b:

Does running a paid campaign on Facebook predict a lower Customer Lifetime Value (CLV) compared to running a paid campaign on Google?

RQ5c:

Does running a paid campaign on product compare engine predict a lower Customer Lifetime Value (CLV) compared to running a paid campaign on Google?

RQ5d:

Does an unpaid search on Google (google_organic) predict a lower Customer Lifetime Value compared to running a paid campaign on Google?

RQ5e:

Does an unpaid search on the Seznam search engine (seznam_organic) predict a lower Customer Lifetime Value compared to running a paid campaign on Google?

RQ5f:

Does entering the URL directly predict a lower Customer Lifetime Value compared to running a paid campaign on Google?

RQ5g:

Do other typically unpaid channels predict a lower Customer Lifetime Value compared to running a paid campaign on Google?

In digital marketing, a landing page is a standalone web page created for the purposes of a marketing or advertising campaign. It is the place where the user “lands” after clicking on a link in an e-mail or an ad from Google, YouTube, Facebook, Instagram, or similar places on the web (Korniichuk and Boryczka 2021). The relationship between CLV and specific landing page could be moderated by the performance of chosen landing page expressed either by its link to conversion rate when leading customers to landing page with content positively influencing conversion rate (Brecht et al. 2021; Korniichuk and Boryczka 2021), to the performance of advertising (Angeloni and Rossi 2021), or to profitability of the used resources which is increased by referring customers to better performing landing pages (Dobrakowski et al. 2022; Almestarihi et al. 2024).

The variable landing page is also categorical with 5 levels. The reference level is the home page (hp)—the main page of the current e-shop (Croitoru et al. 2014; Brecht et al. 2021). The landing_page variable has been used in marketing research by e.g. Gofman et al. (2009), Ash et al. (2012), Kantola (2014) and Viswanathan and Swaminathan (2017), who argue that the first page accessed by the customer has an effect on the subsequent purchasing process, where it can increase or decrease the likelihood of conversion, and effectiveness of spent resources. This may translate into an effect on CLV. In our research, landing_page has the following levels:

  • hp—the home page, reference variable.

  • product—the page containing details about the product.

  • basket—the shopping basket, for cases such as the customer interrupting their purchase and later coming back directly to the basket.

  • category—page containing the selected product category

  • other—other pages of the e-shop, such as blog or contacts

The above authors (Gofman et al. 2009; Ash et al. 2012; Kantola 2014; Viswanathan and Swaminathan 2017) also asserted that a specialised web page which corresponds to the current needs and preferences of the customer, or the phase of the purchase process, has a positive effect on realisation of the desired customer action, which can include a purchase, registration, or providing contact information. Guiding the customer to a specific landing page can therefore be considered more effective in comparison to guiding them to the home page, which by its nature contains general information. This leads to the following research questions:

RQ6a:

Does landing from the product detail page predict a higher Customer Lifetime Value compared to arriving from the home page?

RQ6b:

Does landing from the basket page predict a higher Customer Lifetime Value compared to arriving from the home page?

RQ6c:

Does landing from a page containing a select product category predict a higher Customer Lifetime Value compared to arriving from the home page?

RQ6d:

Does landing from other pages of the website predict a higher Customer Lifetime Value compared to arriving from the home page?

The research questions RQ1-RQ6d will be addressed through the estimated coefficients obtained from hierarchical regression models. Our research questions were posed with specific directions (positive or negative associations), guiding our interpretation of p-values to align with these directions. We set the significance level to 5% for our analysis.

Results

The structure of our dataset exhibits a hierarchical nature (transactions at the first level, customers at the second level, and e-shops at the third level), thus prompting consideration of hierarchical modelling. The key inquiry concerns whether this hierarchy enhances model performance.

Even when considering only the second level of customers, a two-level hierarchical model of CLV without explanatory variables and with only a random intercept yielded an intra-class correlation coefficient (ICC) of 0.292.

$${\text{ICC}} = {{\sigma_{u}^{2} } \mathord{\left/ {\vphantom {{\sigma_{u}^{2} } {\left( {\sigma_{u}^{2} + \sigma_{\varepsilon }^{2} } \right)}}} \right. \kern-0pt} {\left( {\sigma_{u}^{2} + \sigma_{\varepsilon }^{2} } \right)}} = {{\left( {981.252^{2} } \right)} \mathord{\left/ {\vphantom {{\left( {981.252^{2} } \right)} {\left( {981.25^{2} + 1528.28^{2} } \right)}}} \right. \kern-0pt} {\left( {981.25^{2} + 1528.28^{2} } \right)}} = 0.29$$

This indicates that more than 29% of the variance in CLV is attributed to the second level of the model. This underscores the rationale for employing a hierarchical approach and elucidates why the classical regression model based solely on transactions (the first level) is inadequate. However, estimate of the variance among transactions \(\sigma_{\varepsilon }^{2} = 1528.28^{2}\) indicates a substantial amount of unexplained variability, prompting our endeavour to enhance the model's explanatory capacity. Thus, we subsequently incorporated a third level of e-shops along with additional explanatory variables.

The full three-level hierarchical model of CLV with varying intercepts and fixed slopes of explanatory variables at the first level of the hierarchy resulted in following estimates of standard deviations: estimate of standard deviation at the e-shop level is \(\sigma_{\upsilon } = 480.7\), estimate of standard deviation at the customer level is \(\sigma_{u} = 719.6\), and estimate of standard deviation at the transaction level is \(\sigma_{\varepsilon } = 1053.5\). The estimates of standard deviations at all three levels are approximately the same order of magnitude. However, the standard deviation at the transaction level still remains partially unexplained, which can be attributed to the natural variance within the data. In Table 4, there are measures of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) showing that the best (smallest) value achieves our full model.

Table 4 Comparison of individual models quality

The second-best model is a classical non-hierarchical regression model with explanatory variables. To assess its performance against our full model and determine whether the full model demonstrates significant improvement, we conducted a likelihood ratio test. The resulting test statistic of 3607.7 (df1 = 20, df2 = 18) with a p-value of 0.00 confirmed that the full three-level model, incorporating random intercepts and explanatory variables at the first level, is significantly better than the classical non-hierarchical regression model.

Following the confirmation that the full three-level model outperformed the classical regression model, we will proceed to interpret its explanatory variables in further detail, with the parameter estimates listed in Table 5.

Table 5 Variables predicting customer lifetime value

Variables marked in bold indicate significance based on the two-sided p-value criterion. However, it is important to note that in the context of the directional research questions, p-values may hold even greater significance.

The cumulative number of transactions (cum_transactions) predicts the highest increase in CLV. The positive estimate indicates that as the number of transactions by a customer increases, their CLV increases, with each transaction contributing an additional 747.7 CZK (31.2 EUR) to CLV. The answer to RQ1 confirms a positive association between behavioural loyalty and CLV.

Another variable that significantly positively associates with CLV is accessing the e-shop’s website directly (direct), compared to paid campaigns on Google (google_cpc). We observed that the association of direct access with CLV is notably stronger. Direct access contributes 220.4 CZK (9.2 EUR) more than access through paid Google campaigns. This suggests that the association of direct access to the e-shop with CLV contradicts previous research findings (e.g. Chan et al. 2011). The answer to RQ5f is that direct URL input does not predict lower CLV than a paid campaign on Google.

Days_to_transaction exhibits an association with CLV that contradicts the previous research findings (e.g. Yen 2011). We observed that the longer the time from first access until transaction (in days), the higher the CLV. With each day preceding a transaction, CLV increases by 5.5 CZK (0.23 EUR). This implies that the answer to RQ3 is that CLV is not negatively associated with the value of “Days_to_transaction”.

The variable session_duration predicts the smallest increase in CLV that is still statistically significant. The longer the duration, the higher the CLV. Each second spent on the website increases CLV by 0.11 CZK (0.005 EUR).

The answer to RQ4 is that time spent on the website by the customer within a single e-shop visit is positively associated with CLV.

We found that three variables are significantly negatively associated with CLV. One of them is accessing the e-shop through Facebook (fb_cpc) compared to access through paid Google campaigns (google_cpc). Accessing the e-shop through Facebook increases CLV by 136.2 CZK less than access through Google. The answer to RQ5b is that a paid campaign on Facebook predicts a lower CLV than a paid campaign on Google.

Within the variable device_category, accessing the e-shop on mobile (device_categorymobile) predicts a smaller CLV compared to the reference level, accessing the e-shop on desktop (device_categorydesktop). Transactions made with mobile access show a CLV that is 82.8 CZK (3.5 EUR) lower than transactions made with desktop access. The results show a different direction than previous research asserted (Wang et al. 2015). Thus, the answer to RQ2a is that we did not find evidence to support the notion that higher CLV is associated with mobile access rather than desktop access.

The final variable that demonstrates a prediction of decreasing CLV is directly landing on the product detail page of the e-shop (product) compared to landing on the e-shop’s homepage. The results suggest that landing on the product details page directly leads to lower CLV than landing on the e-shop’s home page, by 88 CZK (3.7 EUR) and show a different direction than previous research asserted (Ash et al. 2012, p. 29). Thus, the influence of leading customer to a specific landing page on resulting value of CLV suggested by previous studies demonstrating its relationship with conversion rate (Brecht et al. 2021, Korniichuk and Boryczka 2021) or profitability of allocated resources (Angeloni and Rossi 2021; Dobrakowski et al. 2022; Almestarihi et al. 2024) which is subsequently moderating the value of CLV was supported.

As such, the answers to RQ6a to RQ6d are therefore as follows:

We did not find evidence that landing from the product detail page, basket page, page containing a select product category, or other pages of the web leads to higher CLV than the home page.

Variables included in research questions RQ5a, RQ5c, RQ5d, RQ5e, RQ5g, which were related to various marketing channels (compared to google_cpc), were not statistically significant in their association with CLV.

Despite failing to answer several research questions generated based on literature, we succeeded in identifying some surprising associations between the investigated variables. This led us to create a graphical model, visualised in Fig. 1, which contains all the (two-sided p-value) statistically significant associations that we have found.

Fig. 1
figure 1

The final model of factors associating CLV of SOEFIOR e-shops

Discussion

Common issue in the marketing research of e-commerce environment is the lack of data. The amount of data provided by companies is highly limited which is evident by the number of companies involved in existing studies. Authors analysed current literature to find out how many companies are included in the previous research. Most of the analysed studies involved just one company (Qi et al. 2015; Safari et al. 2016; Vanderveld et al. 2016; Chamberlain et al. 2017; Chiang a Yang 2018; Xue et al. 2021; Hoang et al. 2022; Kumaran et al. 2022; Yılmaz Benk et al. 2022; Dandis et al. 2023), some studies compared data from 3 different companies (Dachyar et al. 2019; Segarra‑Moliner and Moliner‑Tena 2022) and just one study included 6 different companies (Jašek et al. 2018). Knowing this, our dataset containing four different e-shops is larger than most of the previous studies.

E-commerce has significantly altered the landscape for SMEs, enhancing their competitiveness against larger firms, thus reshaping the dynamics of the entire market (He 2021). Consequently, e-shops in the SOEFIOR category can now vie with larger enterprises. The available technologies in e-commerce offer advanced opportunities for gathering customer data and understanding their behaviour. Retailers can leverage this information for marketing activities, customising their offerings, and optimising the shopping experience to meet customer needs (Song and Liu 2020). Customers now have a plethora of options when selecting a retailer and comparing offers. Therefore, it is paramount to comprehensively understand the buying process and the multitude of factors influencing it (Luo 2022). Our research aims to identify variables crucial for understanding the buying process and customer needs, including those significant for predicting Customer Lifetime Value (CLV). Through the analysis of available data and the formulation of appropriate conclusions, the capacity to promptly respond to current trends and adapt product design accordingly is significantly bolstered. This ensures an adequate depth and breadth of product range, competitive pricing, effective distribution methods, and impactful marketing communication strategies—all elements of the marketing mix (He 2021).

The positive effect of number of transactions on CLV has been proven by the following studies: Foster et al. (1996), Gurǎu and Ranchhod (2002), Jašek et al. (2018), Sifa et al. (2018), while Bult and Wansbeek (1995), Zhang et al. (2010) are more careful in their conclusions and recommend using experiment to test causal relationships. The association between CLV and customer loyalty (measured by number of repeat purchases) can thus be confirmed i.e. we confirm a very strong association of customer loyalty on CLV, including in the context of SOEFIOR e-shops.

We found that accessing the e-shop’s website by directly inputting the URL into a web browser (direct access) predicted higher value significantly more than access through a paid campaign on Google. Direct access reduces acquisition cost, as in this case the customer arrives “for free”, their arrival through this channel does not cost the e-shop any resources. It should be noted, however, that direct access must be preceded by some contact or communication with the e-shop. The channel requires the customer to input the URL, which means that either they need to have known it already, or the address must have appeared in “suggested searches”, meaning the customer must have accessed the e-shop beforehand.

This finding of a positive effect of direct access on CLV is in contrast to findings of Chan et al. (2011), who consider Google to predict higher CLV compared to other channels. We also contradict Šrámeková and Vilhanová (2012) who add that access through Google predicts higher revenues compared to direct access. This channel is also used significantly less by customers than other channels, which is in accordance with Lostakova et al. (2012), who adds that the reason for this is customers’ use of product compare engines and Google itself. The results of our research correspond to findings of Matthews (2016), who found that direct access is a more effective marketing channel than Google regarding conversion volume.

Compared to google_cpc, facebook_cpc has a negative influence on CLV, meaning businesses should focus primarily on acquisition through google_cpc. In practice, however, each channel fulfils its own role in the purchase process. The primary purpose of facebook_cpc is to reach out to a target audience based on select characteristics (Semerádová and Weinlich 2019; Yang et al. 2019; Zulaikha et al. 2020), where a customer is contacted, brought to the e-shop and introduced to products without necessarily having an intention to purchase. Google_cpc, on the other hand, is search-based, meaning that the potential customer searches using phrases and keywords related to the products available in the e-shop. In these cases, the potential customer has a purchase need i.e. is ready to make a purchase, which increases their willingness to spend (Chan et al. 2011).

The positive effect of session duration on CLV is in accordance with the results of Venkatesan and Kumar (2004), Shäfer and Kummer (2013) or Kukkonen (2018), though Kumar et al. (2004) argue that this association is not clear-cut i.e. it may not be a causal relationship, where longer stay on the website will increase CLV. Moreover, Venkatesan and Kumar (2004), and Kukkonen (2018) do not consider this association to be a direct one, since in their view this factor has a positive association with the value of the e-shop’s visitor, which in turn has a positive association with CLV. Our research revealed a direct, though weak, association in this case. This may be due to the fact that a longer session does not lead to greater CLV for each customer. On the other hand, if the e-shop cannot or does not wish to segment customers based on this criterion, keeping the customer on the website for longer may be a better strategy, since it is more likely to increase CLV than other strategies.

If the e-shop engages in remarketing activities, targeting visitors who have yet to make a transaction, it generates costs which are not included in the CLV calculation, meaning that the effect of days to transaction and session duration may even be negative (Yen 2011). It is also the case that the online environment allows customers to leave the e-shop with a single click of the mouse (Gommans et al. 2001; Lee 2006), which interrupts the purchase process and reduces the likelihood of a purchase being made, reducing CLV. These are further reasons for the weak positive associations between these variables and CLV which we found. They indicate different customer behaviour and appropriateness of segmentation. On the other hand, we confirm that increasing days to transaction and session duration predict higher CLV, making it desirable to keep the values of these variables as high as possible. Keeping the customer on the e-shop’s website for as long as possible through marketing activities remains a universal strategy.

We confirm that the type of a consumer’s visit is associated by marketing channels as well as their session and purchase history with the e-shop (Pallant et al. 2017). Our research confirms that purchases being made (CLV and its growth) are associated by longer sessions, in addition to the effects of direct access (which has a positive association compared to the reference variable google_cpc) and of accessing the e-shop through paid campaigns on Facebook (which has a negative association compared to the reference variable google_cpc).

We found that using the e-shop on mobile has a worse association with CLV than using it on desktop, which is in contrast with the findings of Wang et al. (2015) and Morisada et al. (2019). Thus, if the customer is ready to make a purchase, it is more desirable that the purchase is made on desktop. This does not, however, mean that it is necessary to acquire customers using only this category of devices. In the case where a customer visits the e-shop more than once, the sessions preceding the one where a purchase was made may be on mobile. This study did not examine the effectiveness of device categories in pre-purchase sessions.

A customer beginning their purchase session by landing on the product detail page of the e-shop predicts lower CLV compared to landing on the home page. This is in contrast with the findings of select authors (Gofman et al. 2009; Ash et al. 2012; Kantola 2014; Viswanathan and Swaminathan 2017). Our findings suggest that bringing the customer to the home page may be better for CLV development than bringing them to the product detail page, with no significant difference compared to other pages. The product detail page predicts lower CLV than the home page makes the home page an appropriate primary destination to which e-shops bring customers through their online marketing campaign channels.

In the e-commerce research, Wang et al. (2022) explored variables including stream duration which is similar to our session_duration, indicating the presence of potential confounding variables that may not be fully identified. This underscores the reality that even with extensive datasets, confounding variables such as customer characteristics and attitudes can influence outcomes and remain challenging to fully ascertain, as also highlighted by Bhattacharyya and Bose (2020). Moreover, Song and Shin (2024) emphasise the vast array of potential confounding variables, urging researchers to recognise their potential impact and incorporate this acknowledgment into the research limitations and require specific conditions to be met.

Conclusion

In our efforts to examine CLV, we took a slightly different approach from other studies, focussing on external associations of the examined variables with CLV instead of focussing on incorporating variables into CLV calculation. We performed this research on three levels (transaction, customer, and e-shop), where transactions were the basic level (of customers in the timeframe). This research was also focussed on comparing levels of categorical variables in their prediction of CLV. We provided a visualisation of the model in the results section for the sake of greater clarity.

Of the quantitative variables, which were individually associated with CLV, behavioural loyalty (measured as number of transactions) had the greatest association, which is logical considering the way CLV is constructed. It is closely associated with the number of transactions through revenues, costs, and retention rate. Another variable associated with CLV is the number of days it takes to make a transaction, the more days, the greater the CLV. Session duration has a similar strong association with CLV.

When it comes to the categorical variable device category, we found that using the e-shop i.e. making a purchase on mobile predicts lower CLV compared to transactions made on desktop. The comparison of channels through which customers access e-shops led to the finding that accessing the e-shop by directly inputting the URL in a web browser predicts higher CLV compared to access through a paid campaign on Google (google_cpc), and access through a paid campaign on Facebook predicts lower CLV compared to access through a paid campaign on Google.

Examination of the other categorical variable landing page revealed that CLV is lower in cases where the customer lands on the product detail page compared to cases where they land on the e-shop’s home page.

Managerial implications

For businesses that operate similar e-shops, we would recommend they focus on maximising the number of transactions, which has a significantly positive association with CLV. Furthermore, they should focus on creating a situation where customers access their e-shop by directly inputting the URL, rather than through search engines such Google, through Facebook, or through similar platforms. For this to be possible, steps must be taken to ensure that customers know the e-shop’s address. Whatever channel customers use to access the e-shop, it is important that they land on the home page, not on the product detail page. It is no less important to keep the (potential) customer on the e-shop’s pages, whether it is in the context of session duration or time to transaction. It does not seem problematic if the customer thinks about the purchase and makes it only after some time passes, what matters is that they make the purchase in the select e-shop. Another possible recommendation is to make customers use the e-shop on desktop rather than mobile. It is reasonable to assume that comfort, ease of navigation, and ease of use probably play a role here (due to the larger display and the option to use a mouse).

From the perspective of days to transaction and session duration, it is better not to force the customer to make a quick transaction. On the contrary, the possibility of the customer abandoning the transaction process makes it desirable to allow the customer to think about and consider everything they need. This is due to the marketing-related costs associated with abandoned transaction processes, which are negatively associated with CLV.

Despite the difference in influence between cpc_google and cpc_facebook on CLV, we recommend using both, as it expands the range of the communication mix and allows reaching out to new potential customers, acquainting them with the products on offer and create or stimulate demand.

Businesses can prioritise device_categorydesktop over device_categorymobile by directing their marketing communication primarily towards desktop devices and excluding mobile devices from targeting altogether. Desktop versions of e-shops are more user-friendly, more transparent, and easier to control, making it a more positive shopping experience, resulting in greater satisfaction.

E-shops have a wide range of options when it comes to their landing page. In most situations, it is easy to make customers land on specific pages, be it online ads, newsletters, or search results. Based on our data, we recommend bringing customers to the e-shop’s home page, where they can look over the products on offer and decide where their next steps will lead.

Limitations

The research is limited by its focus on small and generally newly starting e-shops offering low-turnover products unavailable in physical stores, and resell products they keep in stock. The number of investigated e-shops is also small (although the e-shops are comparable and in this respect the sample is representative) at least at the population of those SOEFIOR e-shops which have similar descriptive statistics outlined in the “Sample” section), which is related to the unwillingness of businesses to provide their data for research purposes, although this number keeps growing and provides a majority share of currently available studies. The conclusions and recommendations are thus limited to a specific set of e-shops (SOEFIOR). In the future, it would be appropriate to expand the research set or to focus also on other types of e-shops, whether with convenience goods (food and non-food) or larger e-shops (in terms of number of employees, width of assortment, size of turnover, etc.).

Another limitation is the use of transactional data, which does not provide a measure of subjective purchase motives or customer satisfaction. Google Analytics also limits the research possibilities to a certain extent, as it is only able to deal with simple data related to either the customer, the e-shop, or the goods and it is not always possible to link this data, which further limits the number of useful variables. Another limitation of the research is its design and the statistical methods used, which do not provide the possibility of confirming and verifying causal relationships.

In our study, it is crucial to acknowledge the potential influence of confounding variables on the variables under examination, particularly when these effects are not explicitly accounted for. Confounding variables encompass a wide range of factors, including personal characteristics of customers, their shopping behaviour patterns, attitudes, preferences, current financial, or life situations, among others. However, obtaining data on these types of variables typically require a different research approach, which may not be readily feasible for companies. Consequently, the variables used in our analysis, being readily accessible to firms, may not fully capture the nuanced complexities introduced by confounders.