1 Introduction

Global digital ad spending is predicted to exceed 626 billion USD in 2023 (Cramer-Flood, 2023), emphasizing the role digital advertising holds for businesses. Advertisers have an array of marketing channels at their disposal to bolster their businesses. However, the effectiveness of each channel still needs to be determined by businesses and it requires resources (Bleier & Eisenbeiss, 2015).

While choosing marketing channels is critical, understanding the various traffic sources —how customers find and access websites—offers valuable insights (Wymbs, 2011). Users can visit websites through many traffic sources, such as organic search, referral traffic, and others. One of the most prominent web traffic sources is direct. Direct traffic refers to visitors who arrive at a website by directly entering the website’s URL into their browser or clicking on a bookmark, without being redirected from another website or digital platform. This type of traffic accounts for 55% of website traffic for online businesses globally (Bianchi, 2022). In addition, a study indicates that conversion rates (CVRs) from direct are higher than those from other sources (Holmes, 2023).

To navigate this complex environment, advertisers utilize different data strategies (Bhargava et al., 2020). Many of these strategies are provided by marketing technology, known as Martech. Martech consists of an ecosystem of over 8000 companies (Brinker & Baldwin, 2020). The predominant solution utilized by advertisers are digital analytics services (Baltes, 2017). Advertisers use similar solutions to define which media mix is ideal for their business (Berman, 2020). While digital analytics tools are widely utilized, there is still uncertainty about optimizing advertising spend across numerous channels, especially when a significant portion of web traffic comes via direct sources. Our study bridges this gap, examining the relationship between website traffic and purchases originating from direct traffic sources.

Using an observational method, we find a strong association between website visits from various channels and purchases from direct traffic sources. Although similar studies have used the same methodology, they have not focused on direct traffic sources, the largest source of website traffic globally. Furthermore, similar studies often overlook the model evaluation process, on which we elaborate in detail.

This research seeks to contribute to the body of literature by addressing the understanding of digital marketing channel dynamics, particularly their influence on direct traffic. It provides practical insights for marketing practitioners and managers, clarifying the interplay between channels and the prevalence of direct traffic. Our findings also explore reasons behind varying CVRs across channels. Academics might find our methods useful to test these insights and formulate new theories.

2 Theoretical background

The present section explores the literature surrounding marketing synergies, web attribution, and the consumer journey.

2.1 Marketing synergies and cross-media

Marketing synergies are the mutual enhancement of various marketing activities’ impacts (Naik & Peters, 2009). Marketing synergies have been a research subject for decades, as understanding the effects of various marketing components could lead to more efficient marketing strategies (Chang & Thorson, 2004; Havlena et al., 2007; Naik & Raman, 2003).

The interactive effects of various media can enhance the overall performance of campaigns and improve key performance indicators (KPIs), ultimately influencing consumers’ buying decisions (Chang & Thorson, 2004; Edell & Keller, 1989). In digital marketing, Naik and Raman (2003) state that understanding cross-media interactions is critical. They argue that ignoring such interactions can lead to inaccurate allocation of marketing resources. This echoes the findings of Havlena et al. (2007), who note that inconsistencies in consumer response across channels can complicate the task of managing marketing synergies. The synergy between website traffic (from various channels) and direct traffic sources has not been explored, leaving a gap in the current understanding of marketing synergies within the digital realm.

2.2 Web attribution

Web attribution is a fundamental component of digital marketing, offering insights into which channels, touchpoints, and actions lead to a desired outcome on a website. Kitts et al. (2010) elaborate on the relationship between T.V. and web transactions, pointing out the fundamental problem of web attribution. Adequate attribution allows marketers to understand how different marketing activities contribute to the consumer’s journey, leading to informed decision-making and resource allocation (Kannan et al., 2016). Attribution models have evolved from a single-touchpoint approach to multiple-touchpoint models, mainly due to the complexity of the digital landscape and consumer behavior (Verhoef et al., 2015). Earlier models attributed the entire outcome to just one interaction (Dalessandro et al., 2012). These models, however, need to consider the contribution of other interactions.

More recent attribution models are data driven. These models leverage advanced analytics and machine learning algorithms to dynamically allocate credit based on patterns and trends in the data (Shao & Li, 2011). A significant challenge in attribution is the issue of cross-platform and cross-device tracking (Brookman et al., 2017). Consumers often interact with brands across various devices and platforms, making it difficult to track their journey accurately. Privacy regulations and technical challenges further complicate this task (Lambrecht & Tucker, 2013). Measuring how one channel can impact the performance of another channel is a priority for advertisers (Anderl et al., 2016). Despite significant advancements in web attribution, there remains a gap in understanding the intricate relationships between channels. Our research aims to delve into this complex relationship.

2.3 Consumer journey

The consumer journey is the sequence of interactions or touchpoints a consumer has with a brand before reaching a specific outcome, such as purchasing (Hamilton & Price, 2019). This dynamic journey varies significantly across individuals, contexts, and industries, primarily influenced by available channels (Edelman, 2010). In the digital era, the consumer journey is reshaped by online channels, ranging from social media platforms and search engines to company websites and email campaigns (Neslin & Shankar, 2009). That change has also impacted users’ consumption (Malter et al., 2020).

Website traffic plays a significant role in this modern consumer journey. A website acts as a central hub that provides information about products or services and as a platform for transaction completion (Bucklin & Sismeiro, 2009). Among the available website traffic sources, direct sources are often associated with higher purchase intent, as they usually yield consumers who have already familiarized themselves with the brand (Kakalejčík et al., 2020). Our study contributes to understanding the impact of indirect digital marketing channels within the consumer journey, mainly how they influence direct traffic purchases.

3 Hypothesis development

In today’s digital landscape, understanding digital marketing channels and their influences on consumer behavior is pivotal. Some might argue that advertisements have minimal impact on conversions “but are likely to stimulate visits through other advertising channels” (Xu et al., 2014). Researchers have tried to build models and explain this dynamic behavior (Moe & Fader, 2004). To our knowledge, there is a gap in explaining the relationship between digital marketing channels and direct traffic sources.

This first hypothesis of our study is in contrast to previous studies, such as those by Plaza (2011), Omidvar et al. (2011), Milano et al. (2011), and Awichanirost and Phumchusri (2020), which primarily focused on different variables (both dependent and independent) but used a similar methodology and data. “Direct” traffic is characterized by visitors who reach a website without traceable referral information (Google Analytics Help Center, 2023; Skow, 2023). Studies (e.g., Kakalejčík et al., 2020) have underscored the role of direct traffic in shaping online sales but not concerning website traffic. Inspired by these insights, our study speculates a similar correlation:

H1: Website sessions initiated through digital marketing channels are positively associated with purchases generated via direct traffic sources.

While our first hypothesis involves the relationship between website visits and direct purchases, our second hypothesis is about the nature of this relationship. Existing research indicates that certain marketing aspects, such as branding, may positively impact the consumer journey (Oh et al., 2020). Milano et al. (2011) identify the effect of specific marketing channels on websites. Other researchers (Cui et al., 2022) prove the impact of mobile app usage on offline shopping store visits. Therefore, we go deeper into identifying which marketing channels have the most significant effect:

H2: Among the business channel mix, which marketing channels (website sessions) have the highest association with purchases from direct traffic sources?

4 Methods

Researchers employ experimental research methods to establish and understand causal relationships. However, implementing such methodologies can sometimes be difficult or impossible (Johnson, 2023). Therefore, nonexperimental approaches are essential. Due to the nature of this study, an experimental method was not feasible. We cannot randomize or control users visiting the website from all marketing channels for 6 months. Consequently, an observational study was developed to address the hypotheses.

4.1 Data

The data represent website performance from an e-commerce website operating in the European Union. The business requested to remain anonymous but shared access to their Google Analytics (G.A.) account with us. During the data collection, the website received website traffic from 85 countries (globally) and purchases were made from 38 of these. For this study, 89,394 purchases were analyzed from 17 European Union countries.

Although the business received orders from 38 countries, we decided to focus on Europe (17 countries), as the number of purchases outside Europe was insignificant.

We gathered time-series data using Google Analytics for 180 days from August 2022 to January 2023. Following the cleaning process, the final report consists of a traffic report with sources, medium breakdown, and a day-by-day analysis. The file contains 10,742 rows of data.

The final dataset consists of 7 columns. In the first one, we have Date. Then, we find the purchases from the direct traffic source (purchases.direct), which act as the response variable (y). The following 5 columns represent daily website sessions from digital traffic sources.

4.2 Model

Digital marketing researchers can benefit from linear regression models to establish relationships among variables (Leeflang & Wittink, 2000). By employing a linear regression model, we can quantify the extent to which variables such as sessions from marketing channels are associated with purchases from direct traffic sources. Therefore, the model is as follows:

$$y={\beta}_0+{\beta}_1\ {x}_1+{\beta}_2{x}_2+{\beta}_3{x}_3+\dots +{\beta}_p{x}_p+\varepsilon$$

where y is the response variable (direct purchases), x1,...,xp are the several channels performing as explanatory variables, βj are the regression coefficients, and ε is the random error, which is of zero expectation and is independent of the variables xj (James et al., 2013).

5 Results

Graph 1 shows the plots generated in R between the independent variables and several dependent variables. The plots showcase the observed relationship, and although further analysis is needed, there is a linear association between the variables.

Graph 1
figure 1

Dependent and independent variables (plot)

In total, 16 models were created and evaluated. The best-performing model, model 10, offers the highest R-squared (0.5344), and all coefficients are statistically significant (0.001). As we will see in the sequel, model 10 consists of 3 variables. According to the results, website traffic from Google paid, price comparison engines, and email marketing activities have the power to describe purchases from direct traffic sources. The final model takes the form:

$$\textrm{Direct}\ \textrm{Purchases}={\beta}_0+{\beta}_1\textrm{Google}\ \textrm{Paid}+{\beta}_2\textrm{Price}\ \textrm{Comparison}+{\beta}_3\textrm{Email}+\varepsilon$$

The statistically significant relationship between direct purchases and our independent variables is still evident upon adjusting the initial model’s variables by a scaling factor of 100 to facilitate the interpretation of the coefficients (James et al., 2013). At −24.24, the intercept remains not statistically significant (t = −1.277, p = 0.203). However, all other predictors retain their statistical significance, with their coefficients now representing the change in direct purchases per 100 unit increase in each of the respective session types: “google_paid_sessions” (β = 3.10, t = 4.789, p < 0.001), “price_aggregators_sessions” (β = 13.36, t = 10.75, p < 0.001), and “email_sessions” (β = 10.13, t = 4.772, p < 0.001).

An increase of 100 sessions in “google_paid_sessions” results in 3.1 more direct purchases, a 100-unit increase in “price_comparison” results in 13.36 more direct purchases, and a 100-unit increase in “email_sessions” results in 10.13 more direct purchases. Moreover, the model’s R-squared value is 0.614, indicating that approximately 61% of the variance in direct purchases can be explained by our model. The adjusted R-squared per variable at 0.6072 suggests that each predictor contributes significantly. Finally, the overall significance of the model was confirmed by a highly significant F-statistic (F = 90.16, p < 2.2e-16), reinforcing the robustness of these predictors and their scaled coefficients in explaining direct purchases.

Leveraging the linear regression model and the adjusted proportions of daily average sessions from each channel, we can estimate the collective correlation of the predictors on direct purchases. Specifically, the website receives an average of 4510 daily sessions (“google_paid_sessions,” 3157; “price_comparison_sessions,” 631.4; and “email_sessions,” 721.6 sessions). We can calculate the predicted number of daily direct purchases attributable to these channels by applying the respective scaled coefficients. This equates to an estimated total of (3157/100)*3.1027 (from “google_paid_sessions”) + (631.4/100)*13.3616 (from “price_comparison_sessions”) + (721.6/100)*10.1282 (from “email_sessions”), resulting in 98, 84, and 73 direct purchases, respectively. Hence, the three marketing channels jointly describe approximately 255 direct purchases per day.

These results could be attributed to a specific consumer behavior pattern observed in digital marketing: users initially exposed to an advertisement via one of these three marketing channels often return to the website days later through a direct traffic source to complete a purchase. This behavior significantly underpins the regression coefficients observed in our model, illustrating the influential role these three marketing channels play in direct purchases.

Indeed, data from the business’s G.A. corroborate this narrative. They show that 8.45% of users visit the website for the first time directly. This implies that the vast majority of users (over 90%) make their initial visit through other channels. Furthermore, the direct purchase CVR is 10.08%, the email CVR is 3.48%, the Google Organic CVR is 1.43%, and the price comparison table CVR is 8.66%. The high CVR of direct visits could mean that users have been exposed to the business in the past and are therefore ready to complete a purchase directly.

5.1 Model evaluation

When linear regression models are used, diagnostics for the model are essential to check the validity of the model. Therefore, standardized residuals and fitted values were checked to determine the model assumptions. Then, we assessed multicollinearity and autocorrelation (Sheather, 2009).

Graph 2 shows that there are outliers in the model. Therefore, we use standardized residuals (Sheather, 2009). Values from −2 to 2 are considered outliers and therefore were excluded. The new model revealed that all three predictors were significantly associated with y. Graph 3 without the outliers demonstrates a better distribution of fitted versus residuals. Similarly, the QQ plot has been improved.

Graph 2
figure 2

Fitted vs. residuals and normal Q–Q plot, model 10

Graph 3
figure 3

Fitted vs. residuals and normal Q–Q plot, model 10 without outliers

For multicollinearity, we use the variance inflation factor (VIF) process. The results were not concerning, with x1: 1.275175, x2 : 1.092773 , and x3: 1.276781 indicating no multicollinearity (Akinwande et al., 2015). Regarding autocorrelation, we use the Durbin–Watson test, and the results are D-W: 1.546667 with a p value of 0. To correct this, we apply the Cochrane–Orcutt transformation (Dufour et al., 1980). The Durbin–Watson statistic increased to 2.08401 with a p value of 0.6763, indicating that the model’s residuals are independent.

The next step is to understand whether the results are driven by seasonality. Seasonality presents a challenge in linear regression models because it introduces a systematic pattern of variation in the data (Chatfield & Xing, 2019). As Hyndman and Athanasopoulos (2018) highlighted in their book, incorporating time series in linear models is helpful. Although the time series linear model (TSLM) was not used for this model, we incorporated week into the model to address seasonality. This variable was added to the model not to improve the model but rather to confirm that the results were not driven by seasonality:

$$\textrm{Direct}\ \textrm{Purchases}={\beta}_0+{\beta}_1\textrm{Google}\ \textrm{Paid}+{\beta}_2\textrm{Price}\ \textrm{Comparison}+{\beta}_3\textrm{Email}+{\beta}_4\textrm{Weeks}+\varepsilon$$

Week 1 represents the first week of the dataset, and week 26 represents the last week of the dataset. This model reports a negative coefficient estimate for “Week Number” (−2.532899), which is statistically significant (p = 0.011106). This implies that the direct purchases decrease by approximately 2.53 units for each additional week. Therefore, seasonality does not positively impact y.

6 Discussion and conclusion

Our research contributes to three primary areas: academic literature, managerial practices, and marketing practitioners. From an academic perspective, our study bridges a gap in the current body of knowledge by exploring the relationship between web traffic from various digital channels and direct traffic purchases. This area has been relatively underexplored, despite direct traffic accounting for 55% of global web traffic (Bianchi, 2022) and higher CVR than any other web traffic source (Holmes, 2023). This advances our understanding of how multiple marketing channels can contribute to direct traffic purchases, thereby extending the work of previous studies such as that of Awichanirost and Phumchusri (2020) and Milano et al. (2011), who focused on the effects of sessions on unique visitors and the influence of online social media, respectively.

Our study outlines the importance of a more holistic approach to understanding web purchases from specific channels for managers. Li and Kannan (2014) claim that data-driven models can capture correlations similar to those described by us. However, there are certain limitations to data-driven models, as noted by Kannan et al. (2016). Similar to the challenges Chagniot et al. (2020) raised for last-click attributions, we highlight the importance of an approach beyond silo thinking concerning various web channels. As a result, managers may allocate their digital marketing resources accordingly.

Finally, for marketing professionals, our work highlights an existing issue, revealing how specific digital channels might interrelate with each other channels, affecting the outcome of direct traffic purchases. The relationship of specific channels with KPI has been described by Kireyev et al. (2016), examining the interplay between display ads and search, as well as Plaza (2011) and Omidvar et al. (2011) emphasis on measuring website performance. None of the literature explores the relationship between website sessions and purchases from direct traffic sources. Therefore, we offer a roadmap for professionals to uncover that relationship for their media mix.

Our study has limitations inherent to using regression analysis for marketing research (Simon & Goes, 2013). Given the unique circumstances under which the relationships among variables occur, it is challenging to extrapolate theories from correlational findings. We also acknowledge the potential issue of directionality (Asamoah, 2014). However, our data show that most first-time visitors (over 90%) originate from various nondirect traffic sources, validating their role as independent variables. Finally, we recognize the potential influence of a third variable (Lovett & Staelin, 2016). Nevertheless, any change in sessions from a marketing channel is typically business-controlled, diminishing the likelihood of extraneous influences.

Future research should expand on our work. First, researchers should establish causal links between website sessions from various marketing channels and direct traffic purchases. This could be achieved by treating marketing investments as a treatment and then measuring the impact of direct purchases. Additionally, future work should seek to create a universal theory around our hypotheses. Such a theory would provide a broader, more holistic view of how diverse digital marketing channels influence direct traffic purchases and potentially other channels. Finally, future studies should explore relationships similar to the ones we discovered but within other marketing channels.

In summary, we discovered associations between the variables, and 61% of the variance was explained by three marketing channels (Google ads, price comparison engines, and email marketing). The predictors can be associated with 255 daily purchases from direct sources and could potentially impact purchases from direct traffic sources. Although future research is necessary to prove the causality of this relationship, our study contributes to understanding the complex relationships between digital marketing channels.