1 Introduction

Deploying targeted communication to reach very specific prospects and customers has always been one of the main challenges of marketing. This is particularly true for a business-to-business (B2B) context, where marketers must identify individuals who actually have decision-making authority in an organization with potentially thousands of employees. To execute their targeted branding and advertising campaigns, B2B companies have therefore progressively used digital channels: An eMarketer report (Ryan, 2020) estimates that B2B marketers spent $8.14 billion on digital ads in 2020, up 22.6% from the previous year.Footnote 1

The increase in digital ad spending both in B2B and business-to-consumer [B2C] settings has been fuelled by the emergence of ‘off-the-shelf’ segments that marketers can now buy from information aggregators, publishers and digital platforms to target a variety of customer audiences. For example, B2C marketers tend to purchase demographic, behavioral or intent data for distinct categories (e.g. ‘in market for cars’). In a similar way, B2B marketers can buy specifically-tailored segments (e.g. ‘health-care professionals’ or ‘IT decision-makers’) derived from firmographics or synthesized name lists based on data partnerships. Collectively, this type of audience segments, created from multiple sources and provided by external parties, is referred to as ‘third-party data.’

While third-party data has created multiple new targeting opportunities for marketers and a flourishing market for data brokers, recent studies demonstrate that there is high performance heterogeneity across leading audience data vendors and networks, and that the average profile accuracy is often disappointing (Trusov et al., 2016; Neumann et al., 2019). Neumann et al. (2019) further recommend considering the odds of naturally finding the target audience in the population (without targeting), and show that the extra costs of buying third-party data are often not justified for the most common B2C segments.

However, despite an emerging literature stream on digital profiling and targeting, several critical questions remain unanswered. First, Neumann et al. (2019) focus on the most popular B2C segments (e.g. gender or ‘sports interest’) where the odds of finding the right individual by chance are still quite high (i.e. 50% for gender or 67% for sports interest, page 923). The cost-benefit trade-off for targeting in such scenarios may be unfavourable for third-party data usage, but most marketers will wonder: How does third-party data fare (economically) for more specific audiences that are rarely found in the population and harder to define?

Second, the existing profiling and data-targeting studies do not distinguish between different types of third-party data. Some vendors offer deterministic profile data, while others use probabilistic inference to create or enrich profile attributes (De Bruyn & Otter, 2022).

Third, while prior studies demonstrate considerable variance in data-targeting performance, little is known about why. What factors influence successful identification of individuals of interest beyond algorithmic inference issues? In particular, marketers need to know to what degree targeting outcomes are influenced by inaccurate [inference of] profiles versus correct profiles that were assigned to the wrong individual in the media buying process (De Bruyn & Otter, 2022).

Finally, all marketers – B2B and B2C – face the challenge of how best to communicate with the desired audiences when third-party targeting becomes less feasible due the phasing out of third-party web cookies. This raises the question of what alternative targeting methods may be the most promising alternatives in a privacy-centred world.

Our research addresses these four research gaps. We first use a large-scale field study in the United States to analyze the accurate reach of purely deterministic data segments versus (black-box) probabilistic data segments for a B2B context. Specifically, we focus on the tech sector — the largest and fastest growing B2B category in 2020 (Lebow, 2021) — and investigate the proportion of IT decision-makers (“ITDMs”) that can be reached when relying on third-party data targeting on a digital publisher network. To establish the efficacy of third-party targeting for context, we benchmark the performance results of two deterministic and three probabilistic data segments from leading data brokers with random prospecting on the same digital publisher network (our baseline without targeting).

Second, while we find that deterministic data segments perform marginally better than probabilistic data segments, our results show that both types of these third-party audiences do not lead to better outcomes than random prospecting for our context. Since deterministic data is not subject to problematic probabilistic inference as shown in prior studies (Trusov et al., 2016; De Bruyn & Otter, 2022), we also analyze our deterministic data in more detail. We find that essential information pieces of prospects, such as the job function or location, are not correct for the majority of people.

Third, we investigate what factors influence the correct digital identification of ITDMs and individuals in general. We present suggestive evidence that poor third-party data targeting can be linked to wrong cookie-identifier matching, which is a fundamental step when using deterministic data segments. We further show that the likelihood of identifying an ITDM depends on people’s background, firm size, and digital footprint, such as availability of online signals and use of devices.

Fourth, we use the results of our findings on factors affecting identification of ITDMs to examine the performance of potential alternative targeting strategies that would rely on first-party publisher network data. Such a strategy has the advantage of not relying on third-party cookies and is less prone to the identified issues around possibly obsolete or flawed deterministic data and cookie-matching. In particular, we show that targeting based on first-party demographic data and content-interest (contextual) information that publishers could gather themselves from consumers (with consent) outperforms specifically-designed third-party segments at finding ITDMs (both in terms of efficacy and efficiency).

Our work offers three main contributions. First, to our knowledge, this is one of the first papers to study digital advertising for targeting of B2B audiences, using one of the most popular B2B categories (information technology). In such situations, where the odds of reaching the right individuals preclude the use of mass marketing, managers are highly dependent on successful targeting and require guidance on the most promising digital strategies.

Second, our research contributes to the academic literature on data quality and success drivers of digital profiling and targeting (Murthi and Sarkar, 2003; Trusov et al., 2016; Lin & Misra, 2022). We present evidence for digital targeting issues beyond probabilistic inference and show that fragmentation bias does not only affect marketing measurement and attribution, but also branding and advertising campaigns (Lin & Misra, 2022). Our investigation into probabilistic and deterministic data segments and what people- and process-specific factors generally affect targeting campaigns, offers practical guidelines for marketers and media planners.

Third, our research presents useful insights into a world of ad targeting without third-party cookies. Marketers and regulators require knowledge of effective alternatives to current targeting practices. The driving force is the current privacy initiatives from leading technology companies, such as Apple Intelligent Tracking Prevention or the phasing out of third-party cookies by many internet browsers, that have and continue to create an environment where the currently dominant targeting tactics based on third-party data become less and less feasible.

Our paper is structured as follows. We first provide a literature review about customer targeting, while outlining our research method for investigating ITDMs in the subsequent section. We then summarize the descriptives for our sample as well as the reach-accuracy analyses for our initial targeting field tests (Study 1). Next, we present an investigation of deterministic data accuracy using two additional studies. Study 2 validates the deterministic segment information using other available online sources, while Study 3 examines the profile-cookie matching process on campaign outcomes in another field test with profile onboarding services. In Study 4, we investigate which people and process factors help describe being an ITDM using correlational analysis. This is followed by our final test (Study 5) of alternative demographic and content-related targeting tactics based on publisher first-party data. We conclude by presenting a brief benefit-cost analysis and a discussion of why our results matter to marketing managers and policymakers, including possible limitations and future research directions.

2 Contribution to the existing literature

Targeting diverse customer groups with differential promotional and advertising elements has played a key role in tactical marketing plans for a long time. Therefore, it is no surprise that there is a multitude of studies documenting the benefits of targeting for marketing communications (Chen et al., 2001; Narayanan & Manchanda, 2006; Goldfarb, 2014). Following the early work on why marketers may want to use targeting for communication efforts, a literature stream on the efficiency of customer targeting and customized content related to digital marketing emerged (Bucklin & Sismeiro, 2003; Ansari & Mela, 2003; Park & Fader, 2004; Manchanda et al., 2006; Hauser et al., 2009). This body of work expanded subsequently into all aspects of technology-driven targeting, such as contextual targeting (Goldfarb & Tucker, 2011), behavioral targeting (Summers et al., 2016), keyword targeting (Li et al., 2016), mobile targeting (Luo et al., 2014; Andrews et al., 2016; Chenet al., 2017) and re-targeting (Lambrecht & Tucker, 2013; Johnson et al., 2017).

During the last decade, another wave of papers emerged, examining the specific pitfalls and opportunities of collecting and selling data based on consumers’ browsing behavior. Bergemann and Bonatti (2015) develop an analytical data pricing model for selling the information encoded in third-party web cookies. Trusov et al. (2016) demonstrate that consumer profiling can be biased because many ad networks and data aggregators only obtain a partial view of customers’ behavior across websites, leading to incorrect inferences about people’s true website visiting behavior. The authors develop a probabilistic method of using cookie data to improve user profiling via a bias correction, allowing online businesses to make profile predictions when limited information is available. In a similar vein, De Bruyn and Otter (2022) present a Bayesian profiling approach to augmenting customer segments based on aggregate data. Coey and Bailey (2016) and Lin and Misra (2022) focus on a different aspect of online data issues and shed light on the consequences of identity fragmentation for successful customer identification and the resulting bias. Miller and Skiera (2017) and Johnson et al. (2020) investigate the value of cookies for publishers. Neumann et al. (2019) examine the accuracy of consumer profile data and document a strong heterogeneity in data accuracy, even among leading data providers for consumer ad targeting.

The slowly growing literature on digital segments and consumer profiling has focused on the consumer-related segments and broad business-to-consumer (B2C) contexts. However, often marketers wish to use data targeting to reach very specific audiences to find customers which can influence or make purchase decisions. This is particularly true for the case of business-to-business (B2B) settings, as the odds of finding the right person (e.g. key business decision makers in organizations) naturally in the population tend to be much lower than the odds of finding people with general consumer attributes (e.g. ‘sports interest’).

The seminal studies of Trusov et al. (2016) and De Bruyn and Otter (2022) revolve around probabilistic inferences of profile attributes, but less is known about about the performance of deterministic data segments as well as other potential error sources of profiling and targeting. Most importantly, to leverage any established profile attribute of a person for media campaigns, the attribute information needs to be matched to the right individual using some identifier (e.g. web cookies). Prior research has demonstrated the far-leading consequences of consumer identity fragmentation for marketing attribution and incrementality analyses, a problem that has been termed ’fragmentation bias’ (Lin & Misra, 2022).

Finally, all marketers need to be aware that the trend towards privacy protection has led to the phasing out of third-party cookies by many key internet browsers, such as Mozilla Firefox, Apple Safari and Brave. Google, whose browser ‘Chrome’ has a market share of more than 65% (Nagpal, 2021), will also cease supporting third-party tracking (Google, 2021b, a). As a result, many of the third-party data targeting tactics that marketers use today will not be feasible anymore.

In sum, essential questions about efficient digital advertising tactics still remain unanswered:

  1. 1.

    How effective is third-party data targeting for audiences that are less common to find in the population?

  2. 2.

    How effective is deterministic versus probabilistic data?

  3. 3.

    What factors influence data targeting performance beyond probabilistic inference abilities?

  4. 4.

    What are the most promising targeting tactics for a future without third-party cookies and limited customer tracking?

Our study addresses these four questions in five studies that build on each other. Specifically, our work focuses on IT decision-makers (ITDMs), a widely used audience for which many data vendors have created specific ‘off-the-shelf’ third-party segments that marketers can buy.Footnote 2 The popularity and availability of ITDM segments allows us to investigate comparable audience data from multiple vendors and for deterministic and probabilistic data segments. In the next section, we outline our empirical focus and methodological details.

3 Study 1: examining third-party audiences for ITDM

The research objective of the first study study is to shed light on the efficacy of ITDM third-party, ‘off-the-shelf’ audience segments for a typical online advertising campaign. To be able to conduct a field test in which we can examine ITDM audiences in a realistic setting, we collaborated with a globally leading IT-product and -service brand in the United States from September 2019 until April 2020. The campaign goal for our research context was to reach the appropriate B2B audience (i.e. ITDM) in the domestic market using digital media for branding and prospecting purposes.

3.1 Audiences data types: deterministic versus probabilistic data

We selected five of 10 of the most reputable third-party B2B data brokers which offer specific ITDM audience data for the U.S. While we cannot reveal the identities of the data vendors – we will only refer to vendors A, B, C, D, E – it should be noted that the five selected firms are globally leading providers of B2B audience data, such as Oracle, Bombora, V12 Data, LiveRamp, DemandScience or 180bytwo.Footnote 3

Two of the selected data brokers in our study, vendors A and B, provided so-called deterministic data for ITDMs. Deterministic data clearly identifies a person, for example through names or email addresses, and is typically directly provided by consumers. In our context, we obtain detailed ITDM lists from the two brokers, including name, location, role, e-mail and/or cellphone information for about 24 million people (e.g. Joe Average, Software Procurement Specialist at Delta, 5 East Ave, NY 10014, J.Average@delta.com).

Because deterministic data refers to a person’s offline profile and includes some personal identifier (mobile number, email or name with postal addresses) but the ecosystem of media-buying relies on online identifiers (e.g. web cookies), one needs to link these two types of identifiers. In other words, media campaigns using deterministic-data segments also require an ‘onboarding’ process (also called ‘activation’) that matches the respective offline profile to web cookies or device identifiers that can be used to recognize the people of interest online. In our case, we want to be able to show ads to the users deemed as ITDM prospects once these enter a publisher website that is part of our campaign. Onboarding services are offered by companies with a large online reach and rich databases that allow linking different identifiers in scale. Often this onboarding is done via customer data or data managemenent platforms (e.g. ZeoTap or Lotame), which have been designed to manage all marketing data for online campaigns and are integrated with demand side platforms (i.e., software to buy ads online, such as the Trade Desk). Many demand side and ad buying platforms also offer onboarding offline data without the need to first use a specialised platform just for data ingestion purposes. For example, Google and Facebook have their own login data as well as a sufficient market reach that allow providing onboarding services for deterministic data in order to buy ads from their inventory using non-cookie identifiers. In addition, there are third-party onboarding solutions that are available across several platforms and just offer identity matching as a standalone service (e.g. Signal or LiveRamp). We will return to the specific influence of the onboarding process on accurate targeting in Study 3, but for now merely wish to mention this essential process step for online targeting based on deterministic data.

The other three data vendors (C, D, E) in our field test provided probabilistic ITDM audiences. In their cases, we did not obtain any personal information, but only the respective cookies that the three vendors mark as belonging to the ITDM segments. Probabilistic data is based on different signals (e.g. IP location, operating system or page views) that are synthesized from various online sources and analyzed using proprietary black-box methods. Cookies are then grouped into segments based on their likelihood that a user may possess a certain attribute. The exact classification method is considered a trade secret and normally not made public. We highlight that probabilistic segments could in theory have some deterministic data elements that were used to infer the desired attribute that is offered to marketers. However, the key characteristic of probabilistic data segments is that the attribute of interest to marketers is still statistically derived (e.g. I obtain different information pieces and infer using machine learning or heuristics that a user is male), while deterministic data segments do not require this attribute-creation step (e.g. someone indicated in a database to be male and a broker obtained access to this database).

3.2 Validation method and instrument design

Because of the campaign goal of branding among likely prospects, the key metric for our test campaign is ‘reach,’ that is, whether the target audience of ITDMs, and not someone else, is exposed to display ads on a specific website that is part of a media campaign. To quantify reach of the right customers, we measure the proportion of actual ITDMs among the sample of targeted web cookies for different targeting tactics we pursue in our digital campaigns.

To identify the proportion of ITDMs for each of the five purchased ITDM segments and a baseline, we use a survey design and user-reported data to find out about people’s ITDM responsibilities. This process is in line with previous work investigating the behavior and types of online users who were reached or are reachable in online campaigns (Trusov et al., 2016; Neumann et al., 2019). Self-reported data can be subject to errors, but using alternative digital metrics like click-through rates would not tell us whether the right individuals with IT responsibilities were reached. Any consumer may click on an ad, but that consumer might not have any influence or purchasing power in an organization, making that B2B ad unsuccessful.

The nature of a B2B setting, where multiple stakeholders tend to be involved, also requires a flexible definition of ITDMs, which is a vague label summarizing several business functions, often spread across different stakeholders. Therefore, we cannot employ a single binary measure that simply asks whether someone is an ‘IT decision-maker.’ Instead, we ask people three questions about possible IT responsibilities in their company. The three questions, developed in collaboration with IT consultants and operations specialists, cover the most essential functions of IT procurement:

  1. 1.

    IT needs identification: “I identify needs for new IT products in my company/department.”

  2. 2.

    IT vendor selection: “I select/shortlist vendors for IT purchases in my company.”

  3. 3.

    IT contract responsibility: “I sign contracts/make financial decisions for IT purchases in my company.”

Since employees’ involvement in IT purchases is rarely a single job and often involves team work, we ask respondents to rate how much each of the three tasks is their responsibility, using a five-point scale (“This is my main responsibility/significant portion of my role/half of my role/minor portion of my role/not part of my role.”)

We also aim to examine factors that can influence the data quality of digital segments and describe our outcome of interest, which is why we collect data on several covariates. These cover basic demographic information (e.g., age tier, gender); work-related characteristics and firmographics (e.g., firm size, job role); and variables related to digital footprint (e.g., having a work laptop, online behavior, the browser linked to the cookie/used to complete the survey)Footnote 4 and content interest (which online topics someone reads). More information on the questionnaire can be found in Appendix A.

3.3 Field test methodology

In the case of niche or highly specific audiences (such as being an ITDM), it is not always possible to obtain the natural distribution of the audience characteristic in the population to establish a baseline comparison for reaching the desired people, as Neumann et al. (2019) did to assess the targeting effectiveness of data providers. Moreover, marketers can rarely reach the entire population of a country leveraging online media. Marketers rather select for an online campaign a group of publishers and apps where they wish to show their ads. Thus, another plausible baseline for comparing the efficacy of any targeting tactic is simply the outcome that can be achieved randomly (without targeting) for a given set of publishers where one wishes to buy media placements. We can then compare the proportion of reached target users (ITDMs) of different audience data vendors for the same publisher group in comparison to the benchmark of using no targeting (i.e., not buying third-party audience data in addition to the media placements). This approach allows controlling for the unique online population of the selected publishers of the ad campaign, which is often likely to be different from the U.S. census data.

The chosen baseline for our field test campaign is a selection of websites and apps in a digital publisher network. The publisher partners include a diverse group of content providers of all sizes, such as Time Magazine, Design Home, basketball-GM.com, JoyBits and Glu Mobile. We highlight that such a true field test of audience targeting always allows only relative comparisons of efficacy in relation to the baseline (the natural reach that can be achieved by the chosen publishers for the media campaign). In other words, we benchmark our five ITDM targeting tactics by establishing the outcomes above and beyond what the network of publishers would achieve without buying additional ITDM data segments (which would also incur additional data costs).

To be able to validate the reach of the baseline (random prospecting) and the reach of ITDM audiences (vendors A to E) on the digital publisher network, we need an advanced technological setup that only works if two conditions are met. First, the publisher network needs to be able to recognize which cookie was tagged as an ITDM by any of the five vendors. To achieve this, the publisher network needs to be able to sync their own cookies with as many cookies of the ITDM data vendors as possible (the cookies serve as identifiers and must be matched). But since our digital test publisher network does not have any direct integration with the respective data brokers, the advertiser (our IT-product and service brand) needs to step in as an intermediary and add a tracking pixel of the digital publisher network to their campaign tracking code. This step allows the digital publisher network to sync their cookies with any cookies the advertiser can access during their media campaign. Next the advertiser buys audience data from each ITDM vendor for as many websites as possible in an online advertising campaign. Thus the advertiser helps our test publisher network identify the cookies that belong to the total pool of cookies or users that each of the five vendors can access and deemed to be an ITDM.Footnote 5 We highlight that this initial cookie-tagging campaign is independent of the ultimate test campaign (i.e. targeting people on our selected digital publisher network) that we try to use for our reach validation. The latter is a sample of each data broker’s cookie pool across all their publishing partners across the U.S.. That is, only those vendor-tagged users who first were reached in the initial advertising campaign and then visit the content providers of our chosen network will be eligible for our survey. This step ensures that we can establish the reach of each vendor only for our chosen publishers where we wish to run (and test) the ultimate ad campaign.

Second, the respective publishers of our validation campaign need to be able to show surveys to the users who visit the app or website. To be able to serve our validation survey to the tagged or random users of the digital publisher network, we rely on an online measurement company that is integrated with the backend of the network and uses ‘surveywall technology’ to collect user information. Surveywall technology enables finding the same user via web cookies across integrated publishers and works as follows.

The partners “lock” content which is only made accessible to visitors who complete a survey. Popular content locking options include ebooks, audio or article content, free Wifi-access, or an “ad-free” experience on the website. Users who were previously shown ads and hence were marked as ITDMs by data vendors were then shown our ITDM survey in the surveywall once these users visited a partner of the network.Footnote 6

In contrast to the five purchased ITDM segments, for our baseline group, we served our survey randomly to online users of the network partners. It should be noted that the integration of the measurement company with publishers means we can replicate how ad serving process would look like for a real publisher network without actually serving an ad. This replicates some of the motivation behind the Ghost Ads methodology (Johnson et al., 2017), which similarly was focused on saving media costs. All survey answers were collected anonymously.

Table 1 Descriptive statistics of sampled online users and their digital behavior

3.4 Descriptive statistics of field test sample

We were able to collect 1,249 responses across our five ITDM segments and 600 responses for our baseline group via the publisher network. Table 1 summarizes the most important descriptive statistics for our sample, cross-tabulated for the six groups. First, we find that between 3.7% and 8% of the respondents are senior executives, between 9.5% and 16.2% are managers, between 7.1% and 11.9% are associates/analysts, between 2.1% and 7.1% have an entry-level position, between 1.2% and 7.1% are interns/casuals/part-time workers, between 1.8% and 11.0% are students and between 48.3% and 67.3% indicated that they were not employed at the time of the survey.

Overall, the sample distribution of job roles appears reasonable, but the high proportion of ‘currently not employed’ (including our baseline) seems to stand out. While we did not specifically track ‘retirees,’ examining age tiers for this group suggests that many respondents may be retirees. This seems to be at least the case for the five purchased B2B segments, where a large percentage is 65 or older.

In addition, it is likely that several respondents may not want to reveal information about their job function or seniority and select ‘currently not employed’ to protect their privacy in this regard.Footnote 7 We acknowledge that there is a potential survey response bias that is unavoidable for this type of research and campaign goal.

Since our main interest lies in benchmarking the relative reach of various digital ad targeting tactics (in comparison to the baseline), our conclusions about which targeting method performs best are unlikely to be completely explained by differences in possible survey response biases.Footnote 8

In terms of area of work, up to 9.9% of respondents indicated that they work in IT and between 15.4% and 25% in operations (for information on the other areas see Appendix B). For company sizes, we observe a relatively even distribution between small (1-99 employees), mid-sized (100-999 employees) and large companies (>999 employees). Likewise, our sample has a fair gender balance, with men representing 30.9%-48.7% across all six groups. This small skew toward women in our sample could be because women are typically more often online (Lambrecht & Tucker, 2019). As mentioned earlier, in terms of age, the five purchased ITDM segments (A-E) skew older, with more than half of the sample being 55 or older (vs. 36.2% for the baseline). We speculate that this age group pattern is an artifact of the objective to find our desired audience of business decision-makers, as commercial responsibility normally increases with seniority.

Regarding their digital footprint and online behavior, we find most respondents (54.5%-76.2%) use the internet or social media daily, around 24.8%-28.6% shop online daily and around 39.2%-48.1% play mobile games daily.Footnote 9 Our browser statistics, which are tracked and not elicited, suggest that between 7.1% and 28.4% appear to use anti-tracking browsers for the ITDM segments versus 49.8% for the baseline group.Footnote 10 Anti-tracking browsers would be Mozilla, Safari or other browsers which do not allow third-party cookies.Footnote 11 The observed browser differences therefore provide evidence that the baseline group consists of random online users who can be reached through the first-party cookies of the digital publishers. We further find that 15.5%-42.1% have a laptop provided for work, up to 25.0% use their private laptop for work, up to 20.9% have a mobile phone provided for work and 12.5%-30.7% use their private mobile phone for work. This results in up to 7.4% of people having two laptops and 3.3% having two mobiles they use for work. Between 12.7% and 31.2% also indicate that they do not spend much time online for work. Finally, 15%-22.3% read business content every week and 10%-16.3% read business content (for information on other content interests, see Appendix B).

3.5 Results: reach analysis of ITDM segments and random prospecting

Next we analyze how our five segments and the baseline differ in their reach metrics for ITDMs. Table 2 summarizes the descriptive statistics for our three outcome metrics: IT needs identification, vendor selection, and contract signing. We can see that across all five examined segments, over 83.5% of respondents have no IT product or purchase responsibilities as indicated by our three measured questions. All of these users were associated with cookies that were marked and sold as ITDM to advertisers. The two deterministic data segments based on prospect lists (A,B) perform slightly better than the other three probabilistic data segments (C, D, E), but still barely better than the baseline alternative.

Table 2 Summary of outcome measures for IT decision-making responsibilities

To shed more light on our specific challenge to find ITDMs and not people with related job functions, we also generated two ITDM classifications for each user based on consumers’ stated IT responsibilities: 1) We formally define an ITDM as a person that has at least some responsibility for any of the three key IT operations functions we measured (even if only a minor portion of the job role). 2) In addition, we distinguish an ITDM from a ‘major ITDM’ (MIDTM). An MITDM is a person that has at least some responsibility for all three key functions (even if only a minor portion of the job role).

Table 2 summarizes the results of our field studies for these two classifications. We find that the three probabilistic data segments have only 9.4%-11.0% ITDMs and 4.8%-6.4% MITDMs in their samples, which is lower than the proportions in our baseline group (16% and 6.8% for ITDMs and MITDMs, respectively). The two deterministic data segments perform better, with ITDM proportions of 18.2% and 16.1% and MITDM proportions of 8.7% and 11.1% (vendor A and B, respectively).

To examine whether the observed differences in sample proportions for ITDMs and MITDMs are statistically significant, we conduct two-tailed Fisher’s exact tests (FET). We find statistically significant differences for ITDM proportions across our five purchased B2B segments (p=0.005, FET), but not for MITDMs (p=0.315, FET). Furthermore, we do not find significant differences in ITDM or MITDM proportions among the three probabilistic-data vendors (p=0.884 and p=0.920, respectively, FET) and the deterministic-data vendors (p=0.757 and p=0.531, respectively, FET). We therefore pool our reach numbers across vendors and segment types into “Baseline”, “Probabilistic data segment” (vendors C,D,E), “Deterministic data segments” (vendors A,B) and “Segment average without baseline” (vendors A,B,C,D,E). Figure 1 summarizes the average reach results for the four segment types, including confidence intervals. Comparing all ITDM segments (five segments pooled) with the baseline does not result in any statistically significant differences for our two ITDM classifications (ITDM and MITDM). When comparing the deterministic with the probabilistic data segments, we find that deterministic outperforms probabilistic for both ITDMs and MITDMs (p<0.001 and p=0.053, respectively, FET). However, the proportion comparison between deterministic data segments and our publisher-network baseline again yields no significant differences (p=0.366 for ITDM and p=0.1738 for MITDM, FET). Likewise, we find no differences between probabilistic data segments and the baseline for MITDMs (p=0.558, FET), while the former even result in lower ITDM proportions than the baseline (p=0.005, FET). In Appendix C, we show that these results also hold for the original three outcome variables and are independent from our chosen ITDM/ MITDM definition.

Fig. 2
figure 1

Average reach results and confidence intervals for different B2B targeting

Overall, our findings reveal that relying on deterministic data segments is more effective than using probabilistic data segments for identifying ITDMs via digital ads. Nevertheless, even the deterministic data segments are not statistically better at reaching our target market than our baseline segment based on random serving. Is this finding unique to our chosen publisher network, which could have a higher share of business readers than other publishers or the U.S. population?

We have not found an official statistic for the number of ITDMs in the US population. However, we can easily obtain a plausible estimate for existing ITDMs to understand how different our publisher network partner may be to the adult population of our study context. According to the latest Census data, there are about 32.6 million businesses in the U.S.,Footnote 12 each of which must have at least one person responsible for any IT purchases. Given the adult population is about 258.3 million, this would correspond to a lower boundary of 12.6%. This is well in the range of our two ITDM definitions, being 6.8% and 16.0%.

While our type of validation field test only allows interpreting relative results as it depends on the publishers where a campaign is assessed, we can see that our results appear to be in line with the reach we would expect for random user selection.

4 Examining error sources of deterministic data segments

The finding that the average probabilistic (black-box) ITDM segment yields unsatisfactory performance expands the findings of Neumann et al. (2019) from B2C to a B2B (and similar) context where it is very challenging to find the right customer by chance. However, while results are slightly better when using deterministic data segments, we find a disappointingly low proportion of actual ITDMs in our field tests for this type of data too.

Two critical factors mainly influence the success of finding the right customers for deterministic data: 1) the accuracy of the actual profile information of the synthesized lists and 2) the accurate matching of the purchased profiles to web cookies against which media can be bought online. Without access to the back end of the data-broker process, it is not possible to examine exactly how much each factor contributes to a specific campaign. However, we subsequently present two further empirical tests to investigate whether each of the two factors is problematic.

4.1 Study 2: validating deterministic profile information

For the first test, we validate the profile information that is the foundation of the deterministic data segments. This test can also be seen as a robustness check for the survey-based reach results in Study 1 for the two deterministic data segments. In our case, we have the detailed profile information (i.e. names, email/cellphone number, location and job role) for each person from two different data vendors. To examine the accuracy of each piece of information, we took a random sample of 884 people from vendor A and 470 people from vendor B from the same media campaign lists that were used for the digital campaigns.

To identify whether a person could be found online, we first used a Google Search. This was followed by searches on LinkedIn, ZoomInfo, Rocket Reach and Signal Hire if we could not find the name of a listed person in Google. We then used LinkedIn to investigate the location and job function of matched people.Footnote 13 We find name and company matches of 67.5%-70.9%, while 20.2%-24.8% of the search queries result in a person match with a different company entry (Fig. 2). Next we checked the job function/provided title as well as the location of the profiled individuals of the deterministic data lists. We find that only 16.3%-17.0% of the job titles and 21.5%-26.1% of the location information resulted in a match. We can only speculate about the underlying reason, one of which could be simply outdated data. Some errors could also stem from the use of IP addresses for tracking; these may be incorrect because companies use firewalls or because some private information may be confounded.

Independent of the underlying reason, our deterministic profile information validation suggests a significant level of inaccuracy, which seems pretty constant across the two vendors. Of particular concern to our goal of reaching ITDMs are the likely incorrect job functions: The average match rate of 16.3%-17% is close to the 16.1%-18.2% ITDM proportion uncovered by our survey results for deterministic data segments (Table 2).

Fig. 3
figure 2

Deterministic profile information validation (mean and confidence intervals)

4.2 Study 3: profile-cookie matching tests

The profile information of deterministic data is only one possible source of error for digital campaigns. As described earlier, any deterministic profile also needs to be matched to other identifiers of publishers or ad networks to be able to buy media against it. This matching process, which is carried out by some onboarding service provider (see Section 3.3), is another potential source of errors for online targeting using deterministic data segments. Specifically, identity fragmentation bias (Lin & Misra, 2022) — the fact that a single user has multiple devices and may use different browsers — could create challenges in matching the right person to identifiers. Put differently, it’s obvious that the quality of matching different identifiers relies entirely on avoiding linking the wrong pieces together. Otherwise the wrong people are reached even though the original attribute was correct.

We next carry out a field test to explore whether the onboarding process itself could have contributed to the poor reach results of our deterministic data segments in our ITDM branding campaign. For privacy reasons, we cannot contact a web cookie through our publisher surveywall and validate the identities of the targeted cookies by asking: “Are you Joe Average from Chicago?” However, most platforms providing onboarding services for media targeting allow uploading user lists (in hashed format) and return the match rates from their database. We carry out two different match tests using these features and our ITDM prospect lists from the US.

For the first test, we upload 100,000 profiles (Name, Location, e-mail and, if available, cellphone number) to four selected platforms: Google, Facebook and two leading onboarding services that are typically used for third-party data targeting.Footnote 14 We then examine in a second test what happens when we upload and intend to match the same 100,000 profiles but this time the uploaded lists lack a clear identifier and leave open the option of multiple matches to real people. To achieve this, we remove the address and mobile number, while we add five numbers to the username of the email addresses (e.g. Jane.Doe@work.com to Jane.Doe99987@work.com). We remind the reader that the email address will be hashed such that the onboarding platform will not be able to recognize any patterns in our modified emails and should be unlikely to find any matches in their database using emails as key identifier (unless someone actually uses the email ‘Jane.Doe99987@work.com’). If there is no e-mail match between the uploaded list and the onboarder’s database, then the only way to provide some match would be just using the name (which should be ambiguous). The results of this test using modified lists are summarized in the top rows of Table 3. We find that Google matches 12.3%, Facebook 20.0%, ‘onboarder 1’ 75.2% and ‘onboarder 2’ 55.3% of the original profiles. For the modified profile data, Google, Facebook and ‘onboarder 2’ report less than 1% matches, while ‘onboarder 1’ find 74.0% matches.

Next we repeat the same two-step procedure using 100,000 profiles based on the company’s global customer relationship management (CRM) system. We additionally remove the last name for the modified data to further increase match difficulty and lower the odds in finding a single correct match. The results are summarized in the bottom rows in Table 3. We find that Google now reports 44.0% matches, Facebook 62.0%, ‘onboarder 1’ 79.2% and ‘onboarder 2’ 76.5% of the actual profiles. For the modified masked profiles (where we created wrong/ fake emails), Google and Facebook report less than 1% matches again, while ‘onboarder 1 and 2’ still find 22.5% and 3.8% matches, respectively. Thus, the CRM global test seem to be in agreement and confirm a similar pattern as our test on US ITDM prospect lists. While Google and Facebook show the results we would expect — that our modified data results in virtually no matches — the two onboarders partly still report some matches. Between the two onboarders we benchmarked, we also find strong differences in the match rates for the modified data. It is concerning that ‘onboarder 1’ still reports a large percentage of matches for profiles that should not be linkable to one single person. While this is only suggestive evidence that the user profiles are likely to be often mismatched in a campaign relying on some onboarding services, we see that the matchmaking process is subject to errors and, depending on the chosen provider, will contribute to poor digital targeting results.

Table 3 Summary of profile matching tests

5 Study 4: what features help finding IT decision-makers?

Study 1 illustrates that the custom ITDM segments are no more helpful for finding the desired audience than random prospecting from a digital publisher network. This result raises the question of how to best reach ITDMs via digital channels. Some covariates we measured in our survey can be used as robustness checks for our ITDM outcome metric (e.g. being a senior executive), while others are typical customer characteristics, some of which companies could access or buy for targeting purposes too (e.g. demographics). We next explore which features help finding online users who are ITDM. For this analysis, we first focus on demographic and firmographic features and then on digital behavior covariates.

5.1 Demographic and firmographic features

We investigate the associations of classic demographic and firmographic features on the likelihood of an online user being an (M)ITDM using the entire sample of 1,849 consumers for these analyses. Since we have individual data on each person i and whether someone is an (M)ITDM, we perform logistic regressions that determine the probability of being an (M)ITDM for our measured features:

$$\begin{aligned} Probability([M]ITDM_i \mid x_{i}) = \frac{exp^{x'_{i}\beta }}{1+exp^{x'_{i} \beta }} , \end{aligned}$$
(1)
$$\begin{aligned} x'_{i}\beta = \beta _0 + Demographics_{i} \beta _A + Firmographics_{i} \beta _B, \end{aligned}$$
(2)

where \(\beta _0\) is the model’s constant and \(Demographics_{i}\) is a row vector of fixed effects for different demographic consumer attributes, such as ‘age 18-25’ or ‘gender,’ while the row vector \(Firmographics_{i}\) captures fixed effects about the person’s background with respect to the job and company, such as firm size, job role and industry. We then estimate five different models with different groups of variables, four reduced model specifications and one full model (with all covariates included), using ‘ITDM’ and ‘MITDM’ as the dependent variable, respectively.

Table 4 summarizes the results of our demographic and firmographic features for our ten estimated models. First, we find that working in IT/operations (columns 1 and 6) and being an senior executive or manager (columns 2 and 7) significantly increases the likelihood of being either an ITDM or MITDM. We interpret these results, which theoretically show a positive association, as a robustness and data validity check.

We next focus on age and gender. We find that being older or a man is associated with a greater likelihood of being an ITDM or MITDM (columns 3, 8). In particular, being 35-64 (55-64) years old significantly increases the odds of being an ITDM or MITDM. This finding appears logical, as corporate responsibilities tend to grow with seniority. This effect disappears for people over 64, probably because of retirement. Further evidence is available in Appendix E. While not shown here, logit models suggest no significant association of gender and being a senior executive, but a significant association of being male and a manager for our data. Hence, the gender effect may reflect the well-documented disparity across gender in IT (Lambrecht & Tucker, 2019) as well as some potential gender bias in being promoted to manager roles. Finally, we find that people working for small and mid-sized companies are more likely to be an (M)ITDM than for large firms (columns 4 and 9). This finding appears plausible, given the odds are simply smaller at larger firms of being a person with ITDM responsibilities. If we include all the features in a model (columns 5 and 10), gender, firm size, work function (IT/ operations) and seniority (manager/executive) are the covariates with the most precise associations with being an (M)ITDM.

Table 4 Firmographic and demographic features of IT decision-makers

5.2 Digital behavior characteristics

Next we investigate the associations of behavioral characteristics regarding web-surfing and technology preferences and the likelihood of an online user being an (M)ITDM. We again perform logistic regressions (in line with Eq. 1), but this time we regress the following features on the the probability of being an (M)ITDM:

$$\begin{aligned} x'_{i}\beta = \beta _{00} + DeviceCharacteristics_{i} \beta _C + OnlineBehavior_{i} \beta _D, \end{aligned}$$
(3)

where \(\beta _{00}\) is the model’s constant and \(DeviceCharacteristics_{i}\) is a row vector of fixed effects for different device characteristics, such as the types of PCs, browsers and mobiles used by an individual, whereas the row vector \(OnlineBehavior_{i}\) pertains to a user’s online behavior, such as the types of content that someone reads, or their indicated frequency of key activities (e.g. mobile gaming or social media usage). We again estimate five different models with different groups of variables, four reduced model specifications and one full model (with all covariates included) for ITDM and MITDM as dependent variables, respectively.

Table 5 Digital behavior characteristics of IT decision-makers

Table 5 summarizes our digital behavior characteristics for our ten models. We find first that using an anti-tracking browser has a significant and negative correlation with the likelihood of an online user being an MITDM (column 6), which does not hold when all covariates enter the model (column 10). We further test an interaction effect of using an anti-tracking browser and an indicator for probabilistic data segments. Probabilistic data segments have a main (correlational) effect with a negative sign, although this effect only reaches statistical significance for one ITDM analysis (column 1). This result suggests that probabilistic data segments are not always worse than random prospecting or deterministic data segments but that the classification likelihood strongly depends on the characteristics of the online user.

Interestingly, we find a negative sign for the interaction effect of probabilistic data segments and anti-tracking browser, which is negative but reaches statistical significance for ITDMs only (see columns 1 and 5).Footnote 15 This finding suggests that tracking challenges affect probabilistic data segments more strongly regarding the likelihood of identifying an ITDM. This is expected, given that the creation of probabilistic data segments heavily relies on available online signals from third-party cookies that must be enabled in web browsers.

In addition, we find that having a personal laptop or phone that is used for work, or a specifically provided work phone or laptop, increases the likelihood of online users being an ITDM or MITDM (columns 2, 7). If people have two devices (phones or laptops), then we find a negative association, which is in line with the concept of identity fragmentation bias (Lin & Misra, 2022). That is, online users who use multiple devices are more likely to be classified incorrectly. This effect only reaches statistical significance for the ITDM analyses (columns 1 and 5). The finding that many effects have the expected sign but do not reach a high precision in the estimates may be linked to the reduced power given by the limited number of respondents who classify as MITDM and are online users exposed to probabilistic targeted ads within our publisher network.

When looking at general online behavior (columns 3, 8), daily online shopping/daily internet use is positively correlated with being an (M)ITDM. This finding could be a behavioral characteristic or be linked to the available digital footprint needed to build user profiles. Being an (M)ITDM is negatively correlated with daily social media use or mobile gaming, although these effects only reach significance for the models just looking at behavioral variables for daily mobile gaming (columns 3 and 8) and three of four models for daily social media usage (columns 5, 8 and 10). Speculatively, key decision-makers plausibly lack the time to engage daily in mobile gaming or social media. Reading business content weekly is positively correlated with being an (M)ITDM (columns 4, 9); reading travel or technology content is positively correlated with being an ITDM. When including all variables, having or using a laptop/phone for work, daily online shopping and reading business content appear to be the most precise behavioral descriptive covariates.Footnote 16

6 Study 5: alternative targeting tactics

Our analysis of customer features describing the association with ITDM likelihood provides some possible alternative characteristics that could be used for digital targeting. Of course, not every firmographic proxy or customer covariate for our target audience can be bought in scale from data vendors or easily be collected. However, our analyses suggests two types of other targeting criteria that publishers can use to build their own segments and then offer them to potential advertisers: (1) Demographics (such as age and gender) and (2) an interest in content such as technology or business. Similar to our random cookie selection for our baseline benchmark, we can use the digital publisher network integration of our measurement company to examine how using the two criteria (demographics and content-interest) would fare as targeting mechanism for an ITDM campaign.

Moreover, we use a first-party data targeting approach for our next field test. We target only cookies whose self-reported data meet our alternative targeting criteria.Footnote 17 Because our measurement company is fully integrated with the websites and apps of the digital publisher network, it can collect data on visitors and can later recognize and select/ target users with certain criteria (provided they do not delete their cookies). In other words, for our study we can again mimic the targeting process in the publisher network without actually serving an ad to the user behind the cookie that meets our criterion (but show our validation survey instead of an ad). Our approach, collecting information directly from its own users and then building segments based on this information, can be carried out by any publisher. Such a first-party data approach reduces risks of poor data quality found in third-party demographic or interest-based data that were bought from data aggregators (Neumann et al., 2019).

6.1 First-party demographic and content-interest targeting for ITDMs

To investigate the efficacy of the two identified targeting alternatives in reaching ITDMs, we conduct a new field test. We target only cookies with an age tier of 45-64 years of age and who are maleFootnote 18 in one test and those who indicate they regularly read business or technology content in another test.

Our additional field test resulted in survey responses from a sample of 333 online users for demographic targeting and 103 for content interest. We again compare the proportions of (M)ITDMs with the results from our previous tests (see Fig. 3). We find that targeting by age/gender results in an average of 23.4%/16.2% ITDMs/MITDMs, whereas selecting users by technology- or business-content interest leads to 28.2%/14.6% and 41.8%/22.8% ITDMs/ MITDMs, respectively. Chi-squared tests suggest that these proportion differences in frequencies of observed (M)ITDMs are all statistically significant (see Table 6).

The field test suggests that these two alternative targeting tactics to reach ITDMs outperform the probabilistic data segments, random prospecting and the deterministic data segments. We carry out two robustness checks to validate the findings using different samples.

Fig. 4
figure 3

Reach comparison for different targeting methods (means and confidence intervals)

Table 6 ITDM reach for demographic and content-interest targeting

6.2 Robustness checks for alternative targeting results

As a first robustness check, we repeat our analysis of how targeting alternatives fair using the random sample of 600 people that made up the baseline group. We can mimic our targeting tactics around age/gender and content interest by filtering only for those users who would meet the criteria of being male and 45-64 years of age or alternatively reading business/technology content). The results of this alternative and second, independent sample are summarized in Fig. 4 and illustrate again that demographic and content-interest targeting based on the publisher’s own data lead to greater ITDM and MITDM reach.

Fig. 5
figure 4

Targeting methods replication for random baseline group (means and confidence intervals)

For our second robustness check, we replicate our analysis of how targeting alternatives fare in comparison to probabilistic and deterministic data segments as well as the baseline using only the sample proportion of respondents who was employed (n=710). The results are summarized in Fig. 5 and confirm the previous findings about the superior performance of demographic and content-interest targeting based on publisher first-party data.

Fig. 6
figure 5

Reach comparison for different targeting methods and the sample of ‘employed‘ (means and confidence intervals)

7 Cost-benefit analysis

In Study 5, we have shown that alternative targeting approaches outperform both the publisher network baseline and purchasable ‘off-the-shelf’ ITDM segments, including deterministic data segments, in reaching ITDM. A remaining question is whether the revealed performance differences are captured by the market prices for audiences and which segment provides the best benefit-cost ratio. We therefore compare the data costs that are directly associated with buying that segment relative to their benefits (Table 7). These costs are typically added as cost per mille (CPM) to a media campaign in addition to media and technology costs. Because our baseline in all our tests represents the propensity to find ITDM across our publisher network and not a purchasable data segment, we cannot use this as our benchmark for the benefit-cost analysis. Instead, we use the probabilistic ITDM segment, which is also the most common type of ITDM audiences bought by advertisers, as our anchor for comparing benefit-cost ratios of our targeting types. We synthesized typical CPM ranges for our four types of targeting based on external sources (Nylen, 2018), a leading data management platform and the input from the media team from the IT service provider. Probabilistic ITDM data costs range from US$1.50-3.00, deterministic data from $2.00-3.00, content-interest/ contextual data from $0.29-1.33 and demographic first-party data from $1.00-1.50. Using the middle values of the range for our back-of-the-envelope estimation results in a CPM of $2.25 for the probabilistic data segments, a $2.75 CPM for deterministic data segments, 81 cents for contextual/content-interest targeting and $2.50 for demographic first-party data (for two attributes each with a middle value of $1.25). We then estimate different multipliers for CPM costs and (M)ITDM using the results from Table 3. This procedure leads to a benefit-cost ratio (with third-party probabilistic as anchor = 1) of 9.35 (8.66) for contextual/ content-interest targeting, and 1.42 (1.23) for deterministic data targeting to reach ITDM (MITDM). Hence, considering both market costs and reach from our study, we find that contextual/content-interest targeting is nearly 9-10 times more (cost) efficient than using the specific ITDM segments (both probabilistic and deterministic). Demographic first-party data targeting is also about twice as (cost) efficient as the ITDM segments. While we chose probabilistic data segments as a basis for our analysis, we highlight that, given the cost differences, the same conclusion can be drawn when using a deterministic data segment as an anchor for the calculations.

Table 7 Benefit-cost analysis for ITDM and MITDM targeting options

8 Discussion and conclusion

In this paper, we investigate in five studies how best to reach IT decision-makers (ITDMs) for digital advertising campaigns when the campaign goal is prospecting or branding for a very specific audience. We benchmark several targeting tactics that marketers can use for this objective of reaching the ‘right’ customers.

For our context, we find that deterministic data segments outperform probabilistic data segments, but are not significantly better than the random prospecting baseline from our digital publisher network. Moreover, all our probabilistic data segments performed worse than the baseline comparison from the publisher network in our tests. Thus our findings expand the work of Neumann et al. (2019) on third-party (probabilistic black-box) targeting from common B2C segments to one of the most important B2B segments, where the odds of naturally finding the right customer in the population tends to be low. Furthermore, to the best of our knowledge, our study presents first evidence that even deterministic data segments are problematic and may not perform better than using no targeting (random prospecting).

After investigating what factors help find an ITDM online, we conduct further benchmarking tests to see how alternative targeting tactics based on first-party publisher data would fare in comparison to the third-party ‘off-the-shelf’ ITDM segments. We demonstrate that such publisher-data targeting based on age tiers and gender or based on technology-/business-content interest outperforms the random baseline and the custom, specifically-created segments (both deterministic and probabilistic) at reaching ITDMs. Our findings appear robust across all (parametric and nonparametric) analyses we carried out. When further considering data costs, we find that content-interest - which can be regarded as a form of contextual targeting - appears to be not only the most effective but also the most (cost-)efficient tactic to reach ITDM via digital advertising.

What explains the poor results for the ITDM segments in our tests? In total, we document three possible mechanisms that account for our results. First, we find evidence of identity fragmentation bias (Lin & Misra, 2022), which can affect profiling in two ways. On the one hand, original user profiles may be matched to the wrong individual in the media-buying process. This part of fragmentation bias strongly affects deterministic data segments, which must be matched (=onboarded) to identifiers of websites (typically web cookies) to be able to execute targeted online ad campaigns. On the other hand, it is also possible that fragmentation bias affects providers of probabilistic data segments. Users with multiple devices or browsers may generate mixed or incomplete signals if a data vendor does not obtain a unified view of a single person (Trusov et al., 2016). As a result of missing critical signals of individuals or incorrectly assuming signals belong to the same person, a data vendor applying probabilistic methods to customers’ online browsing behavior may make wrong inferences about a profile attribute.

Second, we show that being an ITDM depends on the digital footprint users leave and their online behavior. For example, some people use browsers that don’t allow tracking, which strongly affects the performance of probabilistic data segments. We also find that browsing online or shopping online daily correlates with the likelihood of being an ITDM. This findings may be partly due to being an ITDM behavioral characteristic and partly due to providing more online signals that allow identifying or modeling their behavior or background more accurately.Footnote 19

Third, we find low match rates for many pieces of information from the deterministic data segments (in particular, location and job function) when validating these with other well-known databases based on first-party data (e.g. LinkedIn). This finding suggests that the provided contact details may be outdated or incorrectly associated. Incorrect user-specific information primarily affects providers of deterministic data segments.

Our findings on the most effective and efficient digital targeting tactics for reaching ITDM have important implications for marketing practice. Mozilla Firefox and Apple’s Safari have already made policy changes that disable third-party cookies by default. Google has announced to phase out third-party tracking in the Chrome browser in 2024 (Gonzales, 2022). Thus, many current targeting solutions that rely on third-party cookies are likely not feasible options in the future. Using deterministic data segments may still be possible with individual platforms, but is also controversial in terms of privacy considerations, can be expensive and is still subject to several issues, such as incorrect information or fragmentation bias. In contrast, we show that content-interest (contextual) targeting performs well, but is also relatively less intrusive. Demographic targeting based on first-party data and collected with consent may be also able to be more compliant with existing privacy-regimes that are based on consent. Given our results on varying performance of onboarding platforms, marketers are also advised to perform due diligence about the exact matching methods and to carry out their own tests.

Our study has limitations. First, we investigate targeting tactics to reach IT decision-makers through digital advertising. It would be interesting to see how other very specific segments fare. Second, we use self-reported data from surveywalls. We acknowledge that this data may suffer from inaccuracies and be subject to typical survey response biases that are often unavoidable. Given the nature of our research question and the complexity of B2B purchase processes, there is no obvious viable single metric that can be used as a source of ground truth. Moreover, robustness checks and our different complementary research methods suggest that our findings and conclusions are unlikely to be explained by survey response bias. Third, our tests for targeting alternatives mimicked a targeting procedure in the field by filtering and selecting respective profiles within the publisher network of our campaign. While the respective users were online and could have been served an actual ad, the publisher network did not offer either of the two tested targeting tactics when our study was conducted.Footnote 20 However, the data stems from users of a digital network of popular content providers and publishers, which could easily collect such data to build and sell the described segments using our approach. Consumers may provide the required data with consent to their publishers of interest in exchange for access to content. Content-interest targeting, the most efficient tactic in our tests, is also often referred to as contextual targeting. Publishers can easily create such segments from the browsing history of the first-party cookies or signed-in users on their platform. We can see publishers adopting this strategy.

For example, the New York Times or News Corp, report that they have successfully created their own audience segments using their first-party data (including collected survey data) or by building a partner network (Newscorp Press Release, 2019; NYT Open Team, 2020).

One of the most crucial findings in our study relates to the impact of onboarders’ performance when third- or second-party data needs to be matched with different identifiers of media sellers. This process step seems prone to errors due to fragmentation bias. In contrast, audience information generated by publishers (= first-party data segments) is less likely to be subject to some errors based on linking profiles to the wrong person as the identifiers to create and leverage any audience information for media buying are the same. In other words, as long as advertisers rely on first-party data from the publisher [network] where they run their targeted campaigns, additional onboarding/ identifier matching between different networks is not necessary.