Introduction

Since Arrow’s [1] seminal work, uncertainty and information asymmetry have been recognized as two main features of the health care market. Uncertainty arises mainly in relation to the incidence of illness and the effectiveness of treatments provided by physicians. Patients generally possess limited information, relative to physicians, about their own health conditions, treatments provided by physicians and their likely benefits, physicians’ quality, and other treatment options [2, 3]. This information asymmetry allows physicians to act as agents for their patients and creates economic incentives for physicians to induce demand for their medical services so long as it brings extra benefits to them even when the costs outweigh patients’ benefits [4, 5].

While uncertainty and information asymmetry still characterize current health care markets,Footnote 1 a rapid diffusion of the Internet has allowed individuals to access a large variety of health information with practically no monetary and low opportunity (time) costs [6], which has potentially brought diverse impacts on the demand for health care. On the one hand, the extended access to health information can help patients to improve their position in the relationship with physicians and to evade physician-induced demand. It can also act as a cost-effective substitute for medical information from physicians. On the other hand, it reduces uncertainties concerning patients’ health investment decisions and thus increases health care demand for risk-averse patients. Health information can also raise the demand for health care if it enhances patients’ appreciation of the benefits from health care or raises their concerns about their health status. Due to these opposing impacts, the net effects of increased availability of online health information on the demand for health care are ambiguous.

Despite the importance of information in health care market, empirical evidence on the effects of online health information on the demand for health care is scarce and mixed, with recent studies examining the effects of Internet-based health information reporting either positive or non-significant impacts on health care demand [7,8,9]. This paper empirically examines the relationship between Internet-based health information and the demand for health care, specifically the demand for physician services, and contributes to this line of the literature in two important ways. First, while previous empirical studies have mainly focused on the United States and on periods before the extensive diffusion of the Internet, this study examines the data collected from the 28 European Union states in 2014, a period after the extensive diffusion of the Internet with ubiquitous access through smart phones. Second, while the media complementarity hypothesis [10] suggests that individuals seeking health information online tend to seek comparable information on other, conventional (non-Internet) media, previous studies have not considered the potential impacts of concurrent seeking of offline health information when examining the effect of online health information on health care demand. Thus, it is not clear how much of the reported impacts are attributable to information obtained from the Internet and that from other, conventional media. To address this issue, we explore potential differences between online and offline health information in their association with the demand for physician services through distinguishing individuals seeking health information exclusively on offline sources from those who seek it on online and potentially on offline sources too.

Our results indicate that individuals’ health status and sociodemographic factors affect online and offline health information seeking patterns in a similar manner. Furthermore, while the demand for physician services is significantly and positively associated with offline health information seeking, it has no significant association with online health information seeking, indicating that the net association with online health information seeking, after controlling for potential concurrent offline health information seeking, would be even weaker. This result provides a stark contrast to previous studies reporting significantly positive effect of online health information. Our results suggest that fostering the availability of health information on the Internet will promote online health information seeking to wider population, yet this will be associated with little or no change in the demand for physician services in the short term. However, it can reinforce unequal distribution of health information and create even greater variation in individuals’ health management skills and their health care benefits in the long term.

The rest of the paper is organized as follows: The next section discusses the roles of health information in health care investment decision and reviews previous empirical findings on the effects of health information seeking on the demand for health care, with particular emphasis on those focusing on health information from the Internet. Third and fourth section describe the data and introduce three econometric models that measure the differential effects of online and offline health information seeking on health care demand. Fifth section presents the results of the empirical analysis. Sixth section concludes the paper with a synopsis of the results.

Effects of health information on health care demand

Conceptual framework

Theoretically, a rational individual will determine the optimal level of health care investment by equating the marginal benefits and marginal costs of health care. However, in the absence of perfect information, individuals do not fully appreciate the true benefits of health care. Health information helps them to make better health care investment decisions by reducing the uncertainty of their returns. It can also alter an individual’s preference and enhance his/her appreciation of good health condition, relative to other goods. Acquiring new health information, however, incurs monetary and non-monetary (time) cost. Thus, an individual determines the optimal level of health information seeking by equating the marginal benefits and costs of seeking health information from various sources. This decision is made simultaneously with the health care investment decisions, as the benefits of health information emanates at least partly through enhancing benefits of health care.

A rapid diffusion of the Internet has facilitated fast access to health information at reduced cost and multiplied the volume of available information [11, 12]. This cost reduction can promote health information seeking among individuals with low expected benefits from the acquired information, which potentially creates two opposing impacts on the demand for health care. First, health information increases the demand for health care as it reduces uncertainties of health care benefit and enhances consumers’ appreciation of health and wellbeing. For example, acquiring health information prior to visiting physicians can enhance patients’ benefits from their visits [13, 14]. Ultimately, this could lead patients to create excess demand for medical services from their physicians [15].Footnote 2 Health information can also act as a complement to the services provided by medical professionals. For example, patients with insufficient medical knowledge might seek physicians’ support to understand information obtained from the Internet or other media. In another case, patients might use the Internet to gain further insights about the diagnoses and treatments recommended by physicians [6, 16]. In both cases, the cost reduction in health information will increase the demand for both health information and physician services.

Second, additional health information can decrease the demand for health care if it improves patients’ position in their relationship with physicians and thus allows them to evade physician-induced demand. Health information can also act as a substitute for medical services provided by physicians when patients face high monetary or opportunity costs of gathering information from physicians [6]. Such would be the case for uninsured individuals and those with high travel costs to their usual health center. With these opposing impacts, the net effects of increased availability of online health information on the demand for health care are ambiguous and vary among individuals due to the heterogeneity in the costs and benefits of the acquired information.

Empirical work on health information’s impacts on health care demand

Health economists have long tried to empirically estimate the effects of health information on the demand for health care and determine whether positive or negative effects prevail. Empirical research has shown mixed evidence. For example, using direct measures of health information based on individuals’ health knowledge, Kenkel [17] reports that consumer health information increases the probability of physician visits while it has no significant effect on the number of visits. Hsieh and Lin [2] confirmed the positive effect on the demand for preventive care among elders. Dwyer and Liu [18] also found a positive effect, but they identified weak yet negative effects on the use of physician services and emergency rooms among individuals with low trust of doctors. Contrary to these results, Schmid [3] reports a significant negative effect.

More recently, a few studies have investigated on how health information from the Internet impacts the demand for health care. Lee [8] reports that online health information seeking had positive and unidirectional impacts on the demand for two types of medical services: medical information and physical treatment. Suziedelyte [9] confirms the positive effect of online health information seeking on the number of physician visits. In contrast, Beck et al. [7] report survey results in which 88.6% of young online health information seekers self-report that the use of the Internet for health matters has not changed the frequency of their medical consultations.Footnote 3

Inconsistency of the results previously reported on the effect of health information on the demand for health care can be attributable to several factors, yet two are of strong relevance for our analysis. Firstly, previous studies analyzed data from early stages of the diffusion of Internet services, when collecting intended information was still costly due to high cost of Internet connection and limited availability of reliable health information. This high cost possibly limited online health information seeking among individuals expecting high benefits from the intended information, leading previous studies to report positive impacts of online information on the demand for health care. Secondly, the media complementarity hypothesis [10] suggests that individuals seeking information on the Internet also seek comparable information on other, conventional media. The hypothesis has received strong empirical support for the case of health information [21,22,23,24,25].Footnote 4 However, no studies have explicitly considered the effects of offline health information seeking on the demand for health care when examining those of online information seeking. The subsequent sections develop models to address these limitations, analyzing data from a period after the extensive diffusion of the Internet, in particular of broadband services.

Data

Data analyzed in this study are sourced from the Flash Eurobarometer 404 “European Citizens’ Digital Health Literacy” coordinated by the European Commission [28].Footnote 5 The survey collected data from 26,566 individuals aged 15 years and older about their use of Internet resources to manage their health. It was conducted through phone interview in the 28 Member States of the European Union over September 18–20, 2014.Footnote 6 Data include detailed information about respondents’ frequencies of health information search and physician visits, their health conditions, and sociodemographic features.

Table 1 lists the variables used in the analysis and provides their descriptive statistics. The frequency of health information seeking in the last 12 months is originally measured in the questionnaire on an ordinal scale from 1 (never) to 6 (once a week or more often). Importantly, the survey distinguishes between Internet-based and offline health information resources.Footnote 7 However, while the frequency of online health information search (HIi) was asked to all sampled individuals, that of offline search (HIni) was only asked to individuals who never sought health information online. The empirical models developed in the subsequent section address potential sample selection bias resulting from this survey design. As an alternative measure, we also construct a multinomial variable to classify respondents into three groups according to their health information seeking: (i) non-seekers, individuals who do not seek health information either offline or online (\(HIS = 0\)); (ii) online health information seekers, who seek health information online and do not reveal their offline health information seeking (they might or might not seek health information offline) (\(HIS = 1\)); and (iii) offline health information seekers, who only seek health information offline and not online (\(HIS = 2\)). For the second key variable, the demand for health care is proxied by the frequency of physician visits in the last 12 months (PV), which is originally measured in the questionnaire on an ordinal scale from 1 (never) to 4 (6 times or more). This is a conventional measure in the literature.

Table 1 Description of variables

For factors influencing the demand for health care and health information, three variables relating to individuals’ health conditions are considered: respondents’ self-assessed health status (Health), coded in four ordered categories from 1 (very bad) to 4 (very good); a binary variable indicating whether the respondent suffers from a long-term health issue (LONGILL); and the frequency of physical activity (Sport), coded in six ordered categories from 1 (never) to 6 (5 times a week or more). Respondents’ self-assessed health measures might not accurately reflect respondents’ true health condition. However, they are considered to be relevant for individuals’ health information seeking and health investment decisions, and thus are commonly used in the literature [2, 3, 17]. For all three variables, better health is expected to relate inversely to the frequency of physician visits and health information seeking, as a priori, individuals in poor health are expected to receive higher benefits from health care services and health information than those in better health [6]. Nonetheless, the inverse health information law posits that the use of health information relates inversely to the needs of the population [7, 29, 30]. Hone et al. [31] and Vicente and Madden [32] provide empirical support to this law, reporting that individuals in poorer health condition seek health information online less frequently than those in better condition, due to their lower abilities to search online.

Respondents’ sociodemographic factors are included into both health information search and physician visitation equations, including age (AGE), gender (FEMALE), employment status (Employment) in three categories (employed, unemployed, inactive), household size (ONEADULT), type of locality (Area) in three categories (rural, small, large), and country of residence (Country). These variables are commonly considered in the literature as determinants of health information seeking and health care demand. In particular, individuals who are younger, married, and female tend to seek health information more frequently on the Internet (see for example, [6, 13, 33,34,35]).

As for the factors determining the demand for health care, previous evidence suggests a positive relationship between availability of health care service and health care demand. In particular, high physician density implies more competition which could strengthen incentives for physician-induced demand and lower health care costs [3, 18, 36]. Thus, data on the density of medical doctors per resident at NUTS2 level (DENSITY) are collected from Eurostat [37].

In the literature, the price of medical services is often approximated by patients’ health insurance coverage [17]. Unfortunately, our dataset provides no information about respondents’ insurance status. The exclusion of the price variable might raise a concern about potential omitted variable bias. However, such bias would be negligible for the case of the European Union where public health coverage is practically universal.Footnote 8 Furthermore, our regression equations include country-specific effects as well as other variables such as employment status, density of medical doctors, and size of residence area, which together will capture the remaining cross-country variation in health care coverage and differences in travel and time cost of visiting a physician.

As for the demand for health information, unequivocal evidence exists that education is among the most important determinants of individuals’ health knowledge and health information demand [2, 3, 17, 18, 39, 40]. Health information on the Internet in particular varies substantially in quality and reliability; thus, its benefits depend largely on individuals’ previous experience and ability to assess information quality [6, 13, 32]. Education improves such ability and lowers learning and searching costs. Thus, we include respondents’ educational attainment (Education), measured on a five-ordinal scale, in our models of online and offline health information seeking.Footnote 9

Both online and offline health information seeking can be also affected by Internet accessibility. Availability of a high-speed Internet connection reduces the cost of acquiring health information online relative to other sources and hence encourages online seeking while potentially discouraging offline seeking. For example, Costa-Font et al. [13] report a positive correlation between the frequency of online health information search and Internet penetration. Accordingly, our empirical models of online and offline health information seeking include regional (NUTS2) Internet penetration rate (INTERNET), measured as the percentage of households with Internet access.Footnote 10

Empirical model

The extended sample selection model

Empirical models to examine the effect of health information seeking on the demand for health care need to address two issues. The first is sample selection: the Eurobarometer survey collected data on offline health information search only for individuals who never sought health information online. This survey design creates a setting similar to Heckman’s [42] sample selection model; yet the dependent variables in the sample selection and activity equations (respondents’ online and offline health information search, respectively) are both measured on ordinal scales in our setting. An ordinal dependent variable in the activity equation makes the model non-linear, for which the standard two-step approach proposed by Heckman [42] yields biased and inconsistent estimates. Greene and Hensher [43] suggest estimating selection and activity equations jointly by the method of maximum likelihood when the two equations are specified as binary and ordered probit models, respectively. Their approach is extended directly to our case where the two equations form a bivariate ordered probit model.

The second issue, as commonly considered by previous empirical studies, is the endogeneity of health information variables in the health care demand equation [17, 18]. This issue arises for two reasons. First, online and offline health information seeking and physician visitation are considered as simultaneously determined in our models. This treatment is consistent with the way the data were recorded: the survey collected data on the frequencies of health information seeking and physician visits over the same twelve-month period, and not on the detailed sequence of how these activities were conducted over that period. Second, health information seeking and physician visitation are potentially affected by unobserved common factors such as individuals’ preferences for health, health care, time as well as risk [6, 17, 18]. For example, an individual appreciating good health is likely to exhibit high demand for both health information and services. Also, an individual with high time cost might consider searching health information to substitute the comparable services provided by physicians. Risk-averse individuals might consume more health services and information than risk lovers to reduce their health risk; yet, it is also possible that they perceive physician provided services more (or less) reliable than self-collected health information.

To address the endogeneity issue, we estimate each of the online and offline health information search equations simultaneously with the physician visitation equation. Thus, for the selection equation, a system of two ordered probit equations is specified for the frequencies of online health information search (HIi) and physician visitation (PV):Footnote 11

$$\begin{aligned} HIi_{i}^{*} =\, & {\mathbf{X}}_{i} \beta_{11} + {\mathbf{Z}}_{1,i} \beta_{12} + \varepsilon_{1,i} \\ PV_{i}^{*} =\, & {\mathbf{X}}_{i} \beta_{31} + {\mathbf{Z}}_{3,i} \beta_{32} + \beta_{33} HIi_{i}^{*} + \varepsilon_{3,i} \\ \end{aligned}$$
(1)

where the observed ordered categorical variable \(HIi_{i}\) for individual i takes a value \(j_{1}\) (\(j_{1} = 1,...,6\)) if the associated latent variable \(HIi_{i}^{*}\) takes a value within the thresholds (\(c_{{1,j_{1} - 1}} < HIi_{i}^{*} < c_{{1,j_{1} }}\)) with \(c_{1,0} = - \infty\) and \(c_{1,6} = + \infty\). Similarly, \(PV_{i} = j_{3}\,(j_{3} = 1,...,4)\) if \(c_{{3,j_{3} - 1}} < PV_{i}^{*} < c_{{3,j_{3} }}\) with \(c_{3,0} = - \infty\) and \(c_{3,4} = + \infty\). The vector \({\mathbf{X}}_{i}\) represents the set of covariates that affect the two latent variables, \(HIi_{i}^{*}\) and \(PV_{i}^{*}\), whereas \({\mathbf{Z}}_{1,i}\) and \({\mathbf{Z}}_{3,i}\) represent the set of covariates that uniquely affect online health information search and physician visitation, respectively. The inclusion of the vector \({\mathbf{Z}}_{1,i}\) assures the identification of the model.Footnote 12

For individuals not seeking health information online (HIi = 1), the outcome equations are formed by two ordered probit equations, corresponding to the frequencies of offline health information search (HIni) and physician visitation as follows:

$$\begin{aligned} \left. {HIni_{i}^{*} } \right|_{{HIi_{i} = 1}} \,=\, & {\mathbf{X}}_{i} \beta_{21} + {\mathbf{Z}}_{2,i} \beta_{22} + \varepsilon_{2,i} , \\ \left. {PV_{i}^{*} } \right|_{{HIi_{i} = 1}}\, =\, & {\mathbf{X}}_{i} \beta_{41} + {\mathbf{Z}}_{3,i} \beta_{42} + \beta_{43} HIni_{i}^{*} + \varepsilon_{4,i} , \\ \end{aligned}$$
(2)

where \(HIni_{i} = j_{2}\,(j_{2} = 1,...,6)\) if \(c_{{2,j_{2} - 1}} < HIni_{i}^{*} < c_{{2,j_{2} }}\) and \(PV_{i} = j_{4}\,(j_{4} = 1,...,4)\) if \(c_{{4,j_{4} - 1}} < PV_{i}^{*} < c_{{4,j_{4} }}\) with \(c_{2,0} = c_{4,0} = - \infty\) and \(c_{2,6} = c_{4,4} = + \infty\). The vector \({\mathbf{Z}}_{2,i}\) contains the set of covariates that are unique to the offline health information search equation.

In Eqs. (1) and (2), the matrix of common exogenous variables includes respondents’ sociodemographic features and health status X = {AGE, FEMALE, ONEADULT, Employment, Area, Country, Health, Sport, LONGILL}. It does not include individuals’ educational attainment (Education). This is a standard practice in the literature [2, 3, 17, 18] and presumes that education may impact the health care demand indirectly through affecting respondent’s propensity to health-oriented practices and health information seeking. These indirect effects are controlled for by including on the right-hand side of the physician visitation equation the covariates such as health conditions, sport and physical activity (as measures of respondent’s past and current health-oriented practices), employment status, and health information seeking.Footnote 13 The equation specific factors include \({\mathbf{Z}}_{1} =\){Education, INTERNET}, \({\mathbf{Z}}_{2} =\){Education}, and \({\mathbf{Z}}_{3} =\){DENSITY}.

The vector of error terms is assumed to be distributed as follows:Footnote 14

$$\left( {\begin{array}{*{20}c} {\varepsilon_{1} } \\ {\varepsilon_{2} } \\ {\varepsilon_{3} } \\ {\varepsilon_{4} } \\ \end{array} } \right)\sim N\left( {\left( {\begin{array}{*{20}c} 0 \\ 0 \\ 0 \\ 0 \\ \end{array} } \right),\left( {\begin{array}{*{20}c} 1 & {\rho_{12} } & {\rho_{13} } & {\rho_{14} } \\ {\rho_{12} } & 1 & 0 & {\rho_{24} } \\ {\rho_{13} } & 0 & 1 & 0 \\ {\rho_{14} } & {\rho_{24} } & 0 & 1 \\ \end{array} } \right)} \right) = N(0,{{\varvec{\Sigma}}})$$
(3)

Under this normality assumption, the coefficients in the two bivariate ordered probit models are estimated jointly by the method of maximum likelihood, using the likelihood function derived in the Appendix.

The specification in (1) and (2) allows the coefficients for X and \({\mathbf{Z}}_{3}\) in the physician visitation equations to differ between seekers and non-seekers of online health information. Alternatively, a more restrictive specification sets these coefficients to be identical between the two groups, \(\beta_{31} = \beta_{41}\) and \(\beta_{32} = \beta_{42}\), which simplifies the physician visitation equations in (1) and (2) as follows:

$$PV_{i}^{*}\, = \,{\mathbf{X}}_{i} \beta_{31} + {\mathbf{Z}}_{3,i} \beta_{32} + I(HIi_{i} > 1)\left( {\beta_{33} HIi_{i}^{*} + \varepsilon_{3,i} } \right) + I(HIi_{i} = 1)\left( {\beta_{43} HIni_{i}^{*} + \varepsilon_{4,i} } \right),$$
(4)

where I(\(\cdot\)) is the indicator variable that takes a value 1 if its argument is true and 0 otherwise.

Alternative models for robustness checks

We also consider two alternative models to assess the robustness of the estimates of the extended sample selection model. The first alternative specifies a bivariate ordered probit model for the frequencies of online health information search and physician visitation, without modelling offline health information search as follows:

$$\begin{aligned} HIi_{i}^{*} \,=\, & {\mathbf{X}}_{i} \alpha_{11} + {\mathbf{Z}}_{1,i} \alpha_{12} + v_{1,i} \\ PV_{i}^{*} \,=\, & {\mathbf{X}}_{i} \alpha_{21} + {\mathbf{Z}}_{3,i} \alpha_{22} + \alpha_{23} HIi_{i}^{*} + v_{2,i} \\ \end{aligned}$$
(5)

Unlike the extended sample selection models specified in (1) through (4), model (5) does not address the potential effects of concurrent seeking of offline health information on physician visitation. This is the approach commonly used by previous studies [8, 9]. The two equations in (5) are jointly estimated by the method of maximum likelihood.

For the second alternative, we construct a two-equation model, using a multinomial measure of respondents’ health information seeking activity (HIS):

$$\begin{aligned} HIS_{i} \,=\, & {\mathbf{X}}_{i} \gamma_{11} + {\mathbf{Z}}_{2,i} \gamma_{12} + u_{1,i} , \\ PV_{i}^{*} \,=\, & {\mathbf{X}}_{i} \gamma_{21} + {\mathbf{Z}}_{1,i} \gamma_{22} + \gamma_{23} I(HIS_{i} = 1) + \gamma_{24} I(HIS_{i} = 2) + u_{2,i} , \\ \end{aligned}$$
(6)

where the health information search and physician visitation equations are modelled as a multinomial probit and an ordered probit model, respectively.

Model (6) is estimated by the two-stage residual inclusion method which yields biased yet consistent estimates of the coefficients [44]. Under this method, the health information equation is estimated first, by the method of maximum likelihood; then, the physician visitation equation is estimated by including, on the right-hand side of the equation, the observed two indicator variables (\(I(HIS_{i} = j)\), j = 1 and 2) and the residuals from the estimated health information equation in the first stage.

Results

Extended sample selection model—health information search equations

Table 2 shows the estimates of the extended sample selection model specified in (1) through (4).Footnote 15 As to the use of the Internet to search health information (column 1 in Table 2), practically all the explanatory variables are statistically significant at the 1% level. Specifically, the estimated coefficients are significantly negative for age and living alone. These following results are as expected: older people are less likely to seek health information online, as they generally show low rates of Internet use due to technology anxiety and limited ability to deal with new technologies. Living alone means less need to seek health information for other household members. Results also suggest that being active in the labor market, having more education, and living in regions with high Internet penetration rates are all positively related to online health information seeking. Women also seek health information online more frequently than men. This is consistent with a common finding in the literature that women tend to take charge of their households’ health matters [7, 17, 46].

Table 2 Maximum likelihood estimates of extended sample selection model

Concerning health status, individuals with a long-term illness tend to seek health information online more frequently than those without a long-term illness. On the contrary, self-assessed health status is positively related to the frequency of online health information seeking, i.e., respondents reporting a better health status search health information online more frequently than those describing their health status as very bad. These contradictory results are not surprising given that previous research also reports mixed or weak evidence on the link between individuals’ health condition and information demand (e.g., [2, 3, 17, 18]). A plausible interpretation for this counterintuitive result can be sought in the heterogeneity of individuals’ health interests. Specifically, those in better health condition tend to look for information on general health and wellbeing, whereas those in worse condition tend to seek information specific to some particular illness or treatment [31]. Accordingly, the estimated positive coefficient of self-assessed health status might reflect a major use of the Internet by healthy individuals to gather general health information. This interpretation also aligns with positive coefficients estimated for the frequency of sports practice (Sport).

In column 2 of Table 2, the estimates of the offline health information search equation among non-online seekers indicate similar patterns to online search behaviour. Although age is expected to affect offline search less negatively than online search, its coefficient estimate keeps a negative sign. Previous studies also recognize a complex relationship between age and health information [17] and report inconclusive empirical evidence [2, 3, 18]. A non-monotonic relationship between health condition and information seeking is also observed in the offline setting, with positive coefficients estimated for both long-term illness and individuals’ self-assessed health status.

Overall, the similarity of the results obtained for the sociodemographic determinants of online and offline health information seeking implies that fostering Internet usage or increasing availability of online health information potentially exacerbates inequalities in health information access, rather than alleviating them. It also supports indirectly the media complementarity hypothesis, i.e., individuals seeking health information on the Internet also seek comparable information on conventional media. Finally, the relationship between respondents’ health status and information seeking is non-monotonic and complicated, due possibly to the heterogeneity in individuals’ health interests. This result suggests that empirical support for the inverse information law is sensitive to the measures of health status and types of health information.

Extended sample selection model—physician visitation equations

Columns 3 and 4 of Table 2 show the results for the physician visitation equations. The first thing to notice is that online and offline health information search relate to physician visits differently: the estimated coefficient is negative and not significant for the former, whereas it is positive and significant for the latter. As previously explained, the reference populations differ between the two physician visitation equations as follows: they are, for offline health information search, individuals who did not seek health information on the Internet and, for online search, those who sought health information online and possibly on non-Internet sources as well. Thus, the two coefficients can be interpreted, respectively, as the net association of offline information search with the demand for physician services (conditional on not seeking health information online), and the gross association of online search. The significant positive coefficient for offline health information search suggests that individuals seeking health information offline visit physicians more frequently than those not seeking health information offline. Several explanations are possible for this complementary relationship. First, seeking health information offline before or after visiting physicians could enhance the benefits from physician services by increasing patients’ appreciation of these services or by reducing their uncertainty. These increased (expected) benefits would in turn lead the better-informed patients to demand more physician services. Second, health information could increase patients’ anxiety about their health condition and hence make them resort more to support and advice from physicians. In relation to the long-standing hypotheses in the health economics literature, the positive relationship is in line with the patient-induced supply hypothesis, while disagreeing with the physician-induced demand hypothesis.

By contrast, the estimated coefficient for online health information search is not statistically significant. This coefficient represents the sum of the direct (or net) association between online health information search and the demand for physician services, and the possible indirect link through concurrent offline searches. This indirect link would be positive (negative) if online and offline health information seeking were complements (substitutes), coherently with the significantly positive coefficient of offline search. Our dataset does not allow us to test for the complementarity of online and offline searches explicitly, yet previous studies commonly support the complementarity of health information from various media [21,22,23,24,25]. If this was the case, the indirect link through concurrent offline search would be positive and the net association of online search would be strongly negative, to make the gross association non-significant yet negative. Were online and offline health information substitutes, the indirect link would be negative, and the net association of online search would be positive yet weaker than the net effect of offline search. The magnitude of the indirect link and the net association of online search fall between the above two extreme cases if online and offline health information are imperfect substitutes or complements. In either case, our result provides no support for either the patient-induced supply or physician-induced demand hypotheses for online health information search.

Scarce previous evidence has shown mixed evidence on the effect of health information on the demand for health care, with two recent studies reporting significantly positive estimates for the online case [8, 9]. While various factors can contribute to the difference between our results and previous evidence, the most plausible one will be a substantial reduction in the cost of information seeking on the Internet. Both Lee [8] and Suziedelyte [9] analyzed data from 2008 and earlier, the periods before extensive diffusion of the Internet. In these periods, online health information seeking was still costly due to high Internet access cost, low connection speed, and limited availability of reliable health information, and hence it was limited to individuals expecting high benefits from the acquired information. An extensive diffusion of the Internet, in particular broadband services, has reduced the cost of information seeking substantially and promoted online health information seeking among those with low expected benefits from the available information.Footnote 16 These benefits emanate at least partly from better appreciation of health care services including physician visits. Thus, extending online health information seeking to those with low expected benefits weakens a positive or complementary relationship between online health information and physician visits. Our analysis also indicates that individuals with good health status seek health information more frequently and visit physicians less frequently than those with worse health status. This suggests a possibility that the reduction in online search cost has promoted online seeking of general health information by healthy individuals more than that of specific information about illnesses and treatments by unhealthy individuals. These changes weaken the link between online health information seeking and health care utilization as measured by the frequency of physician visits.

For the remaining variables, the estimated coefficients mostly confirm previous findings in the literature [3, 17, 18]. Women are found to visit physicians more frequently than men, while individuals with a long-term illness or bad health status visit physicians more frequently than those in a better health condition. Residing in a large town implies more visits, not surprisingly because its higher population density means closer proximity and hence better access to medical services than in small towns and rural areas. High physician density means better availability of medical services and also more competition among them, which can lower service costs and strengthen incentives for physician-induced demand. As for employment status, both the employed and unemployed visit physicians less frequently than those inactive in the labor market, which possibly reflects the higher opportunity (time) costs of the former groups than the latter. The coefficient for living alone is negative, suggesting that they visit physicians less frequently, possibly because they do not need to accompany other household members, in particular children. However, the coefficient is significant only for non-seekers of online health information. The coefficient for age is negative and significant only for online health information seekers. This counterintuitive result is common in the literature [3, 17, 18, 47] and possibly attributable to a high correlation between age and health status measures: elders tend to be in worse health condition, and after controlling for the effects of health condition, age itself has a weak, negative effect on physician visits. It might also reflect the fact that older people tend to utilize other types of medical services, such as health services at retirement houses, and hence visit physicians less often than younger people [3, 47]. Finally, the frequency of sport practice is positively related to physician visits, but the estimated coefficients are not significant for most categories. Two competing effects might be in play: while physical exercise improves health condition (hence reducing physician visits), it also increases the probability of an injury (thus increasing visits).

Columns 5–8 of Table 2 summarize the estimation results of the restricted model, in which the coefficients of sociodemographic and health status variables (X) and variables specific to the physician visitation equation (Z3) are constrained to be identical in the two physician visitation equations. This restriction does not qualitatively alter the estimates. Most importantly, the estimated coefficient remains positive and significant for offline health information search and not significant for online search.

Robustness based on alternative model specifications

Tables 3 and 4 present the results of the two alternative models; the bivariate ordered probit model specified in Eq. (5), which evaluates the association of online health information seeking with physician visitation without addressing that of offline seeking; and the two-equation model specified in (6), which utilizes a multinomial measure of health information seeking. For both models, the estimation results are generally consistent with those obtained for the extended sample selection model. Most importantly, the coefficient is significantly positive for offline health information search, while it is not significant for online search in the estimated physician visitation equations. Accordingly, results are robust against changes in model specification, estimation method, and construction of the variables for health information seeking. The coefficient estimates for other control variables also confirm the results of the extended sample selection model, and are consistent with previous findings in the literature.

Table 3 Maximum likelihood estimates of the bivariate ordered probit model
Table 4 Two-stage residual inclusion estimates of the two-equation model with multinomial health information seeking (HIS)

Conclusions

The present analysis has examined the link between health information and the demand for physician services. Specifically, the analysis has distinguished individuals seeking health information exclusively on offline sources from those seeking online and possibly offline too. Using an extended sample selection model that addresses both the sample selection issue created by the survey design and the endogeneity of the health information variables, our analysis has elucidated the determinants of online and offline health information seeking and how they relate to the demand for physician services.

The empirical analysis for the 28 European Union Member countries has revealed that sociodemographic factors shape online and offline health information seeking in similar ways. Specifically, those who are female, younger, better educated, in the labor market, living in urban areas, and with a long-term illness, are more likely to seek health information both online and offline. On the contrary, individuals’ self-assessed health status is positively related to health information seeking. These results support the media complementarity hypothesis and suggest that enhancing the availability of online health information potentially exacerbates, rather than alleviates, the unequal distribution of health information traditionally observed in the offline environment.

Concerning the demand for health care, the extended sample selection model shows different results for online and offline health information seeking. Offline seeking has a significantly positive net association with the frequency of physician visits, that is, offline health information seekers visit physicians more frequently than non-seekers. Several explanations are possible for this complementary relationship of health information and physician services. First, seeking health information offline before or after visiting physicians could enhance the benefits of physician services, which in turn could lead into an increased demand for physician services. Second, additional information could increase individuals’ anxiety about their health condition and hence make them resort more to support and advice from physicians. In relation to the long-standing hypotheses in the literature, the positive link is in line with the patient induced-supply, while it disagrees with the physician induced-demand hypothesis.

For online health information, its gross association with the physician visitation is not significant. Our result contrasts with previous findings by Lee [8] and Suziedelyte [9] who report significantly positive impact of online health information on health care demand. A plausible explanation for the difference in the reported results is sought in an extensive diffusion of the Internet, which substantially reduced the cost of online information seeking. This cost reduction might have extended online health information seeking to individuals with low expected benefits from the acquired information, and hence attenuated the link between online health information seeking and the demand for physician services. Our analysis has also revealed that healthy individuals seek health information more often and visit physicians less often than unhealthy individuals. Reduction in the cost of online information seeking potentially promoted health information among healthy individuals, which weakens the average link between online health information and the demand for physician services.

Some directions for further research can be derived from our analysis. First, the supporting evidence for the media complementarity hypothesis raises a concern that the Internet could reinforce the traditional unequal access to health information. In particular, the Internet possibly helps individuals in a better health condition to acquire more information and consequently make better health care investment decisions than those in a worse condition. Types of intended health information would also vary by health condition, with individuals on a good health status seeking information on general health and those on a bad health status seeking information specific to some illnesses and treatments. Hence, strategies need to be designed to assist individuals secluded from health information to access appropriate health information online. To help design these strategies, future research should further unveil the complex association among individuals’ health status, their access to health information and types of desired health information, and how these factors affect their health care investment decisions.

Second, our analysis suggests that further reduction in the cost of health information acquisition (e.g., via improved availability of reliable health information) would extend online health information seeking into wider groups within populations. However, health information seeking is little linked to health care demand for existing online seekers and this link would be even weaker for the new online seekers. Thus, further promotion of online health information would be associated with little or no change in the cost of the health care system in the short term, unlike the implications of previous studies reporting significantly positive effects of online health information. Furthermore, online health information can improve general health knowledge of the broader population and their ability to self-manage health issues, which could improve their health condition and hence relieve the cost pressure on the health care system in the long term. Future research should explore these relationships and quantify their relative magnitudes.

Finally, while our analysis has found no significant association between online health information seeking and demand for physician services, it has considered only one type of health service. It is possible that health information affects differently other types of health care services, such as those specific to mental health, and other aspects such as duration and time efficiency of physician visits [48, 49]. Of particular importance is whether health information helps patients to choose appropriate types and levels of health care. For instance, increased physician visits by better informed patients can mean more efficient use of medical service if they reduce unnecessary use of emergency services. The literature on the effect of health information on the efficient use of health care service is premature and scarce empirical evidence reports mixed results.Footnote 17 Future research should extend the scope of analysis to potentially heterogeneous impacts of online (and offline) health information on broader types of health care services and examine if it improves patient’s choice of appropriate health care services.