Treatment Effect Estimation Under Unknown Interference

Lin, Xiaofeng; Zhang, Guoxi; Lu, Xiaotian; Kashima, Hisashi

doi:10.1007/978-981-97-2253-2_3

Xiaofeng Lin¹³,
Guoxi Zhang¹³,
Xiaotian Lu¹³ &
…
Hisashi Kashima¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14646))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

238 Accesses

Abstract

Causal inference is a powerful tool for effective decision-making in various areas, such as medicine and commerce. For example, it allows businesses to determine whether an advertisement has a role in influencing a customer to buy the advertised product. The influence of an advertisement on a particular customer is considered the advertisement’s individual treatment effect (ITE). This study estimates ITE from data in which units are potentially connected. In this case, the outcome for a unit can be influenced by treatments to other units, resulting in inaccurate ITE estimation, a phenomenon known as interference. Existing methods for ITE estimation that address interference rely on knowledge of connections between units. However, these methods are not applicable when this connection information is missing due to privacy concerns, a scenario known as unknown interference. To overcome this limitation, this study proposes a method that designs a graph structure learner, which infers the structure of interference by imposing an $L_0$-norm regularization on the number of potential connections. The inferred structure is then fed into a graph convolution network to model interference received by units. We carry out extensive experiments on several datasets to verify the effectiveness of the proposed method in addressing unknown interference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that we assume the edges of the interference graph to be directed.
2.
As the information of unit i itself is always important for computing the level of interference, we set $\hat{z}_{ii}$ to 1.

References

Aronow, P.M., Samii, C.: Estimating average causal effects under general interference, with application to a social network experiment. Ann. Appl. Stat. 11, 1912–1947 (2017)
Article MathSciNet Google Scholar
Bhattacharya, R., Malinsky, D., Shpitser, I.: Causal inference under interference and network uncertainty. In: Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, vol. 2019 (2019)
Google Scholar
Chen, Y., Wu, L., Zaki, M.: Iterative deep graph learning for graph neural networks: Better and robust node embeddings. In: Advances in Neural Information Processing Systems, vol. 33, pp. 19314–19326 (2020)
Google Scholar
Forastiere, L., Airoldi, E.M., Mealli, F.: Identification and estimation of treatment and interference effects in observational studies on networks. J. Am. Stat. Assoc. 116(534), 901–918 (2021)
Article MathSciNet Google Scholar
Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In: Proceedings of the 16th International Conference on Algorithmic Learning Theory, pp. 63–77 (2005)
Google Scholar
Guo, R., Li, J., Li, Y., Candan, K.S., Raglin, A., Liu, H.: Ignite: a minimax game toward learning individual treatment effects from networked observational data. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 4534–4540 (2021)
Google Scholar
Guo, R., Li, J., Liu, H.: Learning individual causal effects from networked observational data. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 232–240 (2020)
Google Scholar
He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web, pp. 507–517 (2016)
Google Scholar
Hudgens, M.G., Halloran, M.E.: Toward causal inference with interference. J. Am. Stat. Assoc. 103(482), 832–842 (2008)
Article MathSciNet Google Scholar
Johansson, F., Shalit, U., Sontag, D.: Learning representations for counterfactual inference. In: Proceedings of the 33rd International Conference on Machine Learning, vol. 48, pp. 3020–3029 (2016)
Google Scholar
LaLonde, R.J.: Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. 604–620 (1986)
Google Scholar
Li, Q., Wang, Z., Liu, S., Li, G., Xu, G.: Deep treatment-adaptive network for causal inference. Int. J. Very Large Data Bases, 1–16 (2022)
Google Scholar
Lin, X., Zhang, G., Lu, X., Bao, H., Takeuchi, K., Kashima, H.: Estimating treatment effects under heterogeneous interference. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds.) ECML PKDD 2023. LNCS, vol. 14169, pp. 576–592. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43412-9_34
Chapter Google Scholar
Liu, L., Hudgens, M.G.: Large sample randomization inference of causal effects in the presence of interference. J. Am. Stat. Assoc. 109(505), 288–301 (2014)
Article MathSciNet Google Scholar
Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through ${L}_0$ regularization. In: Proceedings of the 6th International Conference on Learning Representations (2018)
Google Scholar
Ma, J., Wan, M., Yang, L., Li, J., Hecht, B., Teevan, J.: Learning causal effects on hypergraphs. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1202–1212 (2022)
Google Scholar
Ma, Y., Tresp, V.: Causal inference under networked interference and intervention policy enhancement. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, vol. 130, pp. 3700–3708 (2021)
Google Scholar
Nabi, R., Pfeiffer, J., Charles, D., Kıcıman, E.: Causal inference in the presence of interference in sponsored search advertising. Front. Big Data 5 (2022)
Google Scholar
Rakesh, V., Guo, R., Moraffah, R., Agarwal, N., Liu, H.: Linked causal variational autoencoder for inferring paired spillover effects. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1679–1682 (2018)
Google Scholar
Raudenbush, S.W., Schwartz, D.: Randomized experiments in education, with implications for multilevel causal inference. Annu. Rev. Stat. Appl. 7(1) (2020)
Google Scholar
Rubin, D.B.: Randomization analysis of experimental data: the fisher randomization test comment. J. Am. Stat. Assoc. 75(371), 591–593 (1980)
Google Scholar
Schnitzer, M.E.: Estimands and estimation of COVID-19 vaccine effectiveness under the test-negative design: connections to causal inference. Epidemiology 33(3), 325 (2022)
Article Google Scholar
Shalit, U., Johansson, F.D., Sontag, D.: Estimating individual treatment effect: generalization bounds and algorithms. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3076–3085 (2017)
Google Scholar
Smith, J.A., Todd, P.E.: Does matching overcome LaLonde’s critique of nonexperimental estimators? J. Econometrics 125(1–2), 305–353 (2005)
Article MathSciNet Google Scholar
Sävje, F., Aronow, P.M., Hudgens, M.G.: Average treatment effects in the presence of unknown interference. Ann. Stat. 49(2), 673–701 (2021)
Article MathSciNet Google Scholar
Tchetgen, E.J.T., VanderWeele, T.J.: On causal inference in the presence of interference. Stat. Methods Med. Res. 21(1), 55–75 (2012)
Article MathSciNet Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: Proceedings of the 6th International Conference on Learning Representations (2018)
Google Scholar
Welling, M., Kipf, T.N.: Semi-supervised classification with graph convolutional networks. In: Proceedings of the 4th International Conference on Learning Representations (2016)
Google Scholar
Ye, Y., Ji, S.: Sparse graph attention networks. IEEE Trans. Knowl. Data Eng. (2021)
Google Scholar

Download references

Acknowledgements

This study was supported by JSPS KAKENHI Grant Number 20H04244.

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Kyoto, Japan
Xiaofeng Lin, Guoxi Zhang, Xiaotian Lu & Hisashi Kashima

Authors

Xiaofeng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Guoxi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaotian Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hisashi Kashima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofeng Lin .

Editor information

Editors and Affiliations

Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Appendices

A Identifiability of the Expectation of Potential Outcomes

Here, we show that the expectation of potential outcomes $Y_i^{t}(\boldsymbol{s}_i)$ can be identified from observed data. To this end, we need some assumptions.

Inspired by Chen and Ji [3] who learn graph structures using the features of units, we have the following assumption.

Assumption 1 (A1). The unknown structure $\boldsymbol{A}^\mathrm{{itf}}$ can be discovered from the $\boldsymbol{X}$ and $\boldsymbol{T}$ using a graph structure learner $\text {GSL}(\cdot )$, i.e., $\boldsymbol{A}^\mathrm{{itf}}=\text {GSL}(\boldsymbol{X},\boldsymbol{T})$.

Similar to the existing studies on addressing interference [16, 17], we assume the interference can be aggregated, as the following assumption.

Assumption 2 (A2). There exists an aggregation function $\psi (\cdot )$, which can aggregate information of other units on the graph $\boldsymbol{A}^\mathrm{{itf}}$ while outputting the $\boldsymbol{s}$, i.e., $\boldsymbol{s}_i=\psi (\boldsymbol{T}_{-i},\boldsymbol{X}_{-i},\boldsymbol{A}^\mathrm{{itf}})$.

We extend the neighbor interference assumption [4] to the networked interference [17].

Assumption 3 (A3). For $\forall i$, $\forall \boldsymbol{T}_{-i},\boldsymbol{T}'_{-i},\forall \boldsymbol{X}_{-i},\boldsymbol{X}'_{-i}$, and $\forall \boldsymbol{A}^\mathrm{{itf}},\boldsymbol{A}^\mathrm{{itf'}}$: when $\boldsymbol{s}_i=\boldsymbol{s}'_i$, i.e., ${\psi }(\boldsymbol{T}_{-i},\boldsymbol{X}_{-i},\boldsymbol{A}^\mathrm{{itf}}) = {\psi }(\boldsymbol{T}'_{-i},\boldsymbol{X}'_{-i},\boldsymbol{A}^\mathrm{{itf'}})$, $Y^t_i(S_i=\boldsymbol{s}_i) = Y^t_i(S_i=\boldsymbol{s}'_i)$ holds.

We use the consistency assumption which is similar to the consistency assumption in the existing study on interference [4].

Assumption 4 (A4). $Y_i^{\text {obs}}=Y_i^{t_i}(S_i=\boldsymbol{s}_i)$ on the graph $\boldsymbol{A}$ for a unit i under $t_i$ and $\boldsymbol{s}_i$.

We take a similar unconfoundedness assumption of the existing study on addressing interference [4].

Assumption 5 (A5). There is no hidden confounder. For any unit i, given the covariates, the treatment assignment and output of the aggregation function are independent of potential outcomes, i.e., $T_i,S_i \perp \!\!\! \perp Y_i^1(\boldsymbol{s}_i),Y_i^0(\boldsymbol{s}_i) \vert X_i$.

We now prove the identification of the expectation of potential outcomes $Y_i^{t}(\boldsymbol{s}_i)$ ($t=1$ or $t=0$) as follows:

$$\begin{aligned} \begin{aligned} &\mathbb {E}[Y_i^{\text {obs}}\vert X_i=\boldsymbol{x}_i,T_i=t,X_{-i} = \boldsymbol{X}_{-i},T_{-i} = \boldsymbol{T}_{-i},X=\boldsymbol{X},T=\boldsymbol{T}]\\ = &\mathbb {E}[Y_i^{\text {obs}}\vert X_i=\boldsymbol{x}_i,T_i=t,X_{-i} = \boldsymbol{X}_{-i},T_{-i} = \boldsymbol{T}_{-i},A^\mathrm{{itf}}=\boldsymbol{A}^\mathrm{{itf}}] (A1)\\ = &\mathbb {E}[Y_i^{\text {obs}}\vert X_i=\boldsymbol{x}_i,T_i=t,S_i=\boldsymbol{s}_i] \quad \quad \quad \quad \quad \, \quad \quad \quad \quad \quad \quad \,\,(A2)\\ = &\mathbb {E}[Y^t_i(\boldsymbol{s}_i)\vert X_i=\boldsymbol{x}_i,T_i=t,S_i=\boldsymbol{s}_i] \quad \quad \quad \; \quad \quad \quad \quad \quad \quad \quad \; \; (A3\text {, }A4)\\ = &\mathbb {E}[Y^t_i(\boldsymbol{s}_i)\vert X_i=\boldsymbol{x}_i]. \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;(A5) \end{aligned} \end{aligned}$$

Inspired by the above proof, if unknown interference is properly modeled, the potential outcomes can be modeled.

B HSIC

The calculation of HSIC is as follows:

$$\begin{aligned} \textrm{HSIC}(\boldsymbol{B},\boldsymbol{C})= \frac{1}{(n-1)^2}\textrm{tr}(\boldsymbol{K}\boldsymbol{H}\boldsymbol{L}\boldsymbol{H}), \boldsymbol{H} = \boldsymbol{I_n} - \frac{1}{n}\boldsymbol{1_n}\boldsymbol{1_n}^\mathrm{{T}}, \end{aligned}$$

(2)

where $\boldsymbol{B} \in \mathbb {R}^{n \times d_1}$ and $\boldsymbol{C} \in \mathbb {R}^{n \times d_2}$ denote two different matrices or vectors, $\boldsymbol{I_n}$ is the identity matrix, $\boldsymbol{1_n}$ is a vector of all ones, and ${\cdot }^\mathrm{{T}}$ is the transposition operation. $\boldsymbol{K}$ and $\boldsymbol{L}$ are Gaussian kernels applied to $\boldsymbol{B}$ and $\boldsymbol{C}$, respectively:

$$\begin{aligned} K_{ij}=\exp \left( -\frac{\Vert \boldsymbol{b}_i-\boldsymbol{b}_j\Vert ^2_2}{2}\right) ,\quad L_{ij}=\exp \left( -\frac{\Vert \boldsymbol{c}_i-\boldsymbol{c}_j\Vert ^2_2}{2}\right) , \end{aligned}$$

(3)

where vectors $\boldsymbol{b}_i$ (or $\boldsymbol{b}_j$) and $\boldsymbol{c}_i$ (or $\boldsymbol{c}_j$) represent the elements of the i-th (or j-th) row of $\boldsymbol{B}$ and $\boldsymbol{C}$, respectively.

C Implementation Details

Following Ma et al. [17], the entire $\boldsymbol{X}$ and $\boldsymbol{T}$ are given during the training, validation, and testing phases. However, only the observed outcomes of the units in the training set are provided during the training phase.

The three hidden layers of the representation layers of UNITE have (128, 64, 64)-dimensions for the Jobs, Amazon$^-$, and Amazon$^+$ datasets. The hidden layer of the GSL of the UNITE has a 128 dimension for the Jobs dataset and 64 for the Amazon$^-$ and Amazon$^+$ datasets. The hidden layers of the GCN layers of the UNITE have (64, 32)-dimensions for the Jobs dataset and (64, 64, 32)-dimensions for the Amazon$^-$ and Amazon$^+$ datasets. The hidden layers of the prediction networks of the UNITE have (128, 64, 64)-dimensions for the Jobs dataset and (128, 64, 32)-dimensions for the Amazon$^-$ and Amazon$^+$ datasets.

In addition, we train our models with the GPU of the NVIDIA RTX A5000. The UNITE utilizes the Adam optimizer with 500 training iterations for the Jobs dataset and 2,000 training iterations for the Amazon$^-$, and Amazon$^+$ datasets. The learning rate is set to $\lambda _{\text {lr}}= 0.0005$ for the Jobs dataset and $\lambda _{\text {lr}}= 0.001$ for the Amazon$^-$, and Amazon$^+$ datasets, and the weight decay $\gamma $ is set to $\gamma =0.01$ for the Jobs dataset and $\gamma =0.001$ for the Amazon$^-$, and Amazon$^+$ datasets.

We use the grid search method to search for hyperparameters by checking the results on the validation set. The train batch size of the UNITE is the full batch of training units for the Jobs and is searched from $\{128,256,512,1024\}$ for the Amazon$^-$, and Amazon$^+$ datasets. The $\beta _1$ and $\beta _2$ of the UNITE are searched from $\{0.01,0.05,0.1,0.15,0.2\}$ for the Jobs and $\{0.1,0.5,1.0,1.5,2.0\}$ for the Amazon$^-$ and Amazon$^+$ datasets. The $\lambda $ of the UNITE is 0.0005 for the Jobs datasets, 0.02 for the Amazon$^-$ datasets, and 0.01 for the Amazon$^+$ datasets. Dropout is applied for the UNITE (the dropout rate is searched from $\{0.0,0.1,0.5\}$). Moreover, after 400 iterations, early stopping is applied for the Amazon$^-$, and Amazon$^+$ datasets to avoid overfitting. The UNITE-KG uses the same hyperparameters as the UNITE, but it sets the $\lambda =0$.

Here, all the baseline methods use the default hyperparameter or search for hyperparameters from the ranges suggested in the literature. To avoid overfitting, all the baseline methods apply early stopping on the Amazon$^-$, and Amazon$^+$ datasets.

For the Amazon$^-$, and Amazon$^+$ datasets, as the y values span a wide range, z-score normalization is applied during training and testing.

D Ablation Experiments

To investigate the importance of different regularization terms, we conduct ablation experiments on the Jobs and Amazon$^-$ datasets. Table 2 shows the results of the ablation experiments. We can observe that removing the HSIC$_{\phi _{1}}$, or HSIC$_{\phi _{2}}$, or $L_0$ results in performance degradation for the ITE estimation. This verifies that these regularization terms are important for the ITE estimation of UNITE. In addition, the results on the Jobs dataset show the performance degradation in the interference-free ITE and ATE estimation when UNITE removes the HSIC$_{\phi _{2}}$. This implies that the HSIC$_{\phi _{2}}$ is important to estimate the interference-free ITE and ATE estimation.

Table 2. Results of the ablation experiments on the Jobs and Amazon$^{-}$ datasets.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, X., Zhang, G., Lu, X., Kashima, H. (2024). Treatment Effect Estimation Under Unknown Interference. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_3

Download citation

DOI: https://doi.org/10.1007/978-981-97-2253-2_3
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2252-5
Online ISBN: 978-981-97-2253-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics