Skip to main content

Treatment Effect Estimation Under Unknown Interference

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14646))

Included in the following conference series:

  • 238 Accesses

Abstract

Causal inference is a powerful tool for effective decision-making in various areas, such as medicine and commerce. For example, it allows businesses to determine whether an advertisement has a role in influencing a customer to buy the advertised product. The influence of an advertisement on a particular customer is considered the advertisement’s individual treatment effect (ITE). This study estimates ITE from data in which units are potentially connected. In this case, the outcome for a unit can be influenced by treatments to other units, resulting in inaccurate ITE estimation, a phenomenon known as interference. Existing methods for ITE estimation that address interference rely on knowledge of connections between units. However, these methods are not applicable when this connection information is missing due to privacy concerns, a scenario known as unknown interference. To overcome this limitation, this study proposes a method that designs a graph structure learner, which infers the structure of interference by imposing an \(L_0\)-norm regularization on the number of potential connections. The inferred structure is then fed into a graph convolution network to model interference received by units. We carry out extensive experiments on several datasets to verify the effectiveness of the proposed method in addressing unknown interference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that we assume the edges of the interference graph to be directed.

  2. 2.

    As the information of unit i itself is always important for computing the level of interference, we set \(\hat{z}_{ii}\) to 1.

References

  1. Aronow, P.M., Samii, C.: Estimating average causal effects under general interference, with application to a social network experiment. Ann. Appl. Stat. 11, 1912–1947 (2017)

    Article  MathSciNet  Google Scholar 

  2. Bhattacharya, R., Malinsky, D., Shpitser, I.: Causal inference under interference and network uncertainty. In: Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, vol. 2019 (2019)

    Google Scholar 

  3. Chen, Y., Wu, L., Zaki, M.: Iterative deep graph learning for graph neural networks: Better and robust node embeddings. In: Advances in Neural Information Processing Systems, vol. 33, pp. 19314–19326 (2020)

    Google Scholar 

  4. Forastiere, L., Airoldi, E.M., Mealli, F.: Identification and estimation of treatment and interference effects in observational studies on networks. J. Am. Stat. Assoc. 116(534), 901–918 (2021)

    Article  MathSciNet  Google Scholar 

  5. Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In: Proceedings of the 16th International Conference on Algorithmic Learning Theory, pp. 63–77 (2005)

    Google Scholar 

  6. Guo, R., Li, J., Li, Y., Candan, K.S., Raglin, A., Liu, H.: Ignite: a minimax game toward learning individual treatment effects from networked observational data. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 4534–4540 (2021)

    Google Scholar 

  7. Guo, R., Li, J., Liu, H.: Learning individual causal effects from networked observational data. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 232–240 (2020)

    Google Scholar 

  8. He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web, pp. 507–517 (2016)

    Google Scholar 

  9. Hudgens, M.G., Halloran, M.E.: Toward causal inference with interference. J. Am. Stat. Assoc. 103(482), 832–842 (2008)

    Article  MathSciNet  Google Scholar 

  10. Johansson, F., Shalit, U., Sontag, D.: Learning representations for counterfactual inference. In: Proceedings of the 33rd International Conference on Machine Learning, vol. 48, pp. 3020–3029 (2016)

    Google Scholar 

  11. LaLonde, R.J.: Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. 604–620 (1986)

    Google Scholar 

  12. Li, Q., Wang, Z., Liu, S., Li, G., Xu, G.: Deep treatment-adaptive network for causal inference. Int. J. Very Large Data Bases, 1–16 (2022)

    Google Scholar 

  13. Lin, X., Zhang, G., Lu, X., Bao, H., Takeuchi, K., Kashima, H.: Estimating treatment effects under heterogeneous interference. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds.) ECML PKDD 2023. LNCS, vol. 14169, pp. 576–592. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43412-9_34

    Chapter  Google Scholar 

  14. Liu, L., Hudgens, M.G.: Large sample randomization inference of causal effects in the presence of interference. J. Am. Stat. Assoc. 109(505), 288–301 (2014)

    Article  MathSciNet  Google Scholar 

  15. Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \({L}_0\) regularization. In: Proceedings of the 6th International Conference on Learning Representations (2018)

    Google Scholar 

  16. Ma, J., Wan, M., Yang, L., Li, J., Hecht, B., Teevan, J.: Learning causal effects on hypergraphs. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1202–1212 (2022)

    Google Scholar 

  17. Ma, Y., Tresp, V.: Causal inference under networked interference and intervention policy enhancement. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, vol. 130, pp. 3700–3708 (2021)

    Google Scholar 

  18. Nabi, R., Pfeiffer, J., Charles, D., Kıcıman, E.: Causal inference in the presence of interference in sponsored search advertising. Front. Big Data 5 (2022)

    Google Scholar 

  19. Rakesh, V., Guo, R., Moraffah, R., Agarwal, N., Liu, H.: Linked causal variational autoencoder for inferring paired spillover effects. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1679–1682 (2018)

    Google Scholar 

  20. Raudenbush, S.W., Schwartz, D.: Randomized experiments in education, with implications for multilevel causal inference. Annu. Rev. Stat. Appl. 7(1) (2020)

    Google Scholar 

  21. Rubin, D.B.: Randomization analysis of experimental data: the fisher randomization test comment. J. Am. Stat. Assoc. 75(371), 591–593 (1980)

    Google Scholar 

  22. Schnitzer, M.E.: Estimands and estimation of COVID-19 vaccine effectiveness under the test-negative design: connections to causal inference. Epidemiology 33(3), 325 (2022)

    Article  Google Scholar 

  23. Shalit, U., Johansson, F.D., Sontag, D.: Estimating individual treatment effect: generalization bounds and algorithms. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3076–3085 (2017)

    Google Scholar 

  24. Smith, J.A., Todd, P.E.: Does matching overcome LaLonde’s critique of nonexperimental estimators? J. Econometrics 125(1–2), 305–353 (2005)

    Article  MathSciNet  Google Scholar 

  25. Sävje, F., Aronow, P.M., Hudgens, M.G.: Average treatment effects in the presence of unknown interference. Ann. Stat. 49(2), 673–701 (2021)

    Article  MathSciNet  Google Scholar 

  26. Tchetgen, E.J.T., VanderWeele, T.J.: On causal inference in the presence of interference. Stat. Methods Med. Res. 21(1), 55–75 (2012)

    Article  MathSciNet  Google Scholar 

  27. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: Proceedings of the 6th International Conference on Learning Representations (2018)

    Google Scholar 

  28. Welling, M., Kipf, T.N.: Semi-supervised classification with graph convolutional networks. In: Proceedings of the 4th International Conference on Learning Representations (2016)

    Google Scholar 

  29. Ye, Y., Ji, S.: Sparse graph attention networks. IEEE Trans. Knowl. Data Eng. (2021)

    Google Scholar 

Download references

Acknowledgements

This study was supported by JSPS KAKENHI Grant Number 20H04244.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng Lin .

Editor information

Editors and Affiliations

Appendices

A Identifiability of the Expectation of Potential Outcomes

Here, we show that the expectation of potential outcomes \(Y_i^{t}(\boldsymbol{s}_i)\) can be identified from observed data. To this end, we need some assumptions.

Inspired by Chen and Ji [3] who learn graph structures using the features of units, we have the following assumption.

Assumption 1 (A1). The unknown structure \(\boldsymbol{A}^\mathrm{{itf}}\) can be discovered from the \(\boldsymbol{X}\) and \(\boldsymbol{T}\) using a graph structure learner \(\text {GSL}(\cdot )\), i.e., \(\boldsymbol{A}^\mathrm{{itf}}=\text {GSL}(\boldsymbol{X},\boldsymbol{T})\).

Similar to the existing studies on addressing interference [16, 17], we assume the interference can be aggregated, as the following assumption.

Assumption 2 (A2). There exists an aggregation function \(\psi (\cdot )\), which can aggregate information of other units on the graph \(\boldsymbol{A}^\mathrm{{itf}}\) while outputting the \(\boldsymbol{s}\), i.e., \(\boldsymbol{s}_i=\psi (\boldsymbol{T}_{-i},\boldsymbol{X}_{-i},\boldsymbol{A}^\mathrm{{itf}})\).

We extend the neighbor interference assumption [4] to the networked interference [17].

Assumption 3 (A3). For \(\forall i\), \(\forall \boldsymbol{T}_{-i},\boldsymbol{T}'_{-i},\forall \boldsymbol{X}_{-i},\boldsymbol{X}'_{-i}\), and \(\forall \boldsymbol{A}^\mathrm{{itf}},\boldsymbol{A}^\mathrm{{itf'}}\): when \(\boldsymbol{s}_i=\boldsymbol{s}'_i\), i.e., \({\psi }(\boldsymbol{T}_{-i},\boldsymbol{X}_{-i},\boldsymbol{A}^\mathrm{{itf}}) = {\psi }(\boldsymbol{T}'_{-i},\boldsymbol{X}'_{-i},\boldsymbol{A}^\mathrm{{itf'}})\), \(Y^t_i(S_i=\boldsymbol{s}_i) = Y^t_i(S_i=\boldsymbol{s}'_i)\) holds.

We use the consistency assumption which is similar to the consistency assumption in the existing study on interference [4].

Assumption 4 (A4). \(Y_i^{\text {obs}}=Y_i^{t_i}(S_i=\boldsymbol{s}_i)\) on the graph \(\boldsymbol{A}\) for a unit i under \(t_i\) and \(\boldsymbol{s}_i\).

We take a similar unconfoundedness assumption of the existing study on addressing interference [4].

Assumption 5 (A5). There is no hidden confounder. For any unit i, given the covariates, the treatment assignment and output of the aggregation function are independent of potential outcomes, i.e., \(T_i,S_i \perp \!\!\! \perp Y_i^1(\boldsymbol{s}_i),Y_i^0(\boldsymbol{s}_i) \vert X_i\).

We now prove the identification of the expectation of potential outcomes \(Y_i^{t}(\boldsymbol{s}_i)\) (\(t=1\) or \(t=0\)) as follows:

$$\begin{aligned} \begin{aligned} &\mathbb {E}[Y_i^{\text {obs}}\vert X_i=\boldsymbol{x}_i,T_i=t,X_{-i} = \boldsymbol{X}_{-i},T_{-i} = \boldsymbol{T}_{-i},X=\boldsymbol{X},T=\boldsymbol{T}]\\ = &\mathbb {E}[Y_i^{\text {obs}}\vert X_i=\boldsymbol{x}_i,T_i=t,X_{-i} = \boldsymbol{X}_{-i},T_{-i} = \boldsymbol{T}_{-i},A^\mathrm{{itf}}=\boldsymbol{A}^\mathrm{{itf}}] (A1)\\ = &\mathbb {E}[Y_i^{\text {obs}}\vert X_i=\boldsymbol{x}_i,T_i=t,S_i=\boldsymbol{s}_i] \quad \quad \quad \quad \quad \, \quad \quad \quad \quad \quad \quad \,\,(A2)\\ = &\mathbb {E}[Y^t_i(\boldsymbol{s}_i)\vert X_i=\boldsymbol{x}_i,T_i=t,S_i=\boldsymbol{s}_i] \quad \quad \quad \; \quad \quad \quad \quad \quad \quad \quad \; \; (A3\text {, }A4)\\ = &\mathbb {E}[Y^t_i(\boldsymbol{s}_i)\vert X_i=\boldsymbol{x}_i]. \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;(A5) \end{aligned} \end{aligned}$$

Inspired by the above proof, if unknown interference is properly modeled, the potential outcomes can be modeled.

B HSIC

The calculation of HSIC is as follows:

$$\begin{aligned} \textrm{HSIC}(\boldsymbol{B},\boldsymbol{C})= \frac{1}{(n-1)^2}\textrm{tr}(\boldsymbol{K}\boldsymbol{H}\boldsymbol{L}\boldsymbol{H}), \boldsymbol{H} = \boldsymbol{I_n} - \frac{1}{n}\boldsymbol{1_n}\boldsymbol{1_n}^\mathrm{{T}}, \end{aligned}$$
(2)

where \(\boldsymbol{B} \in \mathbb {R}^{n \times d_1}\) and \(\boldsymbol{C} \in \mathbb {R}^{n \times d_2}\) denote two different matrices or vectors, \(\boldsymbol{I_n}\) is the identity matrix, \(\boldsymbol{1_n}\) is a vector of all ones, and \({\cdot }^\mathrm{{T}}\) is the transposition operation. \(\boldsymbol{K}\) and \(\boldsymbol{L}\) are Gaussian kernels applied to \(\boldsymbol{B}\) and \(\boldsymbol{C}\), respectively:

$$\begin{aligned} K_{ij}=\exp \left( -\frac{\Vert \boldsymbol{b}_i-\boldsymbol{b}_j\Vert ^2_2}{2}\right) ,\quad L_{ij}=\exp \left( -\frac{\Vert \boldsymbol{c}_i-\boldsymbol{c}_j\Vert ^2_2}{2}\right) , \end{aligned}$$
(3)

where vectors \(\boldsymbol{b}_i\) (or \(\boldsymbol{b}_j\)) and \(\boldsymbol{c}_i\) (or \(\boldsymbol{c}_j\)) represent the elements of the i-th (or j-th) row of \(\boldsymbol{B}\) and \(\boldsymbol{C}\), respectively.

C Implementation Details

Following Ma et al. [17], the entire \(\boldsymbol{X}\) and \(\boldsymbol{T}\) are given during the training, validation, and testing phases. However, only the observed outcomes of the units in the training set are provided during the training phase.

The three hidden layers of the representation layers of UNITE have (128, 64, 64)-dimensions for the Jobs, Amazon\(^-\), and Amazon\(^+\) datasets. The hidden layer of the GSL of the UNITE has a 128 dimension for the Jobs dataset and 64 for the Amazon\(^-\) and Amazon\(^+\) datasets. The hidden layers of the GCN layers of the UNITE have (64, 32)-dimensions for the Jobs dataset and (64, 64, 32)-dimensions for the Amazon\(^-\) and Amazon\(^+\) datasets. The hidden layers of the prediction networks of the UNITE have (128, 64, 64)-dimensions for the Jobs dataset and (128, 64, 32)-dimensions for the Amazon\(^-\) and Amazon\(^+\) datasets.

In addition, we train our models with the GPU of the NVIDIA RTX A5000. The UNITE utilizes the Adam optimizer with 500 training iterations for the Jobs dataset and 2,000 training iterations for the Amazon\(^-\), and Amazon\(^+\) datasets. The learning rate is set to \(\lambda _{\text {lr}}= 0.0005\) for the Jobs dataset and \(\lambda _{\text {lr}}= 0.001\) for the Amazon\(^-\), and Amazon\(^+\) datasets, and the weight decay \(\gamma \) is set to \(\gamma =0.01\) for the Jobs dataset and \(\gamma =0.001\) for the Amazon\(^-\), and Amazon\(^+\) datasets.

We use the grid search method to search for hyperparameters by checking the results on the validation set. The train batch size of the UNITE is the full batch of training units for the Jobs and is searched from \(\{128,256,512,1024\}\) for the Amazon\(^-\), and Amazon\(^+\) datasets. The \(\beta _1\) and \(\beta _2\) of the UNITE are searched from \(\{0.01,0.05,0.1,0.15,0.2\}\) for the Jobs and \(\{0.1,0.5,1.0,1.5,2.0\}\) for the Amazon\(^-\) and Amazon\(^+\) datasets. The \(\lambda \) of the UNITE is 0.0005 for the Jobs datasets, 0.02 for the Amazon\(^-\) datasets, and 0.01 for the Amazon\(^+\) datasets. Dropout is applied for the UNITE (the dropout rate is searched from \(\{0.0,0.1,0.5\}\)). Moreover, after 400 iterations, early stopping is applied for the Amazon\(^-\), and Amazon\(^+\) datasets to avoid overfitting. The UNITE-KG uses the same hyperparameters as the UNITE, but it sets the \(\lambda =0\).

Here, all the baseline methods use the default hyperparameter or search for hyperparameters from the ranges suggested in the literature. To avoid overfitting, all the baseline methods apply early stopping on the Amazon\(^-\), and Amazon\(^+\) datasets.

For the Amazon\(^-\), and Amazon\(^+\) datasets, as the y values span a wide range, z-score normalization is applied during training and testing.

D Ablation Experiments

To investigate the importance of different regularization terms, we conduct ablation experiments on the Jobs and Amazon\(^-\) datasets. Table 2 shows the results of the ablation experiments. We can observe that removing the HSIC\(_{\phi _{1}}\), or HSIC\(_{\phi _{2}}\), or \(L_0\) results in performance degradation for the ITE estimation. This verifies that these regularization terms are important for the ITE estimation of UNITE. In addition, the results on the Jobs dataset show the performance degradation in the interference-free ITE and ATE estimation when UNITE removes the HSIC\(_{\phi _{2}}\). This implies that the HSIC\(_{\phi _{2}}\) is important to estimate the interference-free ITE and ATE estimation.

Table 2. Results of the ablation experiments on the Jobs and Amazon\(^{-}\) datasets.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, X., Zhang, G., Lu, X., Kashima, H. (2024). Treatment Effect Estimation Under Unknown Interference. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2253-2_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2252-5

  • Online ISBN: 978-981-97-2253-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics