Context-specific independence in graphical log-linear models

Nyman, Henrik; Pensar, Johan; Koski, Timo; Corander, Jukka

doi:10.1007/s00180-015-0606-6

Context-specific independence in graphical log-linear models

Original Paper
Published: 15 July 2015

Volume 31, pages 1493–1512, (2016)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Henrik Nyman¹,
Johan Pensar¹,
Timo Koski² &
…
Jukka Corander³

413 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Log-linear models are the popular workhorses of analyzing contingency tables. A log-linear parameterization of an interaction model can be more expressive than a direct parameterization based on probabilities, leading to a powerful way of defining restrictions derived from marginal, conditional and context-specific independence. However, parameter estimation is often simpler under a direct parameterization, provided that the model enjoys certain decomposability properties. Here we introduce a cyclical projection algorithm for obtaining maximum likelihood estimates of log-linear parameters under an arbitrary context-specific graphical log-linear model, which needs not satisfy criteria of decomposability. We illustrate that lifting the restriction of decomposability makes the models more expressive, such that additional context-specific independencies embedded in real data can be identified. It is also shown how a context-specific graphical model can correspond to a non-hierarchical log-linear parameterization with a concise interpretation. This observation can pave way to further development of non-hierarchical log-linear models, which have been largely neglected due to their believed lack of interpretability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

References

Corander J (2003) Labelled graphical models. Scand J Stat 30:493–508
Article MathSciNet MATH Google Scholar
Corander J, Gyllenberg M, Koski T (2006) Bayesian model learning based on a parallel MCMC strategy. Stat Comput 16:355–362
Article MathSciNet Google Scholar
Corander J, Ekdahl M, Koski T (2008) Parallel interacting MCMC for learning of topologies of graphical models. Data Min Knowl Disc 17:431–456
Article MathSciNet Google Scholar
Csiszár I (1975) $I$-divergence geometry of probability distributions and minimization problems. Ann Probab 3(1):146–158
Article MathSciNet MATH Google Scholar
Csiszár I, Matús̆ F (2003) Information projections revisited. IEEE Trans Inf Theory 49(6):1474–1490
Article MathSciNet MATH Google Scholar
Edwards D, Havránek T (1985) A fast procedure for model search in multidimensional contingency tables. Biometrika 72(2):339–351
Article MathSciNet MATH Google Scholar
Eriksen PS (1999) Context specific interaction models. Technical report, Department of Mathematical Sciences, Aalborg University, Aalborg
Friedman N, Goldszmidt M (1996) Learning Bayesian networks with local structure. In: Proceedings of the twelfth annual conference on uncertainty in artificial intelligence, pp 252–262
Golumbic MC (2004) Algorithmic graph theory and perfect graphs, 2nd edn. Elsevier, Amsterdam
MATH Google Scholar
Helsingin Sanomat (2011) HS:n vaalikone 2011. http://www2.hs.fi/extrat/hsnext/HS-vaalikone2011.xls. Visited 19 Aug 2014
Højsgaard S (2003) Split models for contingency tables. Comput Stat Data Anal 42:621–645
Article MathSciNet MATH Google Scholar
Højsgaard S (2004) Statistical inference in context specific interaction models for contingency tables. Scand J Stat 31:143–158
Article MathSciNet MATH Google Scholar
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, London
MATH Google Scholar
Lauritzen SL (1996) Graphical models. Oxford University Press, Oxford
MATH Google Scholar
Nyman H, Pensar J, Koski T, Corander J (2014) Stratified graphical models—context-specific independence in graphical models. Bayesian Anal 9(4):883–908
Article MathSciNet MATH Google Scholar
Nyman H, Xiong J, Pensar J, Corander J (2015) Marginal and simultaneous predictive classification using stratified graphical models. Adv Data Anal Classif. doi:10.1007/s11634-015-0199-5
Pensar J, Nyman H, Koski T, Corander J (2015) Labeled directed acyclic graphs: a generalization of context-specific independence in directed graphical models. Data Min Knowl Disc 29(2):503–533
Article MathSciNet Google Scholar
Rudas T (1998) A new algorithm for the maximum likelihood estimation of graphical log-linear models. Comput Stat 13:529–537
MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, Chichester
MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the editor and the anonymous reviewers for their constructive comments and suggestions on the original version of this paper. H. N. and J. P. were supported by the Foundation of Åbo Akademi University, as part of the Grant for the Center of Excellence in Optimization and Systems Engineering. J. P. was also supported by the Magnus Ehrnrooth foundation. J. C. was supported by the ERC Grant No. 239784 and academy of Finland Grant No. 251170. T. K. was supported by a grant from the Swedish research council VR/NT.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Åbo Akademi University, Turku, Finland
Henrik Nyman & Johan Pensar
Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden
Timo Koski
Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
Jukka Corander

Authors

Henrik Nyman
View author publications
You can also search for this author in PubMed Google Scholar
Johan Pensar
View author publications
You can also search for this author in PubMed Google Scholar
Timo Koski
View author publications
You can also search for this author in PubMed Google Scholar
Jukka Corander
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henrik Nyman.

Appendices

Appendix 1 Definition 5

Decomposable SG Let (G, L) constitute an SG with G being chordal. Further, let $E_{L}$ denote the set of all stratified edges, $E_{C}$ the set of all edges in the maximal clique C, and $E_{\mathcal {S}}$ the set of all edges in the separators of G. The SG is defined as decomposable if

$$\begin{aligned} E_{L}\cap E_{\mathcal {S}} = \varnothing , \end{aligned}$$

and

$$\begin{aligned} E_{L} \cap E_{C} = \varnothing \quad \text { or } \bigcap _{\{\delta ,\gamma \}\in E_{L}\cap E_{C}} \{\delta ,\gamma \} \ne \varnothing \quad \text { for all } C \in \mathcal {C}(G). \end{aligned}$$

An SGM where (G, L) constitutes a decomposable SG is termed a decomposable SGM.

Appendix 2

Derivation of the parameters in Eq. (6).

We will here give a more detailed explanation of how $\hat{\theta }_{0,0} = \hat{P}(X_{L_{\{\delta ,\gamma \}}} = x_{L_{\{\delta ,\gamma \}}}, X_{\varOmega } = x_{\varOmega }, X_{\delta } = 0, X_{\gamma } = 0)$ is derived. It is generally possible to use the factorization

$$\begin{aligned} P(X_{L_{\{\delta ,\gamma \}}}= & {} x_{L_{\{\delta ,\gamma \}}}, X_{\varOmega } = x_{\varOmega }, X_{\delta } = 0, X_{\gamma } = 0) \\= & {} P(X_{L_{\{\delta ,\gamma \}}} = x_{L_{\{\delta ,\gamma \}}}, X_{\varOmega } = x_{\varOmega }) P(X_{\delta } = 0, X_{\gamma } = 0 \mid X_{L_{\{\delta , \gamma \}}}\\= & {} x_{L_{\{\delta ,\gamma \}}}, X_{\varOmega } = x_{\varOmega }). \end{aligned}$$

When considering a probability distribution where $\delta $ and $\gamma $ can be dependent, it is generally not true that $P(X_{\delta }, X_{\gamma }) = P(X_{\delta }) P(X_{\gamma })$. A standard result, see e.g. Whittaker (1990), states that for a distribution where two variables are dependent the ML projection to the set of distributions where the variables are independent is obtained by calculating the product of the marginal probabilities of the two variables. This implies, in our case, creating a new distribution $\hat{P}$ according to

$$\begin{aligned}&\hat{P}(X_{L_{\{\delta ,\gamma \}}} = x_{L_{\{\delta ,\gamma \}}}, X_{\varOmega } = x_{\varOmega }, X_{\delta } = 0, X_{\gamma } = 0) \\&\quad = P(X_{L_{\{\delta ,\gamma \}}} = x_{L_{\{\delta ,\gamma \}}}, X_{\varOmega } = x_{\varOmega }) P(X_{\delta } = 0 \mid X_{L_{\{\delta ,\gamma \}}} = x_{L_{\{\delta , \gamma \}}}, X_{\varOmega } = x_{\varOmega }) \\&P(X_{\gamma } = 0 \mid X_{L_{\{\delta ,\gamma \}}} = x_{L_{\{\delta , \gamma \}}}, X_{\varOmega } = x_{\varOmega }). \end{aligned}$$

Using the earlier introduced notations this corresponds to setting

$$\begin{aligned} \hat{\theta }_{0,0}= & {} \hat{P}(X_{L_{\{\delta ,\gamma \}}} = x_{L_{\{\delta ,\gamma \}}}, X_{\varOmega } = x_{\varOmega } , X_{\delta } = 0, X_{\gamma } = 0)\\= & {} P(X_{L_{\{\delta ,\gamma \}}} = x_{L_{\{\delta ,\gamma \}}}, X_{\varOmega } = x_{\varOmega }) P(X_{\delta } = 0 \mid X_{L_{\{\delta ,\gamma \}}} = x_{L_{\{\delta , \gamma \}}}, X_{\varOmega } = x_{\varOmega }) \\&P(X_{\gamma } = 0 \mid X_{L_{\{\delta ,\gamma \}}} = x_{L_{\{\delta ,\gamma \}}}, X_{\varOmega } = x_{\varOmega }) \\= & {} (\theta _{0,0}+\theta _{0,1}+\theta _{1,0}+\theta _{1,1}) \cdot (\theta _{0,0}+\theta _{0,1}) / (\theta _{0,0}+\theta _{0,1}+\theta _{1,0}+\theta _{1,1}) \\&\cdot \,(\theta _{0,0}+\theta _{1,0}) / (\theta _{0,0}+\theta _{0,1}+\theta _{1,0}+\theta _{1,1}) \\= & {} (\theta _{0,0}+\theta _{0,1}) \cdot (\theta _{0,0}+\theta _{1,0}) / (\theta _{0,0}+\theta _{0,1}+\theta _{1,0}+\theta _{1,1}). \end{aligned}$$

The other parameters $\hat{\theta }_{0,1}$, $\hat{\theta }_{1,0}$, and $\hat{\theta }_{1,1}$ can be derived in a similar fashion.

Appendix 3

Proposal functions used for model optimization.

The search for the optimal stratified graph is conducted using two separate Markov chains. One Markov chain is used to traverse different underlying graphs. A second chain is used to identify the optimal set of strata given the underlying graph. Combining these two searches will ultimately result in the discovery of the optimal SG.

Using the proposal function defined in Algorithm 1, running a sufficient amount of iterations, we can be assured to find the optimal set of strata for any chordal graph.

Algorithm 1

Proposal function for finding optimal strata for a chordal graph.

Let G denote the underlying graph. By $L_A$ we denote all possible instances that can be added to any stratum of G. If $L_A$ is empty no strata may be added to G and the algorithm is terminated. L denotes the current state with L being empty in the starting state.

1.
Set the candidate state $L^* = L$.
2.
Perform one of the following steps.
1. 2.1.
  If L is empty add a randomly chosen instance from $L_A$ to $L^*$.
2. 2.2.
  Else if $\{L_A {\setminus } L\}$ is empty remove a randomly chosen instance from $L^*$.
3. 2.3.
  Else with probability 0.5 add a randomly chosen instance from $\{L_A {\setminus } L\}$ to $L^*$.
4. 2.4.
  Else remove a randomly chosen instance from $L^*$.

Using this proposal function the optimal set of strata can be found for any underlying graph and we can proceed to the search for the best underlying graph. The proposal function in Algorithm 2 is used for this task.

Algorithm 2

Proposal function used to find the optimal underlying chordal graph.

The starting state is set to be the graph containing no edges. Let G denote the current graph with $G_L = (G, L)$ being the stratified graph with underlying graph G and optimal set of strata L.

1.
Set the candidate state $G^* = G$.
2.
Randomly choose a pair of nodes $\delta $ and $\gamma $. If the edge $\{\delta , \gamma \}$ is present in $G^*$ remove it, otherwise add the edge $\{\delta , \gamma \}$ to $G^*$.
3.
While $G^*$ is non-chordal repeat steps 1 and 2.

The resulting candidate state $G^*$ is used along with the corresponding optimal set of strata $L^*$ to form the stratified graph $G^*_L = (G^*, L^*)$ which is used when calculating the acceptance probability.

Appendix 4

Questions considered in parliament election data.

1.
Since the mid-1990’s the income differences have grown rapidly in Finland. How should we react to this?

0—The income differences do not need to be narrowed.

1—The income differences need to be narrowed.
2.
Should homosexual couples have the same rights to adopt children as heterosexual couples?

0—Yes.

1—No.
3.
Child benefits are paid for each child under the age of 18 living in Finland, independent of the parents’ income. What should be done about child benefits?

0—The income of the parents should not affect the child benefits.

1—Child benefits should be dependent on parents’ income.
4.
In Finland military service is mandatory for all men. What is your opinion on this?

0—The current practice should be kept or expanded to also include women.

1—The military service should be more selective or abandoned altogether.
5.
Should Finland in its affairs with China and Russia more actively debate issues regarding human rights and the state of democracy in these countries?

0—Yes.

1—No.
6.
Russia has prohibited foreigners from owning land close to the borders. In recent years, Russians have bought thousands of properties in Finland. How should Finland react to this?

0—Finland should not restrict foreigners from buying property in Finland.

1—Finland should restrict foreigners’ rights to buy property and land in Finland.
7.
During recent years municipalities have outsourced many services to privately owned companies. What is your opinion on this?

0—Outsourcing should be used to an even higher extent.

1—Outsourcing should be limited to the current extent or decreased.
8.
Currently, a system is in place where tax income from more wealthy municipalities is transferred to less wealthy municipalities. In practice this means that municipalities in the Helsinki region transfer money to the other parts of the country. What is your opinion of this system?

0—The current system is good, or even more money should be transferred.

1—The Helsinki region should be allowed to keep more of its tax income.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nyman, H., Pensar, J., Koski, T. et al. Context-specific independence in graphical log-linear models. Comput Stat 31, 1493–1512 (2016). https://doi.org/10.1007/s00180-015-0606-6

Download citation

Received: 10 September 2014
Accepted: 02 July 2015
Published: 15 July 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s00180-015-0606-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-specific independence in graphical log-linear models

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Definition 5

Appendix 2

Appendix 3

Algorithm 1

Algorithm 2

Appendix 4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Context-specific independence in graphical log-linear models

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Definition 5

Appendix 2

Appendix 3

Algorithm 1

Algorithm 2

Appendix 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation