Skip to main content
Log in

Marginal and simultaneous predictive classification using stratified graphical models

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

An inductive probabilistic classification rule must generally obey the principles of Bayesian predictive inference, such that all observed and unobserved stochastic quantities are jointly modeled and the parameter uncertainty is fully acknowledged through the posterior predictive distribution. Several such rules have been recently considered and their asymptotic behavior has been characterized under the assumption that the observed features or variables used for building a classifier are conditionally independent given a simultaneous labeling of both the training samples and those from an unknown origin. Here we extend the theoretical results to predictive classifiers acknowledging feature dependencies either through graphical models or sparser alternatives defined as stratified graphical models. We show through experimentation with both synthetic and real data that the predictive classifiers encoding dependencies have the potential to substantially improve classification accuracy compared with both standard discriminative classifiers and the predictive classifiers based on solely conditionally independent features. In most of our experiments stratified graphical models show an advantage over ordinary graphical models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bishop CM (2007) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  • Cerquides J, De Mántaras RL (2005) TAN classifiers based on decomposable distributions. Mach Learn 59(3):323–354

    Article  MATH  Google Scholar 

  • Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9(4):309–347

    MATH  Google Scholar 

  • Corander J, Marttinen P (2006) Bayesian identification of admixture events using multi-locus molecular markers. Mol Ecol 15(10):2833–2843

    Article  MathSciNet  Google Scholar 

  • Corander J, Marttinen P, Sirén J, Tang J (2008) Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinform 9:539

    Article  Google Scholar 

  • Corander J, Cui Y, Koski T (2013a) Inductive inference and partition exchangeability in classification. In: Dowe DL (ed) Solomonoff Festschrift, Springer Lecture Notes in Artificial Intelligence (LNAI), vol 7070, pp 91–105

  • Corander J, Cui Y, Koski T, Sirén J (2013b) Have I seen you before? Principles of Bayesian predictive classification revisited. Stat Comput 23(1):59–73

    Article  MathSciNet  MATH  Google Scholar 

  • Corander J, Xiong J, Cui Y, Koski T (2013c) Optimal Viterbi Bayesian predictive classification for data from finite alphabets. J Stat Plan Infer 143(2):261–275

    Article  MathSciNet  MATH  Google Scholar 

  • Dawid A, Lauritzen S (1993) Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann Stat 21:1272–1317

    Article  MathSciNet  MATH  Google Scholar 

  • Dawyndt P, Thompson FL, Austin B, Swings J, Koski T, Gyllenberg M (2005) Application of sliding-window discretization and minimization of stochastic complexity for the analysis of fAFLP genotyping fingerprint patterns of Vibrionaceae. Int J Syst Evol Microbiol 55(1):57–66

    Article  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

    Article  MATH  Google Scholar 

  • Geisser S (1964) Posterior odds for multivariate normal classifications. J R Stat Soc B 26:69–76

    MathSciNet  MATH  Google Scholar 

  • Geisser S (1966) Predictive discrimination. In: Krishnajah PR (ed) Multivariate analysis. Academic Press, New York

    Google Scholar 

  • Geisser S (1993) Predictive inference: an introduction. Chapman & Hall, London

    Book  MATH  Google Scholar 

  • Golumbic MC (2004) Algorithmic graph theory and perfect graphs, 2nd edn. Elsevier, Amsterdam

    MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Helsingin Sanomat (2011) HS:n vaalikone 2011. http://www2.hs.fi/extrat/hsnext/HS-vaalikone2011.xls, visited 15 Oct 2013

  • Holmes DE, Jain LC (2008) Innovations in Bayesian networks: theory and applications, vol 156. Springer, Berlin

    Book  MATH  Google Scholar 

  • Huo Q, Lee CH (2000) A Bayesian predictive classification approach to robust speech recognition. IEEE Trans Speech Audio Process 8(2):200–204

    Article  Google Scholar 

  • Keogh EJ, Pazzani MJ (1999) Learning augmented Bayesian classifiers: a comparison of distribution-based and classification-based approaches. In: Proceedings of the seventh international workshop on artificial intelligence and statistics, pp 225–230

  • Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. The MIT Press, London

    MATH  Google Scholar 

  • Lauritzen SL (1996) Graphical models. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Madden MG (2009) On the classification performance of TAN and general Bayesian networks. Knowl Based Syst 22(7):489–495

    Article  Google Scholar 

  • Maina CW, Walsh JM (2011) Joint speech enhancement and speaker identification using approximate Bayesian inference. IEEE Trans Audio Speech Lang Process 19(6):1517–1529

    Article  Google Scholar 

  • Nádas A (1985) Optimal solution of a training problem in speech recognition. IEEE Trans Acoustics Speech Signal Process 33(1):326–329

    Article  Google Scholar 

  • Nyman H, Pensar J, Koski T, Corander J (2014) Stratified graphical models—context-specific independence in graphical models. Bayesian Anal 9(4):883–908

  • Pernkopf F, Bilmes J (2005) Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In: Proceedings of the 22nd international conference on machine learning, pp 657–664

  • Ripley BD (1988) Statistical inference for spatial processes. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Su J, Zhang H (2006) Full Bayesian network classifiers. In: Proceedings of the 23rd international conference on machine learning, pp 897–904

  • Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, Chichester

    MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the editor and the anonymous reviewers for their constructive comments and suggestions on the original version of this paper. H.N. and J.P. were supported by the Foundation of Åbo Akademi University, as part of the grant for the Center of Excellence in Optimization and Systems Engineering. J.P. was also supported by the Magnus Ehrnrooth foundation. J.X. and J.C. were supported by the ERC Grant No. 239784 and Academy of Finland Grant No. 251170. J.X. was also supported by the FDPSS graduate school.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henrik Nyman.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 47 KB)

Supplementary material 2 (xls 291 KB)

Appendices

Appendix A: Proof of Theorem 1

To prove Theorem 1 it suffices to consider a single class \(k\) and a single maximal clique in \(G_L^k\). If the scores for the marginal and simultaneous classifiers are asymptotically equivalent for an arbitrary maximal clique and class it automatically follows that the scores for the whole system are asymptotically equivalent. We start by considering the simultaneous classifier. The training data \({\mathbf {X}}^R\) and test data \({\mathbf {X}}^T\) are now assumed to cover only one maximal clique of an SG in one class. Looking at \(\log S_{\text {sim}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)\) using (2) we get

$$\begin{aligned} \log S_{\text {sim}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)&= \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \log \frac{\varGamma (\sum _{i = 1}^{k_{j}} \beta _{jil})}{\varGamma (n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil})} \\&+ \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \log \frac{\varGamma (n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil})}{\varGamma (\beta _{jil})}. \end{aligned}$$

Using Stirling’s approximation, \(\log \varGamma (x)=(x - 0.5) \log (x) - x\), this equals

$$\begin{aligned}&\sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \left( \left( \sum _{i = 1}^{k_{j}} \beta _{jil} - 0.5 \right) \log \left( \sum _{i = 1}^{k_{j}} \beta _{jil} \right) - \sum _{i = 1}^{k_{j}} \beta _{jil} \right) \\&\qquad - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \left( \left( n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil} - 0.5 \right) \log \left( n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil} \right) - n(\pi _{j}^{l}) - \sum _{i = 1}^{k_{j}} \beta _{jil} \right) \\&\qquad + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \left( \left( n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil} - 0.5 \right) \log \left( n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil} \right) - n(x_{j}^{i} \mid \pi _{j}^{l}) - \beta _{jil} \right) \\&\qquad - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \left( (\beta _{jil} - 0.5) \log (\beta _{jil}) - \beta _{jil} \right) \\&\quad = - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \left( \left( \sum _{i = 1}^{k_{j}} \beta _{jil} - 0.5 \right) \log \left( 1 + \frac{n(\pi _{j}^{l})}{\sum _{i = 1}^{k_{j}} \beta _{jil}} \right) + n(\pi _{j}^{l}) \log \left( n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil} \right) \right) \\&\qquad + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \left( (\beta _{jil} - 0.5) \log \left( 1 + \frac{n(x_{j}^{i} \mid \pi _{j}^{l})}{\beta _{jil}} \right) \!+\! n(x_{j}^{i} \mid \pi _{j}^{l}) \log \left( n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil} \right) \right) . \end{aligned}$$

When looking at the marginal classifier we need to summarize over each single observation \({\mathbf {X}}_h^T\). We use \(h(\pi _{j}^{l})\) to denote if the outcome of the parents of variable \(X_j\) belongs to group \(l\) and \(h(x_{j}^{i} \mid \pi _{j}^{l})\) to denote if the outcome of \(X_j\) is \(i\) given that the observed outcome of the parents belongs to \(l\). Observing that \(h(\pi _{j}^{l})\) and \(h(x_{j}^{i} \mid \pi _{j}^{l})\) are either 0 or 1 we get the result

$$\begin{aligned}&\log S_{\text {mar}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) = \sum _{h=1}^n \log P({\mathbf {X}}_h^T \mid {\mathbf {X}}^R) \\&\quad = \sum _{h=1}^n \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \log \frac{\varGamma (\sum _{i = 1}^{k_{j}} \beta _{jil})}{\varGamma (h(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil})} + \sum _{h=1}^n \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \log \frac{\varGamma (h(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil})}{\varGamma (\beta _{jil})} \\&\quad = - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} n(\pi _{j}^{l}) \log \left( \sum _{i = 1}^{k_{j}} \beta _{jil} \right) + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} n(x_{j}^{i} \mid \pi _{j}^{l}) \log (\beta _{jil}). \end{aligned}$$

Considering the difference \(\log S_{\text {sim}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) - \log S_{\text {mar}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)\) results in

$$\begin{aligned}&\log S_{\text {sim}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) - \log S_{\text {mar}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) \\&= - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \left( \sum _{i = 1}^{k_{j}} \beta _{jil} - 0.5 \right) \log \left( 1 + \frac{n(\pi _{j}^{l})}{\sum _{i = 1}^{k_{j}} \beta _{jil}} \right) \\&\quad \,\, + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} (\beta _{jil} - 0.5) \log \left( 1 + \frac{n(x_{j}^{i} \mid \pi _{j}^{l})}{\beta _{jil}}\right) \\&\quad \,\, + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \left( n(\pi _{j}^{l}) \log \left( \sum _{i = 1}^{k_{j}} \beta _{jil} \right) - n(\pi _{j}^{l}) \log \left( n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil} \right) \right) \\&\quad \,\, + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \left( n(x_{j}^{i} \mid \pi _{j}^{l}) \log \left( n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil} \right) - n(x_{j}^{i} \mid \pi _{j}^{l}) \log (\beta _{jil}) \right) . \end{aligned}$$

We now make the assumption that all the limits of relative frequencies of feature values are strictly positive under an infinitely exchangeable sampling process of the training data, i.e. all hyperparameters \(\beta _{jil} \rightarrow \infty \) when the size of the training data \(m \rightarrow \infty \). Using the standard limit \(\lim _{y \rightarrow \infty } (1+x/y)^y = e^x\) results in

$$\begin{aligned}&\lim _{m \rightarrow \infty } \log S_{\text {sim}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) - \log S_{\text {mar}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) \\&\quad = - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} n(\pi _{j}^{l}) + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} n(x_{j}^{i} \mid \pi _{j}^{l}) = 0. \end{aligned}$$

Appendix B: Proof of Theorem 2

This proof follows largely the same structure as the proof of Theorem 1 and covers the simultaneous score. It is assumed that the underlying graph of the SGM coincides with the GM, this is a fair assumption since when the size of the training data goes to infinity this property will hold for the SGM and GM maximizing the marginal likelihood. Again we consider only a single class \(k\) and a single maximal clique in \(G_L^k\), using the same reasoning as in the proof above. Additionally, it will suffice to consider the score for the last variable \(X_d\) in the ordering, the variable corresponding to the node associated with all of the stratified edges, and a specific parent configuration \(l\) of the parents \(\Pi _d\) of \(X_d\). The equation for calculating the score for variables \(X_1, \ldots , X_{d-1}\) will be identical using either the GM or the SGM. If the asymptotic equivalence holds for an arbitrary parent configuration it automatically holds for all parent configurations. Under this setting we start by looking at the score for the SGM

$$\begin{aligned} \log S_{\text {SGM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)= \log \frac{\varGamma (\sum _{i = 1}^{k_{j}} \beta _{jil})}{\varGamma (n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil})} + \sum _{i = 1}^{k_{j}} \log \frac{\varGamma (n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil})}{\varGamma (\beta _{jil})}, \end{aligned}$$

which using Stirling’s approximation and the same techniques as in the previous proof equals

$$\begin{aligned}&- \left( \sum _{i=1}^{k_j}\beta _{jil}-0.5\right) \log \left( 1 + \frac{n(\pi _j^l)}{\sum _{i=1}^{k_j}\beta _{jil}}\right) - n(\pi _j^l) \log \left( n(\pi _j^l) + \sum _{i=1}^{k_j}\beta _{jil}\right) \\&\quad + \sum _{i=1}^{k_j} n(x_j^i \mid \pi _j^l) \log (n(x_j^i \mid \pi _j^l) + \beta _{jil}) + \sum _{i=1}^{k_j} (\beta _{jil} - 0.5) \log \left( 1+\frac{n(x_j^i \mid \pi _j^l)}{\beta _{jil}}\right) . \end{aligned}$$

When studying the GM score we need to separately consider each outcome in the parent configuration \(l\). Let \(h\) denote such an outcome in \(l\) with the total number of outcomes in \(l\) totaling \(q_l\). We then get the following score for the GM,

$$\begin{aligned} \log S_{\text {GM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)&= \sum _{h=1}^{q_l} \log \frac{\varGamma (\sum _{i = 1}^{k_{j}} \beta _{jih})}{\varGamma (n(\pi _{j}^{h}) + \sum _{i = 1}^{k_{j}} \beta _{jih})} \\&+ \sum _{h=1}^{q_l} \sum _{i = 1}^{k_{j}} \log \frac{\varGamma (n(x_{j}^{i} \mid \pi _{j}^{h}) + \beta _{jih})}{\varGamma (\beta _{jih})}. \end{aligned}$$

Which, using identical calculations as before, equals

$$\begin{aligned} -&\sum _{h=1}^{q_l} \left( \!\sum _{i=1}^{k_j}\beta _{jih}-0.5\!\right) \log \left( \!1 + \frac{n(\pi _j^h)}{\sum _{i=1}^{k_j}\beta _{jih}}\!\right) \!-\!\sum _{h=1}^{q_l} n(\pi _j^h) \log \left( n(\pi _j^h) + \sum _{i=1}^{k_j}\beta _{jih}\right) \\&\quad + \sum _{h=1}^{q_l}\sum _{i=1}^{k_j} n(x_j^i \mid \pi _j^h) \log (n(x_j^i \mid \pi _j^h) + \beta _{jih}) \\&\quad + \sum _{h=1}^{q_l}\sum _{i=1}^{k_j} (\beta _{jih} - 0.5) \log \left( 1 + \frac{n(x_j^i \mid \pi _j^h)}{\beta _{jih}}\right) . \end{aligned}$$

Considering the difference \(\log S_{\text {SGM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) - \log S_{\text {GM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)\) we get

$$\begin{aligned}&- \left( \sum _{i=1}^{k_j}\beta _{jil}-0.5\right) \log \left( 1 {+} \frac{n(\pi _j^l)}{\sum _{i=1}^{k_j}\beta _{jil}}\right) {+}\sum _{i{=}1}^{k_j} (\beta _{jil}{-}0.5) \log \left( 1+\frac{n(x_j^i \mid \pi _j^l)}{\beta _{jil}}\right) \\&\quad - n(\pi _j^l) \log \left( n(\pi _j^l) + \sum _{i=1}^{k_j}\beta _{jil}\right) +\sum _{h=1}^{q_l} n(\pi _j^h) \log \left( n(\pi _j^h) + \sum _{i=1}^{k_j}\beta _{jih}\right) \\&\quad + \sum _{i{=}1}^{k_j} n(x_j^i \mid \pi _j^l) \log (n(x_j^i \mid \pi _j^l) \!{+}\! \beta _{jil}) \!-\!\sum _{h{=}1}^{q_l}\sum _{i=1}^{k_j} n(x_j^i \mid \pi _j^h) \log (n(x_j^i \mid \pi _j^h) \!+\! \beta _{jih}) \\&\quad + \sum _{h=1}^{q_l} \left( \sum _{i=1}^{k_j}\beta _{jih}-0.5\right) \log \left( 1 + \frac{n(\pi _j^h)}{\sum _{i=1}^{k_j}\beta _{jih}}\right) \\&\quad -\sum _{h=1}^{q_l}\sum _{i=1}^{k_j} (\beta _{jih} - 0.5) \log \left( 1 + \frac{n(x_j^i \mid \pi _j^h)}{\beta _{jih}}\right) . \end{aligned}$$

Under the assumption that \(\beta _{jil} \rightarrow \infty \) as \(m \rightarrow \infty \), the terms in rows one and four will sum to 0 as \(m \rightarrow \infty \). The remaining terms can be written

$$\begin{aligned} \log \frac{\prod _{h=1}^{q_l} (n(\pi _j^h) + \sum _{i=1}^{k_j}\beta _{jih})^{n(\pi _j^h)}}{(n(\pi _j^l) + \sum _{i=1}^{k_j}\beta _{jil})^{n(\pi _j^l)}} - \sum _{i=1}^{k_j} \log \frac{\prod _{h=1}^{q_l} (n(x_j^i \mid \pi _j^h) + \beta _{jih})^{n(x_j^i \mid \pi _j^h)}}{(n(x_j^i \mid \pi _j^l) + \beta _{jil})^{n(x_j^i \mid \pi _j^l)}}. \end{aligned}$$

Noting that \(n(\pi _j^l) = \sum _{h=1}^{q_l} n(\pi _j^h)\) and \(n(x_j^i \mid \pi _j^l) = \sum _{h=1}^{q_l} n(x_j^i \mid \pi _j^h)\) we get

$$\begin{aligned} \sum _{h=1}^{q_l} n(\pi _j^h) \log \frac{ n(\pi _j^h) + \sum _{i=1}^{k_j}\beta _{jih}}{n(\pi _j^l) + \sum _{i=1}^{k_j}\beta _{jil}} - \sum _{i=1}^{k_j} \sum _{h=1}^{q_l} n(x_j^i \mid \pi _j^h) \log \frac{n(x_j^i \mid \pi _j^h) + \beta _{jih}}{n(x_j^i \mid \pi _j^l) + \beta _{jil}}. \end{aligned}$$

By investigating the definition of the \(\beta \) parameters in (3), in combination with the fact that the probabilities of observing the value \(i\) for variable \(X_j\) given that the outcome of the parents is \(h\) are identical for any outcome \(h\) comprising the group \(l\), we get the limits

$$\begin{aligned} \lim _{m \rightarrow \infty } \frac{ n(\pi _j^h) + \sum _{i=1}^{k_j}\beta _{jih}}{n(\pi _j^l) + \sum _{i=1}^{k_j}\beta _{jil}} = \lim _{m \rightarrow \infty } \frac{n(x_j^i \mid \pi _j^h) + \beta _{jih}}{n(x_j^i \mid \pi _j^l) + \beta _{jil}} = \zeta _{jh}. \end{aligned}$$

And subsequently as \(m \rightarrow \infty \) the difference \(\log S_{\text {SGM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) - \log S_{\text {GM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) \rightarrow \)

$$\begin{aligned}&\sum _{h=1}^{q_l} \left( n(\pi _j^h) \log \zeta _{jh} - \sum _{i=1}^{k_j} n(x_j^i \mid \pi _j^h) \log \zeta _{jh} \right) \\&\quad = \sum _{h=1}^{q_l} \left( n(\pi _j^h) \log \zeta _{jh} - n(\pi _j^h) \log \zeta _{jh} \right) = 0. \end{aligned}$$

Appendix C: List of political parties in the Finnish parliament

Table 4 contains a list of political parties elected to the Finnish parliament in the parliament elections of 2011.

Table 4 List of political parties that are members of the Finnish parliament

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nyman, H., Xiong, J., Pensar, J. et al. Marginal and simultaneous predictive classification using stratified graphical models. Adv Data Anal Classif 10, 305–326 (2016). https://doi.org/10.1007/s11634-015-0199-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-015-0199-5

Keywords

Mathematics Subject Classification

Navigation