Skip to main content
Log in

Joint Linear Modeling of Mixed Data and Its Application to Email Analysis

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

We present a new model in Social Networks which allows experts in this field to analyze social networks. In this paper, a joint random effect linear model for analysing longitudinal inflated [0,1]-support and inflated count response variables, where there is the possibility of non-ignorable missing values for inflated [0,1]-support response variable, has been presented. Considering the posterior distribution of unknowns given all available information. A Monte Carlo EM algorithm is used for estimating the posterior distribution of the parameters. A sensitivity of the results to the assumptions is also investigated the perturbation from missing at random to not missing at random. Influence of small perturbation of these elements on posterior displacement is also studied. Finally, for showing the applicability of the proposed model, results from analyzing Enron email dataset and student activity and profile dataset are presented. Also, a new statistical monitoring to study the longitudinal social network datasets via considering attributes which are important in various applications is provided. For this purpose, a complete definition of responsiveness rate in social networks as a [0,1]-support variable has been presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

Notes

  1. \(P_{Y_{(i,j)t}}^{INFBE_{k,l}}\), a probability measure on the measurable space \(({\Omega }, \mathfrak {B})\), is absolutely continuous with respect to the σ −finite measure μ = μL + δk + δl, where μL indicates the Lebesgue measure and δc is a point mass at c.

References

  • Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological) 44, 139–177.

    Article  MathSciNet  MATH  Google Scholar 

  • Anholetoa, T., Sandovala, M.C. and Bottera, D.A. (2012). Adjusted Pearson residuals in beta regression models. J. Stat. Comput. Simul. 84, 999–1014.

    Article  MathSciNet  Google Scholar 

  • Azarnoush, B.K., Paynabar, J.B. and Runger, G. (2016). Monitoring temporal homogeneity in attributed network streams. J. Qual. Technol. 48, 28–43.

    Article  Google Scholar 

  • Barreto-Souza, W. and Simas, A.B. (2017). Improving estimation for beta regression models via EM algorithm and related diagnostic tools. J. Stat. Comput. Simul.87, 2847–2867.

    Article  MathSciNet  MATH  Google Scholar 

  • Cameron, A.C. and Trivedi, P.K. (1998). Regression Analysis of Count Data. Cambridge University Press, Cambridge.

    Book  MATH  Google Scholar 

  • Cook, R.D. (1986). Assessment of Local Influence. J. Royal Statist. Soc., Ser. B. 48, 133–169.

    MathSciNet  MATH  Google Scholar 

  • Choudhary, P. and Singh, U. (2015). A survey on social network analysis for counter terrorism. Int. J. Comput. Appl. 112, 24–29.

    Google Scholar 

  • Demidenko, E. (2013). Mixed Models Theory and Applications With R. John Wiley Sons, New York.

    MATH  Google Scholar 

  • Dempster, A.P. and Laird, N.M. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. J. R. Statist. Soc B. 39, 1–38.

    MATH  Google Scholar 

  • Ferrari, S.L.P. and Cribari-Neto, F. (2004). Beta regression for modeling rates and proportions. J. Appl. Stat. 31, 799–815.

    Article  MathSciNet  MATH  Google Scholar 

  • Ferrari, S.L.P. and Pinheiro, E.C. (2010). Improved Likelihood Inference in Beta Regression. JJ. Stat. Comput. Simul. 81, 431–443.

    Article  MathSciNet  MATH  Google Scholar 

  • Hahn, G.J. and Shapiro, S. (1994). Statistical models in engineering. Wiley, New york.

    MATH  Google Scholar 

  • Hunger, M., Baumert, J. and Holle, R. (2011). Analysis of SF-6D index data: is beta regression appropriate?. Value Health 14, 759–767.

    Article  Google Scholar 

  • Hunger, M., Dring A. and Holle, R. (2012). Longitudinal beta regression models for analyzing health-related quality of life scores over time. BMC Med. Res. Methodol. 12, 144.

    Article  Google Scholar 

  • Johnson, N.L., Kotz, S. and Balakrishnan, N. (1995). Continuous univariate distributions: John Wiley and Sons.

  • Keeping, E.S. (2010). Introduction to Statistical Inference. Dover Publications, New Jersey.

    Google Scholar 

  • Little, R.J. and Rubin, D.B. (2002). Statistical analysis with missing data: John Wiley Sons.

  • Lusher, D., Koskinen, J. and Robins, G. (2013), Exponential random graph models for social networks: theory, methods, and applications. Cambridge University Press.

  • McLachlan, G. and Peel, D. (2000). Finite mixture models willey series in probability and statistics.

  • Ospina, R. and Ferrari, S.L.P. (2010). Inflated beta distributions. Stat. Pap. 51, 1–11.

    Article  MathSciNet  MATH  Google Scholar 

  • Ospina, R. and Ferrari, S.L.P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics and Data Analysis 56, 1609–1623.

    Article  MathSciNet  MATH  Google Scholar 

  • Patil, K. (2016). Validation of beta distribution for spectrum usage using Kolmogorov-Smirnov test. nt. J. Comput. Appl. 144, 479–482.

    Google Scholar 

  • Rubin, D.B. (1976). Inference and missing data. Biometrica. 82, 669–710.

    Google Scholar 

  • Smithson, M. and Verkuilen, J. (2006). A better lemon squeezer Maximum-likelihood regression with beta-distributed dependent variables. Psychol. Methods 11, 54.

    Article  Google Scholar 

  • Tu, W. (2002). Zero inflated data. Encyclopedia of Environmetrics 4, 2387–2391.

    Google Scholar 

  • Vern, J. and Kuile Smithson, M. (2012). Mixed and mixture regression models for continuous bounded responses using the beta distribution. J. Educ. Behav. Stat. 37, 82–113.

    Article  Google Scholar 

  • Vuong, Q.H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57, 307–333.

    Article  MathSciNet  MATH  Google Scholar 

  • Wadsworth, G.P. (1960). Introduction to probability and random variables: McGraw-Hill.

  • Wilson, P. (2015). The misuse of the vuong test for non-nested models to test for zero-inflation. Econ. Lett. 127, 151–153.

    Article  MathSciNet  MATH  Google Scholar 

  • Wood, S.N. (2006). Generalizedadditive models: an introduction with R. Chapman and Hall/CRC.

  • Yang, Z., Hardin, J.W., Addy, C.L. and Vuong, Q.H. (2007). Testing approaches for overdispersion in poisson regression versus the generalized poisson model. Biom. J. 49, 565–584.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou, X. and Changchun, T. (2011). Monte carlo EM algorithm for two-component mixture of generalized linear random effects models with varying coefficients. International Conference on Electronic and Mechanical Engineering and Information Technology.

Download references

Funding

Open access funding provided by Shahid Beheshti Unversity

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ehsan Bahrami Samani.

Ethics declarations

Conflict of Interest

The Authors declare that there is no conflict of interest

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(rar 324 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Samani, E.B., Tabrizi, E. Joint Linear Modeling of Mixed Data and Its Application to Email Analysis. Sankhya B 85, 175–209 (2023). https://doi.org/10.1007/s13571-023-00304-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-023-00304-w

Keywords

PACS Nos

Navigation