Abstract
Count datasets represented as integers are commonly encountered in various scientific fields, encompassing scenarios such as the number of species in a habitat, the number of accidents at a junction, the number of infected cells. This type of data often entails the presence of zero counts, which can be notably prevalent within the dataset. Recently, the zero-inflated Bell distribution family has been introduced to address the substantial occurrence of zeros in count datasets. Model diagnosis is a crucial step to ensure the appropriateness of a fitted model for the given data. While Pearson and deviance residuals are commonly employed for diagnosing count models in practical applications, it is widely acknowledged that these residuals do not adhere to normality when applied to count data. In the present study, our focus lies in evaluating the effectiveness of conventional diagnostic tools, including Pearson and deviance residuals, as well as randomized quantile residuals (RQRs) for the novel Bell and zero-inflated Bell models, which have been proposed as solutions to address overdispersion and zero inflation, respectively. Through this investigation, we aim to shed light on the performance of these residuals in the context of these newly proposed models. In the simulation study, the normality of randomized quantile residuals based on the Shapiro-Wilk test’s p-values are investigated for detecting overdispersion and zero inflation for the Bell-type regression models. The findings of this study indicate the superiority of RQRs in detecting distributional assumptions and reveal that RQRs possess the capability to detect overdispersion and zero inflation under Bell-type models. The number of infected blood cells is used in the application part of the study to illustrate the residual diagnostics of Bell-type regression models. Poisson, Bell, negative binomial, and their zero-inflated versions are utilized to analyze the infected blood cells dataset. Model fit criteria are employed to compare the analysis results of these count models, both in terms of goodness of fit and residual diagnostics.
Similar content being viewed by others
Data Availability
The dataset of the paper is available at bellreg R package (“cells”).
The generic R functions of the paper are publicly available at the github repository: https://github.com/haticeakdur/residualBell
References
Agresti A (2003) Categorical data analysis. Wiley, Hoboken, New Jersey, USA
Bell ET (1934) Exponential polynomials. Ann Math 35:258–277
Cameron AC, Trivedi PK (1990) Regression-based tests for overdispersion in the Poisson model. J Econom 46:347–364
Castellares F, Ferrari SL, Lemonte AJ (2018) On the Bell distribution and its associated regression model for count data. Appl Math Model 56:172–185
Crawley MJ (2010) Statistics: an introduction using R. John Wiley and Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, England
Demarqui F (2020) bellreg: Count Regression Models Based on the Bell Distribution. R package version 0.0.1. https://cran.r-project.org/package=bellreg
Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput Graph Stat 5:236–244
Feng C, Li L, Sadeghpour A (2020) A comparison of residual diagnosis tools for diagnosing regression models for count data. BMC Med Res Methodol 20(1):1–21. https://doi.org/10.1186/s12874-020-01055-2
Klar B, Meintanis SG (2012) Specification tests for the response distribution in generalized linear models. Comp Stat 27(2):251–267. https://doi.org/10.1007/s00180-011-0253-5
Kutner MH, Nachtsheim CJ, Neter J, Wasserman W (2004) Applied linear regression models. McGraw-Hill/Irwin, New York, USA
Lemonte AJ, Moreno-Arenas G, Castellares F (2020) Zero inflated Bell regression models for count data. J Appl Stat 47:265–286
Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1):1–14
Pereira GH (2019) On quantile residuals in beta regression. Commun Stat - Simul Comput 48(1):302–316. https://doi.org/10.1080/03610918.2017.1381740
Sadeghpour A Empirical investigation of randomized quantile residuals for diagnosis of non-normal regression models, Master of Science dissertation, University of Saskatchewan
Souza RDF, Fávero LP, Belfiore P, Corrêa HL (2022) overdisp: an R package for direct detection of overdispersion in count data multiple regression analysis. Int J Bus Intell Data Min 20(3):327–344
Zeileis A, Kleiber C (2020) countreg: Count Data Regression. R package version 0.2-1.http://r-forge.r-project.org/projects/countreg/
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Akdur, H.T.K., Kilic, D. & Bayrak, H. Residual Diagnostic Methods for Bell-Type Count Models. Stat Biosci (2023). https://doi.org/10.1007/s12561-023-09406-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12561-023-09406-5