The utility of item response modeling in marketing research

Raykov, Tenko; Calantone, Roger J.

doi:10.1007/s11747-014-0391-8

The utility of item response modeling in marketing research

Methodological Paper
Published: 17 May 2014

Volume 42, pages 337–360, (2014)
Cite this article

Journal of the Academy of Marketing Science Aims and scope Submit manuscript

Tenko Raykov¹ &
Roger J. Calantone¹

1460 Accesses
21 Citations
2 Altmetric
Explore all metrics

Abstract

Item response modeling (IRM/IRT) has been known to marketing scholars for a number of years. However, with the exception of some notable and important applications in international (cross-cultural) marketing and consumer behavior, even a cursory reading of marketing journals reveals a general lack of interest in applying IRM, despite its ability to provide highly useful measurement-related information. To address and hopefully remedy the paucity of adoption, we offer an application-oriented discussion of the utility of IRM for marketing and related business research to enable researchers to utilize the strengths and realize the benefits of this methodology in their empirical work. After a short discussion of the history of IRM, we focus on its fundamentals within a modern statistical framework based on the generalized linear model and closely related non-linear factor analysis. We then engage major concepts of IRM, including item characteristic curve, local independence, and dimensionality, as well as parameter estimation and information functions. The popular one- and two-parameter logistic models are next discussed, as is the issue of model selection. Several polytomous item response models are subsequently dealt with, followed by a discussion of multidimensional IRM and data illustrations of item response models using widely available software. References to exemplar marketing applications are provided along the way, and a discussion of limitations of IRM concludes the article.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What is Qualitative in Qualitative Research

Article Open access 27 February 2019

Research in marketing strategy

Article 18 August 2018

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

Notes

The threshold parameter is typically called “difficulty” parameter in achievement evaluation and educational research settings, a reference we will not use in this paper dealing with marketing and business research, where it is not generally applicable.
A different derivation of the model definition Eq. 12 was provided by Rasch (1960), which was developed within an ability evaluation context and based on the comparison of an examined person’s ability level to the “difficulty” of the item in question (see Footnote 1). An alternative and direct derivation of the 1PL-model, without reference to any subject-matter context can be found in Raykov (2014), which only uses logistic regression considerations.
This paper is based on the premise that removal of items from a multi-component measuring instrument in order to achieve constraint Eq. 13 may be carried out if this is not associated with loss in validity, since the Rasch model has clearly desirable properties. However, “specific objectivity” cannot be a replacement for validity, and hence it need not be pursued in cases when construct under-representation is possible if dropping particular items from a measuring instrument in an attempt to satisfy the Rasch model and specifically its unique feature of uniform item discrimination power, as reflected in Eq. 13.
A sufficient statistic is a function of the available data with the property that the data likelihood (probability of the data) can be calculated as soon as one knows the value of that statistic. In other words, a statistic is sufficient with respect to a particular parameter if the former extracts (holds) all the information about the parameter that is contained in the available dataset. ML estimators can be shown to be functions of sufficient statistics, hence the particular relevance of such statistics in ML-based estimation (for a formal definition of sufficiency see, e.g., Casella and Berger 2002).
An informal explanation of the idea of the EM algorithm can be provided by considering the trait parameters (i.e., the individual θ’s) as missing values—one per respondent in the current setting. If these values were observed, one could use directly ML to estimate the item parameters. Since the missing values are not observed, however, one could obtain first the expected likelihood with respect to an assumed distribution for the missing values (such as the normal), and then maximize that likelihood to obtain provisional item parameter estimates. One can then use the latter to obtain predictions of the missing values, and treating these predictions as known (i.e., as if they were “observed values”) furnish improved estimates of the item parameters, with them obtain improved predictions of the missing values, then of the item parameters, and so on until convergence (e.g., Thissen 1982). It can be shown that at each subsequent step, the likelihood increases. The EM algorithm can be slow in reaching convergence in some cases.
The general expressions for the elements of the matrix in Eq. 17, also for the 2PL-model, as functions of ability levels, item parameters, and model-implied probabilities of correct response are given for instance in Baker and Kim (2004).
For numerical indexes of essential unidimensionality (i.e., the degree of dominance of a major latent dimension over secondary ones) and how to point and interval estimate them in an empirical setting, see, e.g., Raykov and Pohl (2013a, 2013b). Those authors also offer some informal guidelines for interpretation of the indexes, and suggestions when a complex structure instrument may be considered essentially unidimensional for some research questions in social science research (see also Stout 1990).
Using the weighted least squares method of model fitting, with a mean and variance correction (Muthén and Muthén 2012), one obtains also the root mean square error of approximation (RMSEA) that is a popular goodness-of-fit index, which is in general less sensitive to sample size than the chi-square test statistic reported in this paragraph. The RMSEA index for the fitted model here is .0, with a 90%-confidence interval of (0, .043), indicating similarly a tenable model and thus plausibility of the unidimensionality hypothesis tested with it (e.g., Raykov and Marcoulides 2006).
This p-value associated with the LRT used, can be routinely calculated for instance using the following R command (e.g., Raykov and Marcoulides 2012):> 1-pchisq (21.597-21.136, 5–1) or more generally > 1-pchisq (“LRT reduced model” – “LRT full model”, “df full” – “df reduced”) where “>” stands for the R prompt and within inverted commas one needs to substitute the corresponding numerical values for the LRT statistics associated with the Rasch and 2PL-models, respectively, as well as their degrees of freedom - all obtained from the Mplus output files as discussed in the main text (see also Appendix, incl. Notes 1 and 2 to first command file).
The finally entertained 1PL-model in this example, being a NLFA model, is equivalent to a classical test theory-based model with equal loadings on the underlying common true score (e.g., Raykov and Marcoulides 2011). Thus, the last column of Table 3 shows that when classical test theory is correctly used, and in particular models based on it are fitted to data by accounting properly for the nature/scale of the items (viz. ordinal, and binary in this example), then the standard error associated with the individual trait parameter (i.e., the individual true score) estimates does depend on the true score/trait level, contrary to claims found particularly in the older IRT literature. (See Raykov and Marcoulides 2011 for other myths about classical test theory, as well as Raykov 2014.)
The IIFs and TIF are easily obtained also with the readily available software IRTPRO (Cai et al. 2011). To this end, upon loading the data select “Analysis” from the main toolbar, “Unidimensional IRT” from the drop-down menu, and “Items” in the middle of the opened window then. Upon highlighting and moving all analyzed items into the right blank window, pressing “Run” will fit the (default) 2PL-model. Chose then “Graph” under “Analysis” in the toolbar for the resulting output window, to obtain correspondingly the IIFs and TIF (check those in the small upper-left corner window opened then). For further details on using IRTPRO, refer to Cai et al. (2011).
Upon loading the data in IRTPRO, select “Analysis” from the main toolbar and “Unidimensional IRT.” Click on the “Items” tab in the opened central window, and highlight and move into the right blank window all items of interest. Pressing then “Run” will fit the GR model that is default in this software with polytomous items.

References

Agresti, A. (2012). Categorical data analysis. New York: Wiley.
Google Scholar
Baker, F. B. (2001). The basics of item response theory. Washington: ERIC Clearinghouse on Assessment and Evaluation.
Google Scholar
Baker, F. B., & Kim, S.-H. (2004). Item response theory. parameter estimation techniques. Boca Raton: Taylor & Francis.
Google Scholar
Bartholomew, D. J., Knott, M., & Moustaki, I. (2011). Factor analysis and latent variable models. New York: Wiley.
Book Google Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scores in two or more nominal categories. Psychometrika, 37, 29–51.
Article Google Scholar
Bock, R. D. (1997a). A brief history of item response theory. Educational Measurement: Issues and Practice, 16, 21–33.
Article Google Scholar
Bock, R. D. (1997b). The nominal categories model. In W. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 32–49). New York: Springer.
Google Scholar
Cai, L., Thissen, D., & duToit, S. H. C. (2011). IRTPRO user’s guide. Lincolnwood: Scientific Software International, Inc.
Google Scholar
Casella, G., & Berger, J. O. (2002). Statistical inference. Monterey: Wadsworth.
Google Scholar
Cox, E. P. (1980). The optimal number of response alternatives for a scale: a review. Journal of Marketing Research, 17, 407–422.
Article Google Scholar
De Jong, M. G., Steenkamp, J.-B. E. M., Fox, J.-P., & Baumgartner, H. (2008). Using item response theory to measure extreme response style in marketing research: a global investigation. Journal of Marketing Research, 45, 104–115.
Article Google Scholar
de Leeuw, J., & Verhelst, N. (1986). Maximum-likelihood estimation in generalized rasch models. Psychometrika, 11, 183–196.
Google Scholar
Diamantopoulos, A., Sarstedt, M., Fuchs, C., Wilchynski, P., & Kaiser, S. (2012). Guidelines for choosing between multi-item and single-item scales for construct measurement: a predictive validity perspective. Journal of the Academy of Marketing Science, 40, 434–449.
Article Google Scholar
Ewing, M. T., Salzberger, T., & Sunkovic, R. R. (2005). An alternate approach to assessing cross-cultural measurement equivalence in advertising research. Journal of Advertising, 34, 17–36.
Article Google Scholar
Fischer, G. (1974). Einführung in die theorie psychologischer tests. Bern: Huber.
Google Scholar
Goldstein, H. (2011). Multilevel statistical models. London: Arnold.
Google Scholar
Green, P. E. (1984). Hybrid models for conjoint analysis: an expository review. Journal of Marketing Research, 21, 155–170.
Article Google Scholar
Kamakura, W. A., & Balasubramanian, S. (1989). Tailored interviewing: an application of item response theory for personality measurement. Journal of Personality Assessment, 53, 502–519.
Article Google Scholar
Lancaster, T. (2000). The incidental parameter problem since 1948. Journal of Econometrics, 95, 391–413.
Article Google Scholar
Lawley, D. N. (1943). On problems connected with item selection and test construction. Proceedings. Royal Society of Edinburgh, 61, 273–287.
Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillside: Erlbaum.
Google Scholar
Messick, S. (1995). Validation of inferences from person’s responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.
Article Google Scholar
Muthén, L. K., & Muthén, B. O. (2012). Mplus user’s guide. Los Angeles: Muthén & Muthén.
Google Scholar
Nelder, J. A., & Wedderburn, R. W. (1972). Generalized linear models. Journal of the Royal Statistical Society, Series A, 135, 370–384.
Article Google Scholar
Neyman, J., & Scott, E. L. (1948). Consistent estimation from partially consistent observations. Econometrica, 16, 1–32.
Article Google Scholar
Ostini, R., & Nering, M. L. (2006). Polytomous item response theory models. Thousand Oaks: Sage.
Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute of Educational Research.
Google Scholar
Raykov, T. (2007). Reliability if deleted, not “alpha if deleted”: evaluation of scale reliability following component deletion. British Journal of Mathematical and Statistical Psychology, 60, 201–216.
Article Google Scholar
Raykov, T. (2008). “Alpha if item deleted”: a note on loss of criterion validity in scale development if maximising coefficient alpha. British Journal of Mathematical and Statistical Psychology, 61, 275–285.
Article Google Scholar
Raykov, T. (2012). Scale development using structural equation modeling. In R. Hoyle (Ed.), Handbook of structural equation modeling (pp. 472–492). New York: Guilford Press.
Google Scholar
Raykov, T. (2014). A course in item response modeling (Lecture Notes). Michigan State University, Measurement and Quantitative Methods. MI: East Lansing.
Raykov, T., & Marcoulides, G. A. (2006). A first course in structural equation modeling (second edition). Mahwah: Erlbaum.
Google Scholar
Raykov, T., & Marcoulides, G. A. (2008). An introduction to applied multivariate analysis. New York: Taylor & Francis.
Google Scholar
Raykov, T., & Marcoulides, G. A. (2011). Introduction to psychometric theory. New York: Taylor & Francis.
Google Scholar
Raykov, T., & Marcoulides, G. A. (2012). Basic statistics: an introduction with R. New York: Rowman & Littlefield.
Google Scholar
Raykov, T., Marcoulides, G. A. (2014). On examining the underlying normal variable assumption in latent variable models with categorical indicators. Structural Equation Modeling (in press).
Raykov, T., & Pohl, S. (2013a). On studying common factor variance in multiple component measuring instruments. Educational and Psychological Measurement, 73, 191–209.
Article Google Scholar
Raykov, T., & Pohl, S. (2013b). Essential unidimensionality in multiple component measuring instruments: a correlation decomposition approach. Educational and Psychological Measurement, 73, 581–600.
Article Google Scholar
Raykov, T., Rodenberg, C., Narayanan, A. (2015). On optimal shortenng of psychometric scales. Structural Equation Modeling (in press).
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Book Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometric Monograph, No. 17.
Samejima, F. (1997). Graded response model. In W. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York: Springer.
Google Scholar
Schultz, C., Salomo, S., & Talke, K. (2013). Measuring new product portfolio innovativeness: how differences in scale width and evaluator perspectives affect its relationship with performance. Journal of Product Innovation Management, 30, 93–109.
Article Google Scholar
Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.
Article Google Scholar
Swaminathan, H., & Guifford, J. A. (1982). Bayesian estimation in the rasch model. Journal of Educational Statistics, 7, 175–192.
Article Google Scholar
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.
Article Google Scholar
Thissen, D. (1982). Marginal maximum-likelihood estimation for the one-parameter logistic model. Psychometrika, 47, 175–186.
Article Google Scholar
Timm, N. H. (2002). Multivariate analysis. New York: Springer.
Google Scholar
van der Linden, W., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer.
Google Scholar
Verhelst, N., & Molenaar, I. (1988). Logit-based parameter estimation in the rasch model. Statistica Neerlandica, 42, 273–295.
Article Google Scholar
Wechsler, D. (1958). The measurement and appraisal of adult intelligence. Baltimore: William & Wilkins.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Measurement and Quantitative Methods, Michigan State University, 443a Erickson Hall, East Lansing, MI, 48824, USA
Tenko Raykov & Roger J. Calantone

Authors

Tenko Raykov
View author publications
You can also search for this author in PubMed Google Scholar
Roger J. Calantone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tenko Raykov.

Additional information

Author note

We are indebted to the Editor, G. Tomas M. Hult, for the invitation to write this paper as well as for valuable discussions on the relevance of IRM in marketing research and his guidance during the process of developing the article. We are grateful to T. Asparouhov, D. M. Dimitrov, S. H. C. du Toit, M. Edwards, C. Lewis, B. O. Muthén, and M. D. Reckase for instructive advice, comments, and information on IRM and its applications, as well as to two anonymous Referees for critical comments on an earlier version of the paper that have contributed considerably to its improvement. Thanks are also due to R. Bowles for informative discussions on the use of IRM software.

Appendix

Mplus source codes for testing unidimensionality, model selection, dimensionality examination in MIRM, and fitting a compensatory multidimensional item response model (commonly applicable 2PL-model extension) with subsequent item and trait parameter estimation

A.
To test the assumption of unidimensionality (at present nearly routinely made in social research using UIRT), as well as fit the popular 2PL-model, use the following command file.
1. Note 1.
  In this and following Mplus command files, clarifying comments are inserted after exclamation mark (signaling start of comment in remainder of line). After stating the title and providing the name of the text-only data file (ASCII file) correspondingly with the TITLE and DATA commands, names are assigned to the columns/variables in the file with the VARIABLE command. The items to be analyzed are then selected with the USEVARIABLE subcommand, and declared as categorical/ordinal (binary) with the CATEGORICAL subcommand. The 2PL-model framework is then selected (with no subsequent constraints; see Note 2 for possible such) with the command ANALYSIS. That model is defined with the MODEL command (with the trait measured by the items denoted F1). Thereby, the variance of the trait is set at 1, as commonly done in IRM, and all discrimination parameters are freely estimated (see also Table 2; see, e.g., Raykov and Marcoulides 2006 for an introduction into the syntax of Mplus).
2. Note 2.
  To test if the Rasch (1PL-) model fits the data, upon finding that the 2PL-model does, only add to the end of the first line in the MODEL command the 3 symbols “(1)” (i.e., enter them just before the semi-colon sign; this implies the equality constraint for all item discrimination parameters).
3. Note 3.
  To estimate the trait parameters, add to the end of the above input file the following two commands: SAVEDATA: SAVE = FSCORES; FILE = MIRT_THETA_SCORES. DAT;
B.
To fit the graded response model (e.g., for Likert-type items), and in particular for testing the overall fit of the model, use the following Mplus command file (see above input file for explanation of commands used):
C.
To carry out MIRM/MIRT, as discussed in the multidimensional item response models section one should proceed in general in two steps (see also empirical illustration section): (1) exploration of dimensionality of a given instrument/item set and generation of a corresponding hypothesis about its dimensionality, and (2) testing of this hypothesis and if found plausible estimation of the trait parameters. To this end, use the following pair of command files. (Individual trait parameter estimates, along with their standard errors, are found in file ‘MIRT-THETA-SCORES. DAT’, for the used example data.)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raykov, T., Calantone, R.J. The utility of item response modeling in marketing research. J. of the Acad. Mark. Sci. 42, 337–360 (2014). https://doi.org/10.1007/s11747-014-0391-8

Download citation

Received: 24 January 2014
Accepted: 21 April 2014
Published: 17 May 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11747-014-0391-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The utility of item response modeling in marketing research

Abstract

Access this article

Similar content being viewed by others