Skip to main content

Common Statistical Methods for Primary and Secondary Analysis in Substance Abuse Research

  • Chapter
  • First Online:
Research Methods in the Study of Substance Abuse
  • 943 Accesses

Abstract

This chapter presents statistical methods and issues commonly encountered in the design and analysis of substance abuse research studies. It begins with a general discussion contrasting primary and secondary data analysis, followed by an overview of study design from the perspective of the conduct of primary research, including hypothesis and planned analysis specification, sampling schemes, and power analysis. Next, descriptions of study characteristics are included from the perspective of secondary analysis, paying particular attention to characteristics that need to be considered when determining appropriate analytic methods and interpreting results. Statistical methods reviewed include: various types of regression (linear regression, logistic regression, survival analysis), related topics, such as moderators and mediators, as well as multilevel models (for longitudinal or clustered observations), and latent variable modeling techniques, including structural equation modeling, latent class analysis, latent transition analysis, and growth mixture modeling. Finally, overviews of four major special topics particularly important when using secondary data are provided, which include: multiplicity of hypotheses, combining data and results from multiple studies, missing data, and propensity scores. Where helpful, concepts and methods are illustrated using practical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken, NJ: Wiley.

    Book  Google Scholar 

  • Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317–332.

    Article  Google Scholar 

  • Arbuckle, J. L. (2006). Amos (version 7.0) [computer program]. Chicago, IL: SPSS.

    Google Scholar 

  • Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 2, 238–246.

    Article  Google Scholar 

  • Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software.

    Google Scholar 

  • Bentler, P. M. (2000–2008). EQS 6 structural equations program manual. Encino, CA: Multivariate Software, Inc.

    Google Scholar 

  • Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 719–725.

    Article  Google Scholar 

  • Biernacki, C., & Govaert, G. (1997). Using the classification likelihood to choose the number of clusters. Computing Science and Statistics, 29, 451–457.

    Google Scholar 

  • Bollen, K. A. (1989). Structural equations with latent variables. New York, NY: Wiley.

    Book  Google Scholar 

  • Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation approach (Wiley series on probability and mathematical statistics). Hoboken, NJ: Wiley.

    Google Scholar 

  • Browne, M. W. (1974). Generalized least squares estimators in the analysis of covariance structures. South African Statistical Journal, 8, 1–24.

    Google Scholar 

  • Browne, M. W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62–83.

    Article  Google Scholar 

  • Celeux, G., & Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13, 195–212.

    Article  Google Scholar 

  • Chou, C.-P., & Bentler, P. M. (1990). Model modification in covariance structure modeling: A comparison among likelihood ratio, lagrange multiplier, and Wald tests. Multivariate Behavioral Research, 25, 115–136.

    Article  Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Collins, L. M. (2006). Analysis of longitudinal data: The integration of theoretical model, temporal design and statistical model. Annual Review of Psychology, 57, 505–528.

    Article  Google Scholar 

  • Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B, 34(2), 187–220.

    Google Scholar 

  • D’Agostino, R. (1998). Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statistics in Medicine, 17, 2265–2281.

    Article  Google Scholar 

  • Dang, H. D. (2011). A latent transition analysis of self-efficacy among men treated for cocaine dependence (doctoral dissertation). Available from ProQuest dissertations and theses database (UMI No. 3472617).

    Google Scholar 

  • Deeks, J. J., Higgins, J. P. T., & Altman, D. G. (2011). Chapter 9: Analysing data and undertaking meta-analyses. In J. P. T. Higgins & S. Green (Eds.), Cochrane handbook for systematic reviews of interventions, version 5.1.0 (updated March 2011). London, UK: The Cochrane Collaboration. Available from www.cochrane-handbook.org

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series. B, 39, 1–38.

    Google Scholar 

  • DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7, 177–188.

    Article  Google Scholar 

  • Eliason, S. (1997). The categorical data analysis system. Version 4.0 of MLLSA. Iowa City, IA: University of Iowa.

    Google Scholar 

  • Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.

    Article  Google Scholar 

  • Feng, W., Jun, Y., & Xu, R. A (2006). Method/macro based on propensity score and Mahalanobis distance to reduce bias in treatment comparison in observational study. SAS Technical Report, paper PR05, pp. 1–11.

    Google Scholar 

  • Friedman, L. M., Furberg, C. D., & DeMets, D. L. (2010). Fundamentals of clinical trials (4th ed.). New York, NY: Springer.

    Book  Google Scholar 

  • Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8.

    Article  Google Scholar 

  • Green, S. B., Thompson, M. S., & Babyak, M. A. (1998). A Monte Carlo investigation of methods for controlling type I errors with specification searches in structural equation modeling. Multivariate Behavioral Research, 33, 365–384.

    Article  Google Scholar 

  • Guo, S., & Fraser, M. W. (2010). Propensity score analysis: Statistical methods and application. Thousand Oaks, CA: Sage Publications.

    Google Scholar 

  • Heitjan, F., & Little, R. J. A. (1991). Multiple imputation for the fatal accident reporting system. Applied Statistics, 40, 13–29.

    Article  Google Scholar 

  • Higgins, J. P. T., & Green, S. (Eds.). (2011). Cochrane handbook for systematic reviews of interventions version 5.1.0 (updated March 2011). London, UK: The Cochrane Collaboration. Available from www.cochrane-handbook.org

  • Homburg, C., & Dobartz, A. (1992). Covariance structure analysis via specification searches. Statistical Papers, 33(1), 119–142.

    Article  Google Scholar 

  • Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression. New York, NY: Wiley.

    Book  Google Scholar 

  • Hser, Y.-I., Evans, E., Huang, Y., & Anglin, M. D. (2004). Relationship between drug treatment services, retention and outcomes. Psychiatric Services, 55(7), 767–774.

    Article  Google Scholar 

  • Hser, Y.-I., Evans, E., Huang, D., & Messina, N. (2011). Long-term outcomes among drug-dependent mothers treated in women-only versus mixed-gender programs. Journal of Substance Abuse Treatment, 41(2), 115–123.

    Article  Google Scholar 

  • Hser, Y.-I., Huang, D., Chou, C.-P., & Anglin, M. D. (2007). Trajectories of heroin addiction: Growth mixture modeling results based on a 33-year follow-up study. Evaluation Review, 31(6), 548–563.

    Article  Google Scholar 

  • Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.

    Article  Google Scholar 

  • Hussong, A. M., Curran, P. J., & Bauer, D. J. (2013). Integrative data analysis in clinical psychology research. Annual Review of Clinical Psychology, 9, 61–89.

    Article  Google Scholar 

  • Kang, J. D., & Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data (with discussions). Statistical Science, 22, 523–539.

    Article  Google Scholar 

  • Jones, B. L., Nagin, D. S., & Roeder, K. (2001). A SAS procedure based on mixture models for estimating developmental trajectories. Sociological Methods and Research, 29, 374–393.

    Article  Google Scholar 

  • Jöreskog, K. G., & Sörbom, D. (2006). LISREL 8.8 for Windows [computer software]. Skokie, IL: Scientific Software International, Inc.

    Google Scholar 

  • Kalbfleisch, J. D., & Prentice, R. L. (2002). The statistical analysis of failure time data (2nd ed.). Hoboken, NJ: Wiley.

    Book  Google Scholar 

  • Klein, J. P., & Moeschberger, M. L. (2003). Survival analysis: Techniques for censored and truncated data (2nd ed.). Hoboken, NJ: Springer.

    Google Scholar 

  • Kline, R. B. (1998). Principles and practice of structural equation modeling. New York, NY: Guilford Press.

    Google Scholar 

  • Lanza, S. T., Dziak, J. J., Huang, L., Wagner, A., & Collins, L. M. (2013). PROC LCA and PROC LTA Users’ guide (version 1.3.0). University Park, PA: The Methodology Center, Penn State.

    Google Scholar 

  • Li, L., & Hser, Y.-I. (2011). On inclusion of covariates for class enumeration of growth mixture models. Multivariate Behavioral Research, 46(2), 266–302.

    Article  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York, NY: Wiley.

    Google Scholar 

  • Lo, Y., Mendell, N., & Rubin, D. (2001). Testing the number of components in a normal mixture. Biometrika, 88, 767–778.

    Article  Google Scholar 

  • MacCallum, R. C. (1986). Specification searches in covariance structure modeling. Psychological Bulletin, 100, 107–120.

    Article  Google Scholar 

  • MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in psychological research. Annual Reviews in Psychology, 51, 201–226.

    Article  Google Scholar 

  • MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111, 490–504.

    Article  Google Scholar 

  • McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics, 36, 318–324.

    Article  Google Scholar 

  • McLellan, A. T., Kushner, H., Metzger, D., Peters, R., Smith, I., Grissom, G., et al. (1992). The fifth edition of the addiction severity index. Journal of Substance Abuse Treatment, 9(3), 199–213.

    Google Scholar 

  • Muthén, B. O. (2003). Statistical and substantive checking in growth mixture modeling: Comment on Bauer and Curran (2003). Psychological Methods, 8, 369–377.

    Article  Google Scholar 

  • Muthén, B. O. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (Ed.), The Sage Handbook of Quantitative Methodology for the Social Sciences (pp. 345–368). Thousand Oaks, CA: Sage Publications.

    Google Scholar 

  • Muthén, B., & Muthén, L. (2000). The development of heavy drinking and alcohol-related problems from ages 18 to 37 in a U. S. National sample. Journal of Studies on Alcohol, 61(2), 290–300.

    Article  Google Scholar 

  • Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599–620.

    Article  Google Scholar 

  • Muthén, B., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics, 55(2), 463–469.

    Article  Google Scholar 

  • Muthén, L. K., & Muthén, B. O. (1998–2012). Mplus user’s guide (7th Ed.). Los Angeles, CA: Muthén and Muthén.

    Google Scholar 

  • Nagin, D. S. (1999). Analyzing developmental trajectories: A semiparametric group-based approach. Psychological Methods, 4(2), 139–157.

    Article  Google Scholar 

  • Nagin, D. S., & Tremblay, R. E. (2001). Analyzing developmental trajectories of distinct but related behaviors: A group-based method. Psychological Methods, 6, 18–34.

    Article  Google Scholar 

  • Neale, M. C., Boker, S. M., Xie, G., & Maes, H. H. (2003). Mx: Statistical modeling (6th ed.). Richmond, VA: Department of Psychiatry.

    Google Scholar 

  • Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14, 535–569.

    Article  Google Scholar 

  • Peugh, J., & Fan, X. (2012). How well does growth mixture modeling identify heterogeneous growth trajectories? A simulation study examining GMM’s performance characteristics. Structural Equation Modeling: A Multidisciplinary Journal, 19, 204–226.

    Article  Google Scholar 

  • Riley, R. D., Lambert, P. C., & Abo-Zaid, G. (2010). Meta-analysis of individual participant data: Rationale, conduct, and reporting. British Medical Journal, 340(7745), 521–525.

    Google Scholar 

  • Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.

    Article  Google Scholar 

  • Rothman, K. J., Greenland, S., & Lash, T. L. (2008). Modern epidemiology (3rd ed.). Philadelphia, PA: Lippincott, Williams & Wilkins.

    Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.

    Article  Google Scholar 

  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.

    Book  Google Scholar 

  • Rubin, D. B. (1997). Estimating causal effects from large data sets using propensity scores. Annals of Internal Medicine, 127, 757–763.

    Article  Google Scholar 

  • Rubin, D. B., & Thomas, N. (1996). Matching using estimated propensity scores: Relating theory to practice. Biometrics, 52, 249–264.

    Article  Google Scholar 

  • SAS Institute Inc. (2013). SAS/STAT® 13.1 user’s guide. Cary, NC: SAS Institute Inc.

    Google Scholar 

  • Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. American Statistical Association 1988 proceedings of the Business and Economics Sections (pp. 308–313). Alexandria, VA: American Statistical Association.

    Google Scholar 

  • Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage.

    Google Scholar 

  • Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York, NY: Chapman and Hall.

    Book  Google Scholar 

  • Schenker, N., & Taylor, J. M. G. (1996). Partially parametric techniques for multiple imputation. Computational Statistics and Data Analysis, 22, 425–446.

    Article  Google Scholar 

  • Schwartz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.

    Article  Google Scholar 

  • Sclove, L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52, 333–343.

    Article  Google Scholar 

  • Smith, A. K., Ayanian, J. Z., Covinsky, K. E., Landon, B. E., McCarthy, E. P., Wee, C. C., et al. (2011). Conducting high-value secondary dataset analysis: an introductory guide and resources. Journal of General Internal Medicine, 26(8), 920–929.

    Article  Google Scholar 

  • Sörbom, D. (1989). Model modification. Psychometrika, 54, 371–384.

    Article  Google Scholar 

  • Steiger, J. H., & Lind, J. C. (1980). Statistically-based tests for the number of common factors. Paper presented at the Annual Meeting of the Psychometric Society, Iowa City, IA.

    Google Scholar 

  • Stewart, L. A., & Tierney, J. F. (2002). To IPD or not to IPD?: Advantages and disadvantages of systematic reviews using individual patient data. Evaluation and the Health Professions, 25(1), 76–97.

    Article  Google Scholar 

  • Stewart, L. A., Tierney, J. F., & Clarke, M. (2011). Reviews of individual patient data. In J. P. T. Higgins & S. Green (Eds.), Cochrane handbook for systematic reviews of interventions (version 5.1.0) [updated March 2011]. London, UK: The Cochrane Collaboration, 2011. Available from www.cochrane-handbook.org

  • Tofighi, D., & Enders, C. K. (2007). Identifying the correct number of classes in growth mixture models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 317–341). Charlotte, NC: Information Age.

    Google Scholar 

  • Tucker, L. R., & Lewis, C. (1973). The reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10.

    Article  Google Scholar 

  • Vermunt, J. K. (1997). LEM 1.0: A general program for the analysis of categorical data. Tilburg, NL: Tilburg University.

    Google Scholar 

  • Vermunt, J. K. (2004). Latent Markov Model. In M. S. Lewis-Beck, A. Bryman, & T. F. Liao (Eds.), The sage encyclopedia of social science research methods (pp. 553–554). Thousand Oaks, CA: Sage Publications.

    Google Scholar 

  • Vermunt, J. K., & Magidson, J. (2013). Latent GOLD 5.0 upgrade manual. Belmont, MA: Statistical Innovations Inc.

    Google Scholar 

  • Von Davier, M. (1997). WINMIRA program description and recent enhancements. Methods of Psychological Research Online, 2, 25–28.

    Google Scholar 

  • Weiss, R. E. (2005). Modeling longitudinal data. New York, NY: Springer.

    Google Scholar 

  • Weston, R., & Gore, P. A., Jr. (2006). A brief guide to structural equation modeling. The Counseling Psychologist, 34, 719–751.

    Article  Google Scholar 

  • Willett, J. B., & Singer, J. D. (1993). Investigating onset, cessation, relapse, and recovery: Why you should, and how you can, use discrete-time survival analysis to examine event occurrence. Journal of Consulting and Clinical Psychology, 61(6), 952–965.

    Article  Google Scholar 

  • Yang, C. (2006). Evaluating latent class analyses in qualitative phenotype identification. Computational Statistics and Data Analysis, 50, 1090–1104.

    Article  Google Scholar 

  • Ye, Y., & Kaskutas, L. A. (2008). Using propensity scores to adjust for bias when assessing the effectiveness of Alcoholics anonymous in observational studies. Drug and Alcohol Dependence, 104, 56–64.

    Article  Google Scholar 

Download references

Acknowledgements

The writing of this chapter was supported by the National Institute on Drug Abuse, Center for Advancing Longitudinal Drug Abuse Research (CALDAR, P30 DA016383, PI: Hser).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adam King .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

King, A., Li, L., Hser, YI. (2017). Common Statistical Methods for Primary and Secondary Analysis in Substance Abuse Research. In: VanGeest, J., Johnson, T., Alemagno, S. (eds) Research Methods in the Study of Substance Abuse. Springer, Cham. https://doi.org/10.1007/978-3-319-55980-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55980-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55978-0

  • Online ISBN: 978-3-319-55980-3

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics