Skip to main content
Log in

Analyzing complex survey data: the Kauffman Firm Survey

  • Published:
Small Business Economics Aims and scope Submit manuscript

Abstract

The Kauffman Firm Survey (KFS) was a panel study of new businesses that employed a complex sample design to collect key data about the dynamics of high-technology, medium-technology, and female-owned business entities. Complex sample designs of the type employed in the KFS typically have multi-frame sampling, stratification, non-response adjustment, and over-sampling components. Each of these design elements has been proven to enhance the efficiency with which researchers analyze and draw inferences from the available data. However, there is also a risk that a complex sample design approach can make data analysis more complicated due to non-independent selections and selection with varying probabilities. In this technical overview of the KFS, we describe the sampling method that was utilized in the panel survey. We examine how failing to take into account the probability-based weights impact the parameter estimates and the resulting standard errors. Through adopting an empirical approach, we show why it is important to take account of stratification and weighting. This paper demonstrates the importance of taking the features of a complex survey design into account during the data analysis process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The primary sampling units in the KFS are businesses and not owners.

  2. A sample frame is a list of elements of the population with appropriate contact information.

  3. “Starting from the third follow-up survey, a raking adjustment within the six sampling strata was used to achieve better precision” (KFS Fifth Follow-up Methodology Report, March 29, 2011).

  4. Cochran (1977) explains why stratification can increase the precision of the estimates relative to SRS: “If each stratum is homogeneous, in that the measurements vary little from one unit to another, a precise estimate of any stratum mean can be obtained from a small sample in that stratum. These estimates can be combined in a precise estimate for the whole population.”

  5. Even simple statistics, such as the mean, become non-linear in a complex survey.

  6. This notation is also applicable to other sample designs. For example, for a sample design without stratification, you can let H = 1; for a sample design without clusters, you can let m hi  = 1 for every h and i.

  7. Researchers who are interested in studying high-tech, medium-tech, or non-tech businesses separately should avoid using the technology and gender ownership sampling strata variable that Mathematica used to select the KFS sample to split their sample. This is due to the fact that the primary industry of the business confirmed or updated during every survey; thus, the sampling strata variable does not reflect the current primary industry classification for the business (Farhat and Robb 2014).

  8. In very rare cases where a stratum is the subpopulation (domain has a fixed sample size), eliminate cases are not a problem.

References

  • Aday, L. A., & Llewellyn, J. C. (2006). Designing and conducting health surveys: a comprehensive guide (3rd ed.). San Francisco, CA: Jossey Bass.

    Google Scholar 

  • Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York, NY: John Wiley and Sons.

    Google Scholar 

  • Farhat, Joseph B. and Robb, Alicia. 2014. Applied survey data analysis using Stata: the Kauffman Firm Survey data. Available at SSRN: http://ssrn.com/abstract=2477217

  • Haviland, Amelia and Savych, Bogdan (2007), A description and analysis of evolving data resources on small business. RAND Corporation Working Paper No. WR-293-1-ICJ.

  • Kish, L. (1965). Survey sampling. New York: John Wiley and Sons.

    Google Scholar 

  • Kish, L. (1987). Statistical design for research. New York: John Wiley & Sons, Inc..

    Book  Google Scholar 

  • Kish, L. (1992). Weighting for unequal pi. Journal of Official Statistics, 8(2), 183–200.

    Google Scholar 

  • Kish, L. (1995). Survey sampling (Wiley Classics Library ed.). New York: Wiley and Sons.

    Google Scholar 

  • Korn, E. L., & Graubard, B. I. (1995). Examples of differing weighted and unweighted estimates from a sample survey. The American Statistician, 49, 291–295.

    Google Scholar 

  • Lee, E. S., & Forthofer, R. N. (2005). Analyzing complex survey data (2nd ed.). Thousand Oaks, CA: Sage.

    Google Scholar 

  • Lohr, S. L. (2010). Sampling: design and analysis (Second ed.). Boston: Brooks/Cole.

    Google Scholar 

  • Marsden, P. V., & Wright, J. D. (Eds.). (2010). Handbook of survey research (second ed.). Bingley, UK: Emerald Publishing Group.

    Google Scholar 

  • Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. International Statistical Review, 61, 317–337.

    Article  Google Scholar 

  • Pfeffermann, D., & Holmes, D. (1985). Robustness considerations in the choice of a method of inference for regression analysis of survey data. Journal of the Royal Statistical Society, Series A, 198, 268–278.

    Article  Google Scholar 

  • West, B. T., Berglund, P., & Heeringa, S. G. (2008). A closer examination of subpopulation analysis of complex sample survey data. The Stata Journal, 8(3), 1–12.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph Farhat.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farhat, J., Robb, A. Analyzing complex survey data: the Kauffman Firm Survey. Small Bus Econ 50, 657–670 (2018). https://doi.org/10.1007/s11187-017-9913-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11187-017-9913-3

Keywords

JEL classification code

Navigation