Abstract
The Kauffman Firm Survey (KFS) was a panel study of new businesses that employed a complex sample design to collect key data about the dynamics of high-technology, medium-technology, and female-owned business entities. Complex sample designs of the type employed in the KFS typically have multi-frame sampling, stratification, non-response adjustment, and over-sampling components. Each of these design elements has been proven to enhance the efficiency with which researchers analyze and draw inferences from the available data. However, there is also a risk that a complex sample design approach can make data analysis more complicated due to non-independent selections and selection with varying probabilities. In this technical overview of the KFS, we describe the sampling method that was utilized in the panel survey. We examine how failing to take into account the probability-based weights impact the parameter estimates and the resulting standard errors. Through adopting an empirical approach, we show why it is important to take account of stratification and weighting. This paper demonstrates the importance of taking the features of a complex survey design into account during the data analysis process.
Similar content being viewed by others
Notes
The primary sampling units in the KFS are businesses and not owners.
A sample frame is a list of elements of the population with appropriate contact information.
“Starting from the third follow-up survey, a raking adjustment within the six sampling strata was used to achieve better precision” (KFS Fifth Follow-up Methodology Report, March 29, 2011).
Cochran (1977) explains why stratification can increase the precision of the estimates relative to SRS: “If each stratum is homogeneous, in that the measurements vary little from one unit to another, a precise estimate of any stratum mean can be obtained from a small sample in that stratum. These estimates can be combined in a precise estimate for the whole population.”
Even simple statistics, such as the mean, become non-linear in a complex survey.
This notation is also applicable to other sample designs. For example, for a sample design without stratification, you can let H = 1; for a sample design without clusters, you can let m hi = 1 for every h and i.
Researchers who are interested in studying high-tech, medium-tech, or non-tech businesses separately should avoid using the technology and gender ownership sampling strata variable that Mathematica used to select the KFS sample to split their sample. This is due to the fact that the primary industry of the business confirmed or updated during every survey; thus, the sampling strata variable does not reflect the current primary industry classification for the business (Farhat and Robb 2014).
In very rare cases where a stratum is the subpopulation (domain has a fixed sample size), eliminate cases are not a problem.
References
Aday, L. A., & Llewellyn, J. C. (2006). Designing and conducting health surveys: a comprehensive guide (3rd ed.). San Francisco, CA: Jossey Bass.
Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York, NY: John Wiley and Sons.
Farhat, Joseph B. and Robb, Alicia. 2014. Applied survey data analysis using Stata: the Kauffman Firm Survey data. Available at SSRN: http://ssrn.com/abstract=2477217
Haviland, Amelia and Savych, Bogdan (2007), A description and analysis of evolving data resources on small business. RAND Corporation Working Paper No. WR-293-1-ICJ.
Kish, L. (1965). Survey sampling. New York: John Wiley and Sons.
Kish, L. (1987). Statistical design for research. New York: John Wiley & Sons, Inc..
Kish, L. (1992). Weighting for unequal pi. Journal of Official Statistics, 8(2), 183–200.
Kish, L. (1995). Survey sampling (Wiley Classics Library ed.). New York: Wiley and Sons.
Korn, E. L., & Graubard, B. I. (1995). Examples of differing weighted and unweighted estimates from a sample survey. The American Statistician, 49, 291–295.
Lee, E. S., & Forthofer, R. N. (2005). Analyzing complex survey data (2nd ed.). Thousand Oaks, CA: Sage.
Lohr, S. L. (2010). Sampling: design and analysis (Second ed.). Boston: Brooks/Cole.
Marsden, P. V., & Wright, J. D. (Eds.). (2010). Handbook of survey research (second ed.). Bingley, UK: Emerald Publishing Group.
Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. International Statistical Review, 61, 317–337.
Pfeffermann, D., & Holmes, D. (1985). Robustness considerations in the choice of a method of inference for regression analysis of survey data. Journal of the Royal Statistical Society, Series A, 198, 268–278.
West, B. T., Berglund, P., & Heeringa, S. G. (2008). A closer examination of subpopulation analysis of complex sample survey data. The Stata Journal, 8(3), 1–12.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Farhat, J., Robb, A. Analyzing complex survey data: the Kauffman Firm Survey. Small Bus Econ 50, 657–670 (2018). https://doi.org/10.1007/s11187-017-9913-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11187-017-9913-3