Sample
Due to the implications from previous research, we selected a large-scale dataset from a reward-based platform, Kickstarter. The data was obtained using a self-programmed web-crawler collecting a wide set of variables a project initiator can affect projects from Kickstarter’s initial start in 2009 to the end of 2016. Overall, 294,150 valid projects were retrieved. Film or video projects (19%, n = 54,525) are most frequent, followed by music (16%, n = 45,606), publishing (11%, n = 31,255), games (8%, n = 23,964), and technology (8%, n = 22,584). Projects have an average funding period of 34.32 days (SD = 13.10) and an average goal of 45,961.35 USD (SD = 1,139,705.09) and are backed by 90.61 investors on average (SD = 782.16) that pledge a mean of 7440.47 USD (SD = 75,234.47). That yields a mean success rate of 36.09% (SD = 48.02). Furthermore, projects provide on average 2.54 updates (SD = 4.55) and 7.60 rewards (SD = 4.83). Kickstarter users posted 27.56 comments (SD = 1010.63), 156.55 (SD = 1391.13) Facebook shares, and 37.89 (SD = 646.29) Twitter tweets per project. Creators had on average 820.26 (SD = 978.18) Facebook friends. All descriptive information corresponds to previous research [11, 33].
Models
Non-linear effects can be incorporated into a variety of model types. However, incorporating non-linear effect terms in linear models, usually via polynomials, requires hypotheses about their nature (e.g., exponential, cubic) and successive significance testing. Since significance testing is not advisable in large-scale datasets [34], our approach is inherently explorative. As there is no theoretical assumption which variable follows which non-linear pattern, a family of non-linear models is chosen which resembles this exploratory approach. Generalized additive models (GAM) [35] try to find segments with (unique) non-linear patterns and aggregate these segments to a continuous function. Recent developments [36] have improved GAMs substantially and eased its application to the present type of datasets. We have selected two GAMs to estimate our models. The first GAM uses simple polynomial b-splines of third-degree polynomials of 95 percentile data to remove the sensitivity to outliers. Hence, these “b-splines”-termed models will show a strongly “smoothed” general trend among the variables. A second GAM uses low-rank isotropic smoothers using thin plates as penalty parameters to avoid oversaturation. Hereafter termed “tp-splines,” these advanced GAMs will produce a less smoothed, more data-driven trend. To compare the results with linear models, a traditional linear regression model using maximum likelihood-estimators is applied and quantiles as described before are used likewise.
Our modeling resembles previous approaches [10,11,12, 37], i.e., we use crowdfunding success as the ratio of amount pledged to goal, and we add a variety of control variables such as the log of goal, starting year, starting month, category, number of backers, and duration of the project. Further, we incorporate the amounts of updates, comments, rewards, Facebook friends, Facebook shares, and Twitter tweets as the focal variables that can be at least to some degree affected by projects with (b-spline, tp-spline) or without (linear model) a non-linear term. The general notation is therefore (consecutive numbering of parameters β for all values X):
$$ {Y}_{\mathrm{success}}=\log \left({\beta}_1\right){X}_{\mathrm{goal}}+{\beta}_2{X}_{\mathrm{year}}+{\beta}_3{X}_{\mathrm{month}}+{\beta}_4{X}_{\mathrm{category}}+{\beta}_5{X}_{\mathrm{backers}}+{\beta}_6{X}_{\mathrm{duration}}+{\omega}_1\ast {\beta}_7{X}_{\mathrm{updates}}+{\omega}_2\ast {\beta}_8{X}_{\mathrm{comments}}+{\omega}_3\ast {\beta}_9{X}_{\mathrm{rewards}}+{\omega}_4\ast {\beta}_{10}{X}_{\mathrm{FB}\ \mathrm{friends}}+{\omega}_5\ast {\beta}_{11}{X}_{\mathrm{FB}\ \mathrm{shares}}+{\omega}_6\ast {\beta}_{12}{X}_{\mathrm{Twitter}\ \mathrm{tweets}} $$
Omega (ω) is the additional term for either a b-spline (non-linear), a tp-spline (non-linear), or a linear model (1). Contrast categories are used for year (2009 to 2016), month (January to December), and category (15 categories ranging from art to theater). For the same reasons as explained before, we resign from model comparison tests within nested models of the same type, for instance, more parsimonious models for linear models and reliance on significance tests in the meantime.