Introduction

Self-Determination Theory (SDT; Ryan & Deci, 2017) is a ubiquitously utilized psychological framework which maintains individuals will generally engage more optimally in activities when motivated for more self-determined reasons. Thus, individuals that engage in activities because they want to (i.e., autonomous motivation; Werner & Milyavskaya, 2019) would, according to SDT, be expected to achieve more favorable outcomes when compared with individuals that engage in those same activites because they need to (i.e., controlling motivation; Ryan & Deci, 2017). These outcomes include greater task persistance, application of effort, and improved performance (Ryan & Deci, 2017). The rationale for these effects is that a strong sense of endorsement for an activity is reflective of an activity being of high value to an individual, thus enabling (i.e., motivating) high task engagement and the ability to more easily overcome adversity or challenges (Ryan & Deci, 2017). Given considerable experimental, cross-sectional, and longitudinal support showing more autonomous motivation does indeed lead to these theorized outcomes (Murphy & Taylor, 2019; Ng et al., 2012; Ntoumanis et al., 2021; Teixeira et al., 2012; Wilson & Mackl, 2009), practitioners tasked with facilitating their clients task/activity engagement are strongly encouraged to create conditions that help their clients feel more self-determined (Ntoumanis et al., 2021).

A more self-determined motivational state can be elicited using various strategies (Reeve et al., 2008). For instance, providing individuals with an increased sense of choice, a strong rationale for engagement, or increased social support is effective in enhancing autonomous motivation for the activity (Legault & Inzlicht, 2013; Silva et al., 2010). Yet, interest has been shown in determining whether a more self-determined state (and for theoretical reasons, more controlling motivational states) can also be elicited by using more subtle means, the reason being that subtle manipulations of motivation may avoid some of the problems associated with its more explicit counterpart. For instance, explicit interventions to promote autonomous motivation often require extensive personnel and resources (e.g., Silva et al., 2010). In contrast, subtle inteventions, whether that be in the form of posters, slogans, or embedded environmental structures, are often economical and less demanding of personnel resources. They also have the potential to remain in place long-term (e.g., Papies, 2016). This utility would thus render subtle approaches, if found to be even marginally effective (i.e., with respect to magnitude) in promoting autonomous functioning, to be a valuable alternative or supplement to more explicit interventions.

Subtle interventions are often implemented by a process known as priming (e.g., Banting et al., 2011; Hodgins, 2008; Murphy & Taylor, 2020; Weinstein et al., 2010). This process involves exposing individuals to relevant situational stimuli, either supraliminally (i.e., individuals are unaware of the primes potential influence) or subliminally (i.e., individuals are consciously unaware of the primes existence), in order to bring about specific cognitive, affective, and/or behavioral response. Thus, supraliminally exposing participants to printed words inextricably linked to the notion of ‘autonomy’ (e.g., ‘choice’, ‘freedom’, ‘volition’, etc.) in order to promote more autonomous task motivation (e.g., Murphy & Taylor, 2020) would be understood as an instance where participants have been primed to elicit this outcome. Primes are theorized to function by activating relevant knowledge structures such that they have an increased chance of being used in the immediate period that follows to elicit specified outcomes (Molden, 2014). To date, exposing participants to autonomous and controlling motivation primes (which for ease we also collectively refer to in this paper as ‘motivational primes’—in this and similar usage we do not refer to motivational primes in the non-SDT sense, e.g., Herbert & Kissler, 2010) has been found to promote theoretically expected differences in various outcomes, including implicit self-esteem (Hodgins et al., 2007), creativity (Weinstein et al., 2010), persistence during an exercise session (Ntoumanis et al., 2014), positive and negative affect, task enjoyment and attitudes (Brown et al., 2016), and exercise intentions (Magaraggia et al., 2014). Impressively, these findings manifest across a wide range of experimental and participant characteristics (e.g., priming duration, athletes/students, etc.). Collectively, this research broadly indicates these motivational primes are effective in eliciting distinct motivational states. This research is thus highly encouraging with respect to their potential application in the real-world.

Nevertheless, it may be that motivational priming effects within the extant literature are simply false positives (i.e., a true population effect does not exist even though effects were found), irrespective of the narrative supporting the efficacy of priming effects (e.g., Ryan & Deci, 2017). Indeed, motivational priming researchers may have engaged in p-hacking to find hypothesized effects. P-hacking is the practice of engaging in one or more behaviors (e.g., including recruiting beyond their planned sample-size, including or excluding covariates post-analysis, etc.) with the aim of revealing statistically significant estimates (Bruns & Ioannidis, 2016). P-hacking is problematic because it increases the likelihood that significant effects will be found when no population effect exists; in other words, p-hacking increases the type-1 error rate (Bruns & Ioannidis, 2016).

The practice of p-hacking used to be considered acceptable by many psychologists in order to find true population effects (Simmons et al., 2012). Thus, most psychologists likely had the best of intentions when they engaged in p-hacking (John et al., 2012; Simmons et al., 2012). As a result, p-hacking was prevalent during the period in which much of the motivational priming research was conducted (John et al., 2012; see Friese & Frankenbach, 2020, for a brief discussion on the extent of this prevalence). For instance, more than half of all American researchers contacted about questionable research practices reported that they selectively decided whether or not to collect more data only after first examining whether the results were significant (John et al., 2012). Because p-hacking was prevalent in psychology at the time that motivational priming research was being conducted, it is plausible that some motivational priming researchers engaged in p-hacking. Indeed, there is little indication that the research practices of motivational priming researchers will have been radically different to researchers in broader psychology. Thus, the published literature may not represent a true picture of the effectiveness of motivational priming.

Accordingly, the aim of the present study is to achieve some clarity on this point—to elucidate whether identified autonomous and controlling motivation priming effects in published literature contain evidential value. To do this we conduct p-curve analysis on the autonomous and controlling motivation priming literature. P-curve analysis can help answer the focal question because it is known that the distribution of significant p-values from reported effects is sensitive to whether a true population effect exists (Simonsohn et al., 2014). Specifically, when a true population effect exists, for instance—a motivational prime does give rise to a theorized population effect, then small significant p-values (e.g., p = 0.01–0.02) should arise much more frequently than large significant p-values (e.g., p = 0.04–0.05), such that a right-skew (i.e., positive skew) p-value distribution results. However, in instances that a true population does not exist, for instance—a motivational prime does not give rise to a theorized population effect (e.g., an effect is only perceived to exist because of factors including publication bias and p-hacking), then small significant p-values should arise just as frequently as medium or large significant p-values (a flat p-value distribution should result). Interestingly, when intense p-hacking is conducted within a literature, larger significant p-values can arise more frequently than small significant p-values, thus leading to a left-skew (i.e., negative skew) distribution (Simonsohn et al., 2015). A left-skewed distribution may occur given researchers engaged in p-hacking are often more interested in simply finding significance rather than high levels of significance (Simonsohn et al., 2015). Also, finding high significance when no population effect exists is much more difficult than only just achieving significance (Bruns & Ioannidis, 2016). P-curve analysis can determine whether a literature contains evidential value or not by testing whether the p values from a literature align with these expected frequency distributions (Simonsohn et al., 2014).

Methods

Pre-specified criteria were used to determine which autonomous and controlling motivation priming articles were eligible for p-curve analysis. A systematic search of the academic literature was conducted with Web of Science, using the following key terms: “priming” OR “prime” OR “primed”, AND “autonomy” OR “self-determination” OR “autonomous", in March 2020. This systematic search provided 708 articles. For an article to be considered for p-curve analysis, the following inclusion criteria needed to be met: (1) autonomy was experimentally manipulated using priming techniques not expected to exert immediate motivationally relevant effects due to exposure, but to cause downstream differences in motivational states (i.e., the manipulation should prime [make more accessible] cognitive structures to influence factors in the immediate period that follows); ‘motivational states’ here referring to any behavioral, affective, or cognitive state expected to be sensitive to differences in motivation. For example, Banting et al (2011) would meet this inclusion criteria given their motivational prime applied during an exercise session was expected to shape participants subsequent cognitions and experiences thus influencing their cycling duration and perceived exertion; (2) the manipulation was referred to as a ‘prime’ at least once in the article; (3) the article was published (this was important as only published research will influence future theory and research, not unpublished research. Also, the p-curve calculator does not require unpublished studies to arrive at accurate conclusions, as other meta-analytic techniques do; Simonsohn et al., 2014); (4) at least one significant priming effect was reported (p-curve analyses is only conducted on significant p-values; Simonsohn et al., 2014); (5) the article was not based upon data collected from a previous study; (6) the article provides the relevant test-statistic (a test statistic is necessary to conduct p-curve analyses); (7) the priming manipulation was not used in addition to another manipulation.

Using this inclusion criteria, two research assistants determined which articles were eligible for p-curve analyses. Both research assistants were fully briefed with respect to relevant terms (e.g., autonomy) and criteria. Following the screening process, 24 articles were deemed to meet the inclusion criteria. After then checking the reference lists of these included articles, five more articles were identified and agreed to meet our inclusion criteria, bringing the total to 29. Both research assistants could not agree on whether 6 articles met inclusion criteria or not; these were shared with the lead author who regarded one of these articles to meet the inclusion criteria (thus, 30 articles in total). Upon reviewing these 30 articles the lead author further determined the number of studies that were eligible for p-curve analyses. From the 30 articles, 43 studies were initially identified, and are thus included in the study database, available at: (https://osf.io/4xzea/?view_only=882718f6d23d4012b052950472f44a2d). Following a detailed examination, 13 of these studies were excluded. One study did not use p values (Vail III et al., 2020), two studies reported a key p value as significant that was not actually significant (Lu et al., 2017; Romero-Sánchez et al., 2019), six studies did not prime autonomous and controlling motivational states (Bargh et al., 2001; Custers & Aarts, 2005; Evans et al., 2014; Leander et al., 2011; Radel et al., 2013; Shah, 2003), two studies did not provide enough information to enable the calculation of a test statistic (necessary for conducting p-curve analysis; Milyavskaya & Koestner, 2011; Ntoumanis et al., 2014), one study included another manipulation in addition to the motivational prime (Hodgins et al., 2010), and one study did not actually test their hypothesis—upon testing it, we found it was non-significant (Radel et al., 2009). Thus, 30 studies were included in p-curve analysis (Fig. 1).

Fig. 1
figure 1

Prisma flowchart information through the different phases of the review

Two p-curves (i.e., P-Curve 1 and P-Curve 2) were constructed using the P-Curve Online App 4.0 (http://www.p-curve.com/), with each p-curve based upon different effects extracted from the included research. Examining more than one p-curve is optimal to strengthen conclusions—consistent results across p-curves based upon different effect selection criteria should provide stronger evidence than a single p-curve. The P-Curve Guide available on the p-curve website (http://www.p-curve.com/) details which effects from the included studies should be used in each p-curve. In the present study we followed these guidelines to select the effects to include in each p-curve. The following criteria were also used: (1) only dependent variables (DV’s) that tested the study’s main hypothesis were included in both p-curves, (2) where more than one DV was used to test the main study hypothesis or hypotheses, the first DV presented in the Results section of that study was included in P-Curve 1, while the last DV presented in the Results section was included in P-Curve 2, and 3) although the p-curve guidelines state which effects should be extracted from each study given the research design (e.g., in a three-cell design where two experimental groups exist, P-Curve 1 should include the effect between experimental group 1 and control, and P-Curve 2 should include the effect between experimental group 2 and control), such effects were only included in p-curve analyses if the statistical output relating to the effect was explicitly stated in the study (e.g., if p-curve guidelines indicate P-Curve 2 should include the simple effect between a controlled motivation prime and a neutral prime, this effect will only be included if the test statistic relating to this effect is stated in text).

Retaining surprising or extreme effects (e.g., very large effect size) for p-curve analysis can bias results (Simonsohn et al., 2015). Thus, P-Curve 1 and P-Curve 2 were also constructed without studies that reported extreme effects. Constructed p-curves with these studies included are those already stated (i.e., P-Curve 1 and P-Curve 2). Those without these studies we refer to as 'P-Curve 1 Robust' and 'P-Curve 2 Robust'. Four studies were regarded as containing surprising motivational priming effects. Three of these reported a priming effect larger than may be reasonably expected – the first found a strong autonomous prime nearly quadrupled the number of participants that cheated on a task relative to a weak autonomous prime; 20% vs. 5.4%, Lu et al., 2017, Study 4). The second (Keatley et al., 2014) and third (Friedman et al., 2010) study reported a motivational priming effect with a Cohens d of 1.47 and 0.81, respectively. The final study (i.e., Radel et al., 2009) contained what we regarded as a surprising standard deviation—two conditions within the study had SD’s of 100 s and 143 s, respectively, while the remaining condition had an SD of only 41 s. This was likely due to participants in this group being stopped if they persevered too long thus artificially reducing error variance (although the study does not specify that participants were stopped after a set period).

See Table 1 for the studies and test-statistics included in each P-Curve. In the present study both 'full' and 'half' p-curve analyses were conducted. Whereas the full p-curve assesses all significant p-values entered for analysis (i.e., those ranging from 0 to 0.05), the half p-curve assesses only significant p-values below 0.025. The half p-curve analysis reduces the impact of 'ambitious' p-hacking that may have been conducted whereby researchers avoid reporting effects that only just reach statistical significance (i.e., to avoid suspicion that effects arose due to p-hacking).

Table 1 Studies and test statistics included in p-curve analyses

Results

P-Curve 1 (Full curve Z = − 3.05, p = 0.001; Half curve Z = − 2.31, p = 0.01) and P-Curve 2 (Full curve Z = − 3.79, p = 0.0001; Half curve Z = − 3.68, p = 0.0001) indicate the autonomous and controlling motivational priming literature contains evidential value (see Fig. 2)—that is, they display a right-skew distribution whereby smaller significant p-values arose more frequently than larger significant p-values. Substantive results were unchanged when surprising motivational priming effects were removed from p-curve analyses (see Fig. 3; P-Curve 1 Robust: Full curve Z = − 2.36, p = 0.009; Half curve Z = − 1.9, p = 0.03; P-Curve 2 Robust: Full curve Z = − 3.12, p = 0.001; Half curve Z = − 3.13, p = 0.001).

Fig. 2
figure 2

P-curves of statistically significant motivational priming effects. P-Curve 1 (top) and P-Curve 2 (bottom) include 33 statistically significant (P < 0.05) results, 21 (P-Curve 1) and 22 (P-Curve 2) of which were p < 0.025. No non-significant results were entered. The solid line shows the distribution of significant p-values (e.g., in both figures 9% of significant p-values were between 0.04 and 0.05). The dashed line shows the expected distribution if studies contained evidential value (i.e., right skew). The flat dotted line shows the expected distribution if studies contained no evidential value

Fig. 3
figure 3

Robust p-curves of statistically significant motivational priming effects. P-Curve 1 Robust (top) and P-Curve 2 Robust (bottom) include 29 statistically significant (P < 0.05) results, 18 (P-Curve 1 Robust) and 19 (P-Curve 2 Robust) of which were p < 0.025. No non-significant results were entered. The solid line shows the distribution of significant p-values (e.g., in both figures 10% of significant p-values were between 0.04 and 0.05). The dashed line shows the expected distribution if studies contained evidential value (i.e., right skew). The flat dotted line shows the expected distribution if studies contained no evidential value

Discussion

Autonomous and controlling motivation primes have consistently been found to promote distinct motivational states (e.g., Banting et al., 2011; Brown et al., 2016), yet it still remained possible that these findings are simply false positives and that no true population effect exists (John et al., 2012). In the present study we aimed to gain some clarity on this point by p-curving the autonomous and controlling motivational priming literature. This meta-analytic technique can elucidate whether reported effects do or do not signal a true population effect. In all four p-curve analyses our findings converge to show that extant autonomous and controlling motivational priming literature does indeed contain evidential value.

This study arose because it was recognized that p-hacking was prevalent in psychology during the period in which the majority of autonomous and controlling motivational priming research was conducted (e.g., John et al., 2012), yet these motivational priming effects are still uncritically regarded as representing true population effects (e.g., Ryan & Deci, 2017; Weinstein et al., 2020). It was also recognized that meta-analytic tools existed that could confirm whether reported effects are reflective of a true population effect, and that these had yet to be used on motivational priming research. After p-curving the motivational priming literature our results clearly show it to contain evidential value—small significant p-values were reported in this literature much more frequently than large significant p-values, a distribution that primarily results when a true population effect exists. This right-skew distribution was consistent across p-curves that used different effects from the included studies (thus helping to rule out effect selection bias as an explanation for these results). This effect held after surprising results were removed that could have biased our p-curves towards showing evidential value even when it is absent. In sum, these findings support narratives maintaining the motivational literature contains evidential value (Ryan & Deci, 2017). They also allay concerns these published findings simply reflect false positives, and provide a firmer foundation for future research in this area.

It should be noted that our findings do not remove the possibility that p-hacking was conducted in some of this research. They also do not remove the possibility that publication bias partly shaped the current body of evidence. Indeed, considering the high prevalence of publication bias and p-hacking in broader psychology (Kühberger et al., 2014), it would be a surprise if these factors did not play a role. Furthermore, right-skew p-value distributions can ‘hide’ p-hacking when the evidential value is particularly strong or when the p-hacking is relatively mild (Erdfelder & Heck, 2019; Lakens, 2015). Thus, if two, three, or four motivational priming study effects arose from p-hacking, amidst other research that was not p-hacked and contains evidential value these effects would leave little impact on the overall p-value distribution (Lakens, 2015). What our findings show, however, is what is arguably of greatest interest—that it is unlikely that p-hacking and publication bias are anything other than peripheral factors shaping extant motivational priming literature.

That p-curve analysis does not provide watertight conclusions with respect to p-hacking has led some to conclude that a qualitative analysis of the literature should supplement p-curve analysis (Erdfelder & Heck, 2019). What qualitative factors would signal p-hacking is unknown, particularly given researchers engaged in p-hacking often construct their manuscript to present the notion that all followed from a pre-determined plan. Yet, inclusion of covariates in analyses without a clear rationale may be revealing (Erdfelder & Heck, 2019). The inclusion of covariates can enable further tests of an effect without the researcher needing to shift focus to a different outcome variable (which can disrupt a study’s narrative; Wicherts et al., 2016). Of the 48 covariates included in our p-curved effects (i.e., P-Curve 1), only 10 had some inclusion rationale. Also, in three instances the same identical effect was examined across studies (i.e., in a multistudy publication) but with different covariates included in the analysis (Pavey & Churchill, 2014; Weinstein et al., 2010, 2011). In these instances, the authors did not provide justification for inclusion or why covariates were different from one study to the next. These findings, particularly the latter, are certainly suspect and may well indicate p-hacking. On the other hand, it may also be the case that covariates were apriori justified, but that it was felt unnecessary to detail these justifications in text.

It is tempting to conclude from our p-curve evidence that autonomous and controlling motivation primes are thus effective in promoting distinct motivational states. This conclusion may well be correct, particularly in light of strong evidence from a meta-analyses supporting the notion that primes are indeed effective at influencing behavior (Weingarten et al., 2016). However, our findings in isolation should not lead one to arrive at this conclusion. In our view it cannot be ignored that other factors apart from the motivational prime may explain why participants allocated to the priming condition responded in theoretically expected ways. For example, one such possibility is that in most studies, researchers who collected the data were not blind to the participants condition (e.g., Banting et al., 2011; Brown et al., 2016). Therefore, the researchers may have unknowingly biased their behavior towards favoring the alternative hypothesis; namely, that motivational priming effects exist. This effect has been identified in priming research demonstrating that when experimenters were made blind to participant condition, the priming effect could not be identified (Doyen et al., 2012). Moreover, in the same study when experimenters were not blinded to the priming effect, a significant effect emerged. A further possibility is that even in double-blind studies, motivational priming effects arose because participants became aware of the study hypotheses and thus aligned themselves with them.

Conclusion

In conclusion, the present findings allay concerns that autonomous and controlling motivation priming effects may reflect false positives. Analysis of the published data using p-curve analysis demonstrated a right-skew p-value distribution that is indicative of a true population effect. It is important to note that these findings cannot eliminate the possibility that some of these effects arose from p-hacking. However, they do inspire greater confidence in the efficacy of autonomous and controlling motivational priming effects.