Achievement-Motivated Behavior in Individual Sports (AMBIS-I)—Coach Rating Scale

The assessment of achievement motivation in the context of selection decisions in elite youth sports is associated with serious problems (e.g. socially desirable responses). In order to counteract such problems, an external rating scale for the assessment of the Achievement-Motivated Behavior in Individual Sports (AMBIS-I) from a coach’s perspective was constructed and checked for psychometric quality in three consecutive steps. The studies are based on four different German-speaking samples, including 101 experienced coaches from individual sports and 26 sport psychologists. Multiple phases of exploratory structural equation modeling, item removal and cross-validation unveiled a three-factorial model with 10 items displaying excellent fit indices, acceptable to good reliability, and evidence based on internal structure. Relationships with athletes’ performance level point to the instrument’s evidence for test-criterion relationship. These preliminary results are promising when considering the construction and show the potential of the economical coach rating scale for the scientifically sound assessment of observable achievement-motivated behavior in individual sports.

Talent research highlights the great importance of psychological variables for the successful development from a promising young to a successful top athlete (Coetzee, Grobbelaar, & Gird, 2006;Johnston, Wattie, Schorer, & Baker, 2018;MacNamara, Button, & Collins, 2010). Achievement motivation, in particular, seems to play a critical role for talent development and subsequent success (e.g. Abbott & Collins, 2004;Coetzee et al., 2006;Zuber, Zibung, & Conzelmann, 2015). Unlike the different theories subsumed under the umbrella term achievement motivation, achievement motivation itself is hardly defined precisely (Elliot & Dweck, 2007). Elliot and Dweck (2007) therefore propose to put the construct competence, "defined as a condition or quality of effectiveness, ability, sufficiency, or success" (Elliot & Dweck, 2007, p. 5), at the core of the achievement motivation literature.
Whereas the achievement motive initiates actions aimed at attaining competence, achievement goal orientations guide these actions towards certain goals. It is differentiated between two or three different goal orientations, which are either called task and ego orientation (Nicholls, 1984) or mastery and performance orientation (Ames & Archer, 1988). Task/mastery orientation is aimed at improving one's own skills, for which purpose an internal standard of comparison is used. Ego/performance orientation, on the other hand, focuses on displaying one's own superiority to other people . The third goal orientation competitiveness is characterized by "the desire to enter and strive for success in sport competition" (Gill & Deeter, 1988, p. 200). Therefore, it goes along with a need to compete and compare oneself in a sporting competition. The concept of achievement goals has fertilized a very large number of studies in sport sciences (Duda, 2007). Regarding the question on the relation between achievement goal orientations and achievement within youth elite sports, the major findings by the most recent review (in soccer) exposed a mixed state of evidence for ego/performance orientation with future performance. The majority of studies considering task/mastery orientation found a positive relation with future success (Murr et al., 2018).
In Self-Determination Theory, the reasons for motivated actions are distinguished according to where their perceived locus of causality is, or to what extent they are self-determined (Deci & Ryan, 1985). The resulting motivational type lies on a continuum extending from amotivation, a state with a complete absence of any motivation, through extrinsic motivation, and ending with intrinsic motivation as the most selfdetermined form of motivation (Ryan & Deci, 2000). Intrinsic motivation is char-acterised by pleasure in performing the activity itself. Extrinsic motivation, on the other hand, pertains to actions which are carried out because of the expected consequences, such as fame, honour or prize money. Four types of extrinsic motivation are postulated, which are characterised by increasingly high levels of self-determination or autonomy (for an overview, see Ryan & Deci, 2007). A high degree of self-determination has been shown to be associated with higher levels of performance in adolescents (Gillet et al., 2009;Gillet et al., 2012;Zuber et al., 2015). Conversely, low levels of self-determination appears to hamper a successful sports career in the sense of dropping out (Calvo, Cervelló, Jiménez, Iglesias, & Murcia, 2010;Pelletier, Fortier, Vallerand, & Brière, 2001;Sarrazin et al., 2002).
As a result of (a) the empirical research pointing to the positive association between achievement motivation and performance, and (b), the experiences from sporting practice supporting that motivational characteristics are highly valued by coaches (Christensen, 2009;Jokuschies, Gut, & Conzelmann, 2017), as well as by elite athletes and their parents (MacNamara et al., 2010), the assessment of achievement motivation was integrated in multidimensional talent identification programs (TID; cf. Vaeyens, Lenoir, Williams, & Philippaerts, 2008;Germany: Feichtinger & Höner, 2014;Switzerland: Fuchslocher, Romann, Rüdisüli, Birrer, & Hollenstein, 2011). For the field of motor diagnostics, tests are generally regarded to provide a high degree of psychometric quality and usability (Zibung, Zuber, & Conzelmann, 2016), whereas the assessment of psychological variables, like achievement motivation for real selection decisions in contrast to scientific research, is associated with some serious problems, which will be described below.
To date, different methods to assess achievement motivation are in use. One can distinguish between (1) self-report questionnaires measuring the explicit achievement motive, (2) external assessment questionnaires (coach-ratings) measuring the explicit achievement mo-tive, and (3) projective tests of the implicit achievement motive.
Self-report questionnaires-as the first method discussed here-became increasingly more available and were also used insportscience (Feichtinger&Höner, 2014) due to the high reliability and economical nature of self-reports. Questionnaires that have often been used in the context of achievement motivation in competitive sports are the Achievement Motivation Scale in Sport (Elbe, Wenhold, & Müller, 2005b) to assess hope for success and fear of failure, the Sport Orientation Questionnaire (Elbe, 2004;Gill & Deeter, 1988) that focuses on the achievement goal orientations, and the Sport Motivation Scale (Burtscher, Furtner, Sachse, & Burtscher, 2011;Pelletier et al., 1995;Pelletier, Rocchi, Vallerand, Deci, & Ryan, 2013) to measure selfdetermination. In addition to the advantages that have led to the frequent use of self-report questionnaires, disadvantages are also known: "Although this approach has clear advantages, such as its high psychometric quality and ease of analysis, it also has its disadvantages. Responses may be biased by the tendency to present oneself in a socially desirable light . . . " (Brunstein & Heckhausen, 2010, p. 138). The more obvious the purpose of the questionnaire, the more likely it is that the subject may intentionally or unintentionally influence the outcome, both in a positive and in a negative direction (response bias). Particularly in personally significant situations, the risk of socially desirable responding appears to increase: "Socially desirable responding might be more likely to occur in contexts in which important consequences hinge on the testing outcomes" (Furr & Bacharach, 2014, p. 280). In the context of TID, in which the athletes wish to be selected for the higher squad, the tendency to give socially desirable answers becomes particularly important. The athlete forms an opinion as to which answers or which characteristic expressions might have a positive influence on the selection, adjusting his/her answers in the corresponding direction. Assuring anonymity is recommended as one strategy for preventing or minimizing the bias of socially desirable responses (Furr & Bacharach, 2014). This is possible by using codes or pseudonyms for research purposes. Thus, it has been shown that under anonymous conditions, youth footballers tended to score lower on positively associated personality traits than players under personalized conditions (Feichtinger & Höner, 2014). However, as anonymity cannot be ensured in selection-relevant contexts, the use of self-assessment questionnaires for selection decisions should be viewed very critically.
As this problem of socially desirable answers in self-rating questionnaires is well known, the use of external assessments or a coach's rating-as the second method discussed here-is recommended in amendment to the self-report questionnaires (Anshel & Lidor, 2009). However, this is associated with further methodological problems as the external raters have to evaluate statements that are not directly accessible to them. Motives have a low visibility as they often refer to internal emotions and thoughts and can therefore prove difficult to observe (Connelly & Ones, 2010;Furr & Funder, 2010). Consequently they have to be assessed with different indicators and using evidence from previous behavior as an aid, both of which are associated with the risk of memory effects (Buss & Craik, 1983). Accordingly, questionnaires on the external assessment of the achievement motive-such as those used bythe Swiss National Olympic Committee (Swiss Olympic;Fuchslocher et al., 2011)-should only be used with great caution in TID.
As a third method, the assessment of the implicit achievement motive is discussed. The implicit achievement motive is not directly accessible and hence less affected by social desirability. For the assessment, projective methods, such as the Picture Story Exercise (PSE; Schultheiss & Pang, 2010) were used. However, assessment tools like these are time-consuming and uneconomical, and therefore cannot be recommended in TID.
Neither the explicit nor the implicit achievement motive seems to be assessable problem-free in the context of selection decisions using up-to-date diag-nostic tools. Nevertheless, we must emphasize that the tools mentioned are not without use per se, but that these assessment difficulties arise specifically to the context of TID in sport. Instead of assessing motives, it is also conceivable to rely on recording concrete behavior, which, to put it very simplistically, results from the interaction between persons (e.g., motives) and situations . The advantage is that behavior can also be observed from the outside (e.g. by the coaches), and does not need to be investigated invasively. It is argued by Johnson (1997) that observers are better judges of behavior, whereas for more covert cognitions and feelings, self-reports are most accurate. Likewise, it can be deduced from the general model of determinants and course of motivated action (Heckhausen & Heckhausen, 2010, p. 3) that achievement-motivated behavior predicts future behavior and thus later performance better than a simple motive assessed by self-report that ignores situational factors. Besides, it seems reasonable to focus on decisions from coaches as they often form the basis for selection decisions in sport (Christensen, 2009). Coaches decide which athletes should make the leap to the next highest squad, where they will receive optimal support. For this decision, expert coaches can draw on many years of experience, and thus possess different opportunities for the comparison of different athletes: "Expert coaches can base their assessments of psychological characteristics on a representative sample of talented players they have worked with in the past . . . [and] are able to make inter-individual comparisons and use this knowledge to evaluate and predict a player's current and future potential" (Musculus & Lobinger, 2018, p. 2). In this way, talent characteristics identified by coaches seem to be relevant for the prognosis of future achievements (Jokuschies et al., 2017). Therefore, a suitable new tool for assessing achievement motivation in the contextofselectiondecisions insports should consequentially be based on coaches' ratings of achievement-motivated behavior.
But what overt behaviors can one think of in terms of achievement-moti-

Abstract
The assessment of achievement motivation in the context of selection decisions in elite youth sports is associated with serious problems (e.g. socially desirable responses). In order to counteract such problems, an external rating scale for the assessment of the Achievement-Motivated Behavior in Individual Sports (AMBIS-I) from a coach's perspective was constructed and checked for psychometric quality in three consecutive steps. The studies are based on four different German-speaking samples, including 101 experienced coaches from individual sports and 26 sport psychologists. Multiple phases of exploratory structural equation modeling, item removal and cross-validation unveiled a three-factorial model with 10 items displaying excellent fit indices, acceptable to good reliability, and evidence based on internal structure. Relationships with athletes' performance level point to the instrument's evidence for test-criterion relationship. These preliminary results are promising when considering the construction and show the potential of the economical coach rating scale for the scientifically sound assessment of observable achievement-motivated behavior in individual sports.

Keywords
Coach rating scale · Youth sports · Talent identification · Test construction · Psychology

Schlüsselwörter
Trainerbeurteilungsskala · Nachwuchssport · Talentidentifikation · Testkonstruktion · Psychologie vated behavior? Furr and Funder (2010) state that in general personality psychology we know a lot about persons, but much less about situations and behaviors. In sport, to the best of our knowledge no theoretical approach exists that explicitly deals with achievement-motivated behavior. The concept that seems closest to us in this regard is self-regulation or volition. Volition is defined as ". . . metamotivational processes required to sta-bilize motivation in order to maintain the intended action, especially when rewards are not available immediately" (Elbe & Beckmann, 2006, p. 143). It is according to the Rubicon Model of Action Phases (at a glance: Achtziger & Gollwitzer, 2018) assuming that motivational processes only lead to the decision to act, and that volitional processes are then needed to implement the favored behavior (Achtziger & Gollwitzer, 2018).
Those processes are responsible for initiating an action and maintaining it until the goal has been reached, and include cognitive, motivational and emotional control strategies for not giving up when things get difficult, not letting oneself be distracted, not losing one's confidence, and staying positive (Elbe & Beckmann, 2006). As in the course of a sporting career, long-term goals must be pursued, and high training loads have to be completed, volition is of great importance for elite youth athletes (Baron-Thiene & Alfermann, 2015; Elbe, Szymanski, & Beckmann, 2005a). Volition or selfregulation can be assessed using selfrating questionnaires like the Volitional Components in Sport Questionnaire (Wenhold, Elbe, & Beckmann, 2008), which is a sport-specific adaptation of the general Volitional Components Questionnaire (Kuhl & Fuhrmann, 1998). The analysis of the items (e.g. "I am optimistic about most things in sports"; "In training, I often have to think about things that have nothing to do with what I'm doing") displays that also this construct, which should be closer located to behavior than the achievement motives, is not observable from the outside.
In summary, despite the very rich tradition of theories on achievement motives, it is not possible to deduce concrete performance-motivated behavior in sports from theory, which is why we have opted for an inductive approach for test construction (Smith, Fischer, & Fister, 2003).

Present research
To achieve our goal of constructing a reliable, valid and economical tool for the scientifically sound assessment of achievement-motivated behavior in sports, applicable in coaches every day practice (Horvath & Röthlin, 2018), we built upon a combination of a prototype (Broughton, 1984) and an inductive (Smith et al., 2003) test construction strategy. The actfrequency approach as a form of prototype approach relies on the definitions of constructs elaborated by psychological laypersons (Buss & Craik, 1983), and is used for item construction and to provide evidence based on test content 1 (Studies 1 and 2). Despite the remarkable achievements in motivational psychology made in recent decades, little is known about observable achievementmotivated behaviors. To close this gap and to ensure that the tool is applicable within the practice of TID, in Study 1 (Act nomination) experienced coaches were asked about manifest achievementmotivated behavior in concrete situations (acts). In the subsequent phase (Study 2: Prototypicality rating), these acts were judged by other coaches and sport psychologists to assess evidence based on test content. In the third step (Study 3: Construction and initial validation), the final version of the rating scale was constructed and then checked for reliability and evidence based on internal structure and relation to a relevant criteria. The absence of an a priori theory on achievement-motivated behavior means that no factor or subscale structure is hypothesized in advance. Once the items were developed, the internal structure is uncovered, as when using inductive test construction strategy (Smith et al., 2003). Because it must be assumed that the achievement-motivated behavior of individual and team athletes differs due to their different settings (e.g. higher autonomy in individual sports, individual vs. team training), this study focuses on individual sports only. The ethics committee of the Phil.-hum. Faculty of the University of Bern approved this study.

Participants and procedure
The samples of the coaches were recruited directly through the sport federations via Swiss Olympic. The sporting directors of the sport federations categorized by Swiss Olympic in the categories 1 to 3 (of 5), according to their national importance and achievement potential, were asked to send lists of all their coaches at the 1st or 2nd level of education (professional training for elite sports or competitive sports). In total, we received the contact details of 438 coaches (15% female). For the studies in this publication, only German-speaking coaches were included. For act nomination, 36 coaches from 18 sport federations were randomly selected and invited to participate in Study 1. They received an informational letter with a link to the online survey. Overall, 20 coaches from 14 different sport federations (Mage = 46.0 ± 9.17 years) took part and filled in the online survey. They had an average experience of M = 16.15 ± 9.63 years as coaches in their sport.

Measures
In order to collect acts that the experienced coaches consider to represent achievement-motivated behavior, the following instructions were given in the online survey: "Please think of one or more athletes whom you consider to be particularly achievement motivated with whom you are working with at the moment, or whom you have trained in the past. Now name three individual actions in specific situations in which the achievement motivation of these athletes you are thinking of was clearly expressed. "

Data analysis
In this survey, the participants generated a total of 67 situations. These were minimally modified by the first author, without changing the content of the respective statements (cf. Amelang, Herboth, & Oefner, 1991;Krüger & Amelang, 1995): Very long and detailed situations were shortened (e.g. "He has carried out regenerative activities [blackroll or stretching] without being asked to do so" → "He/she has carried out regenerative activities without being asked to do so"), or divided into two acts (e.g. "He is very ambitious, sets himself goals and reacts emotionally to their achievement or failure" → "He/she has set goals" and "He/she has reacted emotionally when he/she failed to reach a self-imposed goal"). Errors in grammar or spelling, as well as colloquial expressions were corrected. Repeatedly listed acts were deleted. All the modifications were extensively discussed with the coauthor to ensure that no unjustified modifications were made.

Results
Altogether, these adjustments resulted in 58 acts, which were included in the second study, where they were tested with regard to being prototypical of achievement-motivation behavior.

Study 2: Prototypicality rating
The goals of study 2 were twofold: On the one hand, evidence based on test content of the acts was assessed by a comparison between psychological laypersons (expert coaches) and experts in relation to achievement motivation (sport psychologists). On the other hand, by the examination of the rated prototypicality referring to achievement-motivated behavior of all acts, it should be avoided that especially low prototypical or inappropriate acts were processed further.

Participants and procedure
For the prototypicality rating, 40 coaches from 18 sport federations and 150 sport psychologists were invited to participate in Study 2. The coaches were randomly selected and contacted via the contact lists mentioned earlier. The sport psychologists were contacted via the member list of the Swiss Association of Sport Psychology (SASP). They received an informational letter with a link to the online survey. In the end, 21 coaches from 12 different sport federations (MAge = 41.48 ± 9.4 years; MExperience = 14.20 ± 5.67 years), who had not participated in Study 1, and 26 sports psychologists (MAge = 43.23 ± 10.14 years; MExperience = 9.42 ± 8.05 years) took part.

Measures
The participants were asked to rate the 58 acts that had been constructed in Study 1, based on the following instructions: Below you will see a series of acts that may describe achievement motivation to a greater or a lesser extent. For each of these behaviors, please rate on a scale from 1 (not at all) to 5 (very) to what extent you consider this behavior to be achievement motivated, or whether in your opinion this act has something to do with achievement motivation.
In addition, the participants were asked to indicate their age, gender, vocational training and experience as a coach or sport psychologist.

Data analysis
The acts were presented in a random order. The prototypicality ratings were checked for mean differences between the two samples of the coaches and the sport psychologists. As the null hypothesis is the requested hypothesis, the alpha level was set to α = 0.20. Cohen's d was calculated to determine the size of the effect.

Results
The results showed that the overall assessment leans towards fairly prototypical in both groups with M = 3.75 (standard deviation [SD] = 0.99). On the level of the individual acts, the range of prototypicality is between M = 4.53 ("He/she has set goals for himself/herself ") and M = 2.74 ("He/she cried when he/she did not win despite a good performance in the competition because he/she judged himself/ herself better than the opponents"), with only five of 58 acts lying slightly below the level of 3 (somewhat prototypical). Out of the 58 acts, 14 showed significant differences between the two groups with p < 0.20. Out of this, the group judgments differ for nine acts with a large (d > 0.8) or moderate (d > 0.5) effect: Seven acts were regarded as more prototypical by the sports psychologists (e.g. "He/she has asked how he/she could further develop themselves in sports"). Only two acts were considered more prototypical by the coaches (e.g. "He/she strived to be the best at performance comparisons in another type of sport"). For all other acts, there are no or only small group differences in terms of the prototypicality assessment.

Brief discussion of Study 2
The results show that, on average, the acts were rated as being fairly prototypical and therefore appropriate for measuring the concept of achievement-motivated behavior. For nine acts, the two groups do not agree about the prototypicality with a meaningful different assessment (d > 0.5). These items were removed from the rating scale in a first step because the results of the group comparison indicate that the evidence based on test content of these acts is unclear. 2 In terms of differences in the perception of the two samples, a trend became apparent that behavior pointing to the concept of task orientation (Duda, 2007), that is striving to achieve one's own goals and to constantly improve oneself, is regarded as more prototypical, and therefore of higher relevance for achievement motivated behavior by the sports psychologists than by the coaches. This indicates that the two groups might conceive the concept of achievement motivation in a slightly different form, or that the coaches make the everyday experiences that task orientation is not as important than it is considered to be in research (Elbe & Wikman, 2017).
As, to our knowledge, there exists no threshold for sufficient prototypicality, and as only five acts showed prototypicalities slightly lower than somewhat prototypical, all other acts were temporarily retained in the item pool (but none of the lower prototypical acts was included in the final rating scale). The upcoming validation phase had to examine whether the remaining 49 acts could be arranged to form a reliable and valid external rating scale for assessing achievement-motivated behavior.

Study 3: Construction and validation
In Study 3, the first step (t1) was to reduce the number of items, and to explore the underlying factor structure of the item pool to such an extent that at least three items per scale (Smith et al., 2003) create an economically applicable instrument for the measurement of the Achievement-Motivated Behavior in Individual Sports (AMBIS-I). The reduction of items seems for us to be a necessary precondition to ensure that the coaches, as the potential user of the coach-rating scale, will be willing to use the instrument in further practice (Horvath & Röthlin, 2018). To check the dimensionality seems constructive, as concerning the magnitude of different theories related to achievement motivation, a homogeneous 1-factor structure cannot be expected, and as the absence of an a priori theory within the inductive approach means that no factor structure is hypothesized in advance (Smith et al., 2003). The second step (t2/3) checks whether the designed instrument fulfills the requirements towards reliability and internal structure. Finally, we checked the evidence for test-criterion relationship in terms of whether there are differences in the achievement-motivated behavior depending on the level of performance.

Participants and procedure
The sample for Study 3 consisted of coaches who were asked to rate the achievement-motivated behavior of their athletes. Using the lists provided by the sport federations, 160 coaches were invited to participate in the investigation. 69 coaches participated in the study at t1, rating 288 athletes in individual sports with AMBIS-I. According to the guidelines of Swiss Olympic sports, disciplines comprising of teams with a small number of athletes (e.g. tennis, rowing) are counted in addition to individual sports. In a next step of the pre-analysis, the assessments of those coaches who indicated that they had known their athletes for less than half a year, or that they did not feel certain in their assess-ment, were removed from the dataset. The data were also checked for extreme responding, where no evidence of occurrence was found. The final sample of the coaches for Study 3 at t1 is therefore composed of 278 ratings by 67 coaches (19 women, 28.4%, 48 men, 71.6%, Mage = 41.88 ± 11.96 years) from the following sports: badminton, biathlon, curling, free skiing, golf, judo, artistic cycling, cross country, track and field, mountain biking, road cycling, sledding, rowing, swimming, alpine skiing, shooting, tennis and vaulting. The coaches exhibit a high level of education, with more than 50% having successfully completed the highest or 2nd highest level. They have M = 16.19 ± 10.93 years of professional experience and have known the athletes they judged for an average of M = 4.11 ± 3.45 years. At t2 (6-8 weeks after t1) and t3 (4 months after t1), 46 coaches (16 women, 35%; 29 men, 65%; Mage = 42.95 ± 11.67 years) participated and conducted ratings of 176 athletes for the second time. For 52 athletes, the assessments of two coaches (mostly head and assistant coaches) are available at the first measurement point to assess inter-rater reliability. All data were collected using an internet-based questionnaire (LimeSurvey Version 2.50). The coaches had the opportunity to pause and save their ratings to continue another day.

Measures
Achievement-motivated behavior. For Study 3, a coach-rating version was created from each of the 49 acts. A 4-point scale from 0 (never) to 3 (often) with a not able to respond option was used. For the second measurement time (t2/3), the answer format was extended by the category 4 (always), as we found ceiling effects at t1 (see . Table 2). Because each coach had submitted a list with the names of athletes they were training at the moment, we were able to request their rating for each athlete individually:

How often did athlete A [name of one of the coach's athletes was inserted] display the acts mentioned below during the last 12 months?
Further variables. As mentioned before, coaches were asked how certain they felt in their assessment of the respective athlete (not at all, a little, somewhat, fairly much), and how long (in years) they had already known the rated athlete. Finally, the educational level of the coaches was assessed.
Performance level. As an external performance criterion to assess the evidence fortest-criterionrelationship, we checked at t1 whether the athletes rated by their coaches on AMBIS-I were currently holding a Swiss Olympic Card (SOC). SOCs display achievements reached in competitions and can additionally be considered an expression of existing potential. The national sport associations award them according to their specific selection concept (0: no SOC; 1: local SOC; 2: regional SOC; 3: national SOC; 4: international/ elite SOC). Out of this criterion, we composed two groups: One group consisting of athletes on a regional or lower level, and therefore with a lower attributed potential, and a second group consisting of athletes on a national or international level, and therefore with a higher attributed potential.

Data analysis
The analysis of the data was done using IBM SPSS Statistics 24 and Mplus Version 8 (Muthén & Muthén, 2017). In order to construct a tool that can be applied economically, a large part of the original 49 items had to be removed. For this exploratory procedure and to gather evidence based on internal structure, exploratory structural equation modeling (ESEM) was chosen. We aimed to construct a tool that unites at least three items per factor, features a good model fit and at the same time avoids content redundancy as much as possible. The statistical criteria for the choice and elimination of the items are based on Appleton, Ntoumanis, Quested, Viladrich, and Duda (2016) and Payne, Hudson, Akehurst, and Ntoumanis (2013).

Exploratory structural equation modeling (ESEM).
The ESEM model for t1 was estimated using the robust maximum likelihood (MLR) estimator method with oblique geomin rotation, in line with the exploratory nature of the analysis (e.g. epsilon was fixed at 0.5). In doing so, missing values were estimated by means of the full information maximum likelihood (FIML) procedure. Overall, the proportion of missing data was 4.3%. Following the recommendations of Schermelleh-Engel, Moosbrugger, and Müller (2003), a good fit is indicated when χ 2 /df ≤ 2.00; CFI (comparative fit index) and TLI (Tucker-Lewis Index) ≥ 0.97; and root mean square error of approximation (RMSEA) and standardized root mean square residual (SRMR) ≤ 0.05. To test the generalizability of the proposed model, the model was cross-validated with the data from t2 and t3. Because the factor structure was already known at that time, target rotation was chosen as a more confirmatory approach (Marsh, Morin, Parker, & Kaur, 2014).

Reliability and validity.
To estimate the reliability of the indicators, we computed squared multiple correlations (SMC). To estimate the reliability of the constructs, we calculated the composite reliability (CR) (Bagozzi & Yi, 2012) and the average variance extracted (AVE) (Fornell & Larcker, 1981). We used SMC ≥ 0.40, CR ≥ 0.70 and AVE ≥ 0.50 as cut-offs for good reliabilities. To estimate discriminant evidence of the tool, the Fornell-Larcker criterion (Fornell & Larcker, 1981) was calculated. This criterion requires that the AVE of one factor should be higher than any squared correlation with another factor, assuming discriminant evidence at the factor level. To determine the test-retest reliability, reference is made to the data at t2. The coaches of this partial sample were invited to participate again after 8 weeks. Altogether, 64 estimates were made after an average time of 3.2± 0.72 months. The inter-rater reliability between head coach and assistant coach was determined separately for each training group (consisting of athletes assessed by the same two coaches) using the intra-class coefficient (ICC 2-way random effects model, absolute agreement), and then averaged over each training group. The interpretation is based on the recommendations of Koo and Li (2016), who classified ICCs lower than 0.50 as poor, between 0.50 and 0.74 as sufficient, between 0.75 and 0.89 as good and higher than 0.90 as excellent.
To compare the extent of achievementmotivated behavior between the two performance groups, an independent t-test was performed using the values of the factors of AMBIS-I as dependent variables. The level of significance was set at p < 0.05 for all analyses.

Results
In a first step, we removed 24 items which were answered by12 to42% ofthe coaches with not able to respond. It was assumed that if a proportion of more than 10% of all the coaches stated that they could not answer, this is not an incidental finding, and that the reasons for not answering make such items not suitable in the rating scale. The following reasons are conceivable: (a) not directly observable (e.g. "He/she went home after the training to continue practicing there", 35% did not answer), (b) unsuitable for certain sports (e.g. "He/she cried after a final defeat", 22%), or (c) improper evaluation period (e.g. "He/she decided to attend a specialized sports school, even though his/her friends continued to attend the regular one"; 15%; "He/she has moved his/her place of residence closer to the training location", 20%). As the items were all constructed by coaches without any experience in item and test construction (study 1), those problematic points were not addressed automatically, as would have been addressed if the researches themselves would have constructed the items.

Construction of AMBIS-I
For the remaining 25 items, six ESEM models with 1 to 6 factors were modeled and tested. The comparison of the different models points to the 3 or 4 factor model, as these two models have the lowest BIC (Schwarz's Bayesian Information Criteria), and thus display a better model fit taking into account the principle of economy (Masyn, 2013). In both the 3-and 4-factor versions, incremental items were then removed if they featured main loadings ≤ 0.50, communalities ≤ 0.40, high cross loadings or con-tent redundancy. After multiple phases of ESEM modeling and item removal, a 3factor model with 12 items displaying an excellent fit (CFI = 0.99, TLI = 0.97 and RMSEA = 0.04) and readily interpretable factors emerged as the best model (see . Table 1, AMBIS-I-12t1).
For the cross-validation, the data from t2 and t3 was used to check the proposed 12-item solution. The AMBIS-I-12 t2/3 solution displays an inferior fit with the data (cf. AMBIS-I-12 t2/3, . Table 1). Because act 03 showed a low loading on its factors and therefore small communality, and as act 47 showed high cross-loadings, we decided to remove these two items, resulting in an excellent match between the estimated and the observed data at t1 (cf. AMBIS-I-10 t1, . Table 1), as well as at t2/3. The coach-rating scale for achievement-motivated behavior in individual sports (AMBIS-I) therefore consists of 10 items forming three factors which were according to their content (see discussion), named proactivity, ambition and commitment.
As shown by the descriptive statistics in . Table 2, proactivity is least likely to be displayed, followed by ambition and commitment. Additionally, commitment exhibits restricted variance, especially at t1, and thus probably a ceiling effect. The higher values in all dimensions at t2/3 compared with t1 are at least partly due to the change in the scaling from 0-3 at t1 to 0-4 at t2/3. The factor intercorrelations lie in the range 0.22 < r < 0.67, which is acceptable with regard to the differentiation of content (. Table 2). Due to the significant correlations of the three factors, especially at t2/3, the factors cannot be assumed to be uncorrelated. For this reason, the reliability of a total score-in the sense of a higher-order factor-will be examined as well. The disadvantage of the lower content specificity of a total score could possibly be compensated for by improved reliability (through the higher length of the test) and thereby associated with higher validity (Furr & Bacharach, 2014;Kingston, Scheuring, & Kramer, 2013).

Reliability and evidence based on internal structure
The corresponding items of AMBIS-I, their factor loadings and communalities, and the factor reliabilities for t1 and t2/3 are displayed in . Table 3. It should be noted that the items were constructed and checked in German, and that the English translations of the items in . Table 3 have not been validated and are only included for informational purposes. The original German wording of the items is available in Table 1 of the Electronic Supplementary Material 1. The factor loadings on the main factors lie between 0.48 and 0.84 for all items. Five items exhibit cross loadings > 0.20. C1 has a somewhat low indicator reliability at t1. All other items display good communalities (>0.40). The two factors proactivity and ambition display good factor reliabilities at both times of measurement. Only commitment has factor reliabilities that lie below the desired thresholds, especially at t1. In the cross validation at t2/3, their CR= 0.68 and AVE = 0.49 values lie only just below the desired threshold values of 0.70 and 0.50. The test-retest reliability rtt is ≥0.70 for all three factors and thus within an acceptable range over a comparatively long period of 3.2 ± 0.72 months. The total score has the highest retest reliability, with rtt = 0.79. As the average variance extracted for each factor is higher than any squared correlation with another factor, all three factors meet the Fornell-Larcker criterion and therefore display discriminant evidence at the factor level.

Inter-rater reliability
The intra-class coefficients (ICCs) of the coaches averaged over all training groups are displayed in. Table 4. AllICCsexcept that for proactivity lie in a satisfactory range (0.51 < ICC < 0.70). Generally, the average measure, i.e. ICCs which use the meanvalues ofthe twocoaches as a source of information, have higher values. The inter-rater reliabilities of the total score show the highest ICCs for both the average and the single rater measures.

Evidence for test-criterion relationship
Inmattersofconcurrentevidence fortestcriterion relationship, all three factors, as well as the total score, differentiate with small to medium effects (0.30 < d < 0.62) between the two performance groups determined by the SOC (. Table 4). Athletes on the national or international level were rated as showing more achievementmotivated behavior than those on the lower performance level.

Brief discussion of Study 3
In the first step of Study 3, we simultaneously reduced the number of items to ten and checked the dimensionality. The final 10-item tool AMBIS-I, mapping the three factors proactivity, ambition and commitment, showed excellent fit indices and was successfully replicated by the data at t2/3. The factor proactivity refers to getting involved in training processes on one's own initiative and for one's own sake. The impulse to act comes from the person itself, and an action is carried out self-determined and does not require an external push (e.g. item P4: The athlete looks for opportunities to catch up on missed training content). Self-determined means that motivation emanates from the self (Ryan & Deci, 2000). Intrinsic motivation as the most self-determined motivation, and thus accompanied with proactivity, is characterized by the sensation of immanent pleasure stemming from performing an activity itself. Another theoretical link can be made to the concept of achievement goal orientations: A proactive athlete pursues his or her goals perseveringly (e.g. item P2: Athletes to be first on the training grounds and practicing technical processes independently). The items of the factor proactivity give no indication of which goals are being closely pursued. The fact that those goals are being pursued persistently furthermore makes the connection with volition obvious.
The factor ambition is characterized by the absolute will to successfully pursue self-imposed goals in competitions. Ambitious athletes aim at winning competitions (e.g. item A2: The athlete clearly communicates before the competition that he/she wanted to win). If those goals were not achieved, the athlete responds unsatisfied (e.g. item A1: The athlete has acted annoyed when he did not finish the competition in first place.) Thus, the conceptual proximity of the factor ambition to achievement goal orientations competitiveness and goal orientation (as the goals are selfimposed), and-to a lesser degree-ego orientation, becomes visible. In turn, the factor commitment is again localized in the setting of training. Committed athletes show that they are ready and willing to perform (e.g., item C3: The athlete shows an "active" stance in training). He or she displays will to work hard to achieve a goal and tries to solve a task repeatedly, even in the face of adversity (Scanlan, Carpenter, Simons, Schmidt, & Keeler, 1993). Hence, commitment is closely related to discipline in training (e.g. item C1: In highly demanding exercises, the athlete works to the point of exhaustion) and volition, as volition is especially important for "realizing long and intense training loads during the course of an athletic career or for keeping up regular exercising" (Elbe et al., 2005a, p. 560). Because committed athletes orient towards and compare themselves to stronger athletes, a connection with competitiveness is also to be assumed (Gill & Deeter, 1988).
In the second step, we checked whether the designed instrument fulfills initial requirements towards its psychometric properties. Overall, the results point to satisfactory reliability and evidence based on internal structure. The factor reliability of commitment is a bit more critical. However, as commitment shows the highest width in terms of content, and at the same time the lowest variance (e.g. . Table 2), this is hardly surprising. Contrasting these results, commitment showed the best values for inter-rater reliability. Although the values for the inter-rater reliability are not very high in the overview, the results coincide with coefficients of a metaanalysis of the inter-rater reliability of supervisor ratings in an occupational context (Conway & Huffcutt, 1997). Besides, to increase the reliability of the statements, the coach assessment should where possible be carried out by two coaches (Musculus & Lobinger, 2018) and the values should then be averaged for further interpretation. The findings Table 4 Intraclass correlation coefficients (ICC) as measure of agreement between head and assistant coach averaged over all training groups (n = 13) for t1 (ICC 2-way random effects model, absolute agreement) and descriptive statistics and between-group analyses of AMBIS-I Scales Intraclass The 13 training groups (athletes and two coaches) comprise two to eight athletes. Nine training groups even comprise less than five athletes M mean, SD standard deviation, CI confidence interval on test-criterion relationship point in the expected direction, and are-in the case of ambition and commitment-in line with previous findings from a self-report study on the difference between eliteand subelite-level athletes (Halldorsson, Helgason, & Thorlindsson, 2012) . The magnitude of the uncovered differences-with small to medium effects-is substantial considering performance in sports is determined in a multidimensional manner. Thus, in addition to psychological characteristics, such as achievement-motivated behavior, features from various other areas are also relevant, including motor performance, social support or the characteristics of training (Meyer, Gnacinski, & Flechter, 2018;Rees et al., 2016).

General discussion
The studies presented aimed to construct a reliable, valid and economic tool for the scientifically sound assessment of achievement-motivated behavior in sports. This stemmed from the fact that the tools currently available for assessing achievement motivation display methodological problems such as social desirability of self-rating questionnaires, motives that were not directly accessible by coaches, and high costs if implicit motives should be assessed. Those three limitations in the field of selection decisions in youth elite sports can be reduced by using AMBIS-I. With AMBIS-I, socially desirable responding is less problematic as the assessment is based on coach-ratings instead of the ratings of the athletes themselves, and the focus is switched from motives to observable behaviors. The items generated in Study 1 refer to specific situations, in which the coaches' lay understanding of achievement motivation was clearly expressed in the behavior of their athletes. In Study 2, these items were judged by sport psychologists as experts in regard to achievement motivation, and by coaches as psychological laypersons in terms of their being content-valid/prototypical for the construct of achievement-motivated behavior. The results of Study 3 then indicate that AMBIS-I is an economic tool that meets many requirements of reliability and validity. The results can be summarized as follows: The external rating scale AMBIS-I contains the three factors proactivity, ambition and commitment based on ten forms of behavior, whose frequency of occurrence is rated by the coaches. The three factors display an excellent fit with the observed data, reasonable to good reliability coefficients and good evidence based on internal structure. The first findings on test-criterion relationship are promising when considering the potential use of AMBIS-I as an instrument in a battery of tests for talent selection in the future.
Due to the larger number of items, the total score is somewhat more reliable than the individual factors (Tavakol & Dennick, 2011). The calculation of a total score also makes sense from the point of view of content, as in Study 1 the coaches were questioned about achievement-motivated behavior in general and not about forms of behavior representing the retrospectively extracted factors proactivity, ambition or commitment. We therefore recommend using the individual factors as well as the overall score for interpretation. The individual factors provide more detailed information, whereas, from a statistical point of view, the overall score offers more reliable statements. AMBIS-I can therefore be described as a multidimensional test with correlated factors (according to the definition of Furr & Bacharach, 2014).
Because future users were included in the construction of the items and in assessing the quality of the items, it can be assumed that the final tool will be well accepted in practice. We recommend using AMBIS-I as part of a multidimensional test battery for the TID. From the point of view of talent development and further career planning, it might also be interesting to compare the coaches' ratings with the self-ratings of the athletes, and discuss possible differences or similarities (Musculus & Lobinger, 2018).
Despite the encouraging results so far, some critical points need to be addressed. The cross-validation with the same sample at a later date is not the gold standard; however, due to the limited sample size and accessibility of the coaches responsible for selection decisions, this seems to be an acceptable solution. Furthermore, the need to adapt the scaling of the AMBIS-I items between the first and the second measurement time is not ideal.
Nevertheless, the successful cross-validation and the improved distribution of the values at t2/3 confirm the decision to implement this adaptation. As it is also not possible for other test constructions, we cannot guarantee that the full breadth of achievement-motivated behavior is covered. However, there were a large number of acts nominated in Study 1 and many of them were also overlapping each other which suggests that the most important situations might be mentioned. Additionally, Smith et al. (2003, p. 473) argue that inductive test construction has the advantage that ". . . investigators are likely to be able to identify all or most elements of the construct that are recognized by the target individuals. The approach lessens the possibility that a limited theoretical perspective will cause investigators to omit important construct facets".
In contrast to motor tests, there are no purely objective measured values when recording psychological characteristics. Although the social desirability of the self-ratings can be reduced with the coach-ratings used, not all possible answering tendencies (e.g. recall or confirmation bias; Althubaiti, 2016) can be controlled for even though focusing on specific behavior might not be as strongly affected by some response biases as other forms of assessment (McCrae & Weiss, 2010). The fact that the number of athletes to be rated by one coach is low speaks against a large effect of systematic errors caused by differences in individual standards of the raters. The same is true concerning our results for inter-rater reliability, which lie in an acceptable range and therefore speak against a high impact of individual effects of-for example-confirmation bias. To avoid primacy or recency effects, the order of athletes to be rated was randomly changed between the first and the second measurement point, which resulted in acceptable retest reliabilities. In addition, we should consider the whole talent selection process, for which it is strongly recommended to use multidimensional test batteries (motor performance, social support or the characteristics of training), and therefore also different data collection instruments as motor tests or performance results (Vaeyens et al., 2008). Hence, selection decisions should not be based on just one instrument or variable. Therefore, the effect of possible rating biases from AMBIS-I on the holistic selection decision can thus be considered small. However, whenever possible, it is recommended to have the ratings of two coaches (cf. main and assistant coach) to mitigate subjective preferences and confirmation bias.
Despite the promising results based on content, internal structure and testcriterion relationship, the next step has to be the ongoing validation of the test. On the one hand, relationships between test scores and other measures must be carried out by checking whether AMBIS-I is related to other instruments measuring motivational and volitional constructs and therefore refer to the underlying construct achievement motives. From a content perspective, connections can be assumed to exist between proactivity and intrinsic motivation (Deci & Ryan, 1985), or between ambition and competitiveness (Gill & Deeter, 1988). In addition, it seems reasonable to examine the extent to which self-assessment regarding achievement-motivated behavior is related to the coaches' assessment. Although high correlations are not be expected (Conway & Huffcutt, 1997), it seems important to review the overall trend and to involve different perspectives. On the other hand, there are still important questions regarding testcriterion relationship. Thus, an instrument that is meant to predict athletic performance or success should not only be examined using a cross-sectional design, as was done in the present study, but also in a prognostic manner (Vaeyens et al., 2008). Both coach assessment and objective success criteria can be used as performance criteria.
In addition to the extended validation, it also makes sense to check the retest reliability over a shorter time period. Participants took an average of more than 3 months to complete the online questionnaire, despite an early reminder for participation at the second date of measurement. Therefore the calculated reliability coefficient reflects, on the one hand, the degree to which measurements error affects the test and, on the other hand, the amount of change in the true scores (Furr & Bacharach, 2014). It must therefore be regarded as a mixture between the examination of stability and reliability. In the literature, many test-retest analyses are conducted over a period of 2 to 8 weeks (Furr & Bacharach, 2014).
Besides curiosity about learning more about the psychometric properties, advanced analyses of stability, change and age-relatedness of achievement-motivated behavior are of great interest. It can, for example, be assumed that very high values in proactivity are hard to achieve for the younger athletes, as they might-due to external circumstances-be unable to decide for themselves when to arrive at training, and will therefore be accompanied by parents. With the progressive development of behavioral autonomy, autonomous actions and decisions, higher values will be possible as the athletes move toward young adulthood (Collins, Gleason, & Sesma, 1997). The question of age-dependent standard values should therefore be addressed in the near future. In addition, it is necessary to examine whether the much-expressed assumption of the highest possible values in achievementmotivated behavior must be rated positively in every case, or whether extreme manifestations, for example in proactivity, can be accompanied by negative phenomena, such as over-involvement or burnout (Gardner & Moore, 2006). Additionally, it should be emphasized once again that AMBIS-I has so far only been tested for individual sports, and that there has been no validation yet for team sports. So, it is essential for future studies to check if the items of AMBIS-I are also suitable for team sports.
Based on the current state of knowledge, the newly constructed, economically usable coach-rating scale AMBIS-I allows a reliable assessment of observable achievement-motivated behavior in individual sports. Additionally, promising initial indications of validity can be reported. On this basis, a first step has been taken in identifying future successful athletes early on and accordingly promoting them more effectively.